# Music genre identification:
## Classifier feature generation
### John Burt
#### August 2019


In this notebook, I generate features from music clips to train a classifier to classify song genre. The features are saved as .csv files for the classifier model to load and train/test with. I take this intermediate step to prepare the features because feature generation from sound files takes hours on my laptop, so I really don't want to do this very often!

#### Methods:

Most of the features I chose are based on Harmonic-percussive source separation, and MFCC:

- Zero crossing rate
- Mean frequency bin amplitudes of harmonic decomposition of the spectrogram.
- Mean, max and median frequency bin amplitudes of tempogram
- Mean MFCC frequency bin amplitudes 

- I also used a moving average procedure to create a lower resolution (fewer values) version of each of the feature types. I tested the performance of this reduced feature set vs the original full resolution set. 



#### Extra packages required:
- librosa


####  Source data:
- The original data is from ["FMA: A Dataset For Music Analysis"](https://github.com/mdeff/fma). That dataset is a dump of sound clips and associated metadata from the Free Music Archive (FMA), an interactive library of high-quality, legal audio downloads. 

- For the meetup series, the music data was reduced further to a "warmup set" of 4000 samples of folk and 4000 samples of hip-hop music. This code uses the warmup set, but is also tooled to use the original music clip dataset, which is much larger and has more genre categories.

### Notebook setup and load the song metadata

Filter metadata by music clips actually present.

In [1]:
# remove warnings
import warnings
warnings.filterwarnings('ignore')
# ---

%matplotlib inline
from matplotlib import pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')

import pandas as pd
pd.options.display.max_columns = 100

import numpy as np

import librosa
import librosa.display
import os
import fnmatch

def get_metadata(srcdir, metafile):
    """Read and fix up music clip metadata, 
      return only data rows of music clip files present in srcdir"""
    exts = ['*.wav', '*.mp3']
    
    # read the metadata csv file
    df = pd.read_csv(srcdir+metadatafile)

    # search through source folder for sound files
    clippaths = []
    clipIDs = []
    for ext in exts:
        for root, dirnames, filenames in os.walk(srcdir):
            for filename in fnmatch.filter(filenames, ext):
                clippaths.append(os.path.join(root, filename))
                clipIDs.append(int(filename.split('.')[0]))
    
    # extract only metadata for sound files present
    df = df.set_index('track_id')
    df = df.loc[clipIDs]
    df = df.reset_index()
    
    # create clip filepath column
    df['filepath'] = clippaths
    
    # create top genre name column
    if not 'genre_top' in df.columns.values:
        df['genre_top'] = [eval(g)[0]['genre_title'] for g in df['track_genres']]
    
    return df
    
# data source directory and metadata file name
# srcdir = './warmup/' # PDSG small 'warmup' dataset: 200 clips
# metadatafile = 'pdsg_musicgenre_warmup_small.csv'
srcdir = './fma_small/' #  Free Music Archive small dataset: 8000 clips
metadatafile = 'raw_tracks.csv'

# read the metadata file
df = get_metadata(srcdir, metadatafile)

print(df.shape)
# df.head()

(8000, 41)


In [18]:
df.head()


Unnamed: 0,track_id,album_id,album_title,album_url,artist_id,artist_name,artist_url,artist_website,license_image_file,license_image_file_large,license_parent_id,license_title,license_url,tags,track_bit_rate,track_comments,track_composer,track_copyright_c,track_copyright_p,track_date_created,track_date_recorded,track_disc_number,track_duration,track_explicit,track_explicit_notes,track_favorites,track_file,track_genres,track_image_file,track_information,track_instrumental,track_interest,track_language_code,track_listens,track_lyricist,track_number,track_publisher,track_title,track_url,filepath,genre_top
0,2,1.0,AWOL - A Way Of Life,http://freemusicarchive.org/music/AWOL/AWOL_-_...,1,AWOL,http://freemusicarchive.org/music/AWOL/,http://www.AzillionRecords.blogspot.com,http://i.creativecommons.org/l/by-nc-sa/3.0/us...,http://fma-files.s3.amazonaws.com/resources/im...,5.0,Attribution-NonCommercial-ShareAlike 3.0 Inter...,http://creativecommons.org/licenses/by-nc-sa/3.0/,[],256000.0,0,,,,11/26/2008 01:48:12 AM,11/26/2008,1,02:48,Radio-Unsafe,,2,music/WFMU/AWOL/AWOL_-_A_Way_Of_Life/AWOL_-_03...,"[{'genre_id': '21', 'genre_title': 'Hip-Hop', ...",https://freemusicarchive.org/file/images/album...,,0,4656,en,1293,,3,,Food,http://freemusicarchive.org/music/AWOL/AWOL_-_...,./fma_small/000\000002.mp3,Hip-Hop
1,5,1.0,AWOL - A Way Of Life,http://freemusicarchive.org/music/AWOL/AWOL_-_...,1,AWOL,http://freemusicarchive.org/music/AWOL/,http://www.AzillionRecords.blogspot.com,http://i.creativecommons.org/l/by-nc-sa/3.0/us...,http://fma-files.s3.amazonaws.com/resources/im...,5.0,Attribution-NonCommercial-ShareAlike 3.0 Inter...,http://creativecommons.org/licenses/by-nc-sa/3.0/,[],256000.0,0,,,,11/26/2008 01:48:20 AM,11/26/2008,1,03:26,Radio-Unsafe,,6,music/WFMU/AWOL/AWOL_-_A_Way_Of_Life/AWOL_-_06...,"[{'genre_id': '21', 'genre_title': 'Hip-Hop', ...",https://freemusicarchive.org/file/images/album...,,0,1933,en,1151,,6,,This World,http://freemusicarchive.org/music/AWOL/AWOL_-_...,./fma_small/000\000005.mp3,Hip-Hop
2,10,6.0,Constant Hitmaker,http://freemusicarchive.org/music/Kurt_Vile/Co...,6,Kurt Vile,http://freemusicarchive.org/music/Kurt_Vile/,http://kurtvile.com,http://i.creativecommons.org/l/by-nc-nd/3.0/88...,http://fma-files.s3.amazonaws.com/resources/im...,,Attribution-NonCommercial-NoDerivatives (aka M...,http://creativecommons.org/licenses/by-nc-nd/3.0/,[],192000.0,0,Kurt Vile,,,11/25/2008 05:49:06 PM,11/26/2008,1,02:41,Radio-Safe,,178,music/WFMU/Kurt_Vile/Constant_Hitmaker/Kurt_Vi...,"[{'genre_id': '10', 'genre_title': 'Pop', 'gen...",https://freemusicarchive.org/file/images/album...,,0,54881,en,50135,,1,,Freeway,http://freemusicarchive.org/music/Kurt_Vile/Co...,./fma_small/000\000010.mp3,Pop
3,140,61.0,The Blind Spot,http://freemusicarchive.org/music/Alec_K_Redfe...,54,Alec K. Redfearn & the Eyesores,http://freemusicarchive.org/music/Alec_K_Redfe...,http://www.aleckredfearn.com,http://i.creativecommons.org/l/by-nc-nd/3.0/us...,http://fma-files.s3.amazonaws.com/resources/im...,6.0,Attribution-Noncommercial-No Derivative Works ...,http://creativecommons.org/licenses/by-nc-nd/3...,[],128000.0,0,,,,11/26/2008 01:44:07 AM,11/26/2008,1,04:13,,,5,music/WFMU/Alec_K_Redfearn_and_the_Eyesores/Th...,"[{'genre_id': '17', 'genre_title': 'Folk', 'ge...",https://freemusicarchive.org/file/images/album...,,0,1593,en,1299,,2,,Queen Of The Wires,http://freemusicarchive.org/music/Alec_K_Redfe...,./fma_small/000\000140.mp3,Folk
4,141,60.0,Every Man For Himself,http://freemusicarchive.org/music/Alec_K_Redfe...,54,Alec K. Redfearn & the Eyesores,http://freemusicarchive.org/music/Alec_K_Redfe...,http://www.aleckredfearn.com,http://i.creativecommons.org/l/by-nc-nd/3.0/us...,http://fma-files.s3.amazonaws.com/resources/im...,6.0,Attribution-Noncommercial-No Derivative Works ...,http://creativecommons.org/licenses/by-nc-nd/3...,[],128000.0,0,,,,11/26/2008 01:44:10 AM,11/26/2008,1,03:02,,,1,music/WFMU/Alec_K_Redfearn_and_the_Eyesores/Ev...,"[{'genre_id': '17', 'genre_title': 'Folk', 'ge...",https://freemusicarchive.org/file/images/album...,,0,839,en,725,,4,,Ohio,http://freemusicarchive.org/music/Alec_K_Redfe...,./fma_small/000\000141.mp3,Folk


## moving average function

I use moving average with non-overlapping sample frames to reduce the number of features of a given type, creating a lower resolution dataset.  

In [2]:
def moving_mean(x, windowsize):
    """Split each row of x into bins of windowsize length 
       and return an array of the bin means"""
    a = list(x)
    if len(a)%windowsize == 0: extra=0 
    else: extra=windowsize-(len(a)%windowsize)
    a.extend([np.nan]*extra)
    return np.mean(np.array(a).reshape(( int(len(a)/windowsize), windowsize )),axis=1)


## Generate the features

Calculate audio features for a sound clip, return them as a single vector.

Features created:

- zero crossing rate
- mean frequency bin amplitudes of harmonic decomposition of the spectrogram.
- mean, max and median frequency bin amplitudes of tempogram
- mean MFCC frequency bin amplitudes 

- reduced resolution (fewer feature values) of above feature sets using a moving average function.

In [8]:
from sklearn.preprocessing import minmax_scale

def generate_features(y, sr, fftsize=512, hop_length = 50, 
                      margin=16, nmfcc = 2000, windowsize = 5,
                     create_labels=False):
    """Generate audio features from wave clip"""
    
    # zero crossing rate
    zero_cross_rate = np.sum(librosa.zero_crossings(y))/len(y)

    # get spectrograph of the sound clip
    D = librosa.stft(y, hop_length=hop_length, n_fft=fftsize)

    # decompose spectrograph into harmonic (frequency variant) 
    #  and percussive (time variant) components
    D_harmonic, D_percussive = librosa.decompose.hpss(D, margin=margin)

    # harmonic features: mean amplitude of harmonic components at each frequency bin
    harmonic_freqs_mean = minmax_scale(np.mean(np.abs(D_harmonic), axis=1))
    # generate the reduced resolution version 
    harmonic_freqs_mean_r = moving_mean(harmonic_freqs_mean, windowsize)

    # generate tempogram, the tempo frequency aspect of the audio clip
    oenv = librosa.onset.onset_strength(y=y, sr=sr, hop_length=hop_length)
    tempogram = np.abs(librosa.feature.tempogram(onset_envelope=oenv, sr=sr,
                                   hop_length=hop_length))
    # subtract mean to center at zero
    tempo_adj = (tempogram.T - np.mean(tempogram, axis=1).T).T  
    # calc max, median, mean tempo
    tempo_freqs_max =  minmax_scale(np.max(tempo_adj, axis=1))
    tempo_freqs_med =  minmax_scale(np.median(tempo_adj, axis=1))
    # generate reduced resolution version
    tempo_freqs_max_r = moving_mean(tempo_freqs_max, windowsize*2)
    tempo_freqs_med_r = moving_mean(tempo_freqs_med, windowsize*2)

    # calculate MFC - Mel Frequency Cepstral Coefficient
    mfc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=nmfcc, n_fft=fftsize)
    mfc_mean = minmax_scale(np.mean(mfc,axis=1))
    mfc_mean_r = moving_mean(mfc_mean, windowsize)
    
    # create feature labels to use as csv header labels
    if create_labels:
        labels = ['zc_rate']
        labels.extend(['hf_mean_'+str(i) for i in range(len(harmonic_freqs_mean))])
        labels.extend(['tf_max_'+str(i) for i in range(len(tempo_freqs_max))])
        labels.extend(['tf_med_'+str(i) for i in range(len(tempo_freqs_med))])
        labels.extend(['mfc_mean_'+str(i) for i in range(len(mfc_mean))])
        labels.extend(['hf_mean_r_'+str(i) for i in range(len(harmonic_freqs_mean_r))])
        labels.extend(['tf_max_r_'+str(i) for i in range(len(tempo_freqs_max_r))])
        labels.extend(['tf_med_r_'+str(i) for i in range(len(tempo_freqs_med_r))])
        labels.extend(['mfc_mean_r_'+str(i) for i in range(len(mfc_mean_r))])

        # combine all features into one list, include feature name labels
        return   ([zero_cross_rate] + 
                    list(harmonic_freqs_mean) +
                    list(tempo_freqs_max) +
                    list(tempo_freqs_med) +
                    list(mfc_mean) +
                    list(harmonic_freqs_mean_r) +
                    list(tempo_freqs_max_r) +
                    list(tempo_freqs_med_r) +
                    list(mfc_mean_r) 
                   ), labels
    else:
        # combine all features into one list
        return   ([zero_cross_rate] + 
                    list(harmonic_freqs_mean) +
                    list(tempo_freqs_max) +
                    list(tempo_freqs_med) +
                    list(mfc_mean) +
                    list(harmonic_freqs_mean_r) +
                    list(tempo_freqs_max_r) +
                    list(tempo_freqs_med_r) +
                    list(mfc_mean_r) 
                   )

## Generate features for each music clip

In [3]:
# adjustable parameters for feature generation
fftsize=512
hop_length = fftsize*2
margin=16
nmfcc = 2000
windowsize = 5

# build list of track IDs 
track_id = []
genre = []
feat_labels = []

# todo: save more metadata: 
#     album_id, artist_id, artist_name, track_bit_rate, track_explicit, 
#     track_favorites, track_interest, track_listens, track_number, 
#     track_title, 

# loop through metadata, gen features for all sound clips found 
#  NOTE: more efficient to loop through list of sound clips in src dir
for i,info in df.iterrows():
    try:
        y, sr = librosa.load(info['filepath'], sr=None)
        
        if not feat_labels: 
            features, feat_labels = generate_features(y, sr, fftsize=fftsize, hop_length=hop_length, 
                          margin=margin, nmfcc=nmfcc, windowsize=windowsize)
        else:
            features = generate_features(y, sr, fftsize=fftsize, hop_length=hop_length, 
                          margin=margin, nmfcc=nmfcc, windowsize=windowsize)
        
        # on first iteration, create the array to hold the feature data
        if i == 0:
            numsongs = df.shape[0]
            feature_arr = np.array(np.zeros([numsongs,len(features)]))

        feature_arr[i,:] = features
        track_id.append(info['track_id'])
        genre.append(info['genre_top'])

        if i%100 == 0: print(i,end=',')
            
    except:
        pass
    

0,10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160,170,180,190,200,210,220,230,240,250,260,270,280,290,300,310,320,330,340,350,360,370,380,390,400,410,420,430,440,450,460,470,480,490,500,510,520,530,540,550,560,570,580,590,600,610,620,630,640,650,660,670,680,690,700,710,720,730,740,750,760,770,780,790,800,810,820,830,840,850,860,870,880,890,900,910,920,930,940,950,960,970,980,990,1000,1010,1020,1030,1040,1050,1060,1070,1080,1090,1100,1110,1120,1130,1140,1150,1160,1170,1180,1190,1200,1210,1220,1230,1240,1250,1260,1270,1280,1290,1300,1310,1320,1330,1340,1350,1360,1370,1380,1390,1400,1410,1420,1430,1440,1450,1460,1470,1480,1490,1500,1510,1520,1530,1540,1550,1560,1570,1580,1590,1600,1610,1620,1630,1640,1650,1660,1670,1680,1690,1700,1710,1720,1730,1740,1750,1760,1770,1780,1790,1800,1810,1820,1830,1840,1850,1860,1870,1880,1890,1900,1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,2010,2020,2030,2040,2050,2060,2070,2080,2090,2100,2110,2120,2130,2140,2150,2160,2170,2180,2190,2200,2210,2

## Create a dataframe of all feature array with appropriate column labels 

In [17]:
# create dataframe from feature data
feature_df = pd.concat([pd.DataFrame({'track_id':track_id, 'genre':genre}),
                        pd.DataFrame(feature_arr,columns=feat_labels)], axis=1)

print('feature_df.shape', feature_df.shape)



feature_df.shape (8000, 1312)


In [24]:
feature_df.head()

Unnamed: 0,track_id,genre,zc_rate,hf_mean_0,hf_mean_1,hf_mean_2,hf_mean_3,hf_mean_4,hf_mean_5,hf_mean_6,hf_mean_7,hf_mean_8,hf_mean_9,hf_mean_10,hf_mean_11,hf_mean_12,hf_mean_13,hf_mean_14,hf_mean_15,hf_mean_16,hf_mean_17,hf_mean_18,hf_mean_19,hf_mean_20,hf_mean_21,hf_mean_22,hf_mean_23,hf_mean_24,hf_mean_25,hf_mean_26,hf_mean_27,hf_mean_28,hf_mean_29,hf_mean_30,hf_mean_31,hf_mean_32,hf_mean_33,hf_mean_34,hf_mean_35,hf_mean_36,hf_mean_37,hf_mean_38,hf_mean_39,hf_mean_40,hf_mean_41,hf_mean_42,hf_mean_43,hf_mean_44,hf_mean_45,hf_mean_46,...,tf_med_r_15,tf_med_r_16,tf_med_r_17,tf_med_r_18,tf_med_r_19,tf_med_r_20,tf_med_r_21,tf_med_r_22,tf_med_r_23,tf_med_r_24,tf_med_r_25,tf_med_r_26,tf_med_r_27,tf_med_r_28,tf_med_r_29,tf_med_r_30,tf_med_r_31,tf_med_r_32,tf_med_r_33,tf_med_r_34,tf_med_r_35,tf_med_r_36,tf_med_r_37,tf_med_r_38,mfc_mean_r_0,mfc_mean_r_1,mfc_mean_r_2,mfc_mean_r_3,mfc_mean_r_4,mfc_mean_r_5,mfc_mean_r_6,mfc_mean_r_7,mfc_mean_r_8,mfc_mean_r_9,mfc_mean_r_10,mfc_mean_r_11,mfc_mean_r_12,mfc_mean_r_13,mfc_mean_r_14,mfc_mean_r_15,mfc_mean_r_16,mfc_mean_r_17,mfc_mean_r_18,mfc_mean_r_19,mfc_mean_r_20,mfc_mean_r_21,mfc_mean_r_22,mfc_mean_r_23,mfc_mean_r_24,mfc_mean_r_25
0,2.0,Hip-Hop,0.098443,0.455681,1.0,0.552833,0.691586,0.223427,0.190449,0.282134,0.112812,0.05303,0.054595,0.06135,0.129584,0.241001,0.123676,0.064296,0.044909,0.018547,0.023608,0.106943,0.076901,0.033017,0.037002,0.034361,0.032486,0.101843,0.087506,0.016727,0.021955,0.02664,0.036114,0.042516,0.030011,0.01421,0.012372,0.012951,0.014189,0.025279,0.030154,0.030579,0.029993,0.029536,0.024369,0.01944,0.016828,0.010327,0.010791,0.013708,...,0.404484,0.370969,0.336046,0.347666,0.297363,0.275084,0.250757,0.227422,0.216936,0.202315,0.188395,0.183433,0.175257,0.169991,0.165051,0.162472,0.160967,0.159643,0.159151,0.158847,0.158745,0.158712,0.1587084,,0.651562,0.815139,0.793893,0.805969,0.808567,0.800488,0.798006,0.807708,0.804929,0.807728,0.796693,0.793465,0.808957,0.825307,0.75893,0.771155,0.786987,0.753463,0.843214,0.84867,0.831296,0.800685,0.811004,0.823353,0.812255,
1,5.0,Hip-Hop,0.059377,0.42514,1.0,0.318026,0.072528,0.02799,0.025655,0.041813,0.02509,0.006222,0.00457,0.004319,0.006661,0.008314,0.025222,0.00683,0.001975,0.002691,0.00207,0.000859,0.001499,0.001776,0.000871,0.001157,0.001419,0.00149,0.001667,0.003163,0.003797,0.003429,0.004479,0.003033,0.002294,0.002165,0.001811,0.001483,0.002121,0.00159,0.001919,0.001513,0.001803,0.001365,0.001144,0.000733,0.000579,0.000724,0.00046,0.000652,...,0.448302,0.379175,0.36942,0.340201,0.312188,0.288517,0.272923,0.249419,0.23175,0.211191,0.197542,0.182649,0.17182,0.165078,0.157833,0.153375,0.151099,0.14982,0.149016,0.148706,0.148561,0.148525,0.148521,,0.670229,0.817565,0.797072,0.805324,0.807423,0.799754,0.793943,0.805547,0.803523,0.80451,0.793769,0.791152,0.802685,0.818094,0.757645,0.768226,0.78611,0.755455,0.837408,0.84416,0.82709,0.801544,0.809923,0.82028,0.810429,
2,10.0,Pop,0.08108,0.046655,0.404121,0.501663,0.479119,1.0,0.982343,0.438204,0.18388,0.286772,0.405532,0.18038,0.132385,0.293372,0.951195,0.210948,0.057912,0.1176,0.390736,0.157645,0.106173,0.088221,0.518317,0.706688,0.125511,0.089306,0.208887,0.376781,0.19057,0.086085,0.168071,0.300983,0.095261,0.226336,0.146125,0.211181,0.244796,0.089597,0.039194,0.1469,0.277096,0.087904,0.038905,0.054717,0.212436,0.123462,0.05776,0.036142,...,0.52938,0.483576,0.433567,0.39164,0.324726,0.275151,0.22328,0.187716,0.155148,0.123368,0.090574,0.064233,0.043965,0.028953,0.017927,0.011429,0.006202,0.002953,0.001189,0.000396,9.4e-05,1.3e-05,1.444373e-07,,0.589892,0.728892,0.713855,0.727771,0.737698,0.717258,0.711715,0.729103,0.726565,0.730614,0.712844,0.71035,0.726578,0.747558,0.671771,0.68212,0.707222,0.662943,0.77037,0.774209,0.757046,0.718576,0.73291,0.747245,0.734415,
3,140.0,Folk,0.027289,0.135343,0.634874,1.0,0.380972,0.053111,0.008895,0.003221,0.000823,0.000651,0.00064,0.001096,0.000945,0.000847,0.00024,0.000207,0.000303,0.00041,0.00025,0.00034,0.000414,0.000386,0.000273,0.000177,0.000186,0.000223,0.000323,0.000364,0.000187,0.000132,9.5e-05,0.000103,0.000193,0.000229,0.00036,0.000665,0.000533,0.000345,0.000254,0.000196,0.000164,0.000165,0.000253,0.00023,0.000202,0.000169,0.000166,0.000317,...,0.398342,0.342819,0.364683,0.35497,0.332846,0.319778,0.300136,0.304876,0.292123,0.279694,0.277644,0.278487,0.273822,0.27051,0.268737,0.268182,0.267407,0.266777,0.266659,0.266537,0.266513,0.266512,0.266513,,0.720204,0.856103,0.839121,0.842825,0.843418,0.841189,0.837602,0.844675,0.84442,0.844746,0.835615,0.837828,0.845396,0.849864,0.814959,0.817017,0.824609,0.813355,0.867176,0.874886,0.859476,0.844086,0.848417,0.854874,0.848466,
4,141.0,Folk,0.034166,0.22289,1.0,0.728333,0.31205,0.146076,0.654426,0.391532,0.065078,0.098935,0.057677,0.10511,0.099396,0.024248,0.012116,0.012397,0.009679,0.018025,0.009333,0.005101,0.003334,0.0114,0.018985,0.010412,0.007691,0.005181,0.003246,0.003331,0.005166,0.002688,0.00171,0.00294,0.005647,0.005446,0.006057,0.005454,0.00667,0.007291,0.006627,0.008647,0.002887,0.002673,0.004211,0.002914,0.002602,0.003335,0.004879,0.005504,...,0.480073,0.486263,0.474067,0.470374,0.436849,0.429303,0.408387,0.390735,0.364293,0.349009,0.336088,0.327075,0.320759,0.315707,0.312856,0.31063,0.308961,0.308661,0.308418,0.308351,0.308345,0.308334,0.3083327,,0.661806,0.776472,0.772283,0.781399,0.780277,0.771505,0.76929,0.779674,0.778395,0.779456,0.767453,0.769752,0.778985,0.787135,0.742566,0.752804,0.763981,0.738562,0.803796,0.809378,0.794638,0.774338,0.782123,0.789851,0.782272,


## Split feature data into high resolution and low resolution feature sets

Save them as separate data files:
- 'features_hirez.csv'
- 'features_lorez.csv'

In [38]:
# prepare high rez feature set
colnames = ['track_id', 'genre','zc_rate']
colnames.extend([s for s in feature_df.columns if ('hf_mean' in s) & ('_r' not in s)])
colnames.extend([s for s in feature_df.columns if ('tf_max' in s) & ('_r' not in s)])
colnames.extend([s for s in feature_df.columns if ('tf_med' in s) & ('_r' not in s)])
colnames.extend([s for s in feature_df.columns if ('mfc_mean' in s) & ('_r' not in s)])

# create a new df with only high resolution feature columns:
hirez_df = feature_df[colnames]

# delete rows with nans in track_id or genre
hirez_df.dropna(axis=0, inplace=True, how='any', subset=['track_id','genre'])

# delete mysterious nan feature columns
hirez_df.dropna(axis=1, inplace=True, how='any')

# set track_id as int
hirez_df['track_id'] = hirez_df['track_id'].astype(int)

# save dataframe
hirez_df.to_csv(srcdir+'features_hirez_v2.csv', index=False)

# prepare low rez feature set
colnames = ['track_id', 'genre','zc_rate']
colnames.extend([s for s in feature_df.columns if ('hf_mean_r' in s)])
colnames.extend([s for s in feature_df.columns if ('tf_max_r' in s)])
colnames.extend([s for s in feature_df.columns if ('tf_med_r' in s)])
colnames.extend([s for s in feature_df.columns if ('mfc_mean_r' in s)])

# create a new df with only low resolution feature columns:
lorez_df = feature_df[colnames]

# delete rows with nans in track_id or genre
lorez_df.dropna(axis=0, inplace=True, how='any', subset=['track_id','genre'])

# delete mysterious nan feature columns
lorez_df.dropna(axis=1, inplace=True, how='any')

# set track_id as int
lorez_df['track_id'] = lorez_df['track_id'].astype(int)

# save dataframe
lorez_df.to_csv(srcdir+'features_lorez_v2.csv', index=False)

print('hirez_df.shape:', hirez_df.shape)
print('lorez_df.shape:', lorez_df.shape)


hirez_df.shape: (7997, 1156)
lorez_df.shape: (7997, 155)
