# Music genre identification:
## XGBoost based genre classifier
### John Burt
#### August 2019


In this notebook, I train an XGBoost classifier to classify song genre using pre-generated audio features. 


#### Methods:

Most of the features I chose are based on Harmonic-percussive source separation, and MFCC:

- Zero crossing rate
- Mean harmonic frequency amplitudes over the duration of the sound clip
- Mean and max percussive tempo frequency amplitudes over the duration of the sound clip
- mean MFCC frequency amplitude over the duration of the sound clip

These features are generated in another notebook and saved as a csv file for this notebook to load.

#### Extra packages required:
- librosa
- xgboost


####  Source data:

The feature data is generated from music clips:

- The original data is from ["FMA: A Dataset For Music Analysis"](https://github.com/mdeff/fma). That dataset is a dump of sound clips and associated metadata from the Free Music Archive (FMA), an interactive library of high-quality, legal audio downloads. 


- For the meetup series, the music sampled were reduced further to a "warmup set" of 4000 samples of folk and 4000 samples of hip-hop musi (making this a binary classification problem). This code uses the warmup set, but it can also use the original music clip dataset, which is much larger and has more genre categories.




In [50]:
# remove warnings
import warnings
warnings.filterwarnings('ignore')
# ---

%matplotlib inline
from matplotlib import pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')

import pandas as pd
pd.options.display.max_columns = 100

import numpy as np

#  high and low resolution feature datasets
# (previously generated by another notebook/script)
feature_rez = 'hi'
featuredatafile = 'features_'+feature_rez+'rez.csv'

# data source directory
srcdir = './warmup/' # PDSG 'warmup' dataset (200 samples)
# metadatafile = 'pdsg_musicgenre_warmup_small.csv'

# srcdir = './fma_small/' # small version of original dataset (8000 samples)
# metadatafile = 'raw_tracks.csv'

# read the feature data file
df = pd.read_csv(srcdir+featuredatafile)

print('Read feature data - df.shape:',df.shape)


Read feature data - df.shape: (200, 1156)


## prep training data

In [51]:
from sklearn.model_selection import train_test_split

# change label column to categorical
df['genre'] = pd.Categorical(df['genre'])

# for the test prediction, I just train with all labelled data
X_train = df.iloc[:,2:].values

# convert string genre categories to numbers
y_train = df['genre'].cat.codes.values
y_labels = df['genre']

print('X_train.shape:',X_train.shape)


X_train.shape: (200, 1154)


## Train classifier

In [52]:
import xgboost as xgb

# use default parameters
clf = xgb.XGBClassifier().fit(X_train, y_train)


## Predict genre of test data

### Function to generate features from audio clip

In [53]:
import librosa
import librosa.display
from sklearn.preprocessing import minmax_scale

def moving_mean(x, windowsize):
    """Split each row of x into bins of windowsize length 
       and return an array of the bin means"""
    a = list(x)
    if len(a)%windowsize == 0: extra=0 
    else: extra=windowsize-(len(a)%windowsize)
    a.extend([np.nan]*extra)
    return np.mean(np.array(a).reshape(( int(len(a)/windowsize), windowsize )),axis=1)

def generate_features(y, sr, feat_rez='hi', fftsize=512, hop_length = 50, 
                      margin=16, nmfcc = 2000, windowsize = 5):
    """Generate features from wave data, same as with the training data"""
    
    zero_cross_rate = np.sum(librosa.zero_crossings(y))/len(y)

    D = librosa.stft(y, hop_length=hop_length, n_fft=fftsize)
    D_harmonic, D_percussive = librosa.decompose.hpss(D, margin=margin)

    harmonic_freqs_mean = minmax_scale(np.mean(np.abs(D_harmonic), axis=1))

    oenv = librosa.onset.onset_strength(y=y, sr=sr, hop_length=hop_length)
    tempogram = np.abs(librosa.feature.tempogram(onset_envelope=oenv, sr=sr,
                                   hop_length=hop_length))
    tempo_adj = (tempogram.T - np.mean(tempogram, axis=1).T).T  
    tempo_freqs_max =  minmax_scale(np.max(tempo_adj, axis=1))
    tempo_freqs_med =  minmax_scale(np.median(tempo_adj, axis=1))

    mfc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=nmfcc, n_fft=fftsize)
    mfc_mean = minmax_scale(np.mean(mfc,axis=1))

    harmonic_freqs_mean_r = moving_mean(harmonic_freqs_mean, windowsize)
    tempo_freqs_max_r = moving_mean(tempo_freqs_max, windowsize*2)
    tempo_freqs_med_r = moving_mean(tempo_freqs_med, windowsize*2)
    mfc_mean_r = moving_mean(mfc_mean, windowsize)
    
    if feat_rez == 'hi':
        # combine all features into one list
        return   ([zero_cross_rate] + 
                    list(harmonic_freqs_mean) +
                    list(tempo_freqs_max) +
                    list(tempo_freqs_med) +
                    list(mfc_mean) 
                   )
    else:
        return   ([zero_cross_rate] + 
                    list(harmonic_freqs_mean_r) +
                    list(tempo_freqs_max_r) +
                    list(tempo_freqs_med_r) +
                    list(mfc_mean_r) 
                   )


### Load audio clips and generate features for classification

In [54]:
import os
import fnmatch

# parameters used in training set for feature generation
fftsize=512
hop_length = 50
margin=16
nmfcc = 2000
windowsize = 5

# load test data
testdir = './warmup_test/'

# keep a lit of filenames
clipfilenames = []

# search through source folder for sound files
# NOTE: this code assumes all clips in root of testdir
for root, dirnames, filenames in os.walk(testdir):
    numclips = len(fnmatch.filter(filenames, '*.wav'))
    for filename, i in zip(fnmatch.filter(filenames, '*.wav'), range(numclips)):

        y, sr = librosa.load(os.path.join(root, filename), sr=None)

        features = generate_features(y, sr, feat_rez=feature_rez, fftsize=fftsize, hop_length=hop_length, 
                      margin=margin, nmfcc=nmfcc, windowsize=windowsize)

        # on first iteration, create the array to hold the feature data
        if i == 0:
            feature_arr = np.array(np.zeros([numclips,len(features)]))

        feature_arr[i,:] = features
        clipfilenames.append(filename)

print('X_train.shape',X_train.shape)
print('test data shape',feature_arr.shape)


X_train.shape (200, 1154)
test data shape (50, 1154)


## predict genres from features and save to file for submission

In [56]:
y_out = clf.predict(feature_arr)

y_pred = (np.where(y_out>.5,1,0))

pred_df = pd.DataFrame({'filename':clipfilenames, 
                         'genre':np.where(y_pred==0,'Folk','Hip-Hop')})

pred_df.to_csv(testdir+'test_predictions.csv', index=False)
