# Music genre classification notebook

This is the notebook with all the technical details of our work. 

## Libraries 

In [45]:
%load_ext autoreload
%autoreload 2
%reload_ext autoreload

# feature extractoring and preprocessing data
import librosa
import numpy as np
import pandas as pd

from server.utils import Util
from server.feature_extractor import FeatureAggregator

# Utils
import matplotlib.pyplot as plt
from tqdm import tqdm
from concurrent.futures import ThreadPoolExecutor

# model selecting and evaluation
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold
from sklearn.metrics import classification_report

# models
from sklearn.neighbors import KNeighborsClassifier
from sklearn.dummy import DummyClassifier
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
import xgboost as xgb

import warnings
warnings.filterwarnings('ignore')

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Extracting music and features

We use [GTZAN genre collection](http://marsyasweb.appspot.com/download/data_sets/) dataset for classification. 
<br>
<br>
We classify
 * Classical 
 * Metal
 * Blues
 * Hiphop
 * Pop
 * Rock
 * Country
 * Reggae 
 * Jazz
 
Each genre contains 100 songs. Total dataset: 900 songs

In [3]:
genres=['classical', 'metal', 'blues', 'hiphop', 'pop', 'rock', 'country', 'reggae', 'jazz']

In [10]:
def get_data(genre, range_tuple=100):
    data = list()
    for i in range(range_tuple):
        if i < 10:
            path = "../../" + genre + "/" + genre + ".0000" + str(i) + ".au"        
        else:
            path = "../../" + genre + "/" + genre + ".000" + str(i) + ".au"
        song = librosa.load(path)
        data.append(song[0])
        if i%30 == 0: print("Got {0} songs for genre {1}".format(i, genre))
    return (data, genre)

In [11]:
with ThreadPoolExecutor(len(genres)) as pool:
    results = pool.map(get_data, genres)
results = list(results)

Got 0 songs for genre reggae
Got 0 songs for genre pop
Got 0 songs for genre classical
Got 0 songs for genre jazz
Got 0 songs for genre hiphop
Got 0 songs for genre rock
Got 0 songs for genre blues
Got 0 songs for genre metal
Got 0 songs for genre country
Got 30 songs for genre reggae
Got 30 songs for genre pop
Got 30 songs for genre metal
Got 30 songs for genre jazz
Got 30 songs for genre country
Got 30 songs for genre classical
Got 30 songs for genre hiphop
Got 30 songs for genre rock
Got 30 songs for genre blues
Got 60 songs for genre reggae
Got 60 songs for genre pop
Got 60 songs for genre classical
Got 60 songs for genre metal
Got 60 songs for genre country
Got 60 songs for genre jazz
Got 60 songs for genre hiphop
Got 60 songs for genre blues
Got 60 songs for genre rock
Got 90 songs for genre pop
Got 90 songs for genre reggae
Got 90 songs for genre classical
Got 90 songs for genre country
Got 90 songs for genre metal
Got 90 songs for genre jazz
Got 90 songs for genre hiphop
Got 90

### Extracting Features

We will extract
 * Mel-frequency cepstral coefficients (MFCC). You can read about them here:
     * [Wikipedia](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum)
     * [Kishore Prahallad. *Spectrogram, Cepstrum and Mel-Frequency Analysis.*](http://www.speech.cs.cmu.edu/15-492/slides/03_mfcc.pdf)
     * [habr (article in Russian)](https://habr.com/post/140828/)
     * [Beth Logan. *Mel frequency spectral coefficients for Music Modeling*](http://musicweb.ucsd.edu/~sdubnov/CATbox/Reader/logan00mel.pdf)
 * Centoids. Measure of Spectral Brightness
 $$
     C_t = \frac{\sum_{i=1}^N f_i M_t[f_i]}{\sum_{i=1}^N M_t[f_i]}
 $$
 * Rolloff. Rate of Spectral Decreasing
 $$
     R_t: \sum_{i=1}^{R_t}M_t[f_i] = 0.85 \cdot \sum_{i=1}^N M_t[f_i]
 $$
 * Flux. Measure of Spectral Change
 $$
     F_t = \|M_t[f] - M_{t-1}[f] \|
 $$
 * Chromagram. Read about it here
     * [Wikipedia](https://en.wikipedia.org/wiki/Chroma_feature)
 * Low Energy. Percentage of windows with less than average energy
 * Rhythm fearures
     * bmp
     * Tempogram
     * autocorrelation
 * Zero Crossing Rate. Detecting noise in the signal
 $$
     zcr = \frac{1}{T-1}\sum_{i=1} \textbf{1}_{R<0}(S_t S_{t-1})
 $$
 
 
For vector features we count 
 * Mean (MFCC, Chroma, Centroids, Flux, Rolloff, ZRC, ect.)
 * Median (Centroids, Flux, Rolloff, ZRC, ect.)
 * Standard Deviation (Det[cov_matrix MFCC], Centroids, Flux, Rolloff, ZRC, ect.)
 * Max (Centroids, Flux, Rolloff, ZRC, ect.)
 * Min (Centroids, Flux, Rolloff, ZRC, ect.)

In [14]:
music_list = list()
genre_list = list()
for genre_data in results: 
    music_list += genre_data[0]
    genre_list += list(np.array([genre_data[1]]).repeat(100))

In [26]:
exctractor = FeatureAggregator(music_list, parallel=True)
extracted, feature_names = exctractor.get_features()

Got mfcc for 0 songs
Got zero_cross_rate for 0 songs
Got centroid data for 0 songs
Got rhythm data for 0 songs
Got rmse data for 0 songs
Got zero_cross_rate for 100 songs
Got zero_cross_rate for 200 songs
Got zero_cross_rate for 300 songs
Got mfcc for 100 songs
Got zero_cross_rate for 400 songs
Got zero_cross_rate for 500 songs
Got zero_cross_rate for 600 songs
Got centroid data for 100 songs
Got zero_cross_rate for 700 songs
Got mfcc for 200 songs
Got rhythm data for 100 songs
Got zero_cross_rate for 800 songs
Done  <class 'server.feature_extractor.ZeroCrossing'>
Got mfcc for 300 songs
Got centroid data for 200 songs
Got mfcc for 400 songs
Got rhythm data for 200 songs
Got mfcc for 500 songs
Got centroid data for 300 songs
Got rmse data for 100 songs
Got rhythm data for 300 songs
Got mfcc for 600 songs
Got centroid data for 400 songs
Got mfcc for 700 songs
Got rhythm data for 400 songs
Got mfcc for 800 songs
Got centroid data for 500 songs
Done  <class 'server.feature_extractor.MFCC'>

In [28]:
# save all the extracted features to the csv file extracted_data.csv
data_to_save = np.hstack((extracted, np.array(genre_list).reshape(-1,1)))
pd.DataFrame(data_to_save, columns=feature_names + ['genre']).to_csv("extracted_data.csv", encoding="utf-8")

## Data Science

Here is some brief information about the code below.

### Models declaration
**Bad results**: 
 * DummyClassifier (obviously bad, just to compare) 

**Satisfactory results**
 * KNeighborsClassifier 
 * xgboost

**Good results** 
 * SVC
 * RandomForestClassifier
 * VotingClassifier: ensemle of the previous 2 with soft voting

### How do we fit parameters
GridSearchCV

### How do we evaluate models
We iterate randoms_state in train_test_split with the future averaging of the classification reports

In [57]:
# Iterating random_state in train_test_split with future averaging of the classification reports
def check_model(model, X, y, encoder, n=10):
    num_genres = len(np.unique(y))
    average = np.zeros((num_genres + 1, 4), dtype = float)
    for rs in tqdm(range(n)):
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=rs, stratify=y)
        model.fit(X_train, y_train)
        target = encoder.inverse_transform(np.arange(num_genres))
        df = parse_class_report(classification_report(y_test, model.predict(X_test), target_names=target))
        average += df.values
    df.iloc[:,:] = np.round(average / n, 3)
    return df

# Iterating random_state in train_test_split with future averaging of the classification reports for xgboost
def check_model_xgb(X, y, encoder, n=10):
    num_genres = len(np.unique(y))
    average = np.zeros((num_genres + 1, 4), dtype = float)
    for rs in tqdm(range(n)):
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=rs, stratify=y)
        dtrain = xgb.DMatrix(X_train, label=y_train)
        dtest = xgb.DMatrix(X_test, label=y_test)
        param = {'objective': "multi:softmax", "num_class": np.unique(y_train).size}
        evallist = [(dtrain, 'train'), (dtest, 'eval')]
        bst = xgb.train(param, dtrain, evals=evallist, verbose_eval=False)
        target = encoder.inverse_transform(np.arange(num_genres))
        df = parse_class_report(classification_report(y_test, bst.predict(dtest), target_names=target))
        average += df.values
    df.iloc[:,:] = np.round(average / n, 3)
    return df

# Transforming classification_report string to pandas.DataFrame
def parse_class_report(class_rep_str):
    lines = class_rep_str.split('\n')
    list_splitted = list()
    for line in lines:
        splitted_line = np.array(line.split(' '))
        mask_non_empty = np.array([word != '' for word in line.split(' ')])
        res = splitted_line[mask_non_empty]
        if len(res) != 0:
            list_splitted.append(np.array(res, dtype = object))

    first = list(list_splitted[0])
    first.insert(0, 'genre')
    list_splitted[0] = np.array(first, dtype = object)
    list_splitted[-1] = list_splitted[-1][2:]
    list_splitted[-1][0] = 'avg / total'
    
    df = pd.DataFrame(list_splitted)
    df.columns = list(df.iloc[0])
    return df.iloc[1:].set_index('genre').convert_objects(convert_numeric=True)

### Models
#### Preparing data

In [42]:
df = pd.DataFrame.from_csv("extracted_data.csv", encoding = 'utf-8')

scaler = StandardScaler()
X = scaler.fit_transform(np.array(df.values[:, :-1], dtype = float))

genre_list = df.values[:, -1]
encoder = LabelEncoder()
y = encoder.fit_transform(genre_list)

#### GridSearchCV and final preparation

In [47]:
dummy = DummyClassifier()
knn = KNeighborsClassifier()
rfc = RandomForestClassifier()
svc = SVC(probability=True)

param_grid_rfc = { 
    'n_estimators': [ 70, 150, 370],
    'max_features': ['log2'],
    'max_depth' : [10,15, 20],
    'criterion' :['gini']
}

param_grid_svc = {
    "C": np.logspace(0,2,num=20), 
    "kernel": ["poly", "rbf", "sigmoid"]   
}

param_grid_knn = {
    'n_neighbors' : [2,5,8,10,15]
}

skf = StratifiedKFold(n_splits=3)

grid_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid_rfc, cv=skf, n_jobs = -1)
grid_svc = GridSearchCV(estimator=svc, param_grid=param_grid_svc, cv=skf, n_jobs = -1)
grid_knn = GridSearchCV(estimator=knn, param_grid=param_grid_knn, cv=skf, n_jobs = -1)

eclf = VotingClassifier(estimators=[('svc', grid_svc), ('rfc', grid_rfc)], voting='soft')

In [46]:
# DummyClassifier
check_model(dummy, X, y, encoder)

100%|██████████| 10/10 [00:00<00:00, 187.77it/s]


Unnamed: 0_level_0,precision,recall,f1-score,support
genre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
blues,0.126,0.13,0.128,20.0
classical,0.118,0.11,0.113,20.0
country,0.07,0.08,0.072,20.0
hiphop,0.091,0.08,0.084,20.0
jazz,0.118,0.125,0.12,20.0
metal,0.104,0.1,0.103,20.0
pop,0.132,0.15,0.14,20.0
reggae,0.101,0.09,0.094,20.0
rock,0.1,0.11,0.104,20.0
avg / total,0.105,0.106,0.106,180.0


In [50]:
# KNeighbors
check_model(grid_knn, X, y, encoder)

100%|██████████| 10/10 [00:05<00:00,  1.81it/s]


Unnamed: 0_level_0,precision,recall,f1-score,support
genre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
blues,0.583,0.625,0.601,20.0
classical,0.892,0.8,0.839,20.0
country,0.624,0.625,0.614,20.0
hiphop,0.651,0.605,0.626,20.0
jazz,0.672,0.76,0.709,20.0
metal,0.669,0.72,0.69,20.0
pop,0.676,0.805,0.732,20.0
reggae,0.67,0.54,0.588,20.0
rock,0.365,0.275,0.309,20.0
avg / total,0.645,0.639,0.635,180.0


In [59]:
# xgboost
check_model_xgb(X, y, encoder, n=10)

100%|██████████| 10/10 [00:02<00:00,  4.31it/s]


Unnamed: 0_level_0,precision,recall,f1-score,support
genre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
blues,0.646,0.635,0.636,20.0
classical,0.891,0.885,0.886,20.0
country,0.6,0.575,0.584,20.0
hiphop,0.629,0.585,0.604,20.0
jazz,0.724,0.705,0.71,20.0
metal,0.752,0.8,0.768,20.0
pop,0.732,0.78,0.751,20.0
reggae,0.622,0.69,0.649,20.0
rock,0.462,0.385,0.408,20.0
avg / total,0.674,0.671,0.666,180.0


In [48]:
# RandomForest
check_model(grid_rfc, X, y, encoder)

100%|██████████| 10/10 [01:21<00:00,  8.80s/it]


Unnamed: 0_level_0,precision,recall,f1-score,support
genre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
blues,0.676,0.64,0.653,20.0
classical,0.899,0.915,0.905,20.0
country,0.628,0.62,0.619,20.0
hiphop,0.719,0.665,0.691,20.0
jazz,0.748,0.755,0.75,20.0
metal,0.722,0.815,0.761,20.0
pop,0.753,0.8,0.771,20.0
reggae,0.675,0.715,0.689,20.0
rock,0.51,0.395,0.435,20.0
avg / total,0.703,0.701,0.698,180.0


In [49]:
# Support Vectors
check_model(grid_svc, X, y, encoder)

100%|██████████| 10/10 [01:50<00:00, 11.21s/it]


Unnamed: 0_level_0,precision,recall,f1-score,support
genre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
blues,0.77,0.805,0.784,20.0
classical,0.954,0.905,0.928,20.0
country,0.7,0.7,0.695,20.0
hiphop,0.718,0.73,0.723,20.0
jazz,0.859,0.825,0.834,20.0
metal,0.793,0.79,0.788,20.0
pop,0.799,0.755,0.772,20.0
reggae,0.718,0.755,0.732,20.0
rock,0.534,0.51,0.517,20.0
avg / total,0.763,0.752,0.752,180.0


In [55]:
# Voting Classifier
check_model(eclf, X, y, encoder)

100%|██████████| 10/10 [03:20<00:00, 21.00s/it]


Unnamed: 0_level_0,precision,recall,f1-score,support
genre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
blues,0.815,0.775,0.791,20.0
classical,0.928,0.925,0.924,20.0
country,0.747,0.715,0.728,20.0
hiphop,0.743,0.745,0.743,20.0
jazz,0.831,0.83,0.827,20.0
metal,0.786,0.82,0.8,20.0
pop,0.78,0.775,0.773,20.0
reggae,0.727,0.75,0.735,20.0
rock,0.537,0.51,0.519,20.0
avg / total,0.765,0.761,0.76,180.0


### Summing it all up
And the comparisson table

|               | precision | recall | f1-score |  
|---------------|-----------|--------|----------|
|DummyClassifier|0.105      |0.106   |0.106     | 
|KNeigbors      |0.645	    |0.639   |0.635     | 
|xgboost        |0.674	    |0.671   |0.666	    |
|RandomForest   |0.703	    |0.701   |0.698     |
|SupportVectors |0.763	    |0.752   |0.752	    |
|Voting         |0.765      |0.761   | 0.760	|
