In [1]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

<IPython.core.display.Javascript object>

# Feature Extraction and Machine Learning Techniques for Musical Genre Determination <br /><small>by [Rosy Davis](mailto:rosydavis@ieee.org), CSUN MSEE 2017</small>

## Introduction

This notebook runs the neural network models for my masters project, "Feature Extraction
and Machine Learning Techniques for Musical Genre Determination," for which I will be
receiving a Masters of Science in 
[Electrical Engineering](https://www.csun.edu/engineering-computer-science/electrical-computer-engineering/) 
from [California State University, Northridge](https://www.csun.edu/) in December 2017. 
My advisor at CSUN is 
[Dr. Xiyi Hang](http://www.csun.edu/faculty/profiles/xiyi.hang.14). This project is also hosted on [ScholarWorks, via the CSUN library](http://scholarworks.csun.edu/handle/10211.3/199917). Only a partial list of most-relevant references appears in this notebook; the full list appears in the accompanying paper.

In this project, two approaches to musical genre classification were investigated: the use of support vector classification on Mel-frequency cepstral coefficient (MFCC) features (Experiment 1, this notebook), and the use of neural networks on image data generated via the discrete wavelet transform (DWT) (Experiments 2-5, the "[NeuralNetworkModels.ipynb](NeuralNetworkModels.ipynb)" notebook).

### Contents

* [Setup](#Setup)
* [Experiment 1: MFCC Benchmarking](#Experiment-1:-MFCC-Benchmarking)
    * [Small Dataset](#Small-Dataset)
    * [Extended Dataset](#Extended-Dataset)

## Setup

In [2]:
# Seed the random number generator
seed = 42 # set to None to auto-generate

# For nice tables:
from IPython.display import display

# This block is adapted from FMA: A Dataset For Music Analysis
# Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson, EPFL LTS2
# from their provided usage code
import os

import numpy as np                     # For math and analysis
import pandas as pd                    # For data structures
import sklearn as skl                  # (scikit-learn) for various machine learning tasks
import sklearn.utils, sklearn.preprocessing, sklearn.decomposition, sklearn.svm
import fma.utils as fma_utils          # Utilities provided for loading and manipulating the
                                       # Free Music Archive dataset.

# My code for utilities and long-running timers:
import code_timing as timer   
import utilities as ut   

In [3]:
# Adapted from fma usage code:
AUDIO_DIR = os.path.join(os.getcwd(), "data/fma_small/")

# Load the metadata files
tracks = fma_utils.load(AUDIO_DIR + 'tracks.csv')
features = fma_utils.load(AUDIO_DIR + 'features.csv')

# Make sure everything in features is in tracks and vice versa
np.testing.assert_array_equal(features.index, tracks.index)

In [4]:
def fit_svm(kernel, idx_tuple):
    # Tuple is (size_name, kernel_name)
    (size_name, kernel_name) = idx_tuple
    print("{} SVM training SVM begun at {}.".format(kernel_name, timer.datetimestamp()))
    clf = skl.svm.SVC(kernel=kernel)
    clf.fit(X_train_mfcc, y_train)
    print("\tTraining {} SVM finished at {}. Calculating\n\t\tscores...".format(
                        kernel_name,
                        timer.datetimestamp()))
    mfcc_benchmarks.loc[idx_tuple]["Training Accuracy"] = clf.score(X_train_mfcc, y_train)
    mfcc_benchmarks.loc[idx_tuple]["Validation Accuracy"] = clf.score(X_val_mfcc, y_val)
    mfcc_benchmarks.loc[idx_tuple]["Test Accuracy"] = clf.score(X_test_mfcc, y_test)
    print('\tTrain accuracy: {:.2%}; validation accuracy: {:.2%}; test accuracy: {:.2%}\n'.format(
                            mfcc_benchmarks.loc[idx_tuple]["Training Accuracy"], 
                            mfcc_benchmarks.loc[idx_tuple]["Validation Accuracy"], 
                            mfcc_benchmarks.loc[idx_tuple]["Test Accuracy"]))

In [5]:
iterables = [["Small","Extended"], ["Linear","Polynomial","RBF","Sigmoid"]]
idx = pd.MultiIndex.from_product(iterables, names=["Dataset Size", "SVC Kernel"])
mfcc_benchmarks = pd.DataFrame(index = idx, 
                               columns = ["Training Accuracy",
                                          "Validation Accuracy",
                                          "Test Accuracy"])

## Experiment 1: MFCC Benchmarking

### Small Dataset

In [6]:
# Code adapted from FMA (commenting mine):

# Use the small dataset (which is balanced re: genre and <8GB):
small = tracks['set', 'subset'] <= 'small'

# Get the pre-split sets from the FMA dataset
train = tracks['set', 'split'] == 'training'
val = tracks['set', 'split'] == 'validation'
test = tracks['set', 'split'] == 'test'

# Pull the main genre information on the examples in each of the pre-split sets:
y_train = tracks.loc[small & train, ('track', 'genre_top')]
y_val = tracks.loc[small & val, ('track', 'genre_top')]
y_test = tracks.loc[small & test, ('track', 'genre_top')]

In [7]:
# Check out the data:
print(f"Data split into {y_train.size} training examples, "
      f"{y_val.size} validation examples, and "
      f"{y_test.size} testing examples.")

unique_genres = tracks.loc[small & train, ('track', 'genre_top')].unique().categories
print("Small dataset contains {} tracks:".format((tracks.loc[small & (train | 
                                                                       test | 
                                                                       val)]).shape[0]))
for item in unique_genres:
    num_train = y_train[y_train == item].shape[0]
    num_val = y_val[y_val == item].shape[0]
    num_test = y_test[y_test == item].shape[0]
    print(f"\t{item}: {num_train} training, {num_val} validation, and {num_test} test.")
print()

Data split into 6400 training examples, 800 validation examples, and 800 testing examples.
Small dataset contains 8000 tracks:
	Hip-Hop: 800 training, 100 validation, and 100 test.
	Pop: 800 training, 100 validation, and 100 test.
	Folk: 800 training, 100 validation, and 100 test.
	Rock: 800 training, 100 validation, and 100 test.
	Experimental: 800 training, 100 validation, and 100 test.
	International: 800 training, 100 validation, and 100 test.
	Electronic: 800 training, 100 validation, and 100 test.
	Instrumental: 800 training, 100 validation, and 100 test.



In [8]:
# Pull the Mel-frequency cepstral coefficients (pre-calculated) for each of the pre-split 
# sets--the MFCCs are used as the baseline:
X_train_mfcc = features.loc[y_train.index, 'mfcc']
X_val_mfcc = features.loc[y_val.index, 'mfcc']
X_test_mfcc = features.loc[y_test.index, 'mfcc']
print(("* MFCC data loaded: {} training examples, {} validation examples, and "
       "{} test\nexamples.").format(X_train_mfcc.shape[0], 
                                   X_val_mfcc.shape[0], 
                                   X_test_mfcc.shape[0]))
print()

# Shuffle training examples:
X_train_mfcc, y_train = skl.utils.shuffle(X_train_mfcc, y_train, random_state=seed)
print("* Shuffle complete.")

# Standardize MFCC features by removing the mean and scaling to unit variance.
scaler = skl.preprocessing.StandardScaler(copy=False)
X_train_mfcc = scaler.fit_transform(X_train_mfcc)
X_test_mfcc = scaler.transform(X_test_mfcc)
X_val_mfcc = scaler.transform(X_val_mfcc)
print("* Preprocessing for 0 mean/unit variance complete.")

print(("\nThe MFCC training data contains {} features per example and "
       "{} classes.").format(X_train_mfcc.shape[1],
                             unique_genres.shape[0]))

# Now perform the benchmarking:
fit_svm("linear", ("Small","Linear"))
fit_svm("poly", ("Small","Polynomial"))
fit_svm("rbf", ("Small","RBF"))
fit_svm("sigmoid", ("Small","Sigmoid"))

* MFCC data loaded: 6400 training examples, 800 validation examples, and 800 test
examples.

* Shuffle complete.
* Preprocessing for 0 mean/unit variance complete.

The MFCC training data contains 140 features per example and 8 classes.
Linear SVM training SVM begun at Tuesday, 2017 November 28, 7:47 AM.
	Training Linear SVM finished at Tuesday, 2017 November 28, 7:48 AM. Calculating
		scores...
	Train accuracy: 60.53%; validation accuracy: 46.12%; test accuracy: 41.62%

Polynomial SVM training SVM begun at Tuesday, 2017 November 28, 7:48 AM.
	Training Polynomial SVM finished at Tuesday, 2017 November 28, 7:48 AM. Calculating
		scores...
	Train accuracy: 70.16%; validation accuracy: 45.00%; test accuracy: 38.88%

RBF SVM training SVM begun at Tuesday, 2017 November 28, 7:48 AM.
	Training RBF SVM finished at Tuesday, 2017 November 28, 7:49 AM. Calculating
		scores...
	Train accuracy: 75.81%; validation accuracy: 53.12%; test accuracy: 46.38%

Sigmoid SVM training SVM begun at Tuesday, 2

### Extended Dataset

In [9]:
# Pull up the small dataset y values, just for its list of genres
print("Desired genres: {}".format(unique_genres.tolist()))

# Now filter the tracks based on those genres:
eight_genre = tracks[tracks[("track","genre_top")].isin(unique_genres)]
print(f"Eight-genre dataset contains {eight_genre.shape[0]} tracks:")
for item in unique_genres:
    num = eight_genre[eight_genre[("track", "genre_top")] == item].shape[0]
    print("\t{}: {}".format(item, num))
print()

size_key = 'large'
print ("Using the {} dataset as the base for the extended dataset.".format(size_key))

size_selector = eight_genre['set', 'subset'] <= size_key

# Get the pre-split sets from the FMA dataset
train = eight_genre['set', 'split'] == 'training'
val = eight_genre['set', 'split'] == 'validation'
test = eight_genre['set', 'split'] == 'test'

# Pull the main genre information on the examples in each of the pre-split sets:
y_train = eight_genre.loc[size_selector & train, ('track', 'genre_top')]
y_val = eight_genre.loc[size_selector & val, ('track', 'genre_top')]
y_test = eight_genre.loc[size_selector & test, ('track', 'genre_top')]
print(f"Data split into {y_train.size} training examples, "
      f"{y_val.size} validation examples, and "
      f"{y_test.size} testing examples.")

print(f"Extended dataset contains {eight_genre.shape[0]} tracks:")
for item in unique_genres:
    num_train = y_train[y_train == item].shape[0]
    num_val = y_val[y_val == item].shape[0]
    num_test = y_test[y_test == item].shape[0]
    print(f"\t{item}: {num_train} training, {num_val} validation, and {num_test} test.")
print()

Desired genres: ['Hip-Hop', 'Pop', 'Folk', 'Rock', 'Experimental', 'International', 'Electronic', 'Instrumental']
Eight-genre dataset contains 46317 tracks:
	Hip-Hop: 3552
	Pop: 2332
	Folk: 2803
	Rock: 14182
	Experimental: 10608
	International: 1389
	Electronic: 9372
	Instrumental: 2079

Using the large dataset as the base for the extended dataset.
Data split into 37316 training examples, 4350 validation examples, and 4651 testing examples.
Extended dataset contains 46317 tracks:
	Hip-Hop: 2910 training, 319 validation, and 323 test.
	Pop: 1815 training, 313 validation, and 204 test.
	Folk: 2275 training, 229 validation, and 299 test.
	Rock: 11394 training, 1324 validation, and 1464 test.
	Experimental: 8557 training, 966 validation, and 1085 test.
	International: 1124 training, 137 validation, and 128 test.
	Electronic: 7662 training, 871 validation, and 839 test.
	Instrumental: 1579 training, 191 validation, and 309 test.



In [10]:
# Pull the Mel-frequency cepstral coefficients (pre-calculated) for each of the pre-split 
# sets--the MFCCs are used as the baseline:
X_train_mfcc = features.loc[y_train.index, 'mfcc']
X_val_mfcc = features.loc[y_val.index, 'mfcc']
X_test_mfcc = features.loc[y_test.index, 'mfcc']
print(("* MFCC data loaded: {} training examples, {} validation examples, and "
       "{} test\nexamples.").format(X_train_mfcc.shape[0], 
                                   X_val_mfcc.shape[0], 
                                   X_test_mfcc.shape[0]))
print()

# Shuffle training examples:
X_train_mfcc, y_train = skl.utils.shuffle(X_train_mfcc, y_train, random_state=seed)
print("* Shuffle complete.")

# Standardize MFCC features by removing the mean and scaling to unit variance.
scaler = skl.preprocessing.StandardScaler(copy=False)
X_train_mfcc = scaler.fit_transform(X_train_mfcc)
X_test_mfcc = scaler.transform(X_test_mfcc)
X_val_mfcc = scaler.transform(X_val_mfcc)
print(f"* Preprocessing for 0 mean/unit variance complete.")

print(("\nThe MFCC training data contains {} features per example and "
       "{} classes.").format(X_train_mfcc.shape[1],
                             unique_genres.shape[0]))

* MFCC data loaded: 37316 training examples, 4350 validation examples, and 4651 test
examples.

* Shuffle complete.
* Preprocessing for 0 mean/unit variance complete.

The MFCC training data contains 140 features per example and 8 classes.


In [11]:
fit_svm("linear", ("Extended","Linear"))
fit_svm("poly", ("Extended","Polynomial"))
fit_svm("rbf", ("Extended","RBF"))
fit_svm("sigmoid", ("Extended","Sigmoid"))

Linear SVM training SVM begun at Tuesday, 2017 November 28, 7:49 AM.
	Training Linear SVM finished at Tuesday, 2017 November 28, 8:24 AM. Calculating
		scores...
	Train accuracy: 61.27%; validation accuracy: 62.25%; test accuracy: 57.82%

Polynomial SVM training SVM begun at Tuesday, 2017 November 28, 8:28 AM.
	Training Polynomial SVM finished at Tuesday, 2017 November 28, 8:36 AM. Calculating
		scores...
	Train accuracy: 72.37%; validation accuracy: 58.64%; test accuracy: 56.35%

RBF SVM training SVM begun at Tuesday, 2017 November 28, 8:40 AM.
	Training RBF SVM finished at Tuesday, 2017 November 28, 8:47 AM. Calculating
		scores...
	Train accuracy: 74.54%; validation accuracy: 63.59%; test accuracy: 61.21%

Sigmoid SVM training SVM begun at Tuesday, 2017 November 28, 8:52 AM.
	Training Sigmoid SVM finished at Tuesday, 2017 November 28, 10:18 AM. Calculating
		scores...
	Train accuracy: 45.32%; validation accuracy: 45.01%; test accuracy: 45.99%



In [12]:
display(mfcc_benchmarks)
ut.save_obj(mfcc_benchmarks, "mfcc_benchmarks")

Unnamed: 0_level_0,Unnamed: 1_level_0,Training Accuracy,Validation Accuracy,Test Accuracy
Dataset Size,SVC Kernel,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Small,Linear,0.605313,0.46125,0.41625
Small,Polynomial,0.701562,0.45,0.38875
Small,RBF,0.758125,0.53125,0.46375
Small,Sigmoid,0.409062,0.36,0.3475
Extended,Linear,0.612713,0.622529,0.578155
Extended,Polynomial,0.723711,0.586437,0.563535
Extended,RBF,0.745444,0.635862,0.612126
Extended,Sigmoid,0.45321,0.450115,0.459901
