# Using Neural Networks and SVM to Classify Music Genres

## Introduction
Many people like listening to music, and some might be interested in determining
the genre of the music that they are listening to. The midi format is a popular music
format that can capture many aspects of given music, such as instrument, chord
signatures and so on and it is also easy to create midi files using instruments
like a portable piano or stage piano, as you could just record your playing and export
it in the midi format. Therefore, in this tutorial, I would like to
walk through the procedure of extracting features from midi data, matching them with genre labels
from the million song dataset and training them on neural networks and SVMs to predict the
genres of music, given any midi file.

## Tutorial Content
* [Package Installation and Imports](#step-1)
* [Download and Parse Genre Labels](#step-2)
* [Download, Parse and Match Midi Files](#step-3)
* [Extract Features from Matched Midi Files](#step-4)
* [Partition Dataset and Format Features](#step-5)
* [Construct Models to Fit Labeled Features](#step-6)
* [Calculate Test Accuracy](#step-7)
* [Make a Prediction on a Midi File](#step-8)
* [Analysis, Next Steps and Additional Information](#next-steps)
* [Author and Credits](#credits)

## Required Files and Libraries
The following will be required for this tutorial:
* The **`numpy`** module.
* The **`pandas`** module.
* The **`scikit-learn`** module.
* The **`pretty_midi`** module for which documentation can be found at https://craffel.github.io/pretty-midi/.
* The midi file dataset that can be found and downloaded from http://colinraffel.com/projects/lmd/.
* The genre labels that can be found and downloaded from http://www.tagtraum.com/msd_genre_datasets.html.

<a id="step-1"></a>

## Step 1: Package Installation and Imports
The **`pretty_midi`** module can be installed using pip:
    
    $ pip install pretty_midi

The other packages like `numpy`, `pandas` and `scikit-learn` can also be installed using pip:
    
    $ pip install numpy

    $ pip install pandas

    $ pip install scikit-learn

We may also install these modules using the package manager of our choice, like `apt`, `homebrew`, `pacman`, `yaourt`, etc.

After we have these installed, we need import these three modules and make sure we can run the following
lines of code, otherwise we will need to try re-installing the modules or use another installation method
instead. 

Aside from the modules mentioned above, we will also be using the `warnings` package to raise errors instead of 
warnings when a corrupted midi file is read, so that we can catch these errors and skip these files. The `os` 
package is used to traverse through the folder stucture of midi files to find each one and extract the track ID 
data from the working directory string.

In [1]:
import numpy as np
import pandas as pd
from sklearn.neural_network import MLPClassifier
from sklearn.svm import SVC
import pretty_midi
import warnings
import os

<a id="step-2"></a>

## Step 2: Download and Parse Genre Labels
In this step we will download and parse the genre labels and construct a pandas dataframe to store the data.
Do the following:
* Go to http://www.tagtraum.com/msd_genre_datasets.html
* Scroll down to the Genre Ground Truth section to download the zip file under the label "CD1".
* Then unzip the file in this folder.

Now we can write a function to read the file and create a dataframe to store the contents.
After writing the function, we can call it on the path to the genre label file, and if we
print the first couple lines of the output, we should get a dataframe with the Track IDs and the corresponding 
Genres.

We can also construct a list of genre labels that we have seen as well as a dictionary mapping genres to the
index of that genre in the list to make it easier and faster to convert between a number and a genre. This will
facilitate training and the prediction in later steps.

In [2]:
def get_genres(path):
    """
    This function reads the genre labels and puts it into a pandas DataFrame.
    
    @input path: The path to the genre label file.
    @type path: String
    
    @return: A pandas dataframe containing the genres and midi IDs.
    @rtype: pandas.DataFrame
    """
    ids = []
    genres = []
    with open(path) as f:
        line = f.readline()
        while line:
            if line[0] != '#':
                [x, y, *_] = line.strip().split("\t")
                ids.append(x)
                genres.append(y)
            line = f.readline()
    genre_df = pd.DataFrame(data={"Genre": genres, "TrackID": ids})
    return genre_df

# Get the Genre DataFrame
genre_path = "msd_tagtraum_cd1.cls"
genre_df = get_genres(genre_path)

# Create Genre List and Dictionary
label_list = list(set(genre_df.Genre))
label_dict = {lbl: label_list.index(lbl) for lbl in label_list}

# Print to Visualize
print(genre_df.head(), end="\n\n")
print(label_list, end="\n\n")
print(label_dict, end="\n\n")

      Genre             TrackID
0  Pop_Rock  TRAAAAK128F9318786
1       Rap  TRAAAAW128F429D538
2  Pop_Rock  TRAAABD128F429CF47
3      Jazz  TRAAAED128E0783FAB
4  Pop_Rock  TRAAAEF128F4273421

['Jazz', 'Folk', 'Electronic', 'Latin', 'Vocal', 'Blues', 'Rap', 'International', 'RnB', 'Reggae', 'New Age', 'Pop_Rock', 'Country']

{'Jazz': 0, 'Folk': 1, 'Electronic': 2, 'Latin': 3, 'Vocal': 4, 'Blues': 5, 'Rap': 6, 'International': 7, 'RnB': 8, 'Reggae': 9, 'New Age': 10, 'Pop_Rock': 11, 'Country': 12}



<a id="step-3"></a>

## Step 3: Download, Parse and Match Midi Files
In this step we will download the midi files, parse them then match them with the tracks for which we have
genre labels for.

We can follow the procedure below:
* Go to http://colinraffel.com/projects/lmd/
* Scroll down and click on the link in the section that says "LMD-matched" to download a midi dataset where each file is matched to an entry in the million song dataset.
* After the download you can untar the file in this folder using the command
    
    
    $ tar -xvf lmd_matched.tar.gz
    
Now we can write a function to traverse the folder structure of the midi files, create a dataframe
to hold the contents, then join with the genre label dataframe in order to make sure that we only read files
that have a label in genre dataframe that we created in [step 2](#step-2). At the end, we can again print out
a part of our dataframe of matched midi file paths along with their corresponding genre to see the structure.

In [3]:
def get_matched_midi(midi_folder, genre_df):
    """
    This function loads in midi file paths that are found in the given folder, puts this data into a
    pandas DataFrame, then matches each entry with a genre described in get_genres.
    
    @input midi_folder: The path to the midi files.
    @type midi_folder: String
    @input genre_df: The genre label dataframe generated by get_genres.
    @type genre_df: pandas.DataFrame
    
    @return: A dataframe of track id and path to a midi file with that track id.
    @rtype: pandas.DataFrame
    """
    # Get All Midi Files
    track_ids, file_paths = [], []
    for dir_name, subdir_list, file_list in os.walk(midi_folder):
        if len(dir_name) == 36:
            track_id = dir_name[18:]
            file_path_list = ["/".join([dir_name, file]) for file in file_list]
            for file_path in file_path_list:
                track_ids.append(track_id)
                file_paths.append(file_path)
    all_midi_df = pd.DataFrame({"TrackID": track_ids, "Path": file_paths})
    
    # Inner Join with Genre Dataframe
    df = pd.merge(all_midi_df, genre_df, on='TrackID', how='inner')
    return df.drop(["TrackID"], axis=1)

# Obtain DataFrame with Matched Genres to File Paths
midi_path = "lmd_matched"
matched_midi_df = get_matched_midi(midi_path, genre_df)

# Print to Check Correctness
print(matched_midi_df.head())

                                                Path     Genre
0  lmd_matched/I/I/I/TRIIIMY128F4259A8E/1347136e2...  Pop_Rock
1  lmd_matched/I/I/I/TRIIIMY128F4259A8E/e77645a3b...  Pop_Rock
2  lmd_matched/I/I/U/TRIIUGA128F42AD432/861fea2b6...  Pop_Rock
3  lmd_matched/I/I/U/TRIIUGA128F42AD432/e9af70dd5...  Pop_Rock
4  lmd_matched/I/I/U/TRIIUGA128F42AD432/93cbbf687...  Pop_Rock


<a id="step-4"></a>

## Step 4: Extract Features from Matched Midi Files
In this step we will use the Python package called **`pretty_midi`**, and write a function to extract
features and relevant information from midi files. This step will setup a design matrix for training.
After we write our function, we can call it and store the design matrix in a variable for further use.
Parsing the data from each midi file will take some time, so with the amount of data that we have in
from our dataframe, this step will take about an hour to run.

We will write a function called `normalize_features` to normalize a feature vector such that each value
will approximately fall in the range (-1, 1) which will make it easier and faster for the machine
learning modelto converge.

After that, we need a function `get_features` to get some useful features from a given midi file. Over
here we need to decide which features that `pretty_midi` can give us are more likely to be important
factors that contribute to classifying the music genre. As an example I will extract the tempo, number
of chord signature changes, the resolution as well as the time signature. In this function we will also
have to detect corrupted file and make sure not to include them in our training.

Then we will write a function to construct a matrix of all the features of the entire set of data that
we obtained in [step 3](#step-3).

In [4]:
%%time
def normalize_features(features):
    """
    This function normalizes the features to the range [-1, 1]
    
    @input features: The array of features.
    @type features: List of float
    
    @return: Normalized features.
    @rtype: List of float
    """
    tempo = (features[0] - 150) / 300
    num_sig_changes = (features[1] - 2) / 10
    resolution = (features[2] - 260) / 400
    time_sig_1 = (features[3] - 3) / 8
    time_sig_2 = (features[4] - 3) / 8
    return [tempo, resolution, time_sig_1, time_sig_2]


def get_features(path):
    """
    This function extracts the features from a midi file when given its path.
    
    @input path: The path to the midi file.
    @type path: String
    
    @return: The extracted features.
    @rtype: List of float
    """
    try:
        # Test for Corrupted Midi Files
        with warnings.catch_warnings():
            warnings.simplefilter("error")
            file = pretty_midi.PrettyMIDI(path)
            
            tempo = file.estimate_tempo()
            num_sig_changes = len(file.time_signature_changes)
            resolution = file.resolution
            ts_changes = file.time_signature_changes
            ts_1 = 4
            ts_2 = 4
            if len(ts_changes) > 0:
                ts_1 = ts_changes[0].numerator
                ts_2 = ts_changes[0].denominator
            return normalize_features([tempo, num_sig_changes, resolution, ts_1, ts_2])
    except:
        return None


def extract_midi_features(path_df):
    """
    This function takes in the path DataFrame, then for each midi file, it extracts certain
    features, maps the genre to a number and concatenates these to a large design matrix to return.
    
    @input path_df: A dataframe with paths to midi files, as well as their corresponding matched genre.
    @type path_df: pandas.DataFrame
    
    @return: A matrix of features along with label.
    @rtype: numpy.ndarray of float
    """
    all_features = []
    for index, row in path_df.iterrows():
        features = get_features(row.Path)
        genre = label_dict[row.Genre]
        if features is not None:
            features.append(genre)
            all_features.append(features)
    return np.array(all_features)

labeled_features = extract_midi_features(matched_midi_df)
print(labeled_features)

[[ 0.31833658 -0.35        0.125       0.125      11.        ]
 [ 0.32477658  0.55        0.125       0.125      11.        ]
 [ 0.0560317  -0.53        0.125       0.125      11.        ]
 ...
 [ 0.2000007  -0.17        0.125       0.125      11.        ]
 [ 0.24371827  0.31        0.125       0.125       3.        ]
 [ 0.22364894  0.31        0.125       0.125      11.        ]]
CPU times: user 51min 10s, sys: 1min 10s, total: 52min 21s
Wall time: 53min 3s


<a id="step-5"></a>

## Step 5: Partition Dataset and Format Features
In this step, we will partition the data set so that 60% of it is used for training the data, 20% is used for
validation (to tune the hyperparameters of the neural network), and the remaining 20% is used as the testing
set to report our accuracy.

When writing the code to partition, it is nice to randomly permute the whole set of data first so that we can
get a more random sets of data for each of the partitions.

After splitting the data, we also need to separate the features from the labels, and we also need to write a
function to encode the labels using one-hot encoding for multi-class classification with neural networks.

In [5]:
# Shuffle Entire Dataset to Make Random
labeled_features = np.random.permutation(labeled_features)

# Partition into 3 Sets
num = len(labeled_features)
num_training = int(num * 0.6)
num_validation = int(num * 0.8)
training_data = labeled_features[:num_training]
validation_data = labeled_features[num_training:num_validation]
test_data = labeled_features[num_validation:]

# Separate Features from Labels
num_cols = training_data.shape[1] - 1
training_features = training_data[:, :num_cols]
validation_features = validation_data[:, :num_cols]
test_features = test_data[:, :num_cols]

# Format Features for Multi-class Classification
num_classes = len(label_list)
training_labels = training_data[:, num_cols].astype(int)
validation_labels = validation_data[:, num_cols].astype(int)
test_labels = test_data[:, num_cols].astype(int)

# Function for One-Hot Encoding
def one_hot(labels):
    """
    This function encodes the labels using one-hot encoding.
    
    @input num_classes: The number of genres/classes.
    @type num_classes: int
    @input labels: The genre labels to encode.
    @type labels: numpy.ndarray of int
    
    @return: The one-hot encoding of the labels.
    @rtype: numpy.ndarray of int
    """
    return np.eye(num_classes)[labels].astype(int)

# Print to Check Dimentions and to Visualize
print(test_features[:10])
print(test_labels[:10])
print(one_hot(test_labels)[:10])

[[ 0.15499788 -0.35        0.125       0.125     ]
 [ 0.07662544 -0.17        0.125       0.125     ]
 [ 0.11022558 -0.53        0.125       0.125     ]
 [ 0.1958105  -0.41        0.125       0.125     ]
 [ 0.15969576 -0.35        0.125       0.125     ]
 [ 0.09223524  0.55        0.125       0.125     ]
 [ 0.28787879 -0.05        0.125       0.125     ]
 [ 0.05369878  0.31        0.125       0.125     ]
 [ 0.1066389   0.31        0.125       0.125     ]
 [ 0.36558283  0.31        0.125       0.125     ]]
[11 11 11 11 11  6 11 11  3 11]
[[0 0 0 0 0 0 0 0 0 0 0 1 0]
 [0 0 0 0 0 0 0 0 0 0 0 1 0]
 [0 0 0 0 0 0 0 0 0 0 0 1 0]
 [0 0 0 0 0 0 0 0 0 0 0 1 0]
 [0 0 0 0 0 0 0 0 0 0 0 1 0]
 [0 0 0 0 0 0 1 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 1 0]
 [0 0 0 0 0 0 0 0 0 0 0 1 0]
 [0 0 0 1 0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0 0 0 1 0]]


<a id="step-6"></a>

## Step 6: Construct Models to Fit Labeled Features

In this step we will use `scikit-learn` to fit our data to neural networks with different
configurations and an SVM. `scikit-learn` has a Multi-layer Perceptron model as well as
an Support Vector Classifier model that we can use for supervised learning. We will in this
step, use the validation data set to tune our hyperparameters, pick the best model and
return the classifier that performed best on the validation set.

In [6]:
def train_model(t_features, t_labels, v_features, v_labels):
    """
    This function trains a neural network using a couple different configurations.
    
    @input t_features: The training features.
    @type t_features: numpy.ndarray of float
    @input t_labels: The training labels.
    @type t_labels: numpy.ndarray of int
    @input v_features: The validation features.
    @type v_features: numpy.ndarray of float
    @input v_labels: The validation labels.
    @type v_labels: numpy.ndarray of int
    
    @return: The classifier that achieved the best validation accuracy.
    @rtype: sklearn.neural_network.multilayer_perceptron.MLPClassifier
    """
    # Neural Network and SVM Configurations
    clf_1 = MLPClassifier(solver='adam', alpha=1e-4, hidden_layer_sizes=(5,), random_state=1)
    clf_2 = MLPClassifier(solver='adam', alpha=1e-4, hidden_layer_sizes=(5, 5), random_state=1)
    clf_3 = MLPClassifier(solver='adam', alpha=1e-5, hidden_layer_sizes=(10, 10), random_state=1)
    clf_4 = MLPClassifier(solver='adam', alpha=1e-5, hidden_layer_sizes=(100, 100), random_state=1)
    clf_svm = SVC()
    
    # Keep Track of the Best Model
    best_clf = None
    best_accuracy = 0
    
    # Test the Accuracies of the Models and Get Best
    for clf in [clf_1, clf_2, clf_3, clf_4, clf_svm]:
        t_labels_hot = one_hot(t_labels)
        v_labels_hot = one_hot(v_labels)
        if (type(clf) == SVC):
            clf = clf.fit(t_features, t_labels)
        else:
            clf = clf.fit(t_features, t_labels_hot)
        predictions = clf.predict(v_features)
        count = 0
        for i in range(len(v_labels)):
            if (type(clf) != SVC):
                if np.array_equal(v_labels_hot[i], predictions[i]):
                    count += 1
            else:
                if v_labels[i] == predictions[i]:
                    count += 1
        accuracy = count / len(v_labels_hot)
        if accuracy > best_accuracy:
            best_accuracy = accuracy
            best_clf = clf

    print("Best Accuracy:", best_accuracy)
    return best_clf

classifier = train_model(training_features, training_labels, validation_features, validation_labels)

Best Accuracy: 0.7511980830670927


<a id="step-7"></a>

## Step 7: Calculate Test Accuracy
In this step we will calculate the test accuracy of our trained model. Now we can use
our test data set that we created in [step-5](#step-5). This will be an accurate
reporting since our classification model has never seen this test set before.

In [7]:
def calculate_accuracy(clf, t_features, t_labels):
    """
    This function takes a trained model as well as the test features and its
    corresponding labels, and reports the accuracy of the model.
    
    @input clf: The trained classifier.
    @type model: sklearn.neural_network.multilayer_perceptron.MLPClassifier
    @input t_features: The features from the test set.
    @type f_features: numpy.ndarray of float
    @input t_labels: The labels of the test set features.
    @type t_labels: numpy.ndarray of int
    
    @return: The accuracy.
    @rtype: float
    """
    count = 0
    predictions = clf.predict(t_features)
    t_labels_hot = one_hot(t_labels)
    for i in range(len(t_features)):
        if (type(clf) == SVC):
            if t_labels[i] == predictions[i]:
                count += 1
        else:
            if np.array_equal(t_labels_hot[i], predictions[i]):
                count += 1
    return count / len(t_features)

# Print the Test Accuracy
print(calculate_accuracy(classifier, test_features, test_labels))

0.7469061876247505


<a id="step-8"></a>

## Step 8: Make a Prediction on a Midi File

In this step we will make a prediction on a midi file and print out the genre that
the classifier predicts!

In [8]:
def make_prediction(clf, midi_path):
    """
    This function uses the classifier to predict the genre of a midi file.
    
    @input clf: The trained classifier.
    @type clf: sklearn.neural_network.multilayer_perceptron.MLPClassifier
    @input midi_path: The path to the midi file that we are trying to classify.
    @type midi_path: String
    
    @return: The predicted genre of the midi file.
    @rtype: String
    """
    features = get_features(midi_path)
    prediction_ind = list(clf.predict([features])[0]).index(1)
    prediction = label_list[prediction_ind]
    return prediction
    
# Make a Prediction
test_midi_path ="lmd_matched/B/F/E/TRBFELB128F426BFF2/289270d85c81802d912c9907c645dc2d.mid"
print(make_prediction(classifier, test_midi_path))
    

Pop_Rock


<a id="next-steps"></a>

## Analysis, Next Steps and Additional Information

For this classification, we achieved a test accuracy of 75% which is not very high. However, if we
use the `calculate_accuracy` function to check our training accuracy, we will see that it is fairly
similiar to our test accuracy, which means that we are not over-fitting our data. Therefore, one
reason for our low accuracy may be that our features are not relevant enough to our genre predicion.
For example, different genres of music may have different tempos, or the same genre might have a wide
range of possible tempos. Another possible reason could be that we have messy data, and that some of
the labels of midi files are wrong. We did, however, walk through the entire pipeline of parsing the
raw midi data, matching them with genres and training the parsed features using different machine
learning models and making a correct prediction on an unlabeled file!

Given our analysis, some of the next things that we can do are:
* Take a deeper look into music theory and decide which features of music are more relevant to a certain genre.
* Analyze the data to see if some extracted features do not differ a lot between different genres.
* Listen to some matched midi files and make a classification by ear to see if it matches the label.
* Use different machine learning models to see if they perform better.


It might be useful to look up on some additional information:
* [F1 score](https://en.wikipedia.org/wiki/F1_score) (This might be useful since our data is not stratified and we have a large number of some label, and fewer others)
* [scikit-learn classifiers](http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html) might help us choose different classifiers that could possible give a better performance.

<a id="credits"></a>

## Author and Credits
The author of this tutorial is Sander Shi, a Computer Science undergraduate at Carnegie Mellon University. This
tutorial is created as a class project for Practical Data Science (15-388) taught by Zico Kolter during the
Spring of 2018.

The midi dataset that I used in this tutorial is created by Colin Raffel, and the website for the dataset can
be found [here](http://colinraffel.com/projects/lmd). The Python module called `pretty_midi` that I used in
this tutorial is also created by Colin Raffel, and the link to it can be found
[here](http://craffel.github.io/pretty-midi/).