# CCRMA MIR Workshop: MedleyDB and Neural Networks
Elena Georgieva, Iran Roman

In this assignment you will test different ML techniques to classify instruments. This assignment uses a large dataset (~8 GB) which you will download separately: *Medley-solos-DB*:

<blockquote>
V. Lostanlen, C.E. Cella. Deep convolutional networks on the pitch spiral for musical instrument recognition. Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), 2016.
</blockquote>

Refer to https://scikit-learn.org/ for implementation instructions and tutorials.

Some of this notebook was developed by Dirk Vander Wilt at NYU. Thanks!

Let's start with imports:

In [2]:
import numpy as np
import pandas as pd
import librosa
from librosa import feature
from sklearn import neighbors
from sklearn import neural_network
from sklearn.metrics import f1_score
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt


## Part 1: Download Dataset and Metadata

The data is a folder containing wav audio files, and a separate .csv file with metadata. You can download both here:

https://zenodo.org/record/3464194#.X4G_oi2z3kJ

Place both the folder and csv file into the same directory as your `CCRMA_MIR_MedleyDB.ipynb` file, such that the folder stucture is as follows:

`
 
 <--   CCRMA_MIR_MedleyDB.ipynb

 <--   Medley-solos-DB_metadata.csv

 <--   Medley-solos-DB

      <--   *.wav files
`

The audio files contain recordings from 8 different instuments which have already been labeled and separated into training, validation, and test sets. Each audio file is the same length, and there are many example files from each instrument.

Each audio file has a unique id number associated with it ('uuid4'). This id is important when extracting the audio data and making sure that the file has the correct label, as referenced in the csv file. 


**1. Load the .csv file into a DataFrame called `Medley_Data`. Display the data.** 

In [5]:
# Your Code Here

**Next, let's go and listen to a few .wav files. Is it easy for you to tell what instrument is performing in each one?**

The list of instrument classes should be:

0. clarinet

1. distorted electric guitar

2. female singer

3. flute

4. piano

5. tenor saxophone

6. trumpet

7. violin

### Helper Function: `get_file_name_and_label()` and `get_ids()`

The following helper functions have been provided for you.


**1. Print out how many tracks are in each of the training, validation, and test sets.**

In [6]:
def get_file_name_and_label(uuid, path='Medley-solos-DB/', dataset=Medley_Data):
    #Returns full file name and path from a uuid
    rd = dataset.loc[ (dataset['uuid4'] == uuid) ]
    file = path + 'Medley-solos-DB' + '_' + str(rd.values[0,0]) + '-'  + str(rd.values[0,2]) + '_' + rd.values[0,4] + '.wav'
    label = rd.values[0,2]
    return(file, label)
                       
def get_ids(subset, path = 'Medley-solos-DB/', dataset = Medley_Data):
    #Get a np array of all uuids or a subset of files in the dataset 
    file_array = np.array([])
    rd = dataset.loc[ (dataset['subset'] == subset) ]
    if len(rd.index) < 1:
        file_array = np.array([0])
    else:
        k = 0
        for i in range(len(rd.index)):
            file_array = np.append(file_array,rd.iloc[k,4])
            k += 1
    return(file_array)


# Divides up file names into training, validation, and test sets
tracks_train =  get_ids('training')
tracks_validate = get_ids('validation')
tracks_test = get_ids('test')

# Your code Here

## Part 1a: Compute Features

Create a function `compute_features()` such that the input is one audio file and the output is a single feature vector. This function should do the following:
1. Load audio into a sample array.
2. Compute the MFCCs of the input audio, and remove the first  coeficient.
3. Compute the summary statistics of the MFCCs over time:
    1. Find the mean and standard deviation for each feature (2 values for each feature)
    2. stack these statistics into single 1-d vector of size (2*(n_mfccs-1))
4. Return the 1-d vector.

In [4]:
def compute_features(audiofile, n_fft=2048, hop_length=512, n_mels=128, n_mfcc=20):
    ## Your Code Here
    
    return ___


## Part 1b: Create Feature Set - Provided

Provided is a function `create_feature_set()` where the input is an array of audio files and output is a normalized feature set and an accompanying vector of class labels. 
1. We Iterate through all audio files in a list of uuids. The training, test, and validation lists have been created for you. For each uuid we:
    1. Use `get_file_name_and_label()` to retrieve the audio file name and associated label
    2. Use `compute_features()` to get the 1-d vector for that audio file.
    3. Append the feature vector and label to their respective arrays, and continue to the next file.
2. When finished, we output 2 numpy arrays: the feature matrix (n_samples, 2*(mfccs-1)) and the label (n_samples,)

In [5]:
def create_feature_set(id_list):
    nTracks = len(id_list)
    
    features = np.zeros((nTracks, 38)) 
    labels = np.zeros(nTracks)
    counter = 0
    for id1 in id_list:
        id1_filename, id1_label = get_file_name_and_label(id1)
        id1_features = compute_features(id1_filename)
        features[counter,:] = id1_features
        labels[counter] = id1_label
        counter +=1
    
    return features, labels

## Part 2a: Get Mean and Standard Deviation

** 1. Create a function `get_stats()` which gets the mean and standard deviation for each feature in the input matrix.**


In [6]:
def get_stats(____):
    # Your Code Here
    return ___, ___


### Getting Everything Ready

The code in the following cell has been done for you. When all is well, run the code to compute features and training labels for the 3 data sets in Medley-solos-DB.

Note, This code will take a while to run! It might be good to test on small subset of the data, like `tracks_train[0:500]`, just to make sure everything's working.


In [9]:
load_saved_tests = False # Change this to True if you want to load prevously-computed features

if not load_saved_tests:
    test_set, test_labels = create_feature_set(tracks_test)
    print("Test Set: " + str(test_set.shape))
    train_set, train_labels = create_feature_set(tracks_train)
    print("Training Set: " + str(train_set.shape))
    validate_set, validate_labels = create_feature_set(tracks_validate)
    print("Validation Set: " + str(validate_set.shape))
    np.savetxt('test_set.csv', test_set, delimiter=',')
    np.savetxt('test_labels.csv', test_labels, delimiter=',')
    np.savetxt('train_set.csv', train_set, delimiter=',')
    np.savetxt('train_labels.csv', train_labels, delimiter=',')
    np.savetxt('validate_set.csv', validate_set, delimiter=',')
    np.savetxt('validate_labels.csv', validate_labels, delimiter=',')
else:
    test_set = np.loadtxt('test_set.csv',delimiter=',')
    test_labels = np.loadtxt('test_labels.csv',delimiter=',')
    train_set = np.loadtxt('train_set.csv',delimiter=',')
    train_labels = np.loadtxt('train_labels.csv',delimiter=',')
    validate_set = np.loadtxt('validate_set.csv',delimiter=',')
    validate_labels = np.loadtxt('validate_labels.csv',delimiter=',')
    print("Test Set: " + str(test_set.shape))
    print("Training Set: " + str(train_set.shape))
    print("Validation Set: " + str(validate_set.shape))
    

Test Set: (12236, 38)
Training Set: (5841, 38)
Validation Set: (3494, 38)


## Part 2b: Normalize Feature Sets

**1. Using `get_stats()` find the mean and standard deviations for the training set. Then use those statistics to make all 3 data sets have a mean of 0 and standard deviation of 1.!!

In [10]:
# Your Code Here

## Part 3: k-Nearest Neighbor

Using the data from part 1, run a kNN classification experienment:

- Use `sklearn` entirely
- Run tests on the validation set with k = 1, 5, 20, and 50. What worked best?
- When you decide on the best settings, run the experiment on the test set and output the f-measure and a confusion matrix. Hint: you can use the functions confusion_matrix() and f1_score()


 Now, we test on the Testing set: 

confusion matrix:
 [[ 152   27    9   34  280   19    4  207]
 [   0  814    1    0   71    8    0   61]
 [   0    5  696    0   57  118    0  266]
 [  75   85   92  402 1293   17   83 1120]
 [   0   31    2    1 2558    0    0   17]
 [   0   89    6    0  111   25   11   83]
 [  15    1    1    5   33   17  222  112]
 [  14  146    9   31  604    4    2 2090]]
f1 score 0.5687316116377902


## Part 4: Multi-Layered Perceptron (Neural Network)

Multilayer Perceptrons, or MLPs for short, are the classic type of neural network.

They are comprised of one or more layers of neurons. Data is fed to the input layer, there may be one or more hidden layers providing levels of abstraction, and predictions are made on the output layer.

Using the same data, as aboverun the same test using the MLP classifier. Did it perform better?

- Use `sklearn` entirely
- Run tests on the validation set to experiment with the number of iterations and size and number of hidden layers.
- Initially, try setting `max_iter=100` and `hidden_layer_sizes=(5,2)` (meaning 2 hidden layers of sizes 5 and 2.
- When you decide on the best settings (best f-measure), run the experiment on the test set and output the f-measure and a confusion matrix.



 Now, we test on the Testing set: 

confusion matrix:
 [[ 283   25    9   22  241    2    9  141]
 [   0  842    1    0   80   14    0   18]
 [   0    1  787    0   30    9    1  314]
 [ 427  133  201  628 1102    1  249  426]
 [   0   12    0    0 2596    0    0    1]
 [   0  109    0    0  116   30   14   56]
 [  17    1    0    3   37    3  325   20]
 [   9   26    0   21   87    0    2 2755]]
f1 score 0.6739130434782609


## Part 5: Discussion

1. Which algorithm performed better?
2. Which instrument class has the best & worst performance?
3. Listen to the audio for examples the classifier got wrong. What do they have in common?

## Bonus

What other algorithms might be interesting to try?

In [13]:
## Your Code Here