# SI 670: Applied Machine Learning Final Project
## Music Genre Classification
Matt Whitehead (mwwhite)

## Spectrogram Extraction

In order to prepare the data for classification with a neural network, I extracted melspectrograms for each song in the data set. While each song is a 30 second sample, the array lengths for their signal data are slightly different. As a consequence of this, the extracted spectrograms have slightly different shapes. My initial thought was to pad the signals for the smaller songs to match the length of the larger songs. However this resulted in additional problems. I instead decided to follow the example in this [GitHub repository](https://github.com/Hguimaraes/gtzan.keras/blob/master/nbs/classification_cnn_vgg16.ipynb) and limit the signal arrays to the length of the smallest one. This resolved a lot of problems with tensor shape and did not seem to compromise model accuracy. I then split each song into 10 second windows with 50% overlap. This greatly improved model accuracy and resulted in 19 splits for each song. 

In [1]:
import pandas as pd
import librosa
import numpy as np
from os import listdir

In [2]:
def extract_specs(genre):
    # initialize lists
    temp_x = []
    temp_y = []
    
    # step through each song in the directory
    for file in listdir('genres/' + genre):
        song, sr = librosa.load('genres/' + genre + '/' + file)
        song = song[:660000]
            
        # split the song into overlaping chunks
        # ref: https://github.com/Hguimaraes/gtzan.keras/blob/master/nbs/classification_cnn_2d.ipynb
        xshape = song.shape[0]
        chunk = int(xshape*0.1)
        offset = int(chunk*(1.-0.5))
        split_song = [song[i:i+chunk] for i in range(0, xshape - chunk + offset, offset)]
        
        # output the spectogram and genre
        for s in split_song:
            split_spec = librosa.feature.melspectrogram(s)           
            temp_x.append(split_spec)
            temp_y.append(genre)
    
    return (temp_x, temp_y)

In [3]:
X = []
y = []

genres = ['pop','blues', 'classical', 'country', 'disco', 'hiphop', 'jazz', 'metal', 'reggae', 'rock']

for i in genres:
    results = extract_specs(i)
    X.append(results[0])
    y.append(results[1])

In [4]:
smallest_song = 660000
largest_song = 675808

In [5]:
final_X = np.concatenate([X[i] for i in range(10)])
final_y = np.array(y).flatten()

In [6]:
np.save('X', final_X)
np.save('y', final_y)