# Genre Classification
**Liam O'Driscoll**

This notebook uses a feedforward neural network to predict the genre of songs 

**First we begin by importing the necessary packages**

In [1]:
import IPython.display as ipd
import librosa
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import os
from PIL import Image
import pathlib
import csv
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
import keras
import tensorflow as tf
import warnings
warnings.filterwarnings('ignore')

2022-12-24 14:54:43.388378: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


# Part 1 - Creating the Dataset

**This project was created using the GTZAN Dataset found here: https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification/discussion. For those who are familiar with ML, this is the MNIST dataset of audio classification.** 

In [2]:
data = pd.read_csv('Data/features_3_sec.csv')

**Drop the filename and length columns**

In [3]:
data.drop(['filename', 'length'], inplace=True, axis=1)

**Next we want to create our training and test data from the dataset**

In [4]:
genre_list = data.iloc[:, -1]
encoder = LabelEncoder()
y = encoder.fit_transform(genre_list)

scaler = StandardScaler()
X = scaler.fit_transform(np.array(data.iloc[:, :-1], dtype = float))

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1)

# Part 2 - Building the Feed Forward Network

We have an input size of 57 and we are attempting to classify the song into one of 10 categories. We use a simple feed forward network with an input layer of size 57, relu activations on the hidden layers, and an output layer with a softmax activation. Dropout layers and l2 regularization is used to avoid overfitting.

In [5]:
from keras.layers import Input, Dense, Dropout
from keras.models import Sequential

model = Sequential()

model.add(Input(shape=(57,)))
model.add(Dense(512, activation='relu', kernel_regularizer = keras.regularizers.l2(0.001)))
model.add(Dropout(0.2))
model.add(Dense(256, activation='relu', kernel_regularizer = keras.regularizers.l2(0.001)))
model.add(Dropout(0.2))
model.add(Dense(128, activation='relu', kernel_regularizer = keras.regularizers.l2(0.001)))
model.add(Dropout(0.2))
model.add(Dense(32, activation='relu'))
model.add(Dense(10, activation='softmax'))


2022-12-24 14:54:54.200281: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


**Next we train our model on the dataset using a validation split of 0.1**

In [6]:
from keras import optimizers

adam = optimizers.Adam(lr=1e-4)

model.compile(optimizer=adam,
             loss="sparse_categorical_crossentropy",
             metrics=["accuracy"])

hist = model.fit(X_train, y_train, 
                 epochs=100, 
                 batch_size=32, 
                 validation_split=0.1)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

**Looks like our model is learning well, our validation accuracy is around .90! Lets see how well it performs on the test set**

In [7]:
test_loss, test_acc  = model.evaluate(X_test, y_test, batch_size=32)
print("The test Loss is :",test_loss)
print("\nThe Best test Accuracy is :",test_acc*100)

The test Loss is : 0.5352693796157837

The Best test Accuracy is : 90.2902901172638


**Impressive results, around .92 accuracy on the test set**

# Part 3 - Prediction

In [8]:
# Helper function used to get audio features from wav file

def getFeatures(wav):
    r = []
    y, sr = librosa.load(wav)

    chroma_stft = librosa.feature.chroma_stft(y=y, sr=sr)
    r.append(np.mean(chroma_stft))
    r.append(np.var(chroma_stft))
    rmse = librosa.feature.rms(y=y)
    r.append(np.mean(rmse))
    r.append(np.var(rmse))
    spec_cent = librosa.feature.spectral_centroid(y=y, sr=sr)
    r.append(np.mean(spec_cent))
    r.append(np.var(spec_cent))
    spec_bw = librosa.feature.spectral_bandwidth(y=y, sr=sr)
    r.append(np.mean(spec_bw))
    r.append(np.var(spec_bw))
    rolloff = librosa.feature.spectral_rolloff(y=y, sr=sr)
    r.append(np.mean(rolloff))
    r.append(np.var(rolloff))
    zcr = librosa.feature.zero_crossing_rate(y)
    r.append(np.mean(zcr))
    r.append(np.var(zcr))
    harmony, perceptr = librosa.effects.hpss(y)
    r.append(np.mean(harmony))
    r.append(np.var(harmony))
    r.append(np.mean(perceptr))
    r.append(np.var(perceptr))
    tempo, _ = librosa.beat.beat_track(y=y, sr = sr)
    r.append(tempo)

    mfcc = librosa.feature.mfcc(y=y, sr=sr)
    for e in mfcc:
        r.append(np.mean(e))
        r.append(np.var(e))

    return np.array(r)




In [9]:
#define a helper function to print label of prediction

genres = ['blues', 'classical', 'country', 'disco', 'hiphop', 'jazz', 'metal', 'pop', 'reggae', 'rock']

def predictGenre(feature_arr, model, genres):
    return genres[np.argmax(model.predict(scaler.transform(feature_arr.reshape(1,-1))))]

In [10]:
from pydub import AudioSegment

#create helper function to split wav file into 3 second clips to feed into our network

def splitWav(filename, step_size):
    start = 0
    end = step_size
    duration_ms = librosa.get_duration(filename=filename) * 1000
    fullAudio = AudioSegment.from_wav(filename)

    i = 0
    while end < duration_ms:
        clip = fullAudio[start:end]
        clip.export(f'./Temp/clip_{i}.wav', format="wav")
        i+=1
        start+=step_size
        end+=step_size

In [11]:
import os
import glob

# Main function used to predict genre. It takes any length wav file,
# splits it into 3 second clips, classifies each clip, then classifies
# the entire song based by selecting the genre with the max number of 
# occurences

def predictSongGenre(filename):
    splitWav(filename, 3000)

    clips = []

    for _,_,filenames in os.walk('./Temp'):
        for f in filenames:
            clips.append(getFeatures('./Temp/' + f))

    clips = np.array(clips)

    gens = []

    for c in clips:
        gens.append(predictGenre(c, model, genres))


    files = glob.glob('./Temp/*')
    for f in files:
        if 'temp' not in f:
            os.remove(f)

    print(max(gens,key=gens.count))
    return max(gens,key=gens.count)


# Lets take a look at some songs

I picked the most popular song from each of the genres listed above, lets see how well our model performs on songs external to the dataset.

In [14]:
predictSongGenre('./Songs/hiphop.wav')

hiphop


'hiphop'

In [15]:
predictSongGenre('./Songs/blues.wav')

metal


'metal'

In [16]:
predictSongGenre('./Songs/classical.wav')

classical


'classical'

In [17]:
predictSongGenre('./Songs/country.wav')

reggae


'reggae'

In [18]:
predictSongGenre('./Songs/disco.wav')

metal


'metal'

In [19]:
predictSongGenre('./Songs/jazz.wav')

disco


'disco'

In [20]:
predictSongGenre('./Songs/metal.wav')

metal


'metal'

In [21]:
predictSongGenre('./Songs/pop.wav')

disco


'disco'

In [22]:
predictSongGenre('./Songs/reggae.wav')

reggae


'reggae'

In [23]:
predictSongGenre('./Songs/rock.wav')

metal


'metal'

# Try it for yourself!

add wav files (must be this format) to the ./Songs/ directory and call the predictSongGenre on the relative path. Note that the songs must belong to one of the genres listed above. 

In [None]:
# YOUR CODE HERE

# predictSongGenre('/path/to/file')