## Model Development
A convolutional neural network (CNN) is implemented using Keras as an interface for TensorFlow. CNNs are artificial neural networks designed for image classification.

The genres were obtained by extracting data from the metadata of each .mp3 file using the mutagen python module. 

In [1]:
import pandas as pd
import numpy as np

csv_path = "metadata.csv"
df = pd.read_csv(csv_path)
df

Unnamed: 0,trackID,title,genre,duration
0,000002.mp3,Food,Hip-Hop,29.988571
1,000005.mp3,This World,Hip-Hop,30.014694
2,000010.mp3,Freeway,Pop,29.988571
3,000140.mp3,Queen Of The Wires,Folk,29.988571
4,000141.mp3,Ohio,Folk,29.988571
...,...,...,...,...
7995,154308.mp3,MIA,Electronic,29.988571
7996,154309.mp3,A1 Symphony,Electronic,29.988571
7997,154413.mp3,Do Easy,,30.014694
7998,154414.mp3,Dead Can Dance (uncensored),,29.988571


## Building the model

In [2]:
# Perform image augmentation to create additional spectrographs (suggest trying with and without)
# Ref: https://www.kdnuggets.com/2020/02/audio-data-analysis-deep-learning-python-part-2.html
# Ref: https://keras.io/api/preprocessing/image/
# Load Libraries
import pandas as pd
import numpy as np
from numpy import argmax
import matplotlib.pyplot as plt
%matplotlib inline
import librosa
import librosa.display
import IPython.display
import random
import warnings
import os
from PIL import Image
import pathlib
import csv
from sklearn.model_selection import train_test_split
import tensorflow as tf # this includes keras, keras.layers in TensorFlow 2.0 

train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale=1./255, # normalize the dataset
    shear_range=0.2, # randomize some transformations
    zoom_range=0.2, # zoom
    horizontal_flip=True) # is this needed or helpful?
val_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)

# Flow images from a directory
training_set = train_datagen.flow_from_directory(
    './training_data_fma/train',
    target_size=(64,64), # to just use default size, this would be None
    batch_size=47, # 32 is default, should evenly divide total number of files. 4606 files in train directory
    class_mode='categorical', # categorical - must include y_col column with classes of each image
    shuffle = False)

val_set = val_datagen.flow_from_directory(
    './training_data_fma/val',
    target_size=(64,64), # must be same size as target
    batch_size=47,
    class_mode='categorical',
    shuffle=False)

# Create a CNN
model = tf.keras.Sequential() # groups a linear stack of layers
input_shape=(64,64,3) # required so model knows input shape from the start. Add to first layer.
model.add(tf.keras.layers.Conv2D(32, (3,3), strides=(2,2), input_shape=input_shape)) #produce tensor of outputs
model.add(tf.keras.layers.AveragePooling2D((2, 2), strides=(2,2))) # average pooling for spatial data
model.add(tf.keras.layers.Activation('relu'))#2nd hidden layer, Rectified linear unit activation function
model.add(tf.keras.layers.Conv2D(64, (3, 3), padding="same"))
model.add(tf.keras.layers.AveragePooling2D((2, 2), strides=(2,2)))
model.add(tf.keras.layers.Activation('relu'))#3rd hidden layer
model.add(tf.keras.layers.Conv2D(64, (3, 3), padding="same"))
model.add(tf.keras.layers.AveragePooling2D((2, 2), strides=(2,2)))
model.add(tf.keras.layers.Activation('relu'))#Flatten
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dropout(rate=0.5))#Add fully connected layer.
model.add(tf.keras.layers.Dense(64))
model.add(tf.keras.layers.Activation('relu'))
model.add(tf.keras.layers.Dropout(rate=0.5))#Output layer
model.add(tf.keras.layers.Dense(8))
model.add(tf.keras.layers.Activation('softmax')) # activation function for output layer on multi-class classifications
model.summary()

Found 4606 images belonging to 8 classes.
Found 1153 images belonging to 8 classes.
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 31, 31, 32)        896       
_________________________________________________________________
average_pooling2d (AveragePo (None, 15, 15, 32)        0         
_________________________________________________________________
activation (Activation)      (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 15, 15, 64)        18496     
_________________________________________________________________
average_pooling2d_1 (Average (None, 7, 7, 64)          0         
_________________________________________________________________
activation_1 (Activation)    (None, 7, 7, 64)          0         
______________________________________

In [3]:
# Compile and train the model using stochastic gradient descent
# Ref 2: https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/SGD 
# Using default values from tutorial, except 16 epochs rather than 200 to start with
epochs = 200
batch_size = 8
learning_rate = 0.01
decay_rate = learning_rate / epochs
momentum = 0.9
sgd = tf.keras.optimizers.SGD(lr=learning_rate, momentum=momentum, decay=decay_rate, nesterov=False)
model.compile(optimizer="sgd", loss="categorical_crossentropy", metrics=['accuracy'])

In [None]:
# fit the model
# time it
model.fit_generator(
    training_set,
    steps_per_epoch=90,
    epochs=50,
    validation_data=val_set,
    validation_steps=200)



Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50