## 2.0 Genre Classification

This notebook continues with supervised learning using the generated spectorgrams to do genre classification and prediction.

### Table of Contents
[2.1. Setup](#1.)<br>
[2.1.1 Loading libraries](#1.1)<br>
[2.1.2 Setting spectrogram directories](#1.2)<br>
[2.1.3 Splitting into train, validation, and test sets](#1.3)<br>

[2.2. Exploratory data analysis](#2.)<br>
[2.2.1 EDA on train set](#2.1)<br>
[2.2.2 TBA](#2.2)<br>

[2.3. Convolutional Neural Network](#3.)<br>
[2.3.1 CNN setup](#3.1)<br>
[2.3.2 Baseline CNN](#3.2)<br>
[2.3.3 EfficientNet scaling of CNN](#3.3)<br>

### 2.1. Setup <a class="anchor" id="1."></a>

#### 2.1.1 Loading libraries <a class="anchor" id="1.1"></a>

In [5]:
import os
import numpy as np
import pandas as pd
import random
import librosa
import librosa.display
import matplotlib.pyplot as plt
%matplotlib inline

#### 2.1.2 Setting spectrogram directories <a class="anchor" id="1.2"></a>

In [27]:
# Different sets of spectrograms
spect_dict = {'full': 'spect_full', 'subsample_10s': 'spect_subsample_10s'}
# Select the relevant spectrogram set 
spect_type = spect_dict['subsample_10s']
print("Type of spectrograms used for the analysis: {}".format(spect_type))
# Set the directory for the spectrograms
spect_dir = f'./data/{spect_type}'
print("Directory of spectrograms: {}".format(spect_dir))

Type of spectrograms used for the analysis: spect_subsample_10s
Directory of spectrograms: ./data/spect_subsample_10s


#### 2.1.3 Splitting into train, validation, and test sets <a class="anchor" id="1.3"></a>

In [17]:
# Read in the tracks.csv
tracks = pd.read_csv('tracks.csv', index_col=0, header=[0,1])
print("Shape of 'tracks' dataframe: {}".format(tracks.shape))
tracks.head()

Shape of 'tracks' dataframe: (7994, 9)


Unnamed: 0_level_0,set,artist,artist,track,track,track,track,track,track
Unnamed: 0_level_1,split,id,favorites,date_created,genre_top,genres,favorites,interest,listens
track_id,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
2,training,1,9,2008-11-26 01:48:12,Hip-Hop,[21],2,4656,1293
5,training,1,9,2008-11-26 01:48:20,Hip-Hop,[21],6,1933,1151
10,training,6,74,2008-11-25 17:49:06,Pop,[10],178,54881,50135
140,training,54,11,2008-11-26 01:44:07,Folk,[17],5,1593,1299
141,training,54,11,2008-11-26 01:44:10,Folk,[17],1,839,725


In [24]:
# Split into sets of train, validation, test
train_df = tracks[tracks[('set','split')] == 'training']
val_df = tracks[tracks[('set','split')] == 'validation']
test_df = tracks[tracks[('set','split')] == 'test']

# Getting arrays of track ids for the train, validation, test sets
train_tids = train_df.index
val_tids = val_df.index
test_tids = test_df.index

print("No. of samples for train, validation, test are: {}, {}, {}".format(
    len(train_df), len(val_df), len(test_df)))

print("Percentage splits for train, validation, test are: {:.0f}%, {:.0f}%, {:.0f}%".format(
    len(train_df)/len(tracks)*100, len(val_df)/len(tracks)*100, len(test_df)/len(tracks)*100))

No. of samples for train, validation, test are: 6394, 800, 800
Percentage splits for train, validation, test are: 80%, 10%, 10%


In [28]:
genre_list = list(tracks[('track','genre_top')].unique())
genre_list

['Hip-Hop',
 'Pop',
 'Folk',
 'Experimental',
 'Rock',
 'International',
 'Electronic',
 'Instrumental']

### 2.2. Exploratory Data Analysis <a class="anchor" id="2."></a>


#### 2.2.1 EDA on train set <a class="anchor" id="2.1"></a>

#### 2.2.2 TBA <a class="anchor" id="2.2"></a>

### 2.3. Convolutional Neural Network <a class="anchor" id="3."></a>

#### 2.3.1 CNN setup <a class="anchor" id="3.1"></a>

#### 2.3.2 CNN baseline <a class="anchor" id="3.2"></a>

#### 2.3.3 EfficientNet scaling of CNN <a class="anchor" id="3.3"></a>