# Welcome to my first audio classification Notebook.
In this notebook we will classify the genre of music using the GTZAN dataset. GTZAN dataset is a very famous dataset in audio industry. In this dataset we have 10 categories of genres from blues, classical, pop etc. The dataset consist of .wav files and each of the category consist of 100 wav files. 
# Let's Start by loading the dataset

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        #print(os.path.join(dirname, filename))
        print(filename)
# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

Importing some comman ML libraries and Some of the python based Audio libraries.
Here soundfile, Librosa and librosa.display are the python based Audio libraries. 
librosa and soundfile both can be used to read the .wav files and librosa.display would help us visualize the .wav file in the form of waveform and IPython.display.Audio would help us listen the audio file

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import soundfile as sf
import librosa
import librosa.display
from IPython.display import Audio
from tqdm import tqdm 
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense, Flatten, Conv2D, MaxPooling2D, Dropout, LSTM, Bidirectional, GRU, BatchNormalization, LeakyReLU
from keras.utils import to_categorical
import os
import math
import json
import random

In [None]:
# Let's hear one of the Blues file
Audio('../input/gtzan-dataset-music-genre-classification/Data/genres_original/blues/blues.00000.wav')

# Let's visualize the same Blues audio file in Wave form
# For that we must load the audio file 

**Loading an audio file returns 2 parameters first is signal(which is a numpy array) and second is the sr(sampling rate)
Sampling rate is the rate at which the wave form is sampled i.e. if we set the sampling rate to be 22050 Hz, Then 22050 points are sampled from the wave file in 1 second. Since here our wave file is 30 seconds long then setting sr=22050 will yeild the signal array of length 22050*30 .**

In [None]:
signal, sr = librosa.load('../input/gtzan-dataset-music-genre-classification/Data/genres_original/blues/blues.00000.wav', sr=22050)
print('Length of Signal is => ', len(signal))
print('Sampling Rate => ', sr)
print('Duration of the audio file => ', len(signal)/sr)
# Then passing the loaded file as the parameter of the below function
librosa.display.waveplot(signal)

The above waveform is a Amplitude v/s Time plot.

X-axis depicts Amplitude and Y-axis depicts the Time (This is called the wave in Amplitude domain) 

# Next, we will define some of the terms as variables which we can use further.
**we will take sample rate as 22050 Hz which is considered as a standard sampling rate.
Then we would need to fix the duration in order to get the same length of array as signal.
Also, Since we are low on training data, we will divide each of the 30 second wav file to smaller wav files to get more training data these smaller wav files will be called num_segments which will be one of the parameter of our function**

In [None]:
DATASET_PATH = '/kaggle/input'
JSON_PATH = './myjson.json'
SAMPLE_RATE = sr =  22050
DURATION = 30 #measured in seconds 
SAMPLES_PER_TRACK = SAMPLE_RATE*DURATION

# Extracting MFCC's
Next up, We will define a high level function which will basically extract us MFCC(Mel Frequency Spectral Coefficients). These MFCC's will act as our input features which we will pass through our model

In [None]:
def save_mfcc(dataset_path, json_path, n_mfcc=13, n_fft=4084, hop_length=1024, num_segments=10):
    #dictionary to store data
    data = {
        'mapping' : [],
        'mfcc' : [],
        'labels' : []
    }
    
    count = 0 # To keep track of our progress
    num_samples_per_segment = int(SAMPLES_PER_TRACK / num_segments) 
    expected_num_mfcc_vectors_per_segment = math.ceil(num_samples_per_segment / hop_length)
    
    #Loop through all the genres
    for i, (dirpath, dirnames, filenames) in enumerate(os.walk(dataset_path)):
        
        #ensure that we're not at the root level
        if dirpath not in dataset_path:

            #save the semantic label
            dirpath_components = dirpath.split('/')
            semantic_label = dirpath_components[-1]
            data['mapping'].append(semantic_label)
            print('\nProcessing {}'.format(semantic_label))
            
            #process files for a specific genre 
            for f in filenames:
                if f.endswith('.wav') and f != 'jazz.00054.wav': # Since file jazz.00054.wav is an empty file
                    
                    file_path = os.path.join(dirpath,f)
                    
                    #loading the audio file 
                    # we are using the soundfile library since it is faster than librosa
                    signal, sr = sf.read(file_path) # len(signal) = 661794  # sr is 22050 by default 
                    #print(signal,sr)
                    #process segments extracting mfcc and storing data
                    for s in range(num_segments): 
                        # Since num_segments is defined as 5. Every 30 sec file is divided into 5 segments of length 6sec 
                        # Start sample would keep track of the index of the first element of each 6 second batch
                        # finish sample would keep track of the index of the last element of each 6 second batch
                        # And then with the help of python's slice functionality we will extract that 6 second batch from every 30 sec signal
                        start_sample = num_samples_per_segment * s   
                        finish_sample = num_samples_per_segment + start_sample
                        
                        # Next, we will pass each segment in order to extract MFCC. The parameter n_mfcc defines the number of mfcc 
                        # we need to extract, Usually n_mfcc is set b/w 13 to 40. The other parameters n_fft and hop length are 
                        # indivisual topics of discussion. Will be discussed in later Notbooks. 
                        mfcc = librosa.feature.mfcc(signal[start_sample : finish_sample],
                                                   sr = sr,
                                                   n_fft = n_fft,
                                                   n_mfcc = n_mfcc,
                                                   hop_length = hop_length)

                        mfcc = mfcc.T
                        # store mfcc for segment if it has the expected length
                        if len(mfcc) == expected_num_mfcc_vectors_per_segment:
                            print(mfcc.shape)
                            data['mfcc'].append(mfcc.tolist())
                            data['labels'].append(i)
                            print('Processing {}, segment:{}'.format(file_path, s))
                            count += 1
                            print(count)
    with open(json_path, 'w') as fp:
        json.dump(data, fp, indent=4)

In [None]:
# Let's run the above function 
save_mfcc(DATASET_PATH, JSON_PATH, num_segments=10)

In [None]:
# loading the saved Json file
def load_data(path):
    with open(path, 'r') as fp:
        data = json.load(fp)
        
    #Convert lists into numpy arrays
    inputs = data['mfcc']
    targets = data['labels'] 
    return np.array(inputs), np.array(targets)

In [None]:
inputs, targets = load_data('./myjson.json')

In [None]:
print(inputs)

Below, is the shape of the inputs obtained.

9986 is for no. of segments of files.

(65,13) are the dimensions of mfcc generated where 13 comes from the n_mfcc parameter


In [None]:
inputs.shape

In [None]:
np.unique(targets, return_counts=True)

In [None]:
# Converting labels from 15-24 to 0-9
v = min(np.unique(targets))
for i in range(len(targets)):
    if targets[i] == v:
        targets[i] = 0
    else:
        new = targets[i] - v
        targets[i] = new

In [None]:
np.unique(targets, return_counts=True)

In [None]:
# If you want to apply Convolution NN the remove the comment from the below line
#inputs = np.reshape(inputs, (inputs.shape[0], inputs.shape[1], inputs.shape[2], 1))

In [None]:
inputs_train, inputs_test, targets_train, targets_test = train_test_split(inputs, targets, test_size=0.25)

In [None]:
# Adding Noise 
for i in range(inputs_train.shape[0]):
    s = np.random.rand(inputs_train.shape[1], inputs_train.shape[2])
    inputs_train[i] = inputs_train[i] + s


In [None]:
model = Sequential()

model.add(GRU(100, return_sequences=True, input_shape=(inputs.shape[1], inputs.shape[2])))
model.add(GRU(500, return_sequences=True))
model.add(GRU(1000))
model.add(LeakyReLU())
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(100))
model.add(LeakyReLU())
model.add(Dense(10, 'softmax'))

model.compile(optimizer=tf.keras.optimizers.Adam(),
             loss = 'sparse_categorical_crossentropy',
             metrics=['accuracy'])


In [None]:
model.summary()

In [None]:
history = model.fit(inputs_train, targets_train,
          validation_data=(inputs_test, targets_test),
          epochs = 50,
          batch_size=100)

In [None]:
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()