# Exploratory Data Analysis Notebook

# EDA Description

The primary goal of this notebook is to simply analyze the Freesound_Audio training and test dataset. The mission is to summarize its key characteristics using statistical graphs and other data visualization methods.

# Importing Packages

In the code cells below, we are importing several different packages into the Kaggle notebook that will allow us to perform the necessary steps to acheive the best model. Few important packages that is needed for this project include; Librosa, IPython, and TensorFlow.

In [None]:
!pip install -U efficientnet -qq

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import math
import wave

import os
import cv2

import IPython.display as ipd 

import librosa 
import librosa.display

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

import tensorflow as tf
from tensorflow import keras

from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import *
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import backend as K

import efficientnet.tfkeras as efn

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 

# Loading DataFrame

In this section, we are going to load the Freesound Audio training dataset into the Kaggle notebook. This will allow us to explore the data more properly. 

In [None]:
train_path = '../input/freesound-audio-tagging/audio_train/'

print(len(os.listdir(train_path)))

In [None]:
train = pd.read_csv("../input/freesound-audio-tagging/train.csv")

print('The shape of the training data is: ', train.shape)

From the output displayed above, we can see that there are a total of 9473 total audio files located within the training dataset. The shape if the training data indicates 9473 by 3. The three are the number of columns within the dataset which in this case is are **fnames**, **labels**, and **manually_verifed**. We can further view that in the cell below.

In [None]:
train.head()

After taking a moment to review the preview of the dataset, we can see that the fname column contains the individual audio files in a wav format. The label column contains the potential name of each of the given audio files. While the manually_verified column contains either a 0 or a 1 to each of the given files. The binary classification problem here tells us whether or not the label for the specified file is correct or not. The zero means that the label is not verified while the one means that the label is verified.

# Unique Labels

In [None]:
uniq_labels = train.label.unique()
print('There are a total of', len(uniq_labels), 'unique labels.\n')
print(uniq_labels)

After observing the unique label section above, we can see that there are a total of 41 unique labels within the Freesound Audio dataset. Several of which are musical instruments such as a trumpet, cello, and a flute. Some of which are animal noises such as a dog's bark, a cow's cowbell, and a cat's meow. While other noises are random real-world sounds like applause, knock, and keys_jangling. 

# Label Distribution

In this section, we will observe the total number of verified and not verified audio files. We will be using different data visualization methods to answer this.

In the first diagram below, we can see a bar chart was created to easily see the differences between the total number of verified (seen as 1) and non-verified (seen as 0) labels for all audio files. We can see that there are somewhere between 3500 to 3900 audio files where the labels are verified (seen in green). On the other hand, we can see that there are somewhere between 5600 to 5900 audio files where the labels are not verified (seen in blue).

In [None]:
train.manually_verified.value_counts().plot(kind='bar', xlabel='MGMT_value', ylabel='Count', 
                                     color=['#1E90FF', '#00C957'], edgecolor='black');

In the cells below, I created a pie chart to further see the differences between the verified and non-verified labels. This time we will go ahead and review the overall differences between the two through percentages. 

As we can see in the pie chart below, there are about 39 percent of audio files with their correct labels (seen in blue). On the other hand, there are about 61 percent of audio files with labels that are not correctly verified (seen in red). 

In [None]:
print((train.manually_verified.value_counts() /len(train)).to_frame().T)

In [None]:
labels_count = train.manually_verified.value_counts()

plt.pie(labels_count, labels=['Not Verified', 'Verified'], startangle=180, 
        autopct='%1.1f', colors=['#EE2C2C','#009ACD'], shadow=True)

plt.figure(figsize=(16,16))
plt.show()

# Audio Samples Per Category

Next, we will go ahead and create a graph that views the number of audio files per each category below. 

After reviewing the graph, we can see that there the minimum number of audio samples in a category is about 94 while the maximum is 300.

We can see here that there are actually several category that contains a maximum number of 300.

The orange area are the labels that are verified per audio file. While the blue area are the labels that are not verified per audio file. Even so here, we can see that there is a large number of audio files where the label is not verified.  

In [None]:
category_group = train.groupby(['label', 'manually_verified']).count()
plot = category_group.unstack().reindex(category_group.unstack().sum(axis=1).sort_values().index)\
          .plot(kind='bar', stacked=True, title="Number of Audio Samples per Category", figsize=(16,10))
plot.set_xlabel("Category")
plot.set_ylabel("Number of Samples");

# Exploring Samples

In this section, I have provided a few random audio samples into this notebook. This will provide us with some examples of what some of these audio sound like and how it is meausured.

It is also worth noting that in this section, I will also share the process of how each audio file is then converted to what it is called a spectrogram. 

A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. To simply put it, its pretty much another way of visually representing the strength of a signal over time at different frequencies, but in a format of an image. For the sake of this project, the idea is to transform this audio classification into a form of image classification. 

### Sample 1

In [None]:
gunshot = '../input/freesound-audio-tagging/audio_train/0048fd00.wav'
ipd.Audio(gunshot)

In [None]:
signal, sr = librosa.load(gunshot)
print(type(signal))
print(type(sr))

In [None]:
print(signal.shape)
print(sr)
print(len(signal) / sr)

In [None]:
plt.figure(figsize = [12,3])
plt.subplot(2,1,1)
plt.plot(signal)
plt.subplot(2,1,2)
interval = range(2000, 3000)
plt.plot(interval, signal[interval])
plt.tight_layout()
plt.show()

In [None]:
x1 = librosa.feature.melspectrogram(y=signal, sr=22050)   
x2 = librosa.power_to_db(x1, ref=np.max)   

print(x2.shape)

librosa.display.specshow(x2, sr=22050, x_axis='time', y_axis='hz')
plt.colorbar()
plt.show()

### Sample 2

In [None]:
cello = '../input/freesound-audio-tagging/audio_train/0091fc7f.wav'
ipd.Audio(cello)

In [None]:
signal, sr = librosa.load(cello)
print(type(signal))
print(type(sr))

In [None]:
print(signal.shape)
print(sr)
print(len(signal) / sr)

In [None]:
plt.figure(figsize = [12,3])
plt.subplot(2,1,1)
plt.plot(signal)
plt.subplot(2,1,2)
interval = range(2000, 3000)
plt.plot(interval, signal[interval])
plt.tight_layout()
plt.show()

In [None]:
x1 = librosa.feature.melspectrogram(y=signal, sr=22050)   
x2 = librosa.power_to_db(x1, ref=np.max)   

print(x2.shape)

librosa.display.specshow(x2, sr=22050, x_axis='time', y_axis='hz')
plt.colorbar()
plt.show()

# Label Encoder

In [None]:
labels = np.unique(train.label.values)
label_encoder = {label:i for i, label in enumerate(labels)}
print(label_encoder['Cello'])
print(label_encoder['Gunshot_or_gunfire'])

# Displaying Multiple Spectrogram Images

To get an idea of what a spectrogam would look like for several audio labels. I decided to create a multi-sample images that display several random spectrogram images of random audio files each time it is ran. 

If you notice, some audios below displays some signals in some form of pattern. Taken the knock for instance, we can see here that each time a knock occur within the audio the spectrogram would pick up that signal. The louder the noise is the lighter the colors get which is somewhat similar to a heat map. 

In [None]:
sample = train.sample(20)

plt.figure(figsize=[20,9])

for i in range(20):
    fname = train_path + sample.fname.iloc[i]
    clip, sr = librosa.load(fname, sr=44100)
    S1 = librosa.feature.melspectrogram(y=clip, sr=44100) 
    S2 = librosa.power_to_db(S1, ref=np.max)                
    
    plt.subplot(5, 4, i+1)
    librosa.display.specshow(S2)
    plt.title(f'{sample.label.iloc[i]} - {S2.shape[:2]} - {sample.fname.iloc[i]} ')

plt.tight_layout()
plt.show()

# Data Generator

In [None]:
SPEC_PATH = '../input/freesound-melpec-128-512-2sec/spectrograms'
IMG_SIZE = (128,87)

class DataGenerator(keras.utils.Sequence):
    
    def __init__(self, df, batch_size=32, shuffle=True, is_train=True):
        self.df = df
        self.n = len(df)
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.is_train = is_train
        self.on_epoch_end()
        
    def on_epoch_end(self):
        self.indices = np.arange(self.n)
        if self.shuffle == True:
            np.random.shuffle(self.indices)   
    
    def __len__(self):
        
        return math.ceil( self.n / self.batch_size )
    
    def __getitem__(self, batch_index):
        
        start = batch_index * self.batch_size
        end = (batch_index + 1) * self.batch_size
        
        indices = self.indices[start:end]
        
        return self.__data_generation(indices)
    
    def __data_generation(self, batch_indices):
        batch_size = len(batch_indices)
        
        X = np.zeros(shape=(batch_size, IMG_SIZE[0], IMG_SIZE[1], 3))
        y = np.zeros(batch_size)
        
        for i, idx in enumerate(batch_indices):
            FILE = self.df.fname.values[idx]
            LABEL = self.df.label.values[idx]
            
            SET = 'train_spec' if self.is_train else 'test_spec'
            path = f'{SPEC_PATH}/{SET}/{FILE[:-4]}.npy'

            try:
                data_array = np.load(path)
                resized = cv2.resize(data_array, (IMG_SIZE[1], IMG_SIZE[0]))
                
                for j in range(3):
                    X[i,:,:,j] = resized 
                
            except:
                print('skipped')
                
            if self.is_train:
                y[i] = label_encoder[LABEL]
        if self.is_train:    
            return X, y
        return X
    
GENERATOR_TEST = True

if GENERATOR_TEST:
    temp_gen = DataGenerator(train, batch_size=8, shuffle=False)
    X,y = temp_gen.__getitem__(0)

    print(X.shape)
    print(y)
    
    librosa.display.specshow(X[0, :, :, 0])

In [None]:
train_df, valid_df = train_test_split(train, test_size=0.2, random_state=1, stratify=train.label)

print(train_df.shape)
print(valid_df.shape)

In [None]:
train_loader = DataGenerator(train_df, batch_size=64, shuffle=True)
valid_loader = DataGenerator(valid_df, batch_size=64, shuffle=False)

In [None]:
TR_STEPS = len(train_loader)
VA_STEPS = len(valid_loader)

print(TR_STEPS)
print(VA_STEPS)

# CNN Model

In [None]:
ENB1_model = efn.EfficientNetB1(input_shape=(128,87,3), include_top=False, weights='imagenet')
ENB1_model.trainable = True

In [None]:
cnn = Sequential([
    ENB1_model,
    
    Flatten(),
    
    Dense(64, activation='relu'),
    Dropout(0.45),
    
    Dense(32, activation='relu'),
    Dropout(0.45),
    
    Dense(41, activation='softmax')
])

cnn.summary()

# Train Network

## Training Run 1

In [None]:
opt = tf.keras.optimizers.Adam(0.001)
cnn.compile(loss='sparse_categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

In [None]:
%%time 

h1 = cnn.fit(train_loader, steps_per_epoch = TR_STEPS, epochs = 20, validation_data = valid_loader, 
             validation_steps = VA_STEPS, verbose = 1)

In [None]:
def merge_history(hlist):
    history = {}
    for k in hlist[0].history.keys():
        history[k] = sum([h.history[k] for h in hlist], [])
    return history

In [None]:
def vis_training(h, start=1):
    epoch_range = range(start, len(h['loss'])+1)
    s = slice(start-1, None)

    plt.figure(figsize=[14,4])

    n = int(len(h.keys()) / 2)

    for i in range(n):
        k = list(h.keys())[i]
        plt.subplot(1,n,i+1)
        plt.plot(epoch_range, h[k][s], label='Training')
        plt.plot(epoch_range, h['val_' + k][s], label='Validation')
        plt.xlabel('Epoch'); plt.ylabel(k); plt.title(k)
        plt.grid()
        plt.legend()

    plt.tight_layout()
    plt.show()

In [None]:
history = merge_history([h1])
vis_training(history)

## Training Run 2

In [None]:
K.set_value(cnn.optimizer.learning_rate, 0.0001)

In [None]:
%%time 

h2 = cnn.fit(train_loader, steps_per_epoch = TR_STEPS, epochs = 10, validation_data = valid_loader, 
             validation_steps = VA_STEPS, verbose = 1)

In [None]:
def vis_training(h, start=1):
    epoch_range = range(start, len(h['loss'])+1)
    s = slice(start-1, None)

    plt.figure(figsize=[14,4])

    n = int(len(h.keys()) / 2)

    for i in range(n):
        k = list(h.keys())[i]
        plt.subplot(1,n,i+1)
        plt.plot(epoch_range, h[k][s], label='Training')
        plt.plot(epoch_range, h['val_' + k][s], label='Validation')
        plt.xlabel('Epoch'); plt.ylabel(k); plt.title(k)
        plt.grid()
        plt.legend()

    plt.tight_layout()
    plt.show()

In [None]:
history = merge_history([h1, h2])
vis_training(history, start=10)

# Save Model

In [None]:
cnn.save(f'Freesound_Audio_EfficientNet_B1_v01.h5')