# Group 26 - Project CaLLaR

### Love isn't only reserved between humans! 

**Project Description:** Our Machine Learning project theme is Love. When we hear that theme for the first time, we thought of love between humans. However, after giving it some thought, why not make a project that focuses on the love between humans and pets? After all, love isn't only reserved between humans! 

For this project, we used Deep Learning to train our models. Librosa and MFCC is also used to extract features for each of our audio. Our hope is that this project could prove usefull to cat lovers who just wants to know more what their cat is saying!


### Packages used

 > Importing useful packages
 
 > Installing librosa for audio analysis
 
 > Installing tensorflow and keras to do Deep Learning

In [1]:
import matplotlib.pyplot as plt
import pandas as pd
import os
import numpy as np
import IPython.display as ipd
import librosa
import librosa.display
import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout,Activation,Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint
from datetime import datetime
from sklearn import metrics
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from tqdm import tqdm

### Reading in metadata of dataset

In [2]:
metadata= pd.read_csv('ML project\data\metadata.csv')
metadata.head()

Unnamed: 0,slice_file_name,classID,class
0,B_ANI01_MC_FN_SIM01_101.wav,1,brushing
1,B_ANI01_MC_FN_SIM01_102.wav,1,brushing
2,B_ANI01_MC_FN_SIM01_103.wav,1,brushing
3,B_ANI01_MC_FN_SIM01_301.wav,1,brushing
4,B_ANI01_MC_FN_SIM01_302.wav,1,brushing


#### Checking out if dataset is imbalanced

In [3]:
##check wheter the dataset is imbalanced
metadata['class'].value_counts()

isolation           221
brushing            127
waiting_for_food     92
Name: class, dtype: int64

### Using Mel-Frequency Cepstral Coefficients (MFCC) from librosa package to extract features of audio file

#### Loading in the dataset's folder path (that contains the audios) and the audio's dataset

In [4]:
audio_data_path= 'ML project\data\dataset'
metadata= pd.read_csv('ML project\data\metadata.csv')

#### Function to extract features from audio. Features extracted= 40

In [46]:
def features_extractor(file):
    audio,sample_rate= librosa.load(file_name,res_type = 'kaiser_fast')
    mfcss_features=librosa.feature.mfcc(y=audio,sr=sample_rate,n_mfcc=40)
    mfccs_scaled_features= np.mean(mfcss_features.T,axis=0)
    
    return mfccs_scaled_features

#### Iterating through all audio files to extract each feature by class

In [47]:
#Now we iterate through every audio file and extract features using MFCC
extracted_features=[]
for index_num, row in tqdm (metadata.iterrows()):
    file_name= os.path.join(os.path.abspath(audio_data_path),str(row["slice_file_name"]))
    final_class_label=row["class"]
    data=features_extractor(file_name)
    extracted_features.append([data,final_class_label])

440it [00:10, 40.21it/s]


#### Converting extracted_features to Pandas dataframe

In [48]:
extracted_features_df=pd.DataFrame(extracted_features,columns=['feature','class'])
extracted_features_df.head()

Unnamed: 0,feature,class
0,"[-396.81778, 136.41545, -74.92035, -11.478852,...",brushing
1,"[-542.6039, 158.09341, -73.43401, -11.935203, ...",brushing
2,"[-517.51764, 142.89006, -69.55315, -7.46689, 3...",brushing
3,"[-476.4709, 112.17069, -65.29729, -10.651881, ...",brushing
4,"[-511.12573, 135.90286, -62.427002, -11.154292...",brushing


#### Split dataset into independent and dependent dataset

In [49]:
X=np.array(extracted_features_df['feature'].tolist())
y=np.array(extracted_features_df['class'].tolist())

#### Label encoding

In [50]:
labelencoder = LabelEncoder()
y=to_categorical(labelencoder.fit_transform(y))

### Train test split

In [51]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=0)

#### Initialising the number of output layer nodes based on number of classes

In [52]:
## Number of classes
num_labels=y.shape[1]

## Adding Layers for models

In [53]:
model=Sequential()
###first layer
model.add(Dense(100,input_shape=(40,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))

### second layer
model.add(Dense(200))
model.add(Activation('relu'))
model.add(Dropout(0.5))

### third layer
model.add(Dense(100))
model.add(Activation('relu'))
model.add(Dropout(0.5))

### last layer
model.add(Dense(num_labels))
model.add(Activation('softmax'))

#### Model summary

In [54]:
model.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_20 (Dense)             (None, 100)               4100      
_________________________________________________________________
activation_20 (Activation)   (None, 100)               0         
_________________________________________________________________
dropout_15 (Dropout)         (None, 100)               0         
_________________________________________________________________
dense_21 (Dense)             (None, 200)               20200     
_________________________________________________________________
activation_21 (Activation)   (None, 200)               0         
_________________________________________________________________
dropout_16 (Dropout)         (None, 200)               0         
_________________________________________________________________
dense_22 (Dense)             (None, 100)              

#### Compiling the model

In [55]:
model.compile(loss='categorical_crossentropy', metrics=['accuracy'],optimizer='adam')

### Training the model

In [70]:
## Training the model
## features = 40
num_epochs= 100
num_batch_size=32

checkpointer=ModelCheckpoint(filepath='save_models/audio_classification.hdf5',verbose=1,save_best_only=True)
start=datetime.now()

model.fit(X_train, y_train, batch_size=num_batch_size, epochs=num_epochs, validation_data=(X_test,y_test), callbacks= [checkpointer])


duration = datetime.now()- start
print("Training completed in time: ", duration);

Epoch 1/100
 1/11 [=>............................] - ETA: 0s - loss: 0.5775 - accuracy: 0.7500
Epoch 00001: val_loss improved from inf to 1.97766, saving model to save_models\audio_classification.hdf5
Epoch 2/100
 1/11 [=>............................] - ETA: 0s - loss: 0.4743 - accuracy: 0.7500
Epoch 00002: val_loss did not improve from 1.97766
Epoch 3/100
 1/11 [=>............................] - ETA: 0s - loss: 0.6733 - accuracy: 0.7188
Epoch 00003: val_loss did not improve from 1.97766
Epoch 4/100
 1/11 [=>............................] - ETA: 0s - loss: 0.5879 - accuracy: 0.7500
Epoch 00004: val_loss improved from 1.97766 to 1.87576, saving model to save_models\audio_classification.hdf5
Epoch 5/100
 1/11 [=>............................] - ETA: 0s - loss: 0.5567 - accuracy: 0.6875
Epoch 00005: val_loss did not improve from 1.87576
Epoch 6/100
 1/11 [=>............................] - ETA: 0s - loss: 0.6637 - accuracy: 0.6562
Epoch 00006: val_loss did not improve from 1.87576
Epoch 7/10

Epoch 30/100
 1/11 [=>............................] - ETA: 0s - loss: 0.4824 - accuracy: 0.7500
Epoch 00030: val_loss did not improve from 1.12287
Epoch 31/100
 1/11 [=>............................] - ETA: 0s - loss: 0.6220 - accuracy: 0.6250
Epoch 00031: val_loss did not improve from 1.12287
Epoch 32/100
 1/11 [=>............................] - ETA: 0s - loss: 0.4785 - accuracy: 0.7188
Epoch 00032: val_loss did not improve from 1.12287
Epoch 33/100
 1/11 [=>............................] - ETA: 0s - loss: 0.8186 - accuracy: 0.5938
Epoch 00033: val_loss did not improve from 1.12287
Epoch 34/100
 1/11 [=>............................] - ETA: 0s - loss: 0.5447 - accuracy: 0.6250
Epoch 00034: val_loss did not improve from 1.12287
Epoch 35/100
 1/11 [=>............................] - ETA: 0s - loss: 0.6872 - accuracy: 0.6562
Epoch 00035: val_loss did not improve from 1.12287
Epoch 36/100
 1/11 [=>............................] - ETA: 0s - loss: 0.5993 - accuracy: 0.7188
Epoch 00036: val_loss 

Epoch 60/100
 1/11 [=>............................] - ETA: 0s - loss: 0.4361 - accuracy: 0.8125
Epoch 00060: val_loss did not improve from 1.12287
Epoch 61/100
 1/11 [=>............................] - ETA: 0s - loss: 0.5132 - accuracy: 0.7500
Epoch 00061: val_loss did not improve from 1.12287
Epoch 62/100
 1/11 [=>............................] - ETA: 0s - loss: 0.4317 - accuracy: 0.7812
Epoch 00062: val_loss did not improve from 1.12287
Epoch 63/100
 1/11 [=>............................] - ETA: 0s - loss: 0.7147 - accuracy: 0.6875
Epoch 00063: val_loss did not improve from 1.12287
Epoch 64/100
 1/11 [=>............................] - ETA: 0s - loss: 0.6736 - accuracy: 0.6875
Epoch 00064: val_loss did not improve from 1.12287
Epoch 65/100
 1/11 [=>............................] - ETA: 0s - loss: 0.4619 - accuracy: 0.8125
Epoch 00065: val_loss did not improve from 1.12287
Epoch 66/100
 1/11 [=>............................] - ETA: 0s - loss: 0.4540 - accuracy: 0.8125
Epoch 00066: val_loss 

Epoch 90/100
 1/11 [=>............................] - ETA: 0s - loss: 0.5408 - accuracy: 0.7188
Epoch 00090: val_loss did not improve from 1.12287
Epoch 91/100
 1/11 [=>............................] - ETA: 0s - loss: 0.6263 - accuracy: 0.7500
Epoch 00091: val_loss did not improve from 1.12287
Epoch 92/100
 1/11 [=>............................] - ETA: 0s - loss: 0.5358 - accuracy: 0.7812
Epoch 00092: val_loss did not improve from 1.12287
Epoch 93/100
 1/11 [=>............................] - ETA: 0s - loss: 0.5258 - accuracy: 0.8125
Epoch 00093: val_loss improved from 1.12287 to 1.10689, saving model to save_models\audio_classification.hdf5
Epoch 94/100
 1/11 [=>............................] - ETA: 0s - loss: 0.7292 - accuracy: 0.6875
Epoch 00094: val_loss did not improve from 1.10689
Epoch 95/100
 1/11 [=>............................] - ETA: 0s - loss: 0.5968 - accuracy: 0.6875
Epoch 00095: val_loss did not improve from 1.10689
Epoch 96/100
 1/11 [=>............................] - ETA: 

## Accurancy of prediction

In [71]:
test_accurancy=model.evaluate(X_test,y_test,verbose=0)
print(test_accurancy[1])

0.7272727489471436


## Test out the model

In [101]:
## Test it out
filename="ML project\F_MAG01_EU_FN_FED01_101.wav"
audio,sample_rate=librosa.load(filename,res_type='kaiser_fast')
mfccs_features = librosa.feature.mfcc(y=audio,sr=sample_rate,n_mfcc=40)
mfccs_scaled_features=np.mean(mfccs_features.T,axis=0)


mfccs_scaled_features=mfccs_scaled_features.reshape(1,-1)
predicted_label=model.predict_classes(mfccs_scaled_features)
prediction_class=labelencoder.inverse_transform(predicted_label)
prediction_class

array(['waiting_for_food'], dtype='<U16')