# ECS7020P mini-project submission (Advanced Solution)

The mini-project consists of two components:


1.   **Basic solution** [6 marks]: Using the MLEnd Hums and Whistles dataset, build a machine learning pipeline that takes as an input a Potter or a StarWars audio segment and predicts its song label (either Harry or StarWars).
2.   **Advanced solution** [10 marks]: There are two options. (i) Formulate a machine learning problem that can be attempted using the MLEnd Hums and Whistles dataset and build a solution model. (ii) Create a product that uses the functionality provided by a model trained on the MLEnd Hums and Whistles dataset (e.g. a simple App that predicts the label Harry or StarWars when you hum to it).  

The submission will consist of two Jupyter notebooks, one for the basic solution and another one for advanced solution. Please **name each notebook**:

* ECS7020P_miniproject_basic.ipynb
* ECS7020P_miniproject_advanced.ipynb

then **zipped and submitted toghether**.

Each uploaded notebook should consist of: 

*   **Text cells**, describing concisely each step and results.
*   **Code cells**, implementing each step.
*   **Output cells**, i.e. the output from each code cell.

and **should have the structure** indicated below. Notebooks will not be run, please make sure that the output cells are saved.

How will we evaluate your submission?

*   Conciseness in your writing (10%).
*   Correctness in your methodology (30%).
*   Correctness in your analysis and conclusions (30%).
*   Completeness (10%).
*   Originality (10%).
*   Efforts to try something new (10%).

Suggestion: Why don't you use **GitHub** to manage your project? GitHub can be used as a presentation card that showcases what you have done and gives evidence of your data science skills, knowledge and experience. 

Each notebook should be structured into the following 9 sections:


# 1 Author

**Student Name**:  Faysal Ali
**Student ID**:  180135787



# 2 Problem formulation

**Describe the machine learning problem that you want to solve and explain what's interesting about it.**

In this Mini-project, the MLEnds Hums and Whistles dataset was used to predict the label of a song from the an audio file that contained either a hum or whistle. The two sets of files used for the basic solution of this project contained a small section of audio from the theme songs of The Great Showman and Mamma Mia. The interesting part of this machine learning problem was the use of certain features in the audio file that allowed the model to easily distinguish what theme song the audio file belonged to.

# 3 Machine Learning pipeline

**Describe your ML pipeline. Clearly identify its input and output, any intermediate stages (for instance, transformation -> models), and intermediate data moving from one stage to the next. Note that your pipeline does not need to include all the stages.**

The Machine learning pipeline that was used for this project consisted of using the inputs (Hum and Whistle audio files for The Great Showman and Mamma Mia) and classifying the song label as Showman or Mamma as the output. In order for the model to function without any errors or have data missing, the raw data was cleaned and the file names were corrected. The dataset that was used was part 1 folders for both Showman and Mamma. The overall dataset that was used for this model included 428 audio files. Each of the files followed a certain structure when it came to file_id and Interpretation number, which allowed a dataframe to be created using the audio files. The dataframe created was used to extract the information that was required by the model.


# 4 Transformation stage

**Describe any transformations, such as feature extraction. Identify input and output.**

The feature extraction that was used for this machine learning problem was Pitch. The pitch was used to distinguish between the different audio files that correspond to specific song labels.Since the audio file contained a large number of dimensions, using the pitch as the feature extraction allowed it to be reduced to 5 dimension:
1.   Power.
2.   Pitch mean.
3.   Pitch standard deviation.
4.   Fraction of voiced region.

These 5 features were then used as predictors. 



# 5 Modelling

**Describe the ML models that you will implement.** 

A support vector machine (SVM) model was used to predict and distinguish whether the audio file was Showman or Mamma. The dataset was then split into a training set and a validation set, with the accuracy of the training and validation sets compared.

# 6 Methodology

**Describe how you will train and validate your models, how model performance is assesssed (i.e. accuracy, confusion matrix, etc)**

In order to train the SVM model a classifier is created and used to class the data. The SVM model takes two arrays as an input: an array X of the shape holding the training set and array y holding the class labels. The classes are then predicted using the test set, with the training set and validation set accuracies then compared.

# 7 Dataset

**Describe the dataset that you will use to create your models and validate them. If you need to preprocess it, do it here. Include visualisations too. You can visualise raw data samples or extracted features.**

In [None]:
from google.colab import drive

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import os, sys, re, pickle, glob
import urllib.request
import zipfile

import IPython.display as ipd
from tqdm import tqdm
import librosa

drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [None]:
path = '/content/drive/MyDrive/Data/Showman_1'
os.listdir(path)

['S99_hum_1_[Showman].wav',
 'S36_hum_3_Showman.wav',
 'S75_hum_1_Showman.wav',
 'S3_hum_4_Showman.wav',
 'S23_hum_4_Showman.wav',
 'S91_hum_1_Showman.wav',
 'S106_hum_3_Showman.WAV',
 'S44_hum_1_Showman.wav',
 'S8_hum_1_[Showman].wav',
 'S77_whistle_2_Showman.wav',
 'S26_hum_4_Showman.wav',
 'S94_hum_1_Showman.wav',
 'S45_whistle_2_Showman.wav',
 'S71_hum_4_Showman.wav',
 'S59_hum_4_Showman.wav',
 'S17_hum_4_Showman.wav',
 'S6_whisle_1_Showman.wav',
 'S2_whistle_2_Showman.wav',
 'S40_hum_1_showman.wav',
 'S74_hum_4_Showman.wav',
 'S85_hum_3_Showman.wav',
 'S16_hum_1_Showman.wav',
 'S12_hum_4_Showman.wav',
 'S82_hum_1_[Showman].wav',
 'S2_hum_1_Showman.wav',
 'S82_hum_2_[Showman].wav',
 'S58_hum_4_Showman.wav',
 'S103_whistle_2_Showman.wav',
 'S65_hum_3_Showman.wav',
 'S29_whistle_2_Showman.wav',
 'S68_hum_1_Showman.wav',
 'S22_hum_4_Showman.wav',
 'S90_hum_1_Showman.wav',
 'S8_hum_2_[Showman].wav',
 'S89_whistle_2_Showman.wav',
 'S99_hum_2_[Showman].wav',
 'S13_hum_4_Showman.wav',
 'S

In [None]:
sample_path = '/content/drive/MyDrive/Data/Showman_1/*.wav'
files_showman = glob.glob(sample_path)
len(files_showman)
for _ in range(5):
  n1 = np.random.randint(210)
  display(ipd.Audio(files_showman[n1]))

Output hidden; open in https://colab.research.google.com to view.

In [None]:
show_table = [] 

for file in files_showman:
  file_name = file.split('/')[-1]
  participant_ID = file.split('/')[-1].split('_')[0]
  interpretation_type = file.split('/')[-1].split('_')[1]
  #interpretation_number = file.split('/')[-1].split('_')[2]
  #song = file.split('/')[-1].split('_')[3].split('.')[0]
  show_table.append([file_name,participant_ID,interpretation_type,interpretation_number, song])

show_table

[['S99_hum_1_[Showman].wav', 'S99', 'hum', 'Showman.wav', '[Showman]'],
 ['S36_hum_3_Showman.wav', 'S36', 'hum', 'Showman.wav', '[Showman]'],
 ['S75_hum_1_Showman.wav', 'S75', 'hum', 'Showman.wav', '[Showman]'],
 ['S3_hum_4_Showman.wav', 'S3', 'hum', 'Showman.wav', '[Showman]'],
 ['S23_hum_4_Showman.wav', 'S23', 'hum', 'Showman.wav', '[Showman]'],
 ['S91_hum_1_Showman.wav', 'S91', 'hum', 'Showman.wav', '[Showman]'],
 ['S44_hum_1_Showman.wav', 'S44', 'hum', 'Showman.wav', '[Showman]'],
 ['S8_hum_1_[Showman].wav', 'S8', 'hum', 'Showman.wav', '[Showman]'],
 ['S77_whistle_2_Showman.wav', 'S77', 'whistle', 'Showman.wav', '[Showman]'],
 ['S26_hum_4_Showman.wav', 'S26', 'hum', 'Showman.wav', '[Showman]'],
 ['S94_hum_1_Showman.wav', 'S94', 'hum', 'Showman.wav', '[Showman]'],
 ['S45_whistle_2_Showman.wav', 'S45', 'whistle', 'Showman.wav', '[Showman]'],
 ['S71_hum_4_Showman.wav', 'S71', 'hum', 'Showman.wav', '[Showman]'],
 ['S59_hum_4_Showman.wav', 'S59', 'hum', 'Showman.wav', '[Showman]'],
 ['S

In [None]:
show_df = pd.DataFrame(show_table,columns=['file_id','participant','interpretation','number','song']).set_index('file_id') 
show_df

Unnamed: 0_level_0,participant,interpretation,number,song
file_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
S99_hum_1_[Showman].wav,S99,hum,S99,hum
S36_hum_3_Showman.wav,S36,hum,S36,hum
S75_hum_1_Showman.wav,S75,hum,S75,hum
S3_hum_4_Showman.wav,S3,hum,S3,hum
S23_hum_4_Showman.wav,S23,hum,S23,hum
...,...,...,...,...
S17_hum_3_Showman.wav,S17,hum,S17,hum
S36_hum_4_Showman.wav,S36,hum,S36,hum
S66_whistle_2_Showman.wav,S66,whistle,S66,whistle
S32_hum_1_Showman.wav,S32,hum,S32,hum


In [None]:
path1 = '/content/drive/MyDrive/Data/Mamma_1'
os.listdir(path1)

['S79_hum_3_[Mamma].wav',
 'S5_hum_4_mummamia.wav',
 'S91_hum_2_Mamma.wav',
 'S5_hum_3_mummamia.wav',
 'S76_whistle_2_Mamma.wav',
 'S69_hum_2_Mamma.wav',
 'S3_hum_2_Mamma.wav',
 'S29_whistle_2_[Mamma].wav',
 'S54_hum_2_Mamma.wav',
 'S77_whistle_2_Mamma.wav',
 'S70_hum_2_Mamma.wav',
 'S87_hum_2_Mamma.wav',
 'S21_hum_2_Mamma.wav',
 'S42_Hum1_Mamma.wav',
 'S75_whistle_2_Mamma.wav',
 'S28_hum_3_Mamma.wav',
 'S66_hum_2_Mamma.wav',
 'S13_hum_2_Mamma.wav',
 'S97_hum_3_Mamma.wav',
 'S31_hum_3_Mamma.wav',
 'S6_whisle_1_Mamma.wav',
 'S61_whistle_1_[Mama].wav',
 'S87_whistle_2_Mamma.wav',
 'S37_whistle_2_[Mamma].wav',
 'S38_hum_2_Mamma.wav',
 'S71_hum_3_Mamma.wav',
 'S104_Hum_3_Mamma.wav',
 'S36_hum_3_Mamma.wav',
 'S72_whistle_2_Mamma.wav',
 'S78_hum_2_Mamma.wav',
 'S45_hum_2_Mamma.wav',
 'S14_hum_2_Mamma.wav',
 'S12_hum_3_Mamma.wav',
 'S80_whistle_2_Mamma.wav',
 'S26_hum_2_Mamma.wav',
 'S80_hum_2_Mamma.wav',
 'S114_hum_2_mamma.wav',
 'S102_hum_2_Mamma.wav',
 'S68_whistle_2_Mamma.wav',
 'S39_whis

In [None]:
sample_path1 = '/content/drive/MyDrive/Data/Mamma_1/*.wav'
files_mamma = glob.glob(sample_path1)
len(files_mamma)
for _ in range(5):
  n1 = np.random.randint(210)
  display(ipd.Audio(files_mamma[n1]))

Output hidden; open in https://colab.research.google.com to view.

In [None]:
mam_table = [] 

for file in files_mamma:
  file_name = file.split('/')[-1]
  participant_ID = file.split('/')[-1].split('_')[0]
  interpretation_type = file.split('/')[-1].split('_')[1]
  #interpretation_number = file.split('/')[-1].split('_')[2]
  #song = file.split('/')[-1].split('_')[3].split('.')[0]
  mam_table.append([file_name,participant_ID,interpretation_type,interpretation_number, song])

mam_table

[['S79_hum_3_[Mamma].wav', 'S79', 'hum', 'Mamma.wav', 'Mamma'],
 ['S5_hum_4_mummamia.wav', 'S5', 'hum', 'Mamma.wav', 'Mamma'],
 ['S91_hum_2_Mamma.wav', 'S91', 'hum', 'Mamma.wav', 'Mamma'],
 ['S5_hum_3_mummamia.wav', 'S5', 'hum', 'Mamma.wav', 'Mamma'],
 ['S76_whistle_2_Mamma.wav', 'S76', 'whistle', 'Mamma.wav', 'Mamma'],
 ['S69_hum_2_Mamma.wav', 'S69', 'hum', 'Mamma.wav', 'Mamma'],
 ['S3_hum_2_Mamma.wav', 'S3', 'hum', 'Mamma.wav', 'Mamma'],
 ['S29_whistle_2_[Mamma].wav', 'S29', 'whistle', 'Mamma.wav', 'Mamma'],
 ['S54_hum_2_Mamma.wav', 'S54', 'hum', 'Mamma.wav', 'Mamma'],
 ['S77_whistle_2_Mamma.wav', 'S77', 'whistle', 'Mamma.wav', 'Mamma'],
 ['S70_hum_2_Mamma.wav', 'S70', 'hum', 'Mamma.wav', 'Mamma'],
 ['S87_hum_2_Mamma.wav', 'S87', 'hum', 'Mamma.wav', 'Mamma'],
 ['S21_hum_2_Mamma.wav', 'S21', 'hum', 'Mamma.wav', 'Mamma'],
 ['S42_Hum1_Mamma.wav', 'S42', 'Hum1', 'Mamma.wav', 'Mamma'],
 ['S75_whistle_2_Mamma.wav', 'S75', 'whistle', 'Mamma.wav', 'Mamma'],
 ['S28_hum_3_Mamma.wav', 'S28', 'h

In [None]:
mam_df = pd.DataFrame(mam_table,columns=['file_id','participant','interpretation','number','song']).set_index('file_id') 
mam_df

Unnamed: 0_level_0,participant,interpretation,number,song
file_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
S79_hum_3_[Mamma].wav,S79,hum,Mamma.wav,Mamma
S5_hum_4_mummamia.wav,S5,hum,Mamma.wav,Mamma
S91_hum_2_Mamma.wav,S91,hum,Mamma.wav,Mamma
S5_hum_3_mummamia.wav,S5,hum,Mamma.wav,Mamma
S76_whistle_2_Mamma.wav,S76,whistle,Mamma.wav,Mamma
...,...,...,...,...
S119_hum_2_Mamma.wav,S119,hum,Mamma.wav,Mamma
S99_hum_1_[Mamma].wav,S99,hum,Mamma.wav,Mamma
S110_hum_3_Mamma.wav,S110,hum,Mamma.wav,Mamma
S16_whistle_2_Mamma.wav,S16,whistle,Mamma.wav,Mamma


In [53]:
MLENDHWdf = pd.concat([mam_df,show_df])
MLENDHWdf

Unnamed: 0_level_0,participant,interpretation,number,song
file_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
S79_hum_3_[Mamma].wav,S79,hum,Mamma.wav,Mamma
S5_hum_4_mummamia.wav,S5,hum,Mamma.wav,Mamma
S91_hum_2_Mamma.wav,S91,hum,Mamma.wav,Mamma
S5_hum_3_mummamia.wav,S5,hum,Mamma.wav,Mamma
S76_whistle_2_Mamma.wav,S76,whistle,Mamma.wav,Mamma
...,...,...,...,...
S17_hum_3_Showman.wav,S17,hum,S17,hum
S36_hum_4_Showman.wav,S36,hum,S36,hum
S66_whistle_2_Showman.wav,S66,whistle,S66,whistle
S32_hum_1_Showman.wav,S32,hum,S32,hum


In [54]:
files = files_showman + files_mamma

In [55]:
n=0
fs = None 
x, fs = librosa.load(files[n],sr=fs)
t = np.arange(len(x))/fs
plt.plot(t,x)
plt.xlabel('time (sec)')
plt.ylabel('amplitude')
plt.show()
display(ipd.Audio(files[n]))

n=0
x, fs = librosa.load(files[n],sr=fs)
print('This audio signal has', len(x), 'samples')

Output hidden; open in https://colab.research.google.com to view.

# 8 Results

**Carry out your experiments here, explain your results.**

In [59]:
def getPitch(x,fs,winLen=0.02):
  #winLen = 0.02 
  p = winLen*fs
  frame_length = int(2**int(p-1).bit_length())
  hop_length = frame_length//2
  f0, voiced_flag, voiced_probs = librosa.pyin(y=x, fmin=80, fmax=450, sr=fs,
                                                 frame_length=frame_length,hop_length=hop_length)
  return f0,voiced_flag

def getXy(files,labels_file, scale_audio=False, onlySingleDigit=False):
  X,y =[],[]
  for file in tqdm(files):
    fileID = file.split('/')[-1]
    file_name = file.split('/')[-1]
    #print(file_name)
    #print(labels_file.loc[fileID]['interpretation'])
    #print(labels_file.loc[fileID]['interpretation']=='hum')
    #yi = list(labels_file.loc[fileID]['interpretation'])[0]=='hum'
    yi = labels_file.loc[fileID]['song']=='Mamma'

    fs = None # if None, fs would be 22050
    x, fs = librosa.load(file,sr=fs)
    if scale_audio: x = x/np.max(np.abs(x))
    f0, voiced_flag = getPitch(x,fs,winLen=0.02)
      
    power = np.sum(x**2)/len(x)
    pitch_mean = np.nanmean(f0) if np.mean(np.isnan(f0))<1 else 0
    pitch_std  = np.nanstd(f0) if np.mean(np.isnan(f0))<1 else 0
    voiced_fr = np.mean(voiced_flag)

    xi = [power,pitch_mean,pitch_std,voiced_fr]
    X.append(xi)
    y.append(yi)
    
  return np.array(X),np.array(y)


In [60]:
X,y = getXy(files, labels_file= MLENDHWdf, scale_audio=True, onlySingleDigit=True)

100%|██████████| 424/424 [20:31<00:00,  2.90s/it]


In [61]:
print('The shape of X is', X.shape) 
print('The shape of y is', y.shape)
print('The labels vector is', y)

The shape of X is (424, 4)
The shape of y is (424,)
The labels vector is [False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False Fa

In [62]:
print(' The number of hum recordings is ', np.count_nonzero(y))
print(' The number of whistle recordings is ', y.size - np.count_nonzero(y))

 The number of hum recordings is  217
 The number of whistle recordings is  207


In [63]:
from sklearn import svm
from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(X,y,test_size=0.3)
X_train.shape, X_val.shape, y_train.shape, y_val.shape

((296, 4), (128, 4), (296,), (128,))

In [64]:
model  = svm.SVC(C=1)
model.fit (X_train,y_train)

yt_p = model.predict(X_train)
yv_p = model.predict(X_val)

print('Training Accuracy', np.mean(yt_p==y_train))
print('Validation  Accuracy', np.mean(yv_p==y_val))
print('The support vectors are', model.support_vectors_.shape)

Training Accuracy 0.5540540540540541
Validation  Accuracy 0.4765625
The support vectors are (276, 4)


Looking at the results produced it can be concluded that the label of the audio file can be identified using this model from the initial raw audio files.The feature extraction (Pitch) that was used allowed the model to determine whether or not the audio file was labelled Mamma by providing a True or False response. The accuracy of the training set produced a value of 0.55 and the accuracy of the validation set produced a value of 0.48 . This would indicate that the model fits well. However, the accuracy for both the training and validation sets were quite low, which could mean that the model might not have been able to identify some of the audio files correctly based on their pitch.

# 9 Conclusions

In conclusion, this mini-project showed that using specific feature extractions such as pitch, machine learning problems that require classifications can be distinguished using models such as SVM.