# ECS7020P mini-project submission

The mini-project consists of two components:


1.   **Basic solution** [6 marks]: Using the MLEnd Hums and Whistles dataset, build a machine learning pipeline that takes as an input a Potter or a StarWars audio segment and predicts its song label (either Harry or StarWars).
2.   **Advanced solution** [10 marks]: There are two options. (i) Formulate a machine learning problem that can be attempted using the MLEnd Hums and Whistles dataset and build a solution model. (ii) Create a product that uses the functionality provided by a model trained on the MLEnd Hums and Whistles dataset (e.g. a simple App that predicts the label Harry or StarWars when you hum to it).  

The submission will consist of two Jupyter notebooks, one for the basic solution and another one for advanced solution. Please **name each notebook**:

* ECS7020P_miniproject_basic.ipynb
* ECS7020P_miniproject_advanced.ipynb

then **zipped and submitted toghether**.

Each uploaded notebook should consist of: 

*   **Text cells**, describing concisely each step and results.
*   **Code cells**, implementing each step.
*   **Output cells**, i.e. the output from each code cell.

and **should have the structure** indicated below. Notebooks will not be run, please make sure that the output cells are saved.

How will we evaluate your submission?

*   Conciseness in your writing (10%).
*   Correctness in your methodology (30%).
*   Correctness in your analysis and conclusions (30%).
*   Completeness (10%).
*   Originality (10%).
*   Efforts to try something new (10%).

Suggestion: Why don't you use **GitHub** to manage your project? GitHub can be used as a presentation card that showcases what you have done and gives evidence of your data science skills, knowledge and experience. 

Each notebook should be structured into the following 9 sections:


# 1 Author

**Student Name**:  Faysal Ali
**Student ID**:  180135787



# 2 Problem formulation

**Describe the machine learning problem that you want to solve and explain what's interesting about it.**


In this Mini-project, the MLEnds Hums and Whistles dataset was used to predict the label of a song from the an audio file that contained either a hum or whistle. The two sets of files used for the basic solution of this project contained a small section of audio from the theme songs of Harry Potter and Starwars. The interesting part of this machine learning problem was the use of certain features in the audio file that allowed the model to easily distinguish what theme song the audio file belonged to.

# 3 Machine Learning pipeline

**Describe your ML pipeline. Clearly identify its input and output, any intermediate stages (for instance, transformation -> models), and intermediate data moving from one stage to the next. Note that your pipeline does not need to include all the stages.**

The Machine learning pipeline that was used for this project consisted of using the inputs (Hum and Whistle audio files for Harry Potter and StarWars) and classifying the song label as Potter or StarWars as the output. In order for the model to function without any errors or have data missing, the raw data was cleaned and the file names were corrected. The dataset that was used was part 1 folders for both Potter and StarWars. The overall dataset that was used for this model included 416 audio files. Each of the files followed a certain structure when it came to file_id and Interpretation number, which allowed a dataframe to be created using the audio files. The dataframe created was used to extract the information that was required by the model.


# 4 Transformation stage

**Describe any transformations, such as feature extraction. Identify input and output.**

The feature extraction that was used for this machine learning problem was Pitch. The pitch was used to distinguish between the different audio files that correspond to specific song labels.Since the audio file contained a large number of dimensions, using the pitch as the feature extraction allowed it to be reduced to 5 dimension:
1.   Power.
2.   Pitch mean.
3.   Pitch standard deviation.
4.   Fraction of voiced region.

These 5 features were then used as predictors. 



# 5 Modelling

**Describe the ML models that you will implement.** 

A support vector machine (SVM) model was used to predict and distinguish whether the audio file was Showman or Mamma. The dataset was then split into a training set and a validation set, with the accuracy of the training and validation sets compared.

# 6 Methodology

**Describe how you will train and validate your models, how model performance is assesssed (i.e. accuracy, confusion matrix, etc)**

In order to train the SVM model a classifier is created and used to class the data. The SVM model takes two arrays as an input: an array X of the shape holding the training set and array y holding the class labels. The classes are then predicted using the test set, with the training set and validation set accuracies then compared.

# 7 Dataset

Describe the dataset that you will use to create your models and validate them. If you need to preprocess it, do it here. Include visualisations too. You can visualise raw data samples or extracted features.


In [217]:
from google.colab import drive

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import os, sys, re, pickle, glob
import urllib.request
import zipfile

import IPython.display as ipd
from tqdm import tqdm
import librosa

drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [218]:
def download_url(url, save_path):
    with urllib.request.urlopen(url) as dl_file:
        with open(save_path, 'wb') as out_file:
            out_file.write(dl_file.read())
          

In [254]:
path = '/content/drive/MyDrive/Data/Potter_1'
os.listdir(path)

['S77_whistle_2_Potter.wav',
 'S36_hum_1_Potter.wav',
 'S54_hum_2_Potter.wav',
 'S35_hum_1_[Potter].wav',
 'S31_hum_2_Potter.wav',
 'S53_hum_1_Potter.wav',
 'S114_hum_2_potter.wav',
 'S93_hum_2_Potter.wav',
 'S17_hum_1_Potter.wav',
 'S94_hum_1_Potter.wav',
 'S72_whistle_2_Potter.wav',
 'S22_hum_1_Potter.wav',
 'S87_hum_2_Potter.wav',
 'S7_hum_1_Potter.wav',
 'S100_hum_2_Potter.wav',
 'S2_whistle_2_Potter.wav',
 'S58_hum_1_Potter.wav',
 'S45_whistle_2_Potter.wav',
 'S75_hum_2_Potter.wav',
 'S99_whistle_1_[Potter].wav',
 'S25_hum_2_Potter.wav',
 'S40_whistle_2_potter.wav',
 'S39_whistle_2_Potter.wav',
 'S40_hum_2_potter.wav',
 'S26_hum_1_Potter.wav',
 'S60_whistle_1_[Potter].wav',
 'S52_whistle_2_Potter.wav',
 'S44_hum_2_Potter.wav',
 'S47_hum_1_potter.wav',
 'S78_whistle_2_Potter.wav',
 'S8_hum_1_[Potter].wav',
 'S62_hum_1_Potter.wav',
 'S79_hum_4_[Potter].wav',
 'S84_hum_1_Potter.wav',
 'S4_hum_2_Potter.wav',
 'S57_whistle_2_Potter.wav',
 'S103_whistle_2_Potter.wav',
 'S21_hum_2_Potter

In [306]:

sample_path = '/content/drive/MyDrive/Data/Potter_1/*.wav'
files_potter = glob.glob(sample_path)
len(files_potter)
for _ in range(5):
  n = np.random.randint(210)
  display(ipd.Audio(files[n]))

Output hidden; open in https://colab.research.google.com to view.

In [339]:
pot_table = [] 

for file in files_potter:
  file_name = file.split('/')[-1]
  participant_ID = file.split('/')[-1].split('_')[0]
  interpretation_type = file.split('/')[-1].split('_')[1]
  #interpretation_number = file.split('/')[-1].split('_')[2]
  #song = file.split('/')[-1].split('_')[3].split('.')[0]
  pot_table.append([file_name,participant_ID,interpretation_type,interpretation_number, song])

pot_table



[['S77_whistle_2_Potter.wav', 'S77', 'whistle', '2', 'Potter'],
 ['S36_hum_1_Potter.wav', 'S36', 'hum', '2', 'Potter'],
 ['S54_hum_2_Potter.wav', 'S54', 'hum', '2', 'Potter'],
 ['S35_hum_1_[Potter].wav', 'S35', 'hum', '2', 'Potter'],
 ['S31_hum_2_Potter.wav', 'S31', 'hum', '2', 'Potter'],
 ['S53_hum_1_Potter.wav', 'S53', 'hum', '2', 'Potter'],
 ['S114_hum_2_potter.wav', 'S114', 'hum', '2', 'Potter'],
 ['S93_hum_2_Potter.wav', 'S93', 'hum', '2', 'Potter'],
 ['S17_hum_1_Potter.wav', 'S17', 'hum', '2', 'Potter'],
 ['S94_hum_1_Potter.wav', 'S94', 'hum', '2', 'Potter'],
 ['S72_whistle_2_Potter.wav', 'S72', 'whistle', '2', 'Potter'],
 ['S22_hum_1_Potter.wav', 'S22', 'hum', '2', 'Potter'],
 ['S87_hum_2_Potter.wav', 'S87', 'hum', '2', 'Potter'],
 ['S7_hum_1_Potter.wav', 'S7', 'hum', '2', 'Potter'],
 ['S100_hum_2_Potter.wav', 'S100', 'hum', '2', 'Potter'],
 ['S2_whistle_2_Potter.wav', 'S2', 'whistle', '2', 'Potter'],
 ['S58_hum_1_Potter.wav', 'S58', 'hum', '2', 'Potter'],
 ['S45_whistle_2_Potte

In [340]:
potter_df = pd.DataFrame(pot_table,columns=['file_id','participant','interpretation','number','song']).set_index('file_id') 
potter_df

Unnamed: 0_level_0,participant,interpretation,number,song
file_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
S77_whistle_2_Potter.wav,S77,whistle,2,Potter
S36_hum_1_Potter.wav,S36,hum,2,Potter
S54_hum_2_Potter.wav,S54,hum,2,Potter
S35_hum_1_[Potter].wav,S35,hum,2,Potter
S31_hum_2_Potter.wav,S31,hum,2,Potter
...,...,...,...,...
S80_hum_2_Potter.wav,S80,hum,2,Potter
S58_hum_2_Potter.wav,S58,hum,2,Potter
S66_hum_2_Potter.wav,S66,hum,2,Potter
S107_hum_2_Potter.wav,S107,hum,2,Potter


In [341]:
path1 = '/content/drive/MyDrive/Data/StarWars_1'
os.listdir(path1)

['S86_hum_3_StarWars.wav',
 'S23_hum_4_StarWars.wav',
 'S86_hum_4_StarWars.wav',
 'S23_hum_3_StarWars.wav',
 'S61_hum_1_StarWars.wav',
 'S2_whistle_2_StarWars.wav',
 'S43_hum_2_StarWars.wav',
 'S66_whistle_2_StarWars.wav',
 'S88_hum_2_StarWars.wav',
 'S91_whistle_2_StarWars.wav',
 'S50_whistle_2_StarWars.wav',
 'S104_hum_3_StarWars.wav',
 'S50_hum_2_StarWars.wav',
 'S70_hum_4_StarWars.wav',
 'S13_hum_4_StarWars.wav',
 'S57_whistle_2_StarWars.wav',
 'S70_hum_3_StarWars.wav',
 'S13_hum_3_StarWars.wav',
 'S73_hum_3_StarWars.wav',
 'S10_hum_3_StarWars.wav',
 'S73_hum_4_StarWars.wav',
 'S10_hum_4_StarWars.wav',
 'S109_hum_3_StarWars.wav',
 'S96_hum_2_StarWars.wav',
 'S104_hum_4_StarWars.wav',
 'S1_whistle_2_StarWars.wav',
 'S109_hum_4_StarWars.wav',
 'S3_hum_3_StarWars.wav',
 'S63_hum_3_StarWars.wav',
 'S40_hum_2_StarWars.wav',
 'S114_hum_3_StarWars.wav',
 'S54_whistle_2_StarWars.wav',
 'S114_hum_4_StarWars.wav',
 'S30_hum_4_StarWars.wav',
 'S6_whistle_2_StarWars.wav',
 'S53_hum_3_StarWars.

In [342]:
sample_path1 = '/content/drive/MyDrive/Data/StarWars_1/*.wav'
files_starwars = glob.glob(sample_path1)
len(files_starwars)

208

In [344]:
star_table = [] 

for file in files_starwars:
  file_name = file.split('/')[-1]
  participant_ID = file.split('/')[-1].split('_')[0]
  interpretation_type = file.split('/')[-1].split('_')[1]
  interpretation_number = file.split('/')[-1].split('_')[2]
  song = file.split('/')[-1].split('_')[3].split('.')[0]
  star_table.append([file_name,participant_ID,interpretation_type,interpretation_number, song])

star_table

[['S86_hum_3_StarWars.wav', 'S86', 'hum', '3', 'StarWars'],
 ['S23_hum_4_StarWars.wav', 'S23', 'hum', '4', 'StarWars'],
 ['S86_hum_4_StarWars.wav', 'S86', 'hum', '4', 'StarWars'],
 ['S23_hum_3_StarWars.wav', 'S23', 'hum', '3', 'StarWars'],
 ['S61_hum_1_StarWars.wav', 'S61', 'hum', '1', 'StarWars'],
 ['S2_whistle_2_StarWars.wav', 'S2', 'whistle', '2', 'StarWars'],
 ['S43_hum_2_StarWars.wav', 'S43', 'hum', '2', 'StarWars'],
 ['S66_whistle_2_StarWars.wav', 'S66', 'whistle', '2', 'StarWars'],
 ['S88_hum_2_StarWars.wav', 'S88', 'hum', '2', 'StarWars'],
 ['S91_whistle_2_StarWars.wav', 'S91', 'whistle', '2', 'StarWars'],
 ['S50_whistle_2_StarWars.wav', 'S50', 'whistle', '2', 'StarWars'],
 ['S104_hum_3_StarWars.wav', 'S104', 'hum', '3', 'StarWars'],
 ['S50_hum_2_StarWars.wav', 'S50', 'hum', '2', 'StarWars'],
 ['S70_hum_4_StarWars.wav', 'S70', 'hum', '4', 'StarWars'],
 ['S13_hum_4_StarWars.wav', 'S13', 'hum', '4', 'StarWars'],
 ['S57_whistle_2_StarWars.wav', 'S57', 'whistle', '2', 'StarWars'],


In [345]:
starwars_df = pd.DataFrame(star_table,columns=['file_id','participant','interpretation','number','song']).set_index('file_id') 
starwars_df

Unnamed: 0_level_0,participant,interpretation,number,song
file_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
S86_hum_3_StarWars.wav,S86,hum,3,StarWars
S23_hum_4_StarWars.wav,S23,hum,4,StarWars
S86_hum_4_StarWars.wav,S86,hum,4,StarWars
S23_hum_3_StarWars.wav,S23,hum,3,StarWars
S61_hum_1_StarWars.wav,S61,hum,1,StarWars
...,...,...,...,...
S12_hum_3_StarWars.wav,S12,hum,3,StarWars
S71_hum_3_StarWars.wav,S71,hum,3,StarWars
S6_whistle_1_StarWars.wav,S6,whistle,1,StarWars
S48_whistle_2_StarWars.wav,S48,whistle,2,StarWars


In [346]:
MLEndHWdf = pd.concat([potter_df,starwars_df])
MLEndHWdf 

Unnamed: 0_level_0,participant,interpretation,number,song
file_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
S77_whistle_2_Potter.wav,S77,whistle,2,Potter
S36_hum_1_Potter.wav,S36,hum,2,Potter
S54_hum_2_Potter.wav,S54,hum,2,Potter
S35_hum_1_[Potter].wav,S35,hum,2,Potter
S31_hum_2_Potter.wav,S31,hum,2,Potter
...,...,...,...,...
S12_hum_3_StarWars.wav,S12,hum,3,StarWars
S71_hum_3_StarWars.wav,S71,hum,3,StarWars
S6_whistle_1_StarWars.wav,S6,whistle,1,StarWars
S48_whistle_2_StarWars.wav,S48,whistle,2,StarWars


In [347]:
files = files_potter + files_starwars

In [348]:
n=0
fs = None # Sampling frequency. If None, fs would be 22050
x, fs = librosa.load(files[n],sr=fs)
t = np.arange(len(x))/fs
plt.plot(t,x)
plt.xlabel('time (sec)')
plt.ylabel('amplitude')
plt.show()
display(ipd.Audio(files[n]))

n=0
x, fs = librosa.load(files[n],sr=fs)
print('This audio signal has', len(x), 'samples')


Output hidden; open in https://colab.research.google.com to view.

# 8 Results 

**Carry out your experiments here, explain your results.**


In [349]:
def getPitch(x,fs,winLen=0.02):
  #winLen = 0.02 
  p = winLen*fs
  frame_length = int(2**int(p-1).bit_length())
  hop_length = frame_length//2
  f0, voiced_flag, voiced_probs = librosa.pyin(y=x, fmin=80, fmax=450, sr=fs,
                                                 frame_length=frame_length,hop_length=hop_length)
  return f0,voiced_flag

In [350]:
def getXy(files,labels_file, scale_audio=False, onlySingleDigit=False):
  X,y =[],[]
  for file in tqdm(files):
    fileID = file.split('/')[-1]
    file_name = file.split('/')[-1]
    #print(file_name)
    #print(labels_file.loc[fileID]['interpretation'])
    #print(labels_file.loc[fileID]['interpretation']=='hum')
    #yi = list(labels_file.loc[fileID]['interpretation'])[0]=='hum'
    yi = labels_file.loc[fileID]['song']=='Potter'

    fs = None # if None, fs would be 22050
    x, fs = librosa.load(file,sr=fs)
    if scale_audio: x = x/np.max(np.abs(x))
    f0, voiced_flag = getPitch(x,fs,winLen=0.02)
      
    power = np.sum(x**2)/len(x)
    pitch_mean = np.nanmean(f0) if np.mean(np.isnan(f0))<1 else 0
    pitch_std  = np.nanstd(f0) if np.mean(np.isnan(f0))<1 else 0
    voiced_fr = np.mean(voiced_flag)

    xi = [power,pitch_mean,pitch_std,voiced_fr]
    X.append(xi)
    y.append(yi)

  return np.array(X),np.array(y)

In [351]:
X,y = getXy(files, labels_file=MLEndHWdf, scale_audio=True, onlySingleDigit=True)

100%|██████████| 414/414 [19:41<00:00,  2.85s/it]


In [352]:
print('The shape of X is', X.shape) 
print('The shape of y is', y.shape)
print('The labels vector is', y)

The shape of X is (414, 4)
The shape of y is (414,)
The labels vector is [ True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  T

In [353]:
print(' The number of hum recordings is ', np.count_nonzero(y))
print(' The number of whistle recordings is ', y.size - np.count_nonzero(y))

 The number of hum recordings is  206
 The number of whistle recordings is  208


In [354]:
from sklearn import svm
from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(X,y,test_size=0.3)
X_train.shape, X_val.shape, y_train.shape, y_val.shape

((289, 4), (125, 4), (289,), (125,))

In [355]:
model  = svm.SVC(C=1)
model.fit (X_train,y_train)

yt_p = model.predict(X_train)
yv_p = model.predict(X_val)

print('Training Accuracy', np.mean(yt_p==y_train))
print('Validation  Accuracy', np.mean(yv_p==y_val))
print('The support vectors are', model.support_vectors_.shape)

Training Accuracy 0.5536332179930796
Validation  Accuracy 0.504
The support vectors are (277, 4)


Looking at the results produced it can be concluded that the label of the audio file can be identified using this model from the initial raw audio files.The feature extraction (Pitch) that was used allowed the model to determine whether or not the audio file was labelled Potter by providing a True or False response. The accuracy of the training set produced a value of 0.55 and the accuracy of the validation set produced a value of 0.50. This would indicate that the model fits well. However, the accuracy for both the training and validation sets were quite low, which could mean that the model might not have been able to identify some of the audio files correctly based on their pitch.

# 9 Conclusions

In conclusion, this mini-project showed that using specific feature extractions such as pitch, machine learning problems that require classifications can be distinguished using models such as SVM.