<a href="https://colab.research.google.com/github/satvik-dixit/speech_emotion_recognition/blob/main/EmoDB_Phase_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# EmoDB

## Importing packages

In [20]:
!pip install speechbrain
!pip install transformers
!git clone https://github.com/GasserElbanna/serab-byols.git
!python3 -m pip install -e ./serab-byols

!pip install tqdm==4.60.0
!pip install opensmile


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
fatal: destination path 'serab-byols' already exists and is not an empty directory.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Obtaining file:///content/serab-byols
Installing collected packages: serab-byols
  Attempting uninstall: serab-byols
    Found existing installation: serab-byols 0.0.0
    Can't uninstall 'serab-byols'. No files were found to uninstall.
  Running setup.py develop for serab-byols
Successfully installed serab-byols-0.0.0
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
from tqdm import tqdm
import serab_byols
import opensmile
from transformers import Wav2Vec2Model, HubertModel
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
from sklearn.model_selection import train_test_split
from random import sample
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestRegressor
import warnings
warnings.filterwarnings('ignore')
from sklearn.svm import SVC
from sklearn import preprocessing

In [None]:
import numpy as np
import soundfile as sf
import librosa
import os
from glob import glob
import torch

In [None]:
! pip install -q kaggle

from google.colab import files
files.upload()

# Name directory
! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json

Saving kaggle.json to kaggle.json


# Phase 1 Functions: Loading and resampling audio files

In [None]:
# Defining function for loading and resampling audio files

def load_audio_files(audio_files, resampling_frequency=16000, audio_list=None):
    '''
    Loads and resamples audio files 
    
    Parameters
    ------------
    audio_files: string
        The paths of the wav files 
    resampling_frequency: integer
        The frequency which all audios will be resampled to
    audio_list: list of torch arrays of audios to which more audios need too be added, empty by default

    Returns
    ------------
    audio_list: list of torch arrays
        A list of torch arrays, one array for each audio file
        
    '''

    # Making audio_list
    if audio_list is None:
      audio_list = []

    # Resampling
    for audio in audio_files:
        signal, fs = librosa.load(audio, sr=resampling_frequency)
        audio_list.append(torch.from_numpy(signal))
        
    return audio_list
        

# Phase 2 Functions: Embedding Extraction

## Audio Embeddings Extraction

In [None]:
def audio_embeddings_model(model_name):
  '''
  Generates model for embedding extraction 
  
  Parameters
  ------------
  mode_name: string
      The model to used, could be 'wav2vec', 'hubert' or 'hybrid_byols'

  Returns
  ------------
  model: object

  '''
  if model_name=='wav2vec2':
    model_hub = 'facebook/wav2vec2-large-960h-lv60-self'
    model = Wav2Vec2Model.from_pretrained(model_hub)
  elif model_name=='hubert':
    model_hub = 'facebook/hubert-xlarge-ll60k'
    model = HubertModel.from_pretrained(model_hub)
  elif model_name=='hybrid_byols':
    model_name = 'cvt'
    checkpoint_path = "serab-byols/checkpoints/cvt_s1-d1-e64_s2-d1-e256_s3-d1-e512_BYOLAs64x96-osandbyolaloss6373-e100-bs256-lr0003-rs42.pth"
    model = serab_byols.load_model(checkpoint_path, model_name)
  elif model_name=='compare':
    model = opensmile.Smile(
        feature_set=opensmile.FeatureSet.ComParE_2016,
        feature_level=opensmile.FeatureLevel.Functionals,
    )
  elif model_name=='egemaps':
    model = opensmile.Smile(
        feature_set=opensmile.FeatureSet.eGeMAPSv02,
        feature_level=opensmile.FeatureLevel.Functionals,
    )
  return model


def audio_embeddings(audio_list, model_name, model, sampling_rate=16000):
  '''
  Loads and resamples audio files 
  
  Parameters
  ------------
  audio_list: list of arrays
      A list of arrays, one array for each audio file
  model_name: string
      The model to used, could be 'wav2vec', 'hubert' or 'hybrid_byols'
  model: object
      The model generated by audio_embeddings_model function
  n_feats: int
      The number of features of each audio file, 6373 for 'compare' and 88 for 'egemaps'

  Returns
  ------------
  embeddings_array: torch array
      The array containg embeddings of all audio_files, dimension (number of audio files × n_feats)
      
  '''
  if model_name=='hybrid_byols':
    embeddings_array = serab_byols.get_scene_embeddings(audio_list, model)
  else:
    embeddings_list = []
    for i in tqdm(range(len(audio_list))):
      if model_name=='wav2vec2' or model_name=='hubert':
        embeddings = model(audio_list[i].reshape(1,-1)).last_hidden_state.mean(1)
        embeddings_list.append(embeddings[0])
      elif model_name=='compare' or model_name=='egemaps':
        embeddings = model.process_signal(audio_list[i], sampling_rate)
        embeddings_list.append(torch.tensor(embeddings.values[0], dtype=torch.float32))
    embeddings_array = torch.stack(embeddings_list)
  return embeddings_array


# Phase 3 Functions: Downstream Task - Speech Emotion Recognotion

## Speaker normalisation

In [None]:

def speaker_normalisation(embeddings_array, speakers):
  '''
  Normalises embeddings_array for each speaker
  
  Parameters
  ------------
  embeddings_array: torch tensor
      The tensor of embeddings, one row for each audio file
  speakers: list of integers
      The list of speakers

  Returns
  ------------
  embeddings_array: torch tensor
      The tensor containg normalised embeddings 
      
  '''
  speaker_ids = set(speakers)
  for speaker_id in speaker_ids:
    speaker_embeddings_indices = np.where(np.array(speakers)==speaker_id)[0]
    speaker_embeddings = embeddings_array[speaker_embeddings_indices,:]
    normalised_speaker_embeddings = scaler.fit_transform(speaker_embeddings)
    embeddings_array[speaker_embeddings_indices] = torch.tensor(normalised_speaker_embeddings).float()
  return embeddings_array


## Dividing into Training and Test sets

In [None]:
def split_train_test(normalised_embeddings_array, labels, speakers, test_size = 0.30):
  '''
  Splits into training and testing set with different speakers
  
  Parameters
  ------------
  normalised_embeddings_array: torch tensor
      The tensor containing normalised embeddings 
  labels: list of strings
      The list of emotions corresponding to audio files
  speakers: list of integers 
      The list of speakers

  Returns
  ------------
  X_train: torch tensor
    The normalised embeddings that will be used for training
  X_test: torch tensor
    The normalised embeddings that will be used for testing
  y_train: list of strings
    The labels that will be used for training
  y_test: list of strings
    The labels that will be used for testing
  '''

  # 10 speakers in this dataset
  all_speakers = set(speakers)
  # 3 of the 10 total speakers
  test_speakers = sample(all_speakers, int(test_size*len(all_speakers)))

  test_speakers_indices = []
  train_speakers_indices = []

  for speaker in all_speakers:
    if speaker in test_speakers:
      speaker_indices = np.where(np.array(speakers)==speaker)[0]
      test_speakers_indices.extend(speaker_indices)
    else:
      speaker_indices = np.where(np.array(speakers)==speaker)[0]
      train_speakers_indices.extend(speaker_indices)

  X_train = normalised_embeddings_array[train_speakers_indices]
  X_test = normalised_embeddings_array[test_speakers_indices]

  y_train = [0 for i in range(len(train_speakers_indices))]
  y_test = [0 for i in range(len(test_speakers_indices))]

  for i,index in enumerate(train_speakers_indices):
    y_train[i] = labels[index]
  for i,index in enumerate(test_speakers_indices):
    y_test[i] = labels[index]


  return X_train, X_test, y_train, y_test


# EmoDB

# Phase 1

In [None]:
# Phase_1
# Load dataset
! kaggle datasets download -d piyushagni5/berlin-database-of-emotional-speech-emodb
! unzip berlin-database-of-emotional-speech-emodb.zip

# Resample dataset
audio_files_emo = glob(os.path.join('/content/wav','*.wav'))
audio_list_emo= load_audio_files(audio_files_emo, resampling_frequency=16000)


# Verify phase_1
print()
print('number of audio files: {}'.format(len(audio_list_emo)))
print(audio_list_emo[0].shape)


Downloading berlin-database-of-emotional-speech-emodb.zip to /content
 87% 33.0M/38.0M [00:00<00:00, 121MB/s] 
100% 38.0M/38.0M [00:00<00:00, 126MB/s]
Archive:  berlin-database-of-emotional-speech-emodb.zip
  inflating: wav/03a01Fa.wav         
  inflating: wav/03a01Nc.wav         
  inflating: wav/03a01Wa.wav         
  inflating: wav/03a02Fc.wav         
  inflating: wav/03a02Nc.wav         
  inflating: wav/03a02Ta.wav         
  inflating: wav/03a02Wb.wav         
  inflating: wav/03a02Wc.wav         
  inflating: wav/03a04Ad.wav         
  inflating: wav/03a04Fd.wav         
  inflating: wav/03a04Lc.wav         
  inflating: wav/03a04Nc.wav         
  inflating: wav/03a04Ta.wav         
  inflating: wav/03a04Wc.wav         
  inflating: wav/03a05Aa.wav         
  inflating: wav/03a05Fc.wav         
  inflating: wav/03a05Nd.wav         
  inflating: wav/03a05Tc.wav         
  inflating: wav/03a05Wa.wav         
  inflating: wav/03a05Wb.wav         
  inflating: wav/03a07Fa.wav     

# Phase 2

In [None]:
# Phase_2

# Wav2vec
# model = audio_embeddings_model(model_name='wav2vec2')
# embeddings_array_wav2vec = audio_embeddings(audio_list_emo[:50], model_name='wav2vec2', model=model)

# Hubert
# model = audio_embeddings_model(model_name='hubert')
# embeddings_array_hubert = audio_embeddings(audio_list_emo[:50], model_name='hubert', model=model)

# Hybrid BYOLS
model = audio_embeddings_model(model_name='hybrid_byols')
embeddings_array_byols = audio_embeddings(audio_list_emo, model_name='hybrid_byols', model=model)

# EmoDB compare
model = audio_embeddings_model(model_name='compare')
embeddings_array_compare = audio_embeddings(audio_list_emo, model_name='compare', model=model)

# EmoDB egemaps
model = audio_embeddings_model(model_name='egemaps')
embeddings_array_egemaps = audio_embeddings(audio_list_emo, model_name='egemaps', model=model)

# ---------------------------------------------------------------------------------------------------

# Verify Phase_2
models = ['byols', 'compare', 'egemaps']
embeddings_arrays = [embeddings_array_byols, embeddings_array_compare, embeddings_array_egemaps]

for i in range(len(models)):
  print()
  print()
  print('MODEL: {}'.format(models[i]))
  print()
  print('The shape of the embeddings array is {}'.format(embeddings_arrays[i].shape))
  print('The embeddings array is: ')
  print((embeddings_arrays[i]))


Generating Embeddings...: 100%|██████████| 535/535 [00:32<00:00, 16.29it/s]
100%|██████████| 535/535 [01:03<00:00,  8.43it/s]
100%|██████████| 535/535 [00:59<00:00,  8.99it/s]



MODEL: byols

The shape of the embeddings array is torch.Size([535, 2048])
The embeddings array is: 
tensor([[ 4.7838,  5.1332,  1.4826,  ...,  6.5955,  1.3091,  4.0811],
        [ 2.1415,  4.3838, -0.1100,  ...,  4.6903,  0.9929,  4.6195],
        [ 5.1235,  6.4362,  1.3371,  ...,  4.0452, -0.5890,  3.3637],
        ...,
        [ 4.6546,  3.6335,  1.4479,  ...,  4.3445,  0.4926,  3.9791],
        [ 5.6109,  4.9422,  2.1563,  ...,  4.2048,  0.1288,  3.7671],
        [ 5.6137,  5.7598,  0.7077,  ...,  5.4536,  0.5266,  4.1597]])


MODEL: compare

The shape of the embeddings array is torch.Size([535, 6373])
The embeddings array is: 
tensor([[3.6113e+00, 1.3018e-01, 9.9408e-01,  ..., 7.0783e+01, 1.2688e+02,
         6.3207e+01],
        [3.3879e+00, 1.6026e-01, 9.9359e-01,  ..., 4.9951e+01, 1.0987e+02,
         5.0209e+01],
        [3.6579e+00, 1.2462e-01, 9.1185e-03,  ..., 5.8800e+01, 1.1421e+02,
         5.7545e+01],
        ...,
        [3.5641e+00, 3.2704e-01, 7.0440e-01,  ..., 5.6




# Phase 3


### Speaker normalisation

In [None]:
# Phase_3: Speaker normalisation

speakers = []
labels = []

for audio_file in audio_files_emo:
  file_name = audio_file.split('/')[3]
  speakers.append(int(file_name[:2]))
  labels.append(file_name[5:7])


# Verify speakers and labels array
print('Speakers:')
print(speakers)
print('Labels:')
print(labels)

# -------------------------------------------------------------------------------------------------------------------------

# Normalised arrays
# normalised_embeddings_wav2vec = speaker_normalisation(embeddings_array_wav2vec, speakers)
# normalised_embeddings_hubert = speaker_normalisation(embeddings_array_hubert, speakers)
normalised_embeddings_byols = speaker_normalisation(embeddings_array_byols, speakers)
normalised_embeddings_compare= speaker_normalisation(embeddings_array_compare, speakers)
normalised_embeddings_egemaps = speaker_normalisation(embeddings_array_egemaps, speakers)


# Verifying normalised_embeddings_arrays
normalised_embeddings_arrays = [normalised_embeddings_byols, normalised_embeddings_compare, normalised_embeddings_egemaps]

for i in range(len(models)):
  print()
  print()
  print('MODEL: {}'.format(models[i]))
  print()
  print('The shape of the normalised embeddings array is: {}'.format(normalised_embeddings_arrays[i].shape))
  print('Normalised Embeddings Array:')
  print((normalised_embeddings_arrays[i]))
  print()
  columnwise_mean = torch.mean(speaker_normalisation(embeddings_arrays[i], speakers), 0)
  print('Columnwise_mean:')
  print(columnwise_mean)
  if torch.all(columnwise_mean < 10**(-6)):
    print('All means are less than 10**-6')
  else:
    print('All means are NOT less than 10**-6')


Speakers:
[8, 11, 9, 14, 16, 13, 13, 16, 16, 9, 9, 16, 15, 12, 9, 8, 16, 15, 15, 16, 11, 9, 12, 10, 11, 12, 11, 14, 8, 3, 9, 13, 13, 16, 16, 12, 16, 11, 16, 10, 9, 13, 15, 3, 13, 3, 9, 11, 10, 15, 15, 3, 9, 3, 9, 10, 13, 10, 8, 16, 15, 14, 15, 14, 14, 13, 14, 15, 3, 8, 14, 11, 8, 14, 11, 14, 16, 3, 11, 11, 3, 16, 14, 9, 3, 9, 9, 10, 3, 11, 13, 10, 11, 16, 13, 14, 15, 13, 14, 11, 11, 8, 15, 3, 16, 8, 13, 16, 8, 11, 3, 3, 8, 14, 9, 15, 10, 15, 3, 3, 11, 10, 13, 15, 11, 16, 9, 14, 13, 13, 16, 13, 16, 15, 10, 14, 8, 13, 16, 14, 8, 8, 12, 8, 14, 3, 11, 14, 3, 15, 11, 8, 8, 15, 14, 15, 14, 11, 14, 8, 11, 3, 16, 3, 14, 12, 16, 15, 9, 13, 9, 10, 16, 3, 15, 10, 16, 8, 16, 16, 12, 8, 14, 3, 15, 13, 9, 14, 11, 15, 8, 8, 15, 15, 3, 15, 16, 10, 16, 14, 13, 14, 13, 3, 10, 14, 14, 8, 10, 11, 16, 3, 15, 15, 8, 12, 9, 16, 11, 16, 9, 16, 15, 10, 9, 3, 14, 14, 13, 14, 8, 16, 3, 11, 9, 10, 16, 11, 3, 13, 15, 8, 10, 3, 10, 16, 3, 12, 15, 14, 13, 15, 16, 13, 16, 16, 8, 10, 9, 15, 13, 9, 13, 14, 10, 14, 13, 

### Train Test splitting

In [None]:
# Phase_3: Train Test splitting

# X_train_wav2vec, X_test_wav2vec, y_train_wav2vec, y_test_wav2vec = split_train_test(normalised_embeddings_wav2vec, labels, speakers, test_size = 0.30)
# X_train_hubert, X_test_hubert, y_train_hubert, y_test_hubert = split_train_test(normalised_embeddings_hubert, labels, speakers, test_size = 0.30)
X_train_byols, X_test_byols, y_train_byols, y_test_byols = split_train_test(normalised_embeddings_byols, labels, speakers, test_size = 0.30)
X_train_compare, X_test_compare, y_train_compare, y_test_compare = split_train_test(normalised_embeddings_compare, labels, speakers, test_size = 0.30)
X_train_egemaps, X_test_egemaps, y_train_egemaps, y_test_egemaps = split_train_test(normalised_embeddings_egemaps, labels, speakers, test_size = 0.30)

X_trains = [X_train_byols, X_train_compare, X_train_egemaps]
X_tests = [X_test_byols, X_test_compare, X_test_egemaps]
y_trains = [y_train_byols, y_train_compare, y_train_egemaps]
y_tests = [y_test_byols, y_test_compare, y_test_egemaps]

# Verify
for i in range(len(models)):
  print()
  print()
  print('MODEL: {}'.format(models[i]))
  print()
  print('The shape of X_train is: {}'.format(X_trains[i].shape))
  print('X_train')
  print(X_trains[i])
  print()
  print('The shape of X_test is: {}'.format(X_tests[i].shape))
  print('X_test')
  print(X_tests[i])
  print()
  print('The length of y_train is: {}'.format(len(y_trains[i])))
  print('y_train')
  print(y_trains[i])
  print()
  print('The length of y_test is: {}'.format(len(y_tests[i])))
  print('y_test')
  print(y_tests[i])




MODEL: byols

The shape of X_train is: torch.Size([369, 2048])
X_train
tensor([[-0.7266, -0.2330,  0.0496,  ...,  2.7701,  1.4310,  0.8477],
        [ 0.6627, -0.6566,  0.5286,  ...,  0.4009, -1.1438, -0.2275],
        [ 0.9579, -0.8962, -1.1898,  ..., -2.1374, -1.0756, -0.2003],
        ...,
        [ 0.9588, -0.9793, -0.5004,  ...,  0.8546,  0.7931,  0.5545],
        [-0.5254,  0.1002, -0.4655,  ..., -0.3963,  0.9170,  1.1318],
        [ 0.3331,  1.6261,  1.0740,  ..., -0.6312,  0.5772,  0.7013]])

The shape of X_test is: torch.Size([166, 2048])
X_test
tensor([[-0.4564, -1.7515, -1.2570,  ...,  0.0391, -0.0865,  1.8213],
        [ 0.0677, -1.3450,  0.1907,  ..., -0.6493, -0.7612, -0.0036],
        [ 0.2616,  0.3525,  1.3705,  ...,  0.1721,  0.2444,  0.1833],
        ...,
        [-0.3465, -0.7112,  0.3698,  ...,  0.8286,  0.1961,  0.2886],
        [-0.5446, -0.3478, -0.2528,  ...,  2.0745, -0.5076,  0.3372],
        [ 0.6637, -0.2886,  1.4136,  ..., -0.1432,  0.4254,  0.4458]])

Th

## 1. Logistic Regression

Defining functions for hyperparameter tuning:

In [None]:

def get_hyperparams(X_train, X_test, y_train, y_test):
  logreg = LogisticRegression()
  parameters = {'penalty' : ['l1','l2'], 'C': np.logspace(-4,2,7), 'solver': ['newton-cg', 'lbfgs', 'liblinear']}
  grid = GridSearchCV(logreg, param_grid = parameters, cv=5)                     
  grid.fit(X_train,y_train)
  print('Accuracy :',grid.best_score_)
  print('Best Parameters: {}'.format(grid.best_params_))
  print('Accuracy on test_set: {}'.format(grid.score(X_test, y_test)))
  return grid.best_params_


Getting best hyperparameters and checking accuracy of the model:

In [None]:

for i in range(len(models)):
  print()
  print('MODEL: {}'.format(models[i]))
  hyperparams = get_hyperparams(X_trains[i], X_tests[i], y_trains[i], y_tests[i])
  


MODEL: byols
Accuracy : 0.3089966679007775
Best Parameters: {'C': 0.1, 'penalty': 'l2', 'solver': 'newton-cg'}
Accuracy on test_set: 0.2469879518072289

MODEL: compare
Accuracy : 0.28157894736842104
Best Parameters: {'C': 0.01, 'penalty': 'l2', 'solver': 'newton-cg'}
Accuracy on test_set: 0.1935483870967742

MODEL: egemaps
Accuracy : 0.2763902763902764
Best Parameters: {'C': 0.1, 'penalty': 'l2', 'solver': 'liblinear'}
Accuracy on test_set: 0.18243243243243243


## 2. Support Vector Machines

Hyperparameter Tuning:

In [None]:

def get_hyperparams_svm(X_train, X_test, y_train, y_test):
  svm = SVC()
  parameters = {'C': np.logspace(-2,3,6), 'gamma': np.logspace(-5,2,8), 'degree':[1], 'kernel':['rbf','poly','sigmoid','linear']}
  grid = GridSearchCV(svm, param_grid = parameters, cv=5)                     
  grid.fit(X_train, y_train)
  print('Accuracy:',grid.best_score_)
  print('Best Parameters {}'.format(grid.best_params_))
  print('Accuracy on test_set: {}'.format(grid.score(X_test, y_test)))
  return grid.best_params_


Getting best hyperparameters and checking accuracy:

In [None]:

for i in range(len(models)):
  print()
  print('MODEL: {}'.format(models[i]))
  hyperparams = get_hyperparams_svm(X_trains[i], X_tests[i], y_trains[i], y_tests[i])



MODEL: byols
Accuracy: 0.3089966679007775
Best Parameters {'C': 10.0, 'degree': 1, 'gamma': 0.0001, 'kernel': 'sigmoid'}
Accuracy on test_set: 0.24096385542168675

MODEL: compare
Accuracy: 0.29473684210526313
Best Parameters {'C': 100.0, 'degree': 1, 'gamma': 0.0001, 'kernel': 'sigmoid'}
Accuracy on test_set: 0.25806451612903225

MODEL: egemaps
Accuracy: 0.2921078921078921
Best Parameters {'C': 0.01, 'degree': 1, 'gamma': 1e-05, 'kernel': 'linear'}
Accuracy on test_set: 0.19594594594594594


## 3. Random Forrest Regression

Defining functions for hyperparameter tuning:

In [21]:

def get_hyperparams_rfr(X_train, X_test, y_train, y_test):
  le = preprocessing.LabelEncoder()
  le.fit(labels)
  y_train = le.transform(y_train)
  y_test = le.transform(y_test)

  rfr = RandomForestRegressor()
  parameters = {'n_estimators' : [50,100,200], 'max_features' : ['auto', 'log2', 'sqrt'], 'bootstrap' : [True, False]}
  grid = GridSearchCV(rfr, param_grid = parameters, cv = 5)                     
  grid.fit(X_train, y_train)
  print('Accuracy:',grid.best_score_)
  print('Best Parameters {}'.format(grid.best_params_))
  print('Accuracy on test_set: {}'.format(grid.score(X_test, y_test)))
  return grid.best_params_


Getting best hyperparameters and checking accuracy of the model:

In [23]:

for i in range(0, len(models), 2):
  print()
  print('MODEL: {}'.format(models[i]))
  hyperparams = get_hyperparams_rfr(X_trains[i], X_tests[i], y_trains[i], y_tests[i])
  


MODEL: byols
Accuracy: 0.41201278530927415
Best Parameters {'bootstrap': False, 'max_features': 'sqrt', 'n_estimators': 100}
Accuracy on test_set: 0.5508525826114563

MODEL: egemaps
Accuracy: 0.6096095153352803
Best Parameters {'bootstrap': True, 'max_features': 'auto', 'n_estimators': 100}
Accuracy on test_set: 0.532827692151189


# Impoving Accuracy



### 1. Training on the all the embeddings after speaker normalisation (without spliting into training set and test set) <BR>


In [25]:

X_trains = [normalised_embeddings_arrays[0], normalised_embeddings_arrays[1], normalised_embeddings_arrays[2]]
y_trains = [labels, labels, labels]

for i in range(0, len(models), 2):
  print()
  print()
  print('MODEL: {}'.format(models[i]))
  print('1. Logistic Regression:')
  get_hyperparams(X_trains[i], X_trains[i], y_trains[i], y_trains[i])
  print('2. Support Vector Machines:')
  get_hyperparams_svm(X_trains[i], X_trains[i], y_trains[i], y_trains[i])
  print('3. Random Forest:')
  get_hyperparams_rfr(X_trains[i], X_trains[i], y_trains[i], y_trains[i])



MODEL: byols
1. Logistic Regression:
Accuracy : 0.302803738317757
Best Parameters: {'C': 0.001, 'penalty': 'l2', 'solver': 'liblinear'}
Accuracy on test_set: 0.9775700934579439
2. Support Vector Machines:
Accuracy: 0.308411214953271
Best Parameters {'C': 0.01, 'degree': 1, 'gamma': 0.1, 'kernel': 'poly'}
Accuracy on test_set: 0.9757009345794393
3. Random Forest:
Accuracy: 0.5541684524885591
Best Parameters {'bootstrap': True, 'max_features': 'auto', 'n_estimators': 100}
Accuracy on test_set: 0.9398802322093553


MODEL: egemaps
1. Logistic Regression:
Accuracy : 0.27850467289719627
Best Parameters: {'C': 1.0, 'penalty': 'l2', 'solver': 'liblinear'}
Accuracy on test_set: 0.9121495327102803
2. Support Vector Machines:
Accuracy: 0.2953271028037383
Best Parameters {'C': 0.01, 'degree': 1, 'gamma': 10.0, 'kernel': 'poly'}
Accuracy on test_set: 0.8710280373831776
3. Random Forest:
Accuracy: 0.6511260030077498
Best Parameters {'bootstrap': False, 'max_features': 'sqrt', 'n_estimators': 200}
