<a href="https://colab.research.google.com/github/satvik-dixit/speech_emotion_recognition/blob/main/EmoDB_Phase_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# EmoDB

## Importing packages

In [20]:
!pip install speechbrain
!pip install transformers
!git clone https://github.com/GasserElbanna/serab-byols.git
!python3 -m pip install -e ./serab-byols

!pip install tqdm==4.60.0
!pip install opensmile


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
fatal: destination path 'serab-byols' already exists and is not an empty directory.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Obtaining file:///content/serab-byols
Installing collected packages: serab-byols
  Attempting uninstall: serab-byols
    Found existing installation: serab-byols 0.0.0
    Can't uninstall 'serab-byols'. No files were found to uninstall.
  Running setup.py develop for serab-byols
Successfully installed serab-byols-0.0.0
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [80]:
#FEEDBACK: organizing your packages is usually a good practice
import os
import numpy as np
from tqdm import tqdm
from glob import glob
from random import sample

import librosa
import soundfile as sf

import torch
# import opensmile
import serab_byols
from transformers import Wav2Vec2Model, HubertModel

from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import GridSearchCV, train_test_split

import warnings
warnings.filterwarnings('ignore')

In [None]:
! pip install -q kaggle

from google.colab import files
files.upload()

# Name directory
! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json

# Phase 1 Functions: Loading and resampling audio files

In [2]:
# Defining function for loading and resampling audio files
def load_audio_files(audio_files, resampling_frequency=16000, audio_list=None):
    '''
    Loads and resamples audio files 
    
    Parameters
    ------------
    audio_files: List
        The paths of the wav files 
    resampling_frequency: int
        The frequency which all audios will be resampled to
    audio_list: List
        list of torch arrays of audios to which more audios need too be added, empty by default

    Returns
    ------------
    audio_list: List
        A list of torch arrays, one array for each audio file
        
    '''

    # Making audio_list
    if audio_list is None:
        audio_list = []

    # Resampling
    for audio in tqdm(audio_files):
        signal, fs = librosa.load(audio, sr=resampling_frequency)
        audio_list.append(torch.from_numpy(signal))
        
    return audio_list

# Phase 2 Functions: Embedding Extraction

## Audio Embeddings Extraction

In [39]:
def audio_embeddings_model(model_name):
    '''
      Generates model for embedding extraction 

      Parameters
      ------------
      mode_name: string
          The model to used, could be 'wav2vec', 'hubert' or 'hybrid_byols'

      Returns
      ------------
      model: object

    '''
    if model_name=='wav2vec2':
        model_hub = 'facebook/wav2vec2-large-960h-lv60-self'
        model = Wav2Vec2Model.from_pretrained(model_hub, cache_dir='/om2/user/gelbanna/huggingface/')
    elif model_name=='hubert':
        model_hub = 'facebook/hubert-xlarge-ll60k'
        model = HubertModel.from_pretrained(model_hub, cache_dir='/om2/user/gelbanna/huggingface/')
    elif model_name=='hybrid_byols':
        model_name = 'cvt'
        checkpoint_path = "/om2/user/gelbanna/serab-byols/checkpoints/cvt_s1-d1-e64_s2-d1-e256_s3-d1-e512_BYOLAs64x96-osandbyolaloss6373-e100-bs256-lr0003-rs42.pth"
        model = serab_byols.load_model(checkpoint_path, model_name)
    elif model_name=='compare':
        model = opensmile.Smile(
            feature_set=opensmile.FeatureSet.ComParE_2016,
            feature_level=opensmile.FeatureLevel.Functionals,
        )
    elif model_name=='egemaps':
        model = opensmile.Smile(
            feature_set=opensmile.FeatureSet.eGeMAPSv02,
            feature_level=opensmile.FeatureLevel.Functionals,
        )
    return model


def audio_embeddings(audio_list, model_name, model, sampling_rate=16000):
    '''
      Loads and resamples audio files 

      Parameters
      ------------
      audio_list: list of arrays
          A list of arrays, one array for each audio file
      model_name: string
          The model to used, could be 'wav2vec', 'hubert' or 'hybrid_byols'
      model: object
          The model generated by audio_embeddings_model function
      n_feats: int
          The number of features of each audio file, 6373 for 'compare' and 88 for 'egemaps'

      Returns
      ------------
      embeddings_array: torch array
          The array containg embeddings of all audio_files, dimension (number of audio files × n_feats)

    '''
    if model_name=='hybrid_byols':
        embeddings_array = serab_byols.get_scene_embeddings(audio_list, model).detach().cpu()
    else:
        embeddings_list = []
        #FEEDBACK: iterate across elements of the list instead of indices
        for audio in tqdm(audio_list):
            if model_name=='wav2vec2' or model_name=='hubert':
                #FEEDBACK: use unsqueeze to expand tensor dim
                embeddings = model(audio.unsqueeze(0).to('cuda')).last_hidden_state.mean(1)
                embeddings_list.append(embeddings.squeeze(0).detach().cpu())
            elif model_name=='compare' or model_name=='egemaps':
                embeddings = model.process_signal(audio_list[i], sampling_rate)
                embeddings_list.append(torch.tensor(embeddings.values[0], dtype=torch.float32))
        embeddings_array = torch.stack(embeddings_list)
    return embeddings_array

# Phase 3 Functions: Downstream Task - Speech Emotion Recognotion

## Speaker normalisation

In [36]:
def speaker_normalisation(embeddings_array, speakers):
    '''
      Normalises embeddings_array for each speaker

      Parameters
      ------------
      embeddings_array: torch tensor
          The tensor of embeddings, one row for each audio file
      speakers: List
          The list of speakers

      Returns
      ------------
      embeddings_array: torch tensor
          The tensor containg normalised embeddings 

    '''
    speaker_ids = set(speakers)
    scaler = StandardScaler()
    for speaker_id in speaker_ids:
        speaker_embeddings_indices = np.where(np.array(speakers)==speaker_id)[0]
        speaker_embeddings = embeddings_array[speaker_embeddings_indices,:]
        normalised_speaker_embeddings = scaler.fit_transform(speaker_embeddings)
        embeddings_array[speaker_embeddings_indices] = torch.tensor(normalised_speaker_embeddings).float()
    return embeddings_array

## Dividing into Training and Test sets

In [69]:
def split_train_test(normalised_embeddings_array, labels, speakers, test_size = 0.30):
    '''
    Splits into training and testing set with different speakers

    Parameters
    ------------
    normalised_embeddings_array: torch tensor
      The tensor containing normalised embeddings 
    labels: list of strings
      The list of emotions corresponding to audio files
    speakers: list of integers 
      The list of speakers

    Returns
    ------------
    X_train: torch tensor
    The normalised embeddings that will be used for training
    X_test: torch tensor
    The normalised embeddings that will be used for testing
    y_train: list of strings
    The labels that will be used for training
    y_test: list of strings
    The labels that will be used for testing
    '''
    np.random.seed(42)
    # 10 speakers in this dataset
    all_speakers = np.unique(speakers)
    # 3 of the 10 total speakers
    test_speakers = np.random.rand(len(all_speakers)) < test_size
    test_speakers = all_speakers[test_speakers]
    print(test_speakers)
    test_speakers_indices = []
    train_speakers_indices = []

    for speaker in all_speakers:
        if speaker in test_speakers:
            speaker_indices = np.where(np.array(speakers)==speaker)[0]
            test_speakers_indices.extend(speaker_indices)
        else:
            speaker_indices = np.where(np.array(speakers)==speaker)[0]
            train_speakers_indices.extend(speaker_indices)

    X_train = normalised_embeddings_array[train_speakers_indices]
    X_test = normalised_embeddings_array[test_speakers_indices]

    y_train = [0 for i in range(len(train_speakers_indices))]
    y_test = [0 for i in range(len(test_speakers_indices))]

    for i,index in enumerate(train_speakers_indices):
        y_train[i] = labels[index]
    for i,index in enumerate(test_speakers_indices):
        y_test[i] = labels[index]


    return X_train, X_test, y_train, y_test


In [None]:
def modified_split_train_test(normalised_embeddings_array, labels, speakers, test_size = 0.30):
    '''
    Splits into training and testing set with different speakers

    Parameters
    ------------
    normalised_embeddings_array: torch tensor
      The tensor containing normalised embeddings 
    labels: list of strings
      The list of emotions corresponding to audio files
    speakers: list of integers 
      The list of speakers

    Returns
    ------------
    X_train: torch tensor
    The normalised embeddings that will be used for training
    X_test: torch tensor
    The normalised embeddings that will be used for testing
    y_train: list of strings
    The labels that will be used for training
    y_test: list of strings
    The labels that will be used for testing
    '''
    np.random.seed(42)
    #get all speakers
    all_speakers = np.unique(speakers)
    # pick randomly the test speakers
    test_speakers_indices = np.random.rand(len(all_speakers)) < test_size
    test_speakers = all_speakers[test_speakers_indices]
    print(test_speakers)
    
    test_speakers_indices = []
    train_speakers_indices = []

    for speaker in all_speakers:
        if speaker in test_speakers:
            speaker_indices = np.where(np.array(speakers)==speaker)[0]
            test_speakers_indices.extend(speaker_indices)
        else:
            speaker_indices = np.where(np.array(speakers)==speaker)[0]
            train_speakers_indices.extend(speaker_indices)

    X_train = normalised_embeddings_array[train_speakers_indices]
    X_test = normalised_embeddings_array[test_speakers_indices]

    y_train = [0 for i in range(len(train_speakers_indices))]
    y_test = [0 for i in range(len(test_speakers_indices))]

    for i,index in enumerate(train_speakers_indices):
        y_train[i] = labels[index]
    for i,index in enumerate(test_speakers_indices):
        y_test[i] = labels[index]


    return X_train, X_test, y_train, y_test


# EmoDB

# Phase 1: Loading Dataset/Preprocessing Audios/Extracting Metadata

In [None]:
# Phase_1
# Load dataset
! kaggle datasets download -d piyushagni5/berlin-database-of-emotional-speech-emodb
! unzip berlin-database-of-emotional-speech-emodb.zip

# Resample dataset
audio_files_emo = glob(os.path.join('/content/wav','*.wav'))
audio_list_emo= load_audio_files(audio_files_emo, resampling_frequency=16000)


# Verify phase_1
print()
print('number of audio files: {}'.format(len(audio_list_emo)))
print(audio_list_emo[0].shape)


In [32]:
# Resample dataset
audio_files_emo = glob('/om2/user/gelbanna/datasets/emodb/wav/*.wav')
audio_list_emo= load_audio_files(audio_files_emo, resampling_frequency=16000)
labels = np.array(list(map(lambda x: os.path.basename(x).split('.')[0][-2], audio_files_emo)))
speakers = np.array(list(map(lambda x: os.path.basename(x)[:2], audio_files_emo)))

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 535/535 [00:00<00:00, 1294.88it/s]


In [33]:
print(f'Number of classes: {np.unique(labels).shape[0]}')
print(f'Number of speakers: {np.unique(speakers).shape[0]}')

Number of classes: 7
Number of speakers: 10


# Phase 2: Feature Extraction

In [48]:
# Phase_2

# Wav2vec
model = audio_embeddings_model(model_name='wav2vec2')
model.to('cuda')
embeddings_array_wav2vec = audio_embeddings(audio_list_emo, model_name='wav2vec2', model=model)

# Hubert
model = audio_embeddings_model(model_name='hubert')
model.to('cuda')
embeddings_array_hubert = audio_embeddings(audio_list_emo, model_name='hubert', model=model)

# Hybrid BYOLS
model = audio_embeddings_model(model_name='hybrid_byols')
model.to('cuda')
embeddings_array_byols = audio_embeddings(audio_list_emo, model_name='hybrid_byols', model=model)

# # EmoDB compare
# model = audio_embeddings_model(model_name='compare')
# embeddings_array_compare = audio_embeddings(audio_list_emo, model_name='compare', model=model)

# # EmoDB egemaps
# model = audio_embeddings_model(model_name='egemaps')
# embeddings_array_egemaps = audio_embeddings(audio_list_emo, model_name='egemaps', model=model)

# # ---------------------------------------------------------------------------------------------------

# # Verify Phase_2
models = ['wav2vec', 'hubert', 'byols']
# embeddings_arrays = [embeddings_array_byols, embeddings_array_compare, embeddings_array_egemaps]

# for i in range(len(models)):
#   print()
#   print()
#   print('MODEL: {}'.format(models[i]))
#   print()
#   print('The shape of the embeddings array is {}'.format(embeddings_arrays[i].shape))
#   print('The embeddings array is: ')
#   print((embeddings_arrays[i]))


Some weights of the model checkpoint at facebook/wav2vec2-large-960h-lv60-self were not used when initializing Wav2Vec2Model: ['lm_head.bias', 'lm_head.weight']
- This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2Model were not initialized from the model checkpoint at facebook/wav2vec2-large-960h-lv60-self and are newly initialized: ['wav2vec2.masked_spec_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|█████████████████████████████████████████████████████████████████████████████████████

In [20]:
np.save('wav2vec2_emodb_embeddings.npy', embeddings_array_wav2vec)
np.save('hubert_emodb_embeddings.npy', embeddings_array_hubert)

# Phase 3: Emotions Classification

### Speaker normalisation

In [41]:
# Phase_3: Speaker normalisation



# -------------------------------------------------------------------------------------------------------------------------

# Normalised arrays
normalised_embeddings_wav2vec = speaker_normalisation(embeddings_array_wav2vec, speakers)
normalised_embeddings_hubert = speaker_normalisation(embeddings_array_hubert, speakers)
normalised_embeddings_byols = speaker_normalisation(embeddings_array_byols, speakers)
# normalised_embeddings_compare= speaker_normalisation(embeddings_array_compare, speakers)
# normalised_embeddings_egemaps = speaker_normalisation(embeddings_array_egemaps, speakers)


# # Verifying normalised_embeddings_arrays
# normalised_embeddings_arrays = [normalised_embeddings_byols, normalised_embeddings_compare, normalised_embeddings_egemaps]

# for i in range(len(models)):
#   print()
#   print()
#   print('MODEL: {}'.format(models[i]))
#   print()
#   print('The shape of the normalised embeddings array is: {}'.format(normalised_embeddings_arrays[i].shape))
#   print('Normalised Embeddings Array:')
#   print((normalised_embeddings_arrays[i]))
#   print()
#   columnwise_mean = torch.mean(speaker_normalisation(embeddings_arrays[i], speakers), 0)
#   print('Columnwise_mean:')
#   print(columnwise_mean)
#   if torch.all(columnwise_mean < 10**(-6)):
#     print('All means are less than 10**-6')
#   else:
#     print('All means are NOT less than 10**-6')


In [58]:
import pandas as pd

embeddings = pd.DataFrame(normalised_embeddings_wav2vec)
df = pd.DataFrame({'Speaker': speakers, 'Label': labels})
df = pd.concat([df, embeddings], axis=1)
df.loc[df.Speaker == '08'].describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1014,1015,1016,1017,1018,1019,1020,1021,1022,1023
count,58.0,58.0,58.0,58.0,58.0,58.0,58.0,58.0,58.0,58.0,...,58.0,58.0,58.0,58.0,58.0,58.0,58.0,58.0,58.0,58.0
mean,-1.130433e-08,-7.450581e-09,5.138331e-10,-1.027666e-08,1.523836e-08,2.569166e-08,-1.027666e-09,-5.587935e-09,2.569166e-09,1.541499e-09,...,-7.193664e-09,-1.027666e-09,1.027666e-08,-1.644266e-08,-1.027666e-09,7.964414e-09,-1.027666e-08,-7.193664e-09,2.055333e-09,-3.339915e-09
std,1.008734,1.008734,1.008734,1.008734,1.008734,1.008734,1.008734,1.008734,1.008734,1.008734,...,1.008734,1.008734,1.008734,1.008734,1.008734,1.008734,1.008734,1.008734,1.008734,1.008734
min,-2.225183,-2.284904,-1.352574,-1.364579,-1.145783,-2.153695,-3.578148,-1.773826,-1.089894,-1.387864,...,-3.468272,-1.38282,-2.818651,-2.140071,-2.643727,-3.125862,-1.93904,-1.897121,-2.019144,-3.166881
25%,-0.6559227,-0.7229099,-0.7759885,-0.5295486,-0.6304909,-0.6809228,-0.3592648,-0.8331828,-0.6033129,-0.5477249,...,-0.1617871,-0.5668021,-0.2753242,-0.6936557,-0.3147959,-0.002680866,-0.6307088,-0.7623295,-0.6184093,0.0483322
50%,-0.1330399,0.02694504,-0.2323687,-0.1623333,-0.3341815,-0.1054178,0.2278525,0.001392073,-0.3323258,-0.2381661,...,0.2108566,-0.2854047,0.2788046,-0.1695397,0.1734242,0.2351782,-0.01582271,-0.1328953,0.1085909,0.3305354
75%,0.6217686,0.8228316,0.3601648,0.1138228,0.120829,0.5951509,0.608322,0.5735645,0.1156738,0.1532558,...,0.7016827,0.1468292,0.6567787,0.7182398,0.6205806,0.6476696,0.6834063,0.6290811,0.53928,0.5624104
max,2.902808,1.954161,2.671592,3.384115,3.204188,2.619911,2.091192,2.568441,3.062953,3.430804,...,1.324081,3.295449,1.36076,2.429671,1.735096,1.077066,2.560689,2.975641,3.258163,0.9751491


### Train Test splitting

In [None]:
models_dict = {'BYOL_S': X_train_byols, }

In [70]:
# Phase_3: Train Test splitting

X_train_wav2vec, X_test_wav2vec, y_train_wav2vec, y_test_wav2vec = split_train_test(normalised_embeddings_wav2vec, labels, speakers, test_size = 0.30)
X_train_hubert, X_test_hubert, y_train_hubert, y_test_hubert = split_train_test(normalised_embeddings_hubert, labels, speakers, test_size = 0.30)
X_train_byols, X_test_byols, y_train_byols, y_test_byols = split_train_test(normalised_embeddings_byols, labels, speakers, test_size = 0.30)
# X_train_compare, X_test_compare, y_train_compare, y_test_compare = split_train_test(normalised_embeddings_compare, labels, speakers, test_size = 0.30)
# X_train_egemaps, X_test_egemaps, y_train_egemaps, y_test_egemaps = split_train_test(normalised_embeddings_egemaps, labels, speakers, test_size = 0.30)

X_trains = [X_train_wav2vec, X_train_hubert, X_train_byols]
X_tests = [X_test_wav2vec, X_test_hubert, X_test_byols]
y_trains = [y_train_wav2vec, y_train_hubert, y_train_byols]
y_tests = [y_test_wav2vec, y_test_hubert, y_test_byols]

# # Verify
# for i in range(len(models)):
#   print()
#   print()
#   print('MODEL: {}'.format(models[i]))
#   print()
#   print('The shape of X_train is: {}'.format(X_trains[i].shape))
#   print('X_train')
#   print(X_trains[i])
#   print()
#   print('The shape of X_test is: {}'.format(X_tests[i].shape))
#   print('X_test')
#   print(X_tests[i])
#   print()
#   print('The length of y_train is: {}'.format(len(y_trains[i])))
#   print('y_train')
#   print(y_trains[i])
#   print()
#   print('The length of y_test is: {}'.format(len(y_tests[i])))
#   print('y_test')
#   print(y_tests[i])


['11' '12' '13']
['11' '12' '13']
['11' '12' '13']


## 1. Logistic Regression

Defining functions for hyperparameter tuning:

In [71]:

def get_hyperparams(X_train, X_test, y_train, y_test):
    logreg = LogisticRegression()
    parameters = {'penalty' : ['l1','l2'], 'C': np.logspace(-4,2,7), 'solver': ['newton-cg', 'lbfgs', 'liblinear']}
    grid = GridSearchCV(logreg, param_grid = parameters, cv=5, scoring='recall_macro')                     
    grid.fit(X_train,y_train)
    print('Accuracy :',grid.best_score_)
    print('Best Parameters: {}'.format(grid.best_params_))
    print('Accuracy on test_set: {}'.format(grid.score(X_test, y_test)))
    return grid.best_params_


Getting best hyperparameters and checking accuracy of the model:

In [72]:
for i in range(len(models)):
    print()
    print('MODEL: {}'.format(models[i]))
    hyperparams = get_hyperparams(X_trains[i], X_tests[i], y_trains[i], y_tests[i])


MODEL: wav2vec
Accuracy : 0.76695852184574
Best Parameters: {'C': 0.1, 'penalty': 'l2', 'solver': 'liblinear'}
Accuracy on test_set: 0.7697487295313382

MODEL: hubert
Accuracy : 0.9218635332921048
Best Parameters: {'C': 0.1, 'penalty': 'l2', 'solver': 'liblinear'}
Accuracy on test_set: 0.8796657793552204

MODEL: byols
Accuracy : 0.9020634920634922
Best Parameters: {'C': 0.01, 'penalty': 'l2', 'solver': 'liblinear'}
Accuracy on test_set: 0.769899975800597


## 2. Support Vector Machines

Hyperparameter Tuning:

In [73]:
def get_hyperparams_svm(X_train, X_test, y_train, y_test):
    svm = SVC()
    parameters = {'C': np.logspace(-2,3,6), 'gamma': np.logspace(-5,2,8), 'degree':[1], 'kernel':['rbf','poly','sigmoid','linear']}
    grid = GridSearchCV(svm, param_grid = parameters, cv=5)                     
    grid.fit(X_train, y_train)
    print('Accuracy:',grid.best_score_)
    print('Best Parameters {}'.format(grid.best_params_))
    print('Accuracy on test_set: {}'.format(grid.score(X_test, y_test)))
    return grid.best_params_

Getting best hyperparameters and checking accuracy:

In [74]:
for i in range(len(models)):
    print()
    print('MODEL: {}'.format(models[i]))
    hyperparams = get_hyperparams_svm(X_trains[i], X_tests[i], y_trains[i], y_tests[i])


MODEL: wav2vec
Accuracy: 0.6903622693096377
Best Parameters {'C': 100.0, 'degree': 1, 'gamma': 0.0001, 'kernel': 'rbf'}
Accuracy on test_set: 0.7350993377483444

MODEL: hubert
Accuracy: 0.924606971975393
Best Parameters {'C': 100.0, 'degree': 1, 'gamma': 0.0001, 'kernel': 'rbf'}
Accuracy on test_set: 0.9271523178807947

MODEL: byols
Accuracy: 0.8878673957621326
Best Parameters {'C': 1.0, 'degree': 1, 'gamma': 0.001, 'kernel': 'sigmoid'}
Accuracy on test_set: 0.8609271523178808


## 3. Random Forrest Regression

Defining functions for hyperparameter tuning:

In [84]:
def get_hyperparams_rfr(X_train, X_test, y_train, y_test):
    le = LabelEncoder()
    # le.fit(labels)
    y_train = le.fit_transform(y_train)
    y_test = le.fit_transform(y_test)

    rfr = RandomForestRegressor()
    parameters = {'n_estimators' : [50,100,200], 'max_features' : ['auto', 'log2', 'sqrt'], 'bootstrap' : [True, False]}
    grid = GridSearchCV(rfr, param_grid = parameters, cv = 5)                     
    grid.fit(X_train, y_train)
    print('Accuracy:',grid.best_score_)
    print('Best Parameters {}'.format(grid.best_params_))
    print('Accuracy on test_set: {}'.format(grid.score(X_test, y_test)))
    return grid.best_params_

Getting best hyperparameters and checking accuracy of the model:

In [85]:
for i in range(len(models)):
    print()
    print('MODEL: {}'.format(models[i]))
    hyperparams = get_hyperparams_rfr(X_trains[i], X_tests[i], y_trains[i], y_tests[i])  


MODEL: wav2vec
Accuracy: 0.10358126457749048
Best Parameters {'bootstrap': True, 'max_features': 'auto', 'n_estimators': 50}
Accuracy on test_set: 0.0750304361744698

MODEL: hubert
Accuracy: 0.4176771212522422
Best Parameters {'bootstrap': False, 'max_features': 'sqrt', 'n_estimators': 200}
Accuracy on test_set: 0.4825711051920767

MODEL: byols
Accuracy: 0.5073069256859946
Best Parameters {'bootstrap': True, 'max_features': 'auto', 'n_estimators': 100}
Accuracy on test_set: 0.4565980402160863
