# GMM-UBM Modelling

![title](docs/img/GMM-UBM.png)


To extract unbiased measures of convergence, we used a data-driven, text-independent, automatic speaker identification technique, based on Gaussian Markov Modeling (GMM) Universal Background Model
(UBM). The Gaussian components model the underlying broad phonetic features (i.e., MFCCs) that characterize a speaker's voice and are based on a well-understood statistical model. 

A 32-component UBM was trained with the pooled Solo_Pre speech data of all the participants. Then, individual speaker-dependent models were obtained via maximum a posteriori (MAP) adaptation of the UBMs to the Solo_Pre speech data of each speaker separately. The GMM-UBM has multiple hyperparameters and different settings of these hyperparameters can affect the performance of speaker-dependent models.

A cross-validation technique was used to choose the optimum hyper-parameter settings. Solo_Post speech data were used as a validation set, and each speaker-dependent model's performance was verified against the UBM model

## Run the notebook

In [22]:
# init
import numpy as np
import os
import sidekit
import warnings
warnings.filterwarnings('ignore')
import gmm_scoring_singleThread
from sidekit.bosaris import Scores
from IPython.display import clear_output

#read from wav folder, extract mfcc and save it on audio_features folder. train ubm and speaker 
#model, save it on data folder. finally get the prediction score and save it on data folder. 
#use HDFCompass.exe (https://support.hdfgroup.org/projects/compass/download.html) 
#to see whats inside the .h5 files and modify accordingly.

In [23]:
# load wav files and extract features and save it to folder
audioDir = 'wav'
featureDir = 'audio_features'

speakers = np.array(["elvira", "marion"])


# load wav files
fileList = os.listdir(audioDir)
for i in range(0,len(fileList)):
    fileList[i] = fileList[i].replace(".wav", "")
    print(fileList[i])



# feature extraction configuration (read from fileList and save mfcc features in 
# audio_features folder)
extractor = sidekit.FeaturesExtractor(audio_filename_structure=audioDir+"/{}.wav",
                                      feature_filename_structure="./audio_features/{}.h5",
                                      sampling_frequency=22050,
                                      lower_frequency=0,
                                      higher_frequency=6955.4976,
                                      filter_bank="log",
                                      filter_bank_size=40,
                                      window_size=0.025,
                                      shift=0.01,
                                      ceps_number=6,
                                      pre_emphasis=0.97,
                                      save_param=["energy", "cep"],
                                      keep_all_features=True)


for i in range(0,len(fileList)):
    a = './audio_features/' + fileList[i] +'.h5'
    try:
        os.remove(a)
    except OSError:
        pass
    extractor.save(fileList[i])
    
clear_output()

### GMM-UBM (Trainnig)

file labels for traning validation and testing
train = *-pre , 
validation  = *-post , 
test  = *-duet

In [24]:
#create list of files for UBM traning and save it in data/UBM_List.txt
ubmList =  []
for i in range(0,len(fileList)):
    a = fileList[i].replace('.h5', '')
    b = fileList[i].split('-')
    if b[1][1]== 'r':#pre or train
        ubmList.append(a)
        
with open('data/UBM_List.txt','w') as of:
    of.write("\n".join(ubmList))   

In [25]:
#create traning files and modify them to sidekit file format (i.e. IdMap)
train_subjects_models = []
train_audio_files = []
for i in range(0,len(fileList)):
    a = fileList[i].replace('.h5', '')
    b = fileList[i].split('-')
    if b[1][1]== 'r':#pre or train
        train_subjects_models.append(b[0])
        train_audio_files.append(a)
        
    
# Create and fill the IdMap
train_idmap = sidekit.IdMap()
train_idmap.leftids = np.asarray(train_subjects_models)
train_idmap.rightids = np.asarray(train_audio_files)
train_idmap.start = np.empty(train_idmap.rightids.shape, '|O')
train_idmap.stop = np.empty(train_idmap.rightids.shape, '|O')
train_idmap.validate()
train_idmap.write('data/train.h5')

In [26]:
#create validation files and modify them to sidekit file format (i.e. IdMap, key)
validate_audio_files = []
validate_subjects_models = []
for i in range(0,len(fileList)):
    a = fileList[i].replace('.h5', '')
    b = fileList[i].split('-')
    if b[1][1]== 'o':#post
        validate_subjects_models.append(b[0])  
        validate_audio_files.append(a)
        
        
ndx = sidekit.Ndx()
ndx.modelset = speakers
ndx.segset = np.array(validate_audio_files)
ndx.trialmask = np.ones((len(speakers),len(ndx.segset)), dtype='bool')
ndx.validate()           
ndx.write('data/validation.h5')

segments = [i.split('-', 1)[0] for i in ndx.segset]
seg_index = np.arange(0,len(ndx.segset))

key = sidekit.Key()
key.modelset = ndx.modelset
key.segset = ndx.segset
key.tar = np.zeros((len(speakers),len(ndx.segset)), dtype='bool')
key.non = np.zeros((len(speakers),len(ndx.segset)), dtype='bool')
for s in range(0,len(key.modelset)):
    a = [i for i, x in enumerate(segments) if x == key.modelset[s]]
    a = np.asarray(a)
    key.tar[s, a] = True
    b = np.delete(seg_index, a)    
    key.non[s, b] = True

key.validate()
key.write('data/validation_key.h5')    


In [27]:
## Create a FeaturesServer to load features and feed the other methods
features_server = sidekit.FeaturesServer(features_extractor=None,
                                         feature_filename_structure="./audio_features/{}.h5",
                                         sources=None,
                                         dataset_list=["energy", "cep"],
                                         mask="[0-5]",
                                         feat_norm="cmvn",
                                         global_cmvn=None,
                                         dct_pca=False,
                                         dct_pca_config=None,
                                         sdc=False,
                                         sdc_config=None,
                                         delta=True,
                                         double_delta=True,
                                         delta_filter=None,
                                         context=None,
                                         traps_dct_nb=None,
                                         rasta=True,
                                         keep_all_features=True)

In [28]:
# load train, validation, validation_key, UBM files for GMM-UBM traning
train_idmap = sidekit.IdMap('data/train.h5')
validation_ndx = sidekit.Ndx('data/validation.h5')
key = sidekit.Key('data/validation_key.h5')
with open('data/UBM_List.txt') as inputFile:
        ubmList = inputFile.read().split('\n')

### GMM-UBM (Testing)

In [29]:
# (hyperparameters)
distribNb = 32 # no of GMM components 
regulation_factor = 3  # MAP regulation factor 

In [30]:
# train UBM
ubm = sidekit.Mixture()
llk = ubm.EM_split(features_server, ubmList, distribNb, save_partial=False)
ubm.write('data/ubm_model.h5')
#ubm.read('data/ubm_model.h5')


In [31]:
# this is a workaround because the above process gets stuck sometime
# sidekit uses multiprocess so some time it hangs (need to search a properway to kill 
# all the python spawned process)
ubm = sidekit.Mixture()
ubm.read('data/ubm_model.h5')

In [32]:
# train speaker model (Adapt the GMM speaker models from the UBM via a MAP adaptation)
enroll_stat = sidekit.StatServer(train_idmap,distrib_nb=distribNb,feature_size=18)
enroll_stat.accumulate_stat(ubm=ubm,feature_server=features_server, seg_indices=range(enroll_stat.segset.shape[0]))
enroll_stat.write('data/stat_enroll.h5')

# 
enroll_sv = enroll_stat.adapt_mean_map_multisession(ubm, regulation_factor)
enroll_sv.write('data/sv_enroll.h5')
#enroll_sv.read('data/sv_enroll.h5')


In [33]:
# predict on the test data and save it on scores.h5
s = np.zeros(validation_ndx.trialmask.shape)
los = np.array_split(np.arange(validation_ndx.segset.shape[0]),1)
for idx in los:
    gmm_scoring_singleThread.gmm_scoring_singleThread(ubm, enroll_sv, validation_ndx, features_server, s,idx)


score = Scores()
score.scoremat = s
score.modelset = validation_ndx.modelset
score.segset = validation_ndx.segset
score.scoremask = validation_ndx.trialmask
score.write('data/scores.h5')   

clear_output()