Skip to content

kongkip/SpeakerRecognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SpeakerRecognition

This project is a complete example of doing speaker recognition/ classification on an android device using deep learning. The model tries to distinguish the voices of the following prominent leaders; Benjamin Netanyahu, Jens Stoltenberg, Julia Gillard, Margaret Tacher and Nelson Mandela. I used this TensorFlow Example as a starter.

Dataset

The dataset can be downloaded from Kaggle. It contains the speeches of the above mentioned leaders, each audio is one second long 16000 sample rate PCM encoded.

The Model Architecture

The model is made up of audio feature extraction and neural networks. The first layer of the model does audio processing, which includes computing Spectrogram's or Melspectrograms which are built using TensorFlow Keras layers. The reason for building audio processing using keras layers was to provide easy model conversion to TFLite since the model is to be deployed to android and also to do feature computation on the GPU while training. The audio processing layers are implemented in a library called Spela.

The following architecture is a convolutional 2D layer followed by a maximum pooling layer, then a flattening layer. Last layer is a Dense layer with a softmax function.

Training

The model is trained by running the following script

python train.py -data_dirs 16000_pcm_speeches

where the 16000_pcm_speeches is the dataset. While training model accuracy is computed then display a confusion matrix at the end of one epoch.

The confusion matrix looks like

Training creates a checkpoint folder and saves best model weights.

Model Conversion

Saved weights are loaded and model converted to TFLite using the following script

python reconstruct_model_and_convert.py -checkpoint_dir ml/checkpoints/spectrogram_model/20200110-124824/

Android Demo

The android app is built using Kotlin language.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages