VoiceID: An End-to-End Text-Independent Speaker Verification

Introduction:

VoiceID is an end-to-end speaker verification system that aims to confirm the identity of a speaker by comparing the representation of the incoming test utterance/command with a learned set of speaker-dependent enrollment phrases.

LSTM baseline:

End-to-End:

Dataset:

In this project, we are introducing a new limited vocabulary spoken commands dataset which we named 'PS60k'. PS60k has total 60K utterances of 'Hey Siri' and 'Hey Portal' spoken commands. We recorded 60 speakers, 20 from each nationality - China, India and the United States of America. Each speaker speaks 20 different utterences (10 Hey Siri commands & 10 Hey Portal commands). These total 1200 original recordings are merged with 10 different types of noises at 5 SNR (-5, 0, 10, 15, 25) levels. PS60k dataset is available on request. To request the dataset please contact Piyush Vyas at piyush@iu.edu

Code distribution:

VoiceId repository has three python files.

gen_data.py: This file generate the PS60k dataset from original speaker utterances and 10 noise recordings.
baseline.py: This file contains the complete source code for baseline LSTM system, training and testing baseline model.
e2e.py: This file contains the complete source code for proposed end-to-end system, training and testing the end-to-end model.

Basic requirements:

python >= 3.7.4
librosa
libsndfile
audioread
sklearn
pytorch=1.3.0
cudatoolkit=10.1
numpy

Note: To train the model, you will need at least 16GB GPU memory. The written code supports multi-GPU training, but all GPUs should be on same node.

How to run:

Generate data:
i. Clone VoiceID reposity.
ii. Request original dataset from the author.
iii. Download and store the dataset inside the cloned repository at the same level where source code is present.
iv. Make a new directory by the name of PS60k.
v. Run gen_data.py to generate the PS60k dataset.
Train and Test baseline LSTM model:
i. If PS60k dataset is ready, then run baseline.py code to train and test the baseline LSTM model.
Train and Test propose end-to-end model:
i. If PS60k dataset is ready, then run e2e.py code to train and test the proposed end-to-end model.

References:

E. Marchi, S. Shum, K. Hwang, S. Kajarekar, S. Sigtia, H. Richards, R. Haynes, Y. Kim, and J. Bridle, “Generalised dis-criminative transform via curriculum learning for speaker recognition,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP- 2018)
J. W. Jung, H.-S. Heo, J.-h. Kim, H.-J. Shim, and H.-J. Yu, “RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification,” (INTERSPEECH-2019)

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
images		images
LICENSE		LICENSE
README.md		README.md
baseline.py		baseline.py
e2e.py		e2e.py
gen_data.py		gen_data.py
paper.pdf		paper.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VoiceID: An End-to-End Text-Independent Speaker Verification

Introduction:

LSTM baseline:

End-to-End:

Dataset:

Code distribution:

Basic requirements:

Note: To train the model, you will need at least 16GB GPU memory. The written code supports multi-GPU training, but all GPUs should be on same node.

How to run:

References:

About

Releases

Packages

Languages

License

pyyush/VoiceID

Folders and files

Latest commit

History

Repository files navigation

VoiceID: An End-to-End Text-Independent Speaker Verification

Introduction:

LSTM baseline:

End-to-End:

Dataset:

Code distribution:

Basic requirements:

Note: To train the model, you will need at least 16GB GPU memory. The written code supports multi-GPU training, but all GPUs should be on same node.

How to run:

References:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages