Generic-Speaker-Verificator

This repository has been updated to: https://github.com/julik43/Online-Identification-of-New-Speakers

To use this model you need:

Download a database

For this work the database used was LibriSpeech available in: http://www.openslr.org/12/
To run any of the models

You need four files:
- train_speakers.txt --> a list of audio files for training, with the proper path
- valid_speakers.txt --> a list of audio files for validation, with the proper path
- test_speakers.txt --> a list of audio files for test, with the proper path

To replicate the results found

Update the correct path to the files you are going to use, LibriSpeech or VoxCeleb.

For this project, VoxCeleb database was changed fromo m4a format to flac format using the bash code "change_m4a_to_flac.sh".

Note: it is important to mention that this code localize the audios of the third level of folders from the path given.

ResNet trained with 32 frequencies and Librispeech

Remember to update the path of the audios you are working with in the files of train, validation and test.
Be sure that the parameters of the resnet_initialization.py are configured as follows:

FILE_TRAIN = 'librispeech_train_speakers_initialization'

FILE_VALID = 'librispeech_valid_speakers.txt'

FILE_TEST = 'librispeech_test_speakers.txt'

FILE_LIST = 'librispeech_list_train_speakers'

permited_speakers = 100

WINDOW = 1*16000

MS = 1.0/16000

NPERSEG = int(0.025/MS)

NOVERLAP = int(0.015/MS)

NFFT =NPERSEG

SIZE_FFT = 32

TOTAL_DATA_TRAIN = 12277*2

TOTAL_DATA_VALID = TOTAL_DATA_TRAIN*0.1

TOTAL_DATA_TEST = TOTAL_DATA_TRAIN*0.1

VAD = 0.05

Be sure that the parameters of the resnet.py are configured as follows:

FILE_TRAIN = 'librispeech_train_speakers'

FILE_VALID = 'librispeech_valid_speakers.txt'

FILE_TEST = 'librispeech_test_speakers.txt'

FILE_LIST = 'librispeech_list_train_speakers'

permited_speakers = 100

WINDOW = 1*16000

MS = 1.0/16000

NPERSEG = int(0.025/MS)

NOVERLAP = int(0.015/MS)

NFFT =NPERSEG

SIZE_FFT = 32

TOTAL_DATA_TRAIN = 800000

TOTAL_DATA_VALID = TOTAL_DATA_TRAIN*0.1

TOTAL_DATA_TEST = TOTAL_DATA_TRAIN*0.1

VAD = 0.05

Configure the path of the saver with the correct path to the resnet initialization final weights.

saver.restore(sess, "resnet_initialization.py_0.01_10/final_weights.ckpt")

Run the models.

4.1. Run the resnet initialization model --> python resnet_initialization.py

4.2. Run the resnet model --> python resnet.py --learning_rate 0.01

Note: Remember to configure the path of the saver

4.3 Run the resnet model with a smaller learning rate --> python resnet.py --learning_rate 0.001

Note: Remember to configure the path of the saver with the weights of the model of 4.2.

Results:

train: 0.95858375

validation: 0.93198125

test: 0.92392500

Module VGG trained with 32 frequencies and LibriSpeech

Remember to update the path of the audios you are working with in the files of train, validation and test.
Be sure that the parameters of the module_vgg.py are configured as follows:

FILE_TRAIN = 'librispeech_train_speakers'

FILE_VALID = 'librispeech_valid_speakers.txt'

FILE_TEST = 'librispeech_test_speakers.txt'

WINDOW = 1*16000

MS = 1.0/16000

NPERSEG = int(0.025/MS)

NOVERLAP = int(0.015/MS)

NFFT =NPERSEG

SIZE_FFT = 32

TOTAL_DATA_TRAIN = 800000

TOTAL_DATA_VALID = TOTAL_DATA_TRAIN*0.1

TOTAL_DATA_TEST = TOTAL_DATA_TRAIN*0.1

VAD = 0.05

Run the model.

3.1. Run the resnet model --> python module_vgg.py

Results:

train: 0.92258267

validation: 0.89274333

test: 0.92220000

Module VGG trained with 256 frequencies and LibriSpeech

Remember to update the path of the audios you are working with in the files of train, validation and test.
Be sure that the parameters of the module_vgg.py are configured as follows:

FILE_TRAIN = 'librispeech_train_speakers'

FILE_VALID = 'librispeech_valid_speakers.txt'

FILE_TEST = 'librispeech_test_speakers.txt'

WINDOW = 1*16000

MS = 1.0/16000

NPERSEG = int(0.025/MS)

NOVERLAP = int(0.015/MS)

NFFT =NPERSEG

SIZE_FFT = 256

TOTAL_DATA_TRAIN = 800000

TOTAL_DATA_VALID = TOTAL_DATA_TRAIN*0.1

TOTAL_DATA_TEST = TOTAL_DATA_TRAIN*0.1

VAD = 0.05

Run the model.

3.1. Run the resnet model --> python module_vgg.py

Results:

train: 0.93572700

validation: 0.89579833

test: 0.91178750

Module VGG trained with 32 frequencies and VoxCeleb

Remember to update the path of the audios you are working with in the files of train, validation and test.
Be sure that the parameters of the module_vgg.py are configured as follows:

FILE_TRAIN = 'voxceleb_train_speakers'

FILE_VALID = 'voxceleb_valid_speakers.txt'

FILE_TEST = 'voxceleb_test_speakers.txt'

WINDOW = 1*16000

MS = 1.0/16000

NPERSEG = int(0.025/MS)

NOVERLAP = int(0.015/MS)

NFFT =NPERSEG

SIZE_FFT = 32

TOTAL_DATA_TRAIN = 65642722

TOTAL_DATA_VALID = 3623722

TOTAL_DATA_TEST = 3623722

VAD = 0.08

Run the model.

3.1. Run the resnet model --> python module_vgg.py --num_epochs 8

Results:

train: 0.87228221

validation: 0.84642973

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
evaluation_model		evaluation_model
README.md		README.md
change_m4a_to_flac.sh		change_m4a_to_flac.sh
librispeech_list_train_speakers.txt		librispeech_list_train_speakers.txt
librispeech_test_speakers.txt		librispeech_test_speakers.txt
librispeech_train_speakers.txt		librispeech_train_speakers.txt
librispeech_train_speakers_initialization.txt		librispeech_train_speakers_initialization.txt
librispeech_valid_speakers.txt		librispeech_valid_speakers.txt
modulo_vgg.py		modulo_vgg.py
resnet.py		resnet.py
resnet_initialization.py		resnet_initialization.py
voxceleb_test_speakers.txt		voxceleb_test_speakers.txt
voxceleb_train_speakers.txt		voxceleb_train_speakers.txt
voxceleb_valid_speakers.txt		voxceleb_valid_speakers.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generic-Speaker-Verificator

To replicate the results found

ResNet trained with 32 frequencies and Librispeech

Module VGG trained with 32 frequencies and LibriSpeech

Module VGG trained with 256 frequencies and LibriSpeech

Module VGG trained with 32 frequencies and VoxCeleb

About

Releases

Packages

Languages

julik43/Generic-Speaker-Verificator

Folders and files

Latest commit

History

Repository files navigation

Generic-Speaker-Verificator

To replicate the results found

ResNet trained with 32 frequencies and Librispeech

Module VGG trained with 32 frequencies and LibriSpeech

Module VGG trained with 256 frequencies and LibriSpeech

Module VGG trained with 32 frequencies and VoxCeleb

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages