Skip to content

mohamad-hasan-sohan-ajini/VoxSRC-20

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The VoxCeleb Speaker Recognition Challenge 2020 (VoxSRC-20)

Implementation of several loss functions and models to train speaker recognition model on VoxCeleb dataset.

Data Preparation

Dev csv

Extract the data and create a csv formated file as follows:

ID0 /full/path/to/wav
ID1 /full/path/to/wav
...

Eval csv

Eval csv is as the following format:

1 /full/path/to/spk_x/wav1 /full/path/to/spk_x/wav2
0 /full/path/to/spk_x/wav1 /full/path/to/spk_y/wav1
...

Train With Your Custom Setup

The code is modular such that one could combine desired trunk model and polling layer, then train the network with desired criterion:

python3.8 trainer.py --csv-path /path/to/csv --trunk-net resnet --lr 0.003 --batch-size 64 --polling-net tap --criterion cosface --m 0.1 --s 20 --criterion-lr 0.001

Take a look at opts.py to see the full options.

Trunk Models

resnet34
resnet34se
TDS

Polling Layers

tap
sap

Criterions

cosface
psge2e*
protypical

*psge2e (pseudo ge2e loss): Despite the original version, it learns the speakers representations.

This repo is under heavy construction and the lists will be grown.

TODO

  • Add feature backend option: [librosa | torchaudio]
  • Add feature backend opiton: [CPU | GPU]

About

The VoxCeleb Speaker Recognition Challenge 2020

Resources

Stars

Watchers

Forks

Packages

No packages published