VOICE SEPARATION

Paper: Improving Voice Separation by Incorporating End-To-End Speech Recognition

Samples

ConvTasNet	Oracle(With Estimated Input)	Oracle(With Target Input)
Target1	Target1	Target1
Target2	Target2	Target2
Estimated1	Estimated1	Estimated1
Estimated2	Estimated2	Estimated2
Mixture	Mixture	Mixture

Trained Model

ConvTasNet_Model
Oracle_Model
ASR_Model

Downloading the dataset

Install the packages axel, youtube-dl, parallel by using the following commands -

apt-get install axel, youtube-dl, parallel

Install the requirements.txt file by

pip install -r requirements.txt

Download the csv files containing youtube-id of the video

Train CSV
Test CSV

Run the shell script getDataset.sh present in preprocessing as

cd preprocessing
njobs=<num-parallel-download-threads> numdownload=<num-files-to-download> ./getDataset.sh <path-to-csv-file> <output-directory-mp3> <output-directory-wav>

For example -

njobs=20 numdownload=1000000 ./getDataset.sh avspeech_test.csv test_mp3 test_wav
njobs=20 numdownload=1000000 ./getDataset.sh avspeech_train.csv train_mp3 train_wav

Since you may not want to download the entire dataset, you can set the number of audio files you want to download using the numdownload argument.

ConvTasNet Training

Inside the ConvTasNet directory, set the config.py variables

Set config.dataSetPath['train'] -> Absolute path of where your train_wav folder is present
Set config.dataSetPath['test'] -> Absolute path of where your test_wav folder is present
Set config.basePath -> '<Path-To-Store-Experiment-Data>/'+str(datetime.now())

Training:

cd ConvTasNet
python main.py train

Testing:

cd ConvTasNet
python main.py test --modelpath "Path to trained model"

ASR

Inside the directory ETESpeechRecognition, set the config.py variables

Set config.path_to_download -> Absolute path of where you want to download the LibriSpeech dataset
Set config.base_model_path -> Absolute path of where you want to save the trained model
Set config.cache_dir -> Absolute path of where you want to save the unigram model, etc

Download the dataset

cd ETESpeechRecognition
python downloadDataSet.py

Pre-process

python main.py genunigram

Training:

python main.py train

Testing:

python main.py test

Oracle Training

Coming Soon

Iterative Training

Coming Soon

Results

Automatic Speech Recognition

CER	CTC Loss	Attention Loss	Avg Loss
0.5668	78.1625	49.1855	57.8786

Speech Separation

Method	SI-SNR
ConvTasNet	9.699
Oracle	13.483
Iterative	10.781

Plots

Credits

For downloading the AVSpeech dataset, the code was modified to download only mp3 with some additional features from the repository, https://github.com/changil/avspeech-downloader.

For training the ASR, the code was modified from the repository, https://github.com/mayank-git-hub/ETE-Speech-Recognition.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConvTasNet

ConvTasNet

ETESpeechRecognition

ETESpeechRecognition

Oracle

Oracle

preprocessing

preprocessing

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

VOICE SEPARATION

Samples

Trained Model

Downloading the dataset

ConvTasNet Training

ASR

Oracle Training

Iterative Training

Results

Automatic Speech Recognition

Speech Separation

Plots

Credits

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
ConvTasNet		ConvTasNet
ETESpeechRecognition		ETESpeechRecognition
Oracle		Oracle
preprocessing		preprocessing
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

pragyak412/Improving-Voice-Separation-by-Incorporating-End-To-End-Speech-Recognition

Folders and files

Latest commit

History

Repository files navigation

VOICE SEPARATION

Samples

Trained Model

Downloading the dataset

ConvTasNet Training

ASR

Oracle Training

Iterative Training

Results

Automatic Speech Recognition

Speech Separation

Plots

Credits

About

Topics

Resources

Stars

Watchers

Forks

Languages