Skip to content

schufo/tisms

Repository files navigation

Joint phoneme alignment and text-informed speech separation on highly corrupted speech

Here you find the code to reproduce the experiments of the paper "Joint phoneme alignment and text-informed speech separation on highly corrupted speech" by Kilian Schulze-Forster, Clement S. J. Doire, Gaël Richard, Roland Badeau. Accepted at IEEE International Conference on Audio, Speech, and Signal Processing, 2020.

The paper and audio examples are available here

Download

Clone the repository to your machine:

git clone https://github.com/schufo/tisms.git

Make sure that your working directory is tisms/ for all steps described below.

Virtual Environment

The project was done in a conda environment with python 3.6. You can create one with the following command:

conda create -n tisms_env python=3.6

Activate the environment:

source activate tisms_env

Then install pytorch. I was using version 1.1.0 but later versions should work as well (I did not test it though).

conda install pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=9.0 -c pytorch

The you can run the following command to install all other required packages with pip:

pip install -r requirements.txt

Data Preprocessing

At the bottom of the script 01_musdb_pre_processing.py enter the correct links to your MUSDB dataset and to the directory where you want to save the preprocessed MUDSB data. Then, run the following two commands:

python 01_musdb_pre_processing.py

python 02_make_timit_phoneme_vocabulary.py

In the folder 'data' you find three python files containing the data set classes for training, validation, and testing. Enter the correct path to your TIMIT dataset and to the preprocessed MUSDB data at the top of all three files.

Training

To train the Baseline (BL) model run the following from the command line

python 03_train_BL.py

With the following commands you can train the three versions of the text-informed models:

python 04_train_informed_models.py with 'tag="V1"'

python 04_train_informed_models.py with 'tag="V2"' 'side_info_encoder_bidirectional=False'

python 04_train_informed_models.py with 'tag="V3"' 'model="InformedSeparatorWithSplitAttention"'

To train the model with Optimal Attention (OA) weights run:

python 05_train_OA.py

For this project I tried the experiment tracking package "sacred". Since scripts we want to run for testing need to access configuration files by their tag name we assigned during training, we now need to run the following script to copy the config files to a folder named by their tag:

python 06_copy_configs.py

Evaluation

To evaluate the alignment provided by V1, V2, V3 run the command below. To evaluate on clean speech set the test SNR to 100, for corrupted speech set it to -5. Set the tag parameter to the model you want to evaluate.

python 07_eval_alignment.py with 'test_snr=100' 'tag="V1"'

To evaluate the separation quality in terms of SDR, SAR, SIR, STOI, PESQ, as well as PES and EPS run the following script with the respective tags:

python 08_eval_separation.py with 'tag="BL"'

The evaluation scripts save json-files with evaluation summaries for a quick preview in the evaluation folder. For more advanced analysis of the results, numpy-files with all scores for every test example and metric are also saved in the evaluation folder.

Acknowledgment

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowsa-Curie grant agreement No. 765068.

Copyright notice

Copyright 2020 Kilian Schulze-Forster of Télécom Paris, Institut Polytechnique de Paris. All rights reserved.

About

This is the code of the ICASSP 2020 paper "Joint phoneme alignment and text-informed speech separation on highly corrupted speech"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages