PAAPLoss

To be present at ICASSP 2023.

Title: PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement

Prerequisites

pip install -r requirements.txt

Datasets

Please follow https://github.com/microsoft/DNS-Challenge/tree/interspeech2020/master to download the DNS Interspeech 2020 dataset.
Edit paths in noisyspeech_synthesizer.cfg and run noisyspeech_synthesizer_multiprocessing.py to generate your train (and validation) data.

Most likely, you will not want to change the other parameters in .cfg for the train data, and then you will get 12,000 synthesized audios. You may change the fileindex_end in the .cfg to have a small set of validation data.

You can also manually change num_train_files in conf/ to adjust the number of train audios in use.
Edit paths in conf/ to make it consistent to your folders that contains the data.

Usage

(Optional) Train the Acoustic estimator (or use the pretrained ones).

Generating the acoustic feature for the first time could be slow and take up some space.
```
python train_est.py estimator=acoustic
```
Prepare the json list of the train/valid/test data.
```
bash make_dns.sh
```
Finetune the enhancement model (only support Demucs / FullSubNet so far). The pretrained model checkpoints can be downloaded at the original authors' repositories.
```
python train.py finetune=demucs
```
or
```
python train.py finetune=fullsubnet
```
By default it takes up all of the available GPUs.
This objective function can also be used at arbitrary model by using the pretrained acoustic estimator.

Related Resources

More details about the official implementation of TAPLoss: A Temporal Acoustic Parameter Loss For Speech Enhancement can be found at https://github.com/YunyangZeng/TAPLoss.

Acknowledgement

Some of the model architectures are adapted from the original Demucs and FullSubNet repos. The phonetic aligner is adapted from here. Thanks all the authors for open sourcing!

Citation

Welcome to cite our paper if you find our code or paper useful for your research!

@article{yang2023paaploss,
  title={PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement},
  author={Yang, Muqiao and Konan, Joseph and Bick, David and Zeng, Yunyang and Han, Shuo and Kumar, Anurag and Watanabe, Shinji and Raj, Bhiksha},
  journal={arXiv preprint arXiv:2302.08095},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
charsiu		charsiu
conf		conf
datasets		datasets
enh		enh
pretrained		pretrained
trainer		trainer
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
make_dns.sh		make_dns.sh
requirements.txt		requirements.txt
train.py		train.py
train_est.py		train_est.py

License

muqiaoy/PAAP

Folders and files

Latest commit

History

Repository files navigation

PAAPLoss

Prerequisites

Datasets

Usage

Related Resources

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Languages