Fully Quantized Speech Enhancement (FQSE)

Paper (InterSpeech 2023)

https://www.isca-archive.org/interspeech_2023/cohen23_interspeech.html

Abstract

Deep learning models have shown state-of-the-art results in speech enhancement. However, deploying such models on an eight-bit integer-only device is challenging. In this work, we analyze the gaps in deploying a vanilla quantization-aware training method for speech enhancement, revealing two significant observations. First, quantization mainly affects signals with a high input Signal-to-Noise Ratio (SNR). Second, quantizing the model's input and output shows major performance degradation. Based on our analysis, we propose Fully Quantized Speech Enhancement (FQSE), a new quantization-aware training method that closes these gaps and enables eight-bit integer-only quantization. FQSE introduces data augmentation to mitigate the quantization effect on high SNR. Additionally, we add an input splitter and a residual quantization block to the model to overcome the error of the input-output quantization. We show that FQSE closes the performance gaps induced by eight-bit quantization.

Librimix Dataset

LibriMix is an open source dataset for source separation in noisy environments. It is derived from LibriSpeech signals (clean subset) and WHAM noise. It offers a free alternative to the WHAM dataset and complements it. It will also enable cross-dataset experiments. Please refer to Librimix for more information.

Installation

pip install requirements.txt

Quantization Aware Training (QAT)

Generate Librimix dataset according to Librimix. Please use Libri2Mix 16kHz and 'min' version of the dataset.
Download the pre-trained model: https://huggingface.co/JorisCos/ConvTasNet_Libri1Mix_enhsingle_16k/blob/main/pytorch_model.bin
Edit the configuration file (YML) configs/convtasnet_16k_fqse.yml.
- Set work_dir: /home/user-name
- Set dataset csv folders generated by step 1 as follows:
  - dataset->train_dir: /your-librimix-path/data/wav16k/min/train-360
  - dataset->valid_dir: /your-librimix-path/data/wav16k/min/dev
- Set pre-trained model path in training->pretrained: pytorch_model.bin
Note: The convtasnet_16k_fqse.yml configuration is our QAT 8bit (FQSE) as in the paper.

Run train.py:

python train.py -y configs/convtasnet_16k_fqse.yml

Validation

Edit the configuration file (YML) configs/convtasnet_16k.yml:
- Set dataset csv folder generated by step 1 as follows:
  - dataset->test_dir: /your-librimix-path/data/wav16k/min/test
- Set the model path in model->model_path: trained_model.pth

Run val.py:

python val.py -y configs/convtasnet_16k.yml

SI-SNR benchmark on Librimix

Network	Float	Vanilla QAT 8bit	FQSE 8bit
ConvTasNet [1]	14.74	14.42	14.77

Inference

Run infer.py on noisy speech:

python infer.py -y configs/convtasnet_16k_fqse.yml -a samples/speech/test_1spk_noisy_1.wav

Export

Run export.py:

python export.py -y configs/convtasnet_16k_fqse.yml --torchscript --onnx

Citation

If you find this project useful in your research, please consider cite:

@inproceedings{cohen23_interspeech,
  author={Elad Cohen and Hai Victor Habi and Arnon Netzer},
  title={{Towards Fully Quantized Neural Networks For Speech Enhancement}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
  pages={181--185},
  doi={10.21437/Interspeech.2023-883}
}

Thanks

https://github.com/asteroid-team/asteroid

References

[1] https://arxiv.org/pdf/1809.07454.pdf

Name	Name	Last commit message	Last commit date
Latest commit eladco add link to paper Feb 15, 2024 9d3c364 · Feb 15, 2024 History 2 Commits
configs	configs	Initial commit	May 28, 2023
figures	figures	Initial commit	May 28, 2023
quantization	quantization	Initial commit	May 28, 2023
samples/speech	samples/speech	Initial commit	May 28, 2023
train_env/asteroid_librimix	train_env/asteroid_librimix	Initial commit	May 28, 2023
trained_models	trained_models	Initial commit	May 28, 2023
.gitignore	.gitignore	Initial commit	May 28, 2023
LICENSE.txt	LICENSE.txt	Initial commit	May 28, 2023
README.md	README.md	add link to paper	Feb 15, 2024
export.py	export.py	Initial commit	May 28, 2023
infer.py	infer.py	Initial commit	May 28, 2023
process.py	process.py	Initial commit	May 28, 2023
requirements.txt	requirements.txt	Initial commit	May 28, 2023
train.py	train.py	Initial commit	May 28, 2023
utils.py	utils.py	Initial commit	May 28, 2023
val.py	val.py	Initial commit	May 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fully Quantized Speech Enhancement (FQSE)

Paper (InterSpeech 2023)

Abstract

Librimix Dataset

Installation

Quantization Aware Training (QAT)

Validation

SI-SNR benchmark on Librimix

Inference

Export

Citation

Thanks

References

About

Releases

Packages

Languages

License

ssi-research/FQSE

Folders and files

Latest commit

History

Repository files navigation

Fully Quantized Speech Enhancement (FQSE)

Paper (InterSpeech 2023)

Abstract

Librimix Dataset

Installation

Quantization Aware Training (QAT)

Validation

SI-SNR benchmark on Librimix

Inference

Export

Citation

Thanks

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages