TF-Codec: Latent-Domain Predictive Neural Speech Coding

Official implementation of the non-predictive version of the paper Latent-Domain Predictive Neural Speech Coding.

Prerequisites

Python 3.10 and conda, get Conda
CUDA 12.5 (other versions may also work. Make sure the CUDA version matches with pytorch.)
pytorch 2.5 (We have tested that pytorch-2.5 works. Other versions may also work.)

Environment

conda create -n $YOUR_PY_ENV_NAME python=3.10
conda activate $YOUR_PY_ENV_NAME
pip install -r requirements.txt

Pretrained models

Download our pretrained models and put them into ./checkpoints folder. Both the generator and discriminator weights are saved in the pretrained model ckpt.

Training

Put your training and validation data (Multilingual_train.mdb and Multilingual_val.mdb in LMDB format) in ./training_data folder:

Stage-1 without adversarial training:

 python multiprocess_caller.py --nproc_per_node=4 --nnodes=1 --num_workers=2 --train_data_dir=training_data/Multilingual_train.mdb --val_data_dir=training_data/Multilingual_val.mdb --train_dir=job_tfcodec_stage1 --config=configs/tfcodec_config_train_stage1.yaml

Stage-2 finetuning from stage-1 checkpoints (./checkpoints/model_stage1.ckpt) with adversarial training:

python multiprocess_caller.py --nproc_per_node=4 --nnodes=1 --num_workers=2 --train_data_dir=training_data/Multilingual_train.mdb --val_data_dir=training_data/Multilingual_val.mdb --train_dir=job_tfcodec_stage2 --config=configs/tfcodec_config_train_stage2.yaml --checkpoint_path=checkpoints/model_stage1.ckpt

Testing

Example to test pretrained models:

 python inf.py --audio_path=<input audio> --model_path=checkpoints/tfcodec_path/tfcodec_6k_514000.ckpt --config_path=configs/tfcodec_config_6k.yaml --output_path=<output audio>
 python inf.py --audio_path=<input audio> --model_path=checkpoints/tfcodec_path/tfcodec_1k_545000.ckpt --config_path=configs/tfcodec_config_1k.yaml --output_path=<output audio>

Only 16khz speech in .wav is supported currently. This version only provides encoding, quantization to token indices, and decoding modules. External huffman coding tools are needed to encode quantized token indices to a bitstream.

Citation

If you find this work useful for your research, please cite:

@article{Jiang2023tfcodec,
  title={Latent-Domain Predictive Neural Speech Coding},
  author={Xue Jiang and Xiulian Peng and Huaying Xue and Yuan Zhang and Yan Lu},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  volume={31},
  year={2023}
}

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
assets		assets
configs		configs
dataload		dataload
losses		losses
models		models
optim		optim
utils		utils
Azure_pipelines.yml		Azure_pipelines.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
inf.py		inf.py
log.py		log.py
loss_caller.py		loss_caller.py
model_caller.py		model_caller.py
multiprocess_caller.py		multiprocess_caller.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TF-Codec: Latent-Domain Predictive Neural Speech Coding

Prerequisites

Pretrained models

Training

Testing

Citation

Trademarks

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TF-Codec: Latent-Domain Predictive Neural Speech Coding

Prerequisites

Pretrained models

Training

Testing

Citation

Trademarks

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages