GitHub - he-shuwei/NATSpeech: A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)

NATSpeech: A Non-Autoregressive Text-to-Speech Framework

This repo contains official PyTorch implementation of:

PortaSpeech: Portable and High-Quality Generative Text-to-Speech (NeurIPS 2021)
Demo page | HuggingFace🤗 Demo
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (DiffSpeech) (AAAI 2022)
Demo page | Project page | HuggingFace🤗 Demo

Key Features

We implement the following features in this framework:

Data processing for non-autoregressive Text-to-Speech using Montreal Forced Aligner.
Convenient and scalable framework for training and inference.
Simple but efficient random-access dataset implementation.

Install Dependencies

## We tested on Linux/Ubuntu 18.04. 
## Install Python 3.6+ first (Anaconda recommended).

export PYTHONPATH=.
# build a virtual env (recommended).
python -m venv venv
source venv/bin/activate
# install requirements.
pip install -U pip
pip install Cython numpy==1.19.1
pip install torch==1.9.0 # torch >= 1.9.0 recommended
pip install -r requirements.txt
sudo apt install -y sox libsox-fmt-mp3
bash mfa_usr/install_mfa.sh # install forced alignment tool

Documents

Citation

If you find this useful for your research, please cite the following papers:

PortaSpeech

@article{ren2021portaspeech,
  title={PortaSpeech: Portable and High-Quality Generative Text-to-Speech},
  author={Ren, Yi and Liu, Jinglin and Zhao, Zhou},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}

DiffSpeech

@article{liu2021diffsinger,
  title={Diffsinger: Singing voice synthesis via shallow diffusion mechanism},
  author={Liu, Jinglin and Li, Chengxi and Ren, Yi and Chen, Feiyang and Liu, Peng and Zhao, Zhou},
  journal={arXiv preprint arXiv:2105.02446},
  volume={2},
  year={2021}
 }

Acknowledgments

Our codes are influenced by the following repos:

License and Agreement

Any organization or individual is prohibited from using any technology mentioned in this paper to generate someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github		.github
assets		assets
checkpoints		checkpoints
data		data
data_gen/tts		data_gen/tts
docs		docs
egs		egs
inference/tts		inference/tts
mfa_usr		mfa_usr
modules		modules
tasks		tasks
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README-zh.md		README-zh.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NATSpeech: A Non-Autoregressive Text-to-Speech Framework

Key Features

Install Dependencies

Documents

Citation

Acknowledgments

License and Agreement

About

Releases

Packages

Languages

License

he-shuwei/NATSpeech

Folders and files

Latest commit

History

Repository files navigation

NATSpeech: A Non-Autoregressive Text-to-Speech Framework

Key Features

Install Dependencies

Documents

Citation

Acknowledgments

License and Agreement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages