Persian FastPitch

Training FastPitch for Persian language as a Persian text-to-speech. FastPitch is a TTS model that generates mel-spectrograms from text and is newer and faster than Tacotron. In this implementation we use FastPitch from Nvidia and change it to train this model for persian language. We clone Nvidia-FastPitch and install its requirements and then do following changes:

Prepare persian data: many audio files and phonemes sequence for each file (we use phoneme instead of text because of using english characters and solving the problem of not writing some vowels in the Persian text)
Edit fastpitch/data_function.py beacause of erroe in google colab. You can see this issue
Edit cleaners.py in common/text/ according to used characters in phonemes
Edit script/train.sh and train.py to change training parameters
Edit scripts/inference_example.sh to change inferencing parameter

How to use

To use this implementation:

Clone this repository
Install requirements in requirments.txt
Add your data: audio files to wavs/ and training and validating phoneme_transcriptions to filelists/ and testing phoneme_transcriptions to phrases/ as it is right now
Run following command to extract pitch from your audio files and save files to wavs/pitch/:

python prepare_dataset.py \
     --wav-text-filelists filelists/audio_text_train.txt \
                          filelists/audio_text_val.txt \
     --n-workers 16 \
     --batch-size 1 \
     --dataset-path 'wavs/' \
     --extract-pitch \
     --f0-method pyin

Run following command to install some dependencies:

git clone https://github.com/NVIDIA/apex
cd apex; pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
bash scripts/download_cmudict.sh

Train the model on your data using following command. The checkpoints file will be in output/

bash scripts/train.sh

Download WaveGlow to get audio from mel-spectrogram:

bash scripts/download_waveglow.sh

Run following command to get result of test file that you put in phrase/ in step 3. The synthesized audio will be in output/audio_test_file/:

bash scripts/inference_example.sh

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
__pycache__		__pycache__
audio		audio
cmudict		cmudict
common		common
fastpitch		fastpitch
filelists		filelists
notebooks		notebooks
phrases		phrases
platform		platform
scripts		scripts
triton		triton
waveglow		waveglow
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
export_torchscript.py		export_torchscript.py
inference.py		inference.py
models.py		models.py
pitch_transform.py		pitch_transform.py
prepare_dataset.py		prepare_dataset.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Persian FastPitch

How to use

About

Releases

Packages

Languages

Adibian/persian_fastpitch

Folders and files

Latest commit

History

Repository files navigation

Persian FastPitch

How to use

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages