How to use

한국어 버전은 여기로 → README-ko.md

How to use

Clone this repository

git clone https://github.com/ouor/vits.git

Choose cleaners

Fill "text_cleaners" in config.json
Initialy "text_cleaners" is set to 'korean_cleaners'. To use alternative cleaners, revise with following step.
Edit text/symbols.py
Remove unnecessary imports from text/cleaners.py

Create virtual environment

python -m venv .venv
.\.venv\Scripts\activate

Install pytorch

pip3 install torch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 --index-url https://download.pytorch.org/whl/cu117

Install requirements

pip install -r requirements.txt

If error occurs while install requirements, Install visual studio build tools and try again.

Build monotonic alignment search

cd monotonic_align
mkdir monotonic_align
python setup.py build_ext --inplace
cd ..

Create datasets

Single speaker

"n_speakers" should be 0 in config.json

path/to/XXX.wav|transcript

Example

dataset/001.wav|こんにちは。

Mutiple speakers

Speaker id should start from 0

path/to/XXX.wav|speaker id|transcript

Example

dataset/001.wav|0|こんにちは。

Preprocess

If you need random pick from full filelist..

python random_pick.py --filelist path/to/filelist.txt

# Single speaker
python preprocess.py --text_index 1 --filelists path/to/filelist_train.txt path/to/filelist_val.txt --text_cleaners 'korean_cleaners'

# Mutiple speakers
python preprocess.py --text_index 2 --filelists path/to/filelist_train.txt path/to/filelist_val.txt --text_cleaners 'korean_cleaners'

If you have done this, set "cleaned_text" to true in config.json

Small Tips

recommand to use pretrained model (you can get pretrained model from huggingface.co)
If your vram is not enough (less than 40GB)
do not train with 44100Hz. 22050Hz is good enough.
make each dataset audio length short. (recommand to use maximum 4 seconds per audio)

Train

# Single speaker
python train.py -c <config> -m <folder>

# Mutiple speakers
python train_ms.py -c <config> -m <folder>

If you want to train from pretrained model, Place 'G_0.pth' and 'D_0.pth' in destination folder before enter train command.

Tensorboard

tensorboard --logdir checkpoints/<folder> --port 6006

Inference

Jupyter notebook

infer.ipynb

Gradio web app

python server.py --config_path path/to/config.json --model_path path/to/model.pth

Running in Docker

docker run -itd --gpus all --name "Container name" -e NVIDIA_DRIVER_CAPABILITIES=compute,utility -e NVIDIA_VISIBLE_DEVICES=all "Image name"

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
Libtorch C++ Infer		Libtorch C++ Infer
example		example
monotonic_align		monotonic_align
text		text
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README-ko.md		README-ko.md
README.md		README.md
attentions.py		attentions.py
commons.py		commons.py
data_utils.py		data_utils.py
infer.ipynb		infer.ipynb
losses.py		losses.py
mel_processing.py		mel_processing.py
models.py		models.py
modules.py		modules.py
preprocess.py		preprocess.py
random_pick.py		random_pick.py
requirements.txt		requirements.txt
server.py		server.py
train.py		train.py
train_ms.py		train_ms.py
transforms.py		transforms.py
utils.py		utils.py

License

ouor/vits

Folders and files

Latest commit

History

Repository files navigation

How to use

Clone this repository

Choose cleaners

Create virtual environment

Install pytorch

Install requirements

Build monotonic alignment search

Create datasets

Single speaker

Mutiple speakers

Preprocess

Small Tips

Train

Tensorboard

Inference

Jupyter notebook

Gradio web app

Running in Docker

About

Resources

License

Stars

Watchers

Forks

Languages