Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to build a new voicedb? #22

Closed
uselessbug opened this issue Oct 5, 2020 · 3 comments
Closed

How to build a new voicedb? #22

uselessbug opened this issue Oct 5, 2020 · 3 comments
Labels
documentation Improvements or additions to documentation question Further information is requested

Comments

@uselessbug
Copy link

How can I build a new voicedb by my own? Or is there any documents? (I didn't find it up to now)

@taroushirani
Copy link
Contributor

First of all, you need to prepare (1)singing voice data (any format which python can handle) (2)sheet music of singing voice data(musicxml format[1]) (3)label file which contains the relationship between each phoneme and its time to be pronounced(HTS monophone label format). Currently NNSVS makes use of pysinsy[2](python wrapper of sinsy[3], HMM based singing voice synthesis system) to convert musicxml files to HTS fullcontext label files, and it supports Japanese only. There is need to match between phonemes used in HTS monophone label files and those converted from musicxml files with pysinsy. If you want to use other language, you may need to find or write such converter as pysinsy. If you can prepare your own data as above, the recipe of kiritan database or PJS corpus may be helpful.

  1. https://www.musicxml.com/
  2. https://github.com/r9y9/pysinsy
  3. http://www.sinsy.jp/

@uselessbug
Copy link
Author

If you want to use other language, you may need to find or write such converter as pysinsy. If you can prepare your own data as above, the recipe of kiritan database or PJS corpus may be helpful.

Thanks fo replying! I've prepared the data above, but how can I train the model based on those data?

@taroushirani
Copy link
Contributor

taroushirani commented Oct 5, 2020

  1. Copy nnsvs/egs/pjs/00-svs-world and create a recipe for your own data.
cp -r egs/pjs egs/<name_of_your_voice_db>
  1. Edit egs/<name_of_your_voice_db>/00-svs-world/run.sh with your favorite editor. You need to change spk at line 6 to your name(or other as you like). If pretrained_expdir at line 15 is not empty, you can do fine-tuning from kiritan_singing. If you don't want to do so, you need to leave pretrained_expdir empty. You need to change arguments of util/data_prep.py at line 69 from the directory of PJS corpus to that of your own data. Between line 73 and 77 of run.sh lists of files for training, validation and evaluation are created and you can choose songs for validation and evaluation as you like from your own data.

  2. Run your recipe as this.

bash run.sh --stage 0 --stop-stage 6

If there is a mismatch between the monophone label you prepared and HTS full-context label generated from your musicxml file by pysinsy, your recipe will stop at stage 0. You need to resolve this conflict by editing the monophone label file or the musicxml file. If you can run your recipe to the end, wav files you chose as validation and evaluation above are generated at egs/<name_of_your_voice_db>/00-svs-world/exp/<spk_of_your_voice_db>/synthesis/.

@r9y9 r9y9 added documentation Improvements or additions to documentation question Further information is requested labels Oct 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants