New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional detail on using preprocess.py with gentle phoneme data #96
Comments
I've been able to figure it out, by more careful reading of the code and just trying things (repeatedly 🙂 ) If it helps others, what I did was:
for f in *.txt; do mv "$f" "p225_$f"; done
for f in *.lab; do mv "$f" "p225_$f"; done
for f in *.wav; do mv "$f" "p225_$f"; done
if not is_wav:
#files = list(map(lambda s: open(s, "rb").read().decode("utf-8")[:-1], files))
files = list(map(lambda s: open(s, "rb").read().decode("utf-8"), files))
python preprocess.py vctk "./datasets/neil2" "./datasets/processed_neil2" --preset=presets/deepvoice3_ljspeech.json
python train.py --preset=presets/deepvoice3_ljspeech.json --data-root=./datasets/processed_neil2 --restore-parts=checkpoints_intial_ljspeech/checkpoint_step000430000.pth And, after waiting overnight it worked!! 🎉 I've got a small issue with the synthesised audio, but I'll open a separate issue for that (if I don't figure out the cause) Two questions @r9y9
|
You did a great job, but I think the better way to handle your own dataset is to follow https://github.com/r9y9/deepvoice3_pytorch#1-1-building-custom-dataset-using-json_meta. I just tried tto build my own toy dataset and I understand that could be explained more carefully (took me more than ten minutes to figure out what exactly > ls -l total 936
-rw-rw-r-- 1 ryuichi ryuichi 152 6月 30 22:32 LJ001-0001.txt
-rw-r--r-- 1 ryuichi ryuichi 425830 6月 30 22:29 LJ001-0001.wav
-rw-rw-r-- 1 ryuichi ryuichi 31 6月 30 22:32 LJ001-0002.txt
-rw-r--r-- 1 ryuichi ryuichi 83814 6月 30 22:29 LJ001-0002.wav
-rw-rw-r-- 1 ryuichi ryuichi 156 6月 30 22:32 LJ001-0003.txt
-rw-r--r-- 1 ryuichi ryuichi 426342 6月 30 22:29 LJ001-0003.wav
-rw-rw-r-- 1 ryuichi ryuichi 417 6月 30 22:45 alignment.json
> echo "{" > alignment.json; for f in $(find $PWD -type f -name "*.wav"); do { g=${f/.wav/.txt}; echo \"${f}\": \"${g}\", >> alignment.json } done; echo "}" >> alignment.json
# remove last comma manually
> emacs -nw alignment.json
# check that we have correct json format
> cat alignment.json | jq .
{
"/home/ryuichi/Dropbox/sp/deepvoice3_pytorch/foobar/LJ001-0002.wav": "/home/ryuichi/Dropbox/sp/deepvoice3_pytorch/foobar/LJ001-0002.txt",
"/home/ryuichi/Dropbox/sp/deepvoice3_pytorch/foobar/LJ001-0001.wav": "/home/ryuichi/Dropbox/sp/deepvoice3_pytorch/foobar/LJ001-0001.txt",
"/home/ryuichi/Dropbox/sp/deepvoice3_pytorch/foobar/LJ001-0003.wav": "/home/ryuichi/Dropbox/sp/deepvoice3_pytorch/foobar/LJ001-0003.txt"
} After you have
Steps you described after |
Yes!
Yes. See my previous comment. if not is_wav:
#files = list(map(lambda s: open(s, "rb").read().decode("utf-8")[:-1], files))
files = list(map(lambda s: open(s, "rb").read().decode("utf-8"), files)) If this is a real issue (sorry I forgot why I put |
Thanks a lot for the update. I could be wrong, but I think your steps don't take account of the gentle processing - I'd already managed to get the regular training working as per section 1.1, but I was trying to use the .lab files that gentle created (as I thought that better knowledge of the positions of the words within the .wav files might help with training) and therefore was trying to do the steps suggested by 1.2. I tried to follow section 1.2 but was struggling due to not knowing the structure to aim for with the files/folders. My steps above did seem to manage to incorporate the gentle .lab files etc, but if there's a smarter way to handle that part (ie what is implied in 1.2), I'd be keen to dicsuss it. Did you find using gentle helpful? It's a bit subjective, but I think my results were marginally improved. I'm now focusing on weeding out some bad quality data that I found in my dataset and then will do some more runs. |
While I didn't prepare label files, deepvoice3_pytorch/json_meta.py Lines 222 to 237 in 271863f
|
Hi. For json format, in principle the format should be like (Other formats where list of alignment candidates is placed instead of transcript text is supported (because the carpedm20's automatic alignment creates such format), but the format mentioned above works well) For effectiveness of Gentle, you can refer to #78, where i directly compared the performance. In summary, merlin based vctk_preprocess didnt work at all in 'dirty' dataset, while gentle did work and did improve the tts performance. This works well when dealing with dataset which includes inconsistent silences(e.g. breathing) during speech. I totally agree that doc needs some improvement. Any suggestions will be welcome, and i will try to improve it!! :) +) I just discovered that gentle alignment step can be covered just with alignment.json files (as it provides path to wav and the wav's transcript) Would it be better if gentle alignment step supports json input, instead of txt and wav file patterns? |
Hi @engiecat, it's great that you suggested gentle and it was included 😀 |
@nmstoker |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hello @r9y9
Would you mind giving a few more details on what is needed to use preprocess.py as mentioned in the last step of the section on using custom data?
https://github.com/r9y9/deepvoice3_pytorch#1-2-preprocessing-custom-english-datasets-with-long-silence-based-on-vctk_preprocess
Initially I managed to train using custom data without using gentle, and the results were recognisably like my training data (recordings of my own voice) but I am hoping it will improve quality if I use gentle with the training data. I have managed to process the data with gentle_web_align.py, but I am unsure what parameters to use for preprocess.py now and also what format I need to put the files into.
Is there some similar format to the alignment.json file that should be created? And how would I incorporate the .lab files I got from gentle_web_align.py?
Sorry - I expect these may be obvious to you, but I've been trying to figure it out from looking over the code, but to no avail! 😞
The project is really impressive - thank you for sharing your work!
Neil (@nmstoker)
The text was updated successfully, but these errors were encountered: