-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problems of noisy loud audios? #58
Comments
No, everything you did seems ok to me. Could you upload code to https://gist.github.com/ or whatever so that I can reproduce? |
This is the gist link, https://gist.github.com/attitudechunfeng/58a052f18f6aa24235000cc50618e887. There're 2 files, nn.py is the training phase of the DNN based model and nn_synth.py is the test phase. |
Thank you. I can reproduce and here is the fix: https://gist.github.com/attitudechunfeng/58a052f18f6aa24235000cc50618e887#file-nn_synth-py-L181 replace wavfile.write('1.wav', rate=fs, data=waveform) with wavfile.write('1.wav', rate=fs, data=waveform.astype(np.int16)) By the way, I noticed you save models as |
It works! I will adopt your advice about saving model, finally, many thanks! |
You are welcome. Feel free to open new issues if you have any other problems. |
Another question, if i want to synthesis arbitrary text, how do i generate the lab file. I compared the hts full-context lab file and the one in the example. They're almost the same except each line in the example has an '[2-6]' at the end position. How could i unify them? |
Generating full-context labels (which requires language-dependent text processor) is out of scope of the library. For this purpose, you can follow Merlin's guide to generate labels. https://github.com/CSTR-Edinburgh/merlin/tree/master/egs/build_your_own_voice/s1#prepare-labels. See also CSTR-Edinburgh/merlin#28. |
I have met another question when i tried to adapt the rnn and dnn model to my own dataset. First, i generate full-text labs using the hts-engine front end tools, then normalize them to phone and state alignment files, and finally using the prepare_feature script to extract features. But during the phase of training acoustic model, errors occured. With dnn part, it warns as follows: and for the rnn part, it warns a padding size problem, but the code have implied to pad to the max utt length. any ideas about that? |
Did you make sure that you did valid indexing?
Did you try to set large |
i've figured this problem by increasing the padded_length. However, the audio trained on my own data performs poor quality. I didn't change the training parameters. Any advices to improve the quality? ps: my dataset is 16khz |
Hard to say much without seeing code and data, but I guess:
Also, training parameters should be turned for you data. |
Hi, @r9y9, I followed the document tutorial of synthesis audios. I have tried both your DNN based and RNN based models, step by step following the python tutorial. However, when the model trained, in the test stage, i got a noisy loud audio. Can you give me some advices about this problem. In fact, i make some little changes, i split the train and test phase into different files, and both model file and parameters are saved in local. In the test stage, I just load those files from disk and go through the following test steps. Also, I don't use the test files in your tutorial, but randomly select a lab file from 'data/slt_arctic_full_data/label_state_align/' instead. Is there anything wrong with me?
The text was updated successfully, but these errors were encountered: