New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to generate with frame shift 12.5ms #4
Comments
Thanks to trying the code. Regarding "audio looks dis-continuous", I am sorry that I cannot tell too much from the spectrogram you showed. Could you please provide more information:
With the audios, I probably can identify the issue. With the input features and trained models, I may reply with a better answer. Thanks in advance. |
Hi, thanks for your reply. This is a pitch sequence of a song's segment extracted by amfm_decompy.pYAAPT. You can find that it is discontinuous: |
@Chunhui-Lu Thanks for the reply. It is good to know that. Yes, no F0 extractor is guaranteed to work at all cases. I remember that I used to use multiple F0 extractors and do voting ... |
Hi, I am runing a exp (cyc-noise-nsf-4) with 12.5ms frame shift, 50ms frame length (to match the config of Tacotron).
I only modify input_reso = [200, 200] in config.py, and corresponding args to extract mel and f0
But, the f0 of the synthesized audio looks dijscontinuous.
Can you help me?
The text was updated successfully, but these errors were encountered: