Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to generate with frame shift 12.5ms #4

Closed
Chunhui-Lu opened this issue Dec 14, 2020 · 5 comments
Closed

How to generate with frame shift 12.5ms #4

Chunhui-Lu opened this issue Dec 14, 2020 · 5 comments

Comments

@Chunhui-Lu
Copy link

Hi, I am runing a exp (cyc-noise-nsf-4) with 12.5ms frame shift, 50ms frame length (to match the config of Tacotron).
I only modify input_reso = [200, 200] in config.py, and corresponding args to extract mel and f0
But, the f0 of the synthesized audio looks dijscontinuous.
Can you help me?
屏幕截图 2020-12-14 135748

@TonyWangX
Copy link
Member

TonyWangX commented Dec 14, 2020

Thanks to trying the code.

Regarding "audio looks dis-continuous", I am sorry that I cannot tell too much from the spectrogram you showed. Could you please provide more information:

  1. What kind of dis-continuity did you perceive, like the voicing error within a sound (i.e., voiced sound -> unvoiced sound)? Or it occurs during the transition from one sound to another?

  2. How frequently does it happen? Only this one or in every generated utterances?

  3. If it is allowed, please do attach a few audio samples here. Or, you can send the audios to me through emails.

  4. If it is allowed, you may also send the input features files and trained model (trained_network.pt, config.py, model.py) to me too.

With the audios, I probably can identify the issue. With the input features and trained models, I may reply with a better answer.

Thanks in advance.

@TonyWangX
Copy link
Member

@Chunhui-Lu

@Chunhui-Lu
Copy link
Author

Hi, thanks for your reply.
I found that the pitch sequences I extracted are discontinuous
When I extracted pitch sequences using pyworld again, everything was OK

This is a pitch sequence of a song's segment extracted by amfm_decompy.pYAAPT. You can find that it is discontinuous:
219.178085
219.178085
219.178085
216.216217
213.333328
210.526321
207.792206
207.792206
207.792206
207.792206
207.792206
207.792206
205.128204
205.128204
205.128204
205.128204
205.128204
205.128204
207.792206
207.792206
207.792206
207.792206
207.792206
207.792206
207.792206
207.792206
207.792206
210.526321
210.526321
210.526321
213.333328
213.333328

@TonyWangX
Copy link
Member

TonyWangX commented Dec 18, 2020

@Chunhui-Lu Thanks for the reply. It is good to know that.

Yes, no F0 extractor is guaranteed to work at all cases.
I am afraid that I cannot help to solve it : )

I remember that I used to use multiple F0 extractors and do voting ...
I don't have more informative suggestion on this

@kikirizki
Copy link

Hi, I am runing a exp (cyc-noise-nsf-4) with 12.5ms frame shift, 50ms frame length (to match the config of Tacotron).
I only modify input_reso = [200, 200] in config.py, and corresponding args to extract mel and f0
But, the f0 of the synthesized audio looks dijscontinuous.
Can you help me?
屏幕截图 2020-12-14 135748

Hi, did you retrain cyc-noise-nsf-4 using frameshift 12.5ms, if yes, would you mind share the trained model ? I try to test tacotron2 + cyc-noise-nsf-4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants