Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some confusion in your visualization TensorBoard #4

Closed
v-nhandt21 opened this issue Jul 15, 2020 · 2 comments
Closed

Some confusion in your visualization TensorBoard #4

v-nhandt21 opened this issue Jul 15, 2020 · 2 comments

Comments

@v-nhandt21
Copy link

When I train model FastPitch from NVIDIA source code, I have the same images like yours,
My train data is 11239 and validation is 1000,
I have seen that the train line and val line more and more separate it others. It seems not common
Are your models really output a speech or not? I feel so confused?
Thank you for sharing the code <3

@ming024
Copy link
Owner

ming024 commented Jul 22, 2020

I am not sure of the meaning of the sentence "Are your models really output a speech or not?"
The output of my FastSpeech2 implementation is a mel-spectrogram, which can be conversed to wavfiles by vocoders such as WaveGlow and MelGAN.

If you are asking that why there is a large gap between the training and validation mel_loss and mel_postnet_loss curves, it is because that in evaluate.py the model synthesizes mel-spectrograms without ground-truth F0 and energy labels.

FastSpeech2/evaluate.py

Lines 63 to 64 in 172d2ea

mel_output, mel_postnet_output, duration_output, f0_output, energy_output = model(
text, src_pos, mel_pos, max_len, D)

@ming024
Copy link
Owner

ming024 commented Sep 5, 2020

closed #4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants