You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I train model FastPitch from NVIDIA source code, I have the same images like yours,
My train data is 11239 and validation is 1000,
I have seen that the train line and val line more and more separate it others. It seems not common
Are your models really output a speech or not? I feel so confused?
Thank you for sharing the code <3
The text was updated successfully, but these errors were encountered:
I am not sure of the meaning of the sentence "Are your models really output a speech or not?"
The output of my FastSpeech2 implementation is a mel-spectrogram, which can be conversed to wavfiles by vocoders such as WaveGlow and MelGAN.
If you are asking that why there is a large gap between the training and validation mel_loss and mel_postnet_loss curves, it is because that in evaluate.py the model synthesizes mel-spectrograms without ground-truth F0 and energy labels.
When I train model FastPitch from NVIDIA source code, I have the same images like yours,
My train data is 11239 and validation is 1000,
I have seen that the train line and val line more and more separate it others. It seems not common
Are your models really output a speech or not? I feel so confused?
Thank you for sharing the code <3
The text was updated successfully, but these errors were encountered: