Some confusion in your visualization TensorBoard #4

v-nhandt21 · 2020-07-15T13:14:54Z

When I train model FastPitch from NVIDIA source code, I have the same images like yours,
My train data is 11239 and validation is 1000,
I have seen that the train line and val line more and more separate it others. It seems not common
Are your models really output a speech or not? I feel so confused?
Thank you for sharing the code <3

ming024 · 2020-07-22T06:42:42Z

I am not sure of the meaning of the sentence "Are your models really output a speech or not?"
The output of my FastSpeech2 implementation is a mel-spectrogram, which can be conversed to wavfiles by vocoders such as WaveGlow and MelGAN.

If you are asking that why there is a large gap between the training and validation mel_loss and mel_postnet_loss curves, it is because that in evaluate.py the model synthesizes mel-spectrograms without ground-truth F0 and energy labels.

FastSpeech2/evaluate.py

Lines 63 to 64 in 172d2ea

    
           mel_output, mel_postnet_output, duration_output, f0_output, energy_output = model( 
        
               text, src_pos, mel_pos, max_len, D)

ming024 · 2020-09-05T02:48:33Z

closed #4

ming024 closed this as completed Sep 5, 2020

youngstu mentioned this issue Oct 18, 2021

How Fastspeech2 support many level prosody for Chinese language？ #114

Open

leslie2046 mentioned this issue Apr 11, 2022

pinyin phoneme modeling on aishell3 dataset #42

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some confusion in your visualization TensorBoard #4

Some confusion in your visualization TensorBoard #4

v-nhandt21 commented Jul 15, 2020

ming024 commented Jul 22, 2020

ming024 commented Sep 5, 2020

Some confusion in your visualization TensorBoard #4

Some confusion in your visualization TensorBoard #4

Comments

v-nhandt21 commented Jul 15, 2020

ming024 commented Jul 22, 2020

ming024 commented Sep 5, 2020