My trained model is 4GB, how that? #22

hdmjdp · 2018-11-14T02:33:52Z

My trained model is 4GB, how that?

hcwu1993 · 2018-11-14T08:49:33Z

because the default config is huage. 12*8(512channel)

RPrenger · 2018-11-14T17:36:12Z

We haven't done a lot of ablative analysis yet to see how few channels we could get away with or how few layers. A lot of architecture decisions were made based on the early parts of the training curves which seem to favor bigger models. But if smaller models were trained for 500k iterations they might sound essentially as good.

rafaelvalle · 2018-11-14T21:55:43Z

@hcwu1993 the trained model or the checkpoint file that is saved during training and includes includes the optimizer states?

hcwu1993 · 2018-11-16T03:12:40Z

it should save model parameters and structure information according to the pytorch doc.

hcwu1993 · 2018-11-16T03:17:17Z

By the way, the given model is 2GB. So the config is different from the paper? And i got a unusual result using this model. The F0 of generation wav is lower than the natural, it sounds like male voice.

rafaelvalle · 2018-11-16T03:37:43Z

@hcwu1993 Unlike the checkpoints saved during draining that include optimizer states, the checkpoint we shared with the pretrained model only contains the model. Hence the difference in size.

hcwu1993 · 2018-11-16T04:04:29Z

thank you. And i have another question. I use this command to synthesis wav, the default sampling rate is 22050Hz. it sounds like male voice. is there any problem in my command?

rafaelvalle · 2018-11-16T05:04:12Z

Your mel-spectrograms must have the same parameters (sampling_rate, filter_lenght, hop_length, win_length, mel_fmin, mel_fmax) as your model.

The pretrained model we share was trained with "mel_fmax": 8000.0.
We eventually updated the config.json file to have "mel_fmax": 8000.0. as the default. https://github.com/NVIDIA/waveglow/blob/master/config.json#L20

If you trained your model before this update, it is possible that your model was trained with librosa's default: "mel_fmax": sampling_rate/2.

hdmjdp · 2018-11-17T02:39:19Z

@ Did your batch_size=24, train with fp16？

rafaelvalle · 2018-11-17T03:56:32Z

No, we trained with FP32.

rafaelvalle · 2018-11-17T16:21:57Z

Closing issue. Please re-open if needed.

rafaelvalle closed this as completed Nov 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

My trained model is 4GB, how that? #22

My trained model is 4GB, how that? #22

hdmjdp commented Nov 14, 2018

hcwu1993 commented Nov 14, 2018

RPrenger commented Nov 14, 2018 •

edited

rafaelvalle commented Nov 14, 2018

hcwu1993 commented Nov 16, 2018

hcwu1993 commented Nov 16, 2018

rafaelvalle commented Nov 16, 2018

hcwu1993 commented Nov 16, 2018

rafaelvalle commented Nov 16, 2018

hdmjdp commented Nov 17, 2018

rafaelvalle commented Nov 17, 2018

rafaelvalle commented Nov 17, 2018

My trained model is 4GB, how that? #22

My trained model is 4GB, how that? #22

Comments

hdmjdp commented Nov 14, 2018

hcwu1993 commented Nov 14, 2018

RPrenger commented Nov 14, 2018 • edited

rafaelvalle commented Nov 14, 2018

hcwu1993 commented Nov 16, 2018

hcwu1993 commented Nov 16, 2018

rafaelvalle commented Nov 16, 2018

hcwu1993 commented Nov 16, 2018

rafaelvalle commented Nov 16, 2018

hdmjdp commented Nov 17, 2018

rafaelvalle commented Nov 17, 2018

rafaelvalle commented Nov 17, 2018

RPrenger commented Nov 14, 2018 •

edited