Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My trained model is 4GB, how that? #22

Closed
hdmjdp opened this issue Nov 14, 2018 · 11 comments
Closed

My trained model is 4GB, how that? #22

hdmjdp opened this issue Nov 14, 2018 · 11 comments

Comments

@hdmjdp
Copy link

hdmjdp commented Nov 14, 2018

My trained model is 4GB, how that?

@hcwu1993
Copy link

because the default config is huage. 12*8(512channel)

@RPrenger
Copy link

RPrenger commented Nov 14, 2018

We haven't done a lot of ablative analysis yet to see how few channels we could get away with or how few layers. A lot of architecture decisions were made based on the early parts of the training curves which seem to favor bigger models. But if smaller models were trained for 500k iterations they might sound essentially as good.

@rafaelvalle
Copy link
Contributor

@hcwu1993 the trained model or the checkpoint file that is saved during training and includes includes the optimizer states?

@hcwu1993
Copy link

it should save model parameters and structure information according to the pytorch doc.

@hcwu1993
Copy link

By the way, the given model is 2GB. So the config is different from the paper? And i got a unusual result using this model. The F0 of generation wav is lower than the natural, it sounds like male voice.

@rafaelvalle
Copy link
Contributor

@hcwu1993 Unlike the checkpoints saved during draining that include optimizer states, the checkpoint we shared with the pretrained model only contains the model. Hence the difference in size.

@hcwu1993
Copy link

thank you. And i have another question. I use this command to synthesis wav, the default sampling rate is 22050Hz. it sounds like male voice. is there any problem in my command?
image

@rafaelvalle
Copy link
Contributor

Your mel-spectrograms must have the same parameters (sampling_rate, filter_lenght, hop_length, win_length, mel_fmin, mel_fmax) as your model.

The pretrained model we share was trained with "mel_fmax": 8000.0.
We eventually updated the config.json file to have "mel_fmax": 8000.0. as the default. https://github.com/NVIDIA/waveglow/blob/master/config.json#L20

If you trained your model before this update, it is possible that your model was trained with librosa's default: "mel_fmax": sampling_rate/2.

@hdmjdp
Copy link
Author

hdmjdp commented Nov 17, 2018

@ Did your batch_size=24, train with fp16?

@rafaelvalle
Copy link
Contributor

No, we trained with FP32.

@rafaelvalle
Copy link
Contributor

Closing issue. Please re-open if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants