Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training model with a dataset #13

Closed
gcunhase opened this issue Sep 4, 2018 · 6 comments
Closed

Training model with a dataset #13

gcunhase opened this issue Sep 4, 2018 · 6 comments

Comments

@gcunhase
Copy link

gcunhase commented Sep 4, 2018

Any hints on how to use NB2 to train a dataset (say a directory with multiple audio files) and then use the trained model to generate one of those samples?

Thank you in advance

@fatchord
Copy link
Owner

fatchord commented Sep 4, 2018

@gcunhase Yeah it should be easy enough - you could try Pytorch's Dataset class. Whether the model will produce one of the samples is a gamble since the model would be unconditioned and you wouldn't have much, if any, control over what it outputs.

@gcunhase
Copy link
Author

gcunhase commented Sep 4, 2018

@fatchord Thank you for your reply, would it be possible to condition it somehow?

@fatchord
Copy link
Owner

fatchord commented Sep 7, 2018

@gcunhase Have a look at notebooks 4a and 4b - there you'll see how I do it.

@hyzhan
Copy link

hyzhan commented Sep 10, 2018

Is that have some sample audio? I am trying to train on the LJspeech dataset for seq_len=8000,batch=12,learning_rate=1e-3 warm up 4k step and then decay, my loss is around 1.7

@fatchord
Copy link
Owner

@hyzhan what model are you training? By the way the seq_len should be around 1000 for more efficient training - also make sure it's cleanly divisible by your sprectrogram hop length.

@hyzhan
Copy link

hyzhan commented Oct 21, 2018

Thanks for reply, I have successfully trained in the alternative model, it sounds good, thank you very much. But the strange thing is that the alternative model can converge in a dataset of about 5 hours, but the dataset is often jittery in 30 hours, causing the latter to generate audio with loud noise.
30hour_train_loss
30hour_train_loss2
Above are two large data sets, the smaller ones below:
5hour_train_loss
Could you please give me some advice?

@fatchord fatchord closed this as completed Dec 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants