Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting an unconditional generation experiment #5

Closed
ladium493 opened this issue Oct 12, 2020 · 7 comments
Closed

Starting an unconditional generation experiment #5

ladium493 opened this issue Oct 12, 2020 · 7 comments

Comments

@ladium493
Copy link

For unconditional generation, is that changing
y = self.dilated_conv(y) + conditioner in model.py
to
y = self.dilated_conv(y)
avaliable?

And how to generate samples?

@sharvil
Copy link
Contributor

sharvil commented Oct 12, 2020

I haven't experimented too much with unconditional generation yet. You'll have to make the code changes yourself if you want to play with it before I get around to making the changes.

Besides removing the conditioning network, you'll also need to increase the receptive field size so that it covers the entire utterance. Specifically:

  • increase number of layers to 36
  • increase dilation cycle length to 12
  • increase diffusion steps to 200
  • increase residual channels to 256
  • use a linear spaced noise schedule covering [1e-4, 2e-2]

All of the changes I listed can be made in params.py and are tuned to the Speech Commands 0-9 dataset. Generating samples is pretty straightforward: in inference.py you'll just drop the spectrogram argument and pass in None to the model.

Let me know if you have more questions. I'd love to hear how your experiment goes!

@ladium493
Copy link
Author

ladium493 commented Oct 15, 2020

Finally got some results instead of small noise. A smaller dataset is used for training (up down left right). T=20 result in some recognizable voices. Thanks for helping!

And what about adding padding to the short samples instead of deleting them? If all samples in a minibatch are deleted, the training process will be terminated, which is annoying.
https://github.com/lmnt-com/diffwave/blob/master/src/diffwave/dataset.py#L56

@sharvil
Copy link
Contributor

sharvil commented Oct 16, 2020

Great! Glad to hear you're getting something that resembles speech.

Padding is a valid way to handle short samples, though it has the effect of being less computationally efficient. Another option - especially if you have a lot of short samples which it sounds like you do - is to reduce the number of frames to train on (also in params.py).

@ladium493
Copy link
Author

Yes that works. My problems are solved. Thanks for your helping!

1 similar comment
@ladium493
Copy link
Author

Yes that works. My problems are solved. Thanks for your helping!

@moiseshorta
Copy link

Hi,

I'm very curious onto how to implement the unconditional generation for my experiments as well.

Would have any code modifications on how to do this?

Thanks so much

@sharvil
Copy link
Contributor

sharvil commented Oct 26, 2020

@moiseshorta, have you tried making the changes I described in #5 (comment)? That should be a good starting point for your experiments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants