-
Notifications
You must be signed in to change notification settings - Fork 497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About generation and input&output types #93
Comments
For the second question, it's just a sampling from the categorical distribution conditioned on previously generated samples; As for the first question, you might want to see if your incremental generation is correct, but it could happen if your dataset has a lot of silence regions and your model might fitted to the silence regions, resulting in generating only silences. |
For second question: Yeah i did not get why you choose a random sample instead of the one with maximum value, which has the highest probability. First question: Even though I do standard trimming on LJspeech dataset, I might need to check again. Otherwise, probably i have an implementation error.. |
We want to sample from a generative model. Choosing a sample with the highest probability makes more sense in classification tasks. https://deepmind.com/blog/wavenet-generative-model-raw-audio/
|
Oh i see, you are randomly sampling from the distribution, not directly getting the result with the maximum value. Thanks! |
Hey, thank you for your code, it is very helpful to understand how the model works. I am implementing the wavenet on my own from scratch and have some questions:
I give scalar inputs to the model, and quantized targets to find out the cross entropy loss. Then in generation, i decode the output by mulaw into scalar value in range [-1,1], expand the generation audio and refeed into model for next sample. It works when I try to overfit a small data and generate. As well, when I train with whole dataset, the loss decreases down. However, when generating after ~700K steps without conditioning, it only produces a constant value. What could be the problem?
In your code, I see this part in incremental generation:
x = F.softmax(x.view(B, -1), dim=1) if softmax else x.view(B, -1) if quantize: sample = np.random.choice( np.arange(self.out_channels), p=x.view(-1).data.cpu().numpy())
I am not sure why you randomly choose a value here.
The text was updated successfully, but these errors were encountered: