Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replicating Generating Sequences by Alex Graves Handwriting Section #1608

Closed
dragon271828 opened this issue Feb 1, 2016 · 6 comments
Closed

Comments

@dragon271828
Copy link

I'm trying to replicate Alex Graves' paper: http://arxiv.org/pdf/1308.0850v5.pdf

The part with handwriting generation, I'm having trouble defining the objective function as a function of y_true and y_pred. In the paper, y_true takes the form of a 3-tuple and y_pred takes the form of a (e, {w_i, mu_i, sigma_i, rho_i}) where w_i, mu_i, sigma_i, and rho_i are the parameterization of the Gaussian mixture and e_i is the probability of whether the pen is down or not.

First off, y_true and y_pred have different dimensions, is that allowed?

Secondly, the different elements of y_pred must be treated individually in the custom loss function. So for instance let's say that there's two Gaussians in the mixture so that the dimension of y_pred is 9. Then the loss function will use all these individual components and do something different with each of them as shown on page 20 of the reference paper above

e = y_pred[0]
w1 = y_pred[1]
mu1 = y_pred[2]
sigma1 = y_pred[3]
rho1 = y_pred[4]
w2 = y_pred[5]
mu2 = y_pred[6]
sigma2 = y_pred[7]
rho2 = y_pred[8]

Is splitting up the components of y_pred a permitted operation in the custom loss function? I've written an implementation of the function but seem to be getting NaNs for the loss. I'm not sure whether I am doing something wrong or whether they are simply not allowed in keras or theano.

@rpinsler
Copy link
Contributor

rpinsler commented Feb 3, 2016

I've done something similar in #1061. Does that help?

@dragon271828
Copy link
Author

Thanks! This is incredibly helpful. I implemented a layer similar to yours, however I'm now receiving NaNs in the training loss after a couple iterations. I noticed that you had the same issue, would you mind sharing what kind of numeric optimization problems you had and how you solved them?

@rpinsler
Copy link
Contributor

rpinsler commented Feb 4, 2016

I can't remember exactly what the source of the problem was. I tried different things to avoid those numeric problems, e.g. another optimizer, gradient clipping and batch normalization. Now, it is pretty stable. Let me know if you can get it to work!

@jrieke
Copy link

jrieke commented Mar 20, 2016

I ran into the same issue, a few notes that might be helpful: I fixed it eventually by choosing a way smaller learning rate (with RMSprop; it's still quite prone to errors though so don't expect too much..). Gradient clipping had no effect for me (only made things worse); didn't try batch normalization. I also found that the parameter M (i. e. the number of Gaussian distributions per mixture) has quite an effect on the nan issue, so try to play around with that as well.

@el3ment
Copy link

el3ment commented Apr 13, 2016

I'm also getting nans after a few hundred iterations.

This implementation in tensorflow: http://blog.otoro.net/2015/11/24/mixture-density-networks-with-tensorflow/

Didn't seem to have the same nan issue, but in trying to debug the objective function in @rpinsler 's implementation I can't see any reason what is causing the nan unless one of the sigmas strays too close to zero and causes a divide by zero or perhaps the "sum" in return (alpha/sigma) * T.exp(-T.sum(T.sqr(mu-y_true),-1)/(2*sigma**2)) is misplaced. Did either of you solve the problems?

@stale
Copy link

stale bot commented May 23, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

@stale stale bot closed this as completed Jun 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants