-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replicating Generating Sequences by Alex Graves Handwriting Section #1608
Comments
I've done something similar in #1061. Does that help? |
Thanks! This is incredibly helpful. I implemented a layer similar to yours, however I'm now receiving NaNs in the training loss after a couple iterations. I noticed that you had the same issue, would you mind sharing what kind of numeric optimization problems you had and how you solved them? |
I can't remember exactly what the source of the problem was. I tried different things to avoid those numeric problems, e.g. another optimizer, gradient clipping and batch normalization. Now, it is pretty stable. Let me know if you can get it to work! |
I ran into the same issue, a few notes that might be helpful: I fixed it eventually by choosing a way smaller learning rate (with RMSprop; it's still quite prone to errors though so don't expect too much..). Gradient clipping had no effect for me (only made things worse); didn't try batch normalization. I also found that the parameter |
I'm also getting nans after a few hundred iterations. This implementation in tensorflow: http://blog.otoro.net/2015/11/24/mixture-density-networks-with-tensorflow/ Didn't seem to have the same nan issue, but in trying to debug the objective function in @rpinsler 's implementation I can't see any reason what is causing the nan unless one of the sigmas strays too close to zero and causes a divide by zero or perhaps the "sum" in |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed. |
I'm trying to replicate Alex Graves' paper: http://arxiv.org/pdf/1308.0850v5.pdf
The part with handwriting generation, I'm having trouble defining the objective function as a function of y_true and y_pred. In the paper, y_true takes the form of a 3-tuple and y_pred takes the form of a (e, {w_i, mu_i, sigma_i, rho_i}) where w_i, mu_i, sigma_i, and rho_i are the parameterization of the Gaussian mixture and e_i is the probability of whether the pen is down or not.
First off, y_true and y_pred have different dimensions, is that allowed?
Secondly, the different elements of y_pred must be treated individually in the custom loss function. So for instance let's say that there's two Gaussians in the mixture so that the dimension of y_pred is 9. Then the loss function will use all these individual components and do something different with each of them as shown on page 20 of the reference paper above
e = y_pred[0]
w1 = y_pred[1]
mu1 = y_pred[2]
sigma1 = y_pred[3]
rho1 = y_pred[4]
w2 = y_pred[5]
mu2 = y_pred[6]
sigma2 = y_pred[7]
rho2 = y_pred[8]
Is splitting up the components of y_pred a permitted operation in the custom loss function? I've written an implementation of the function but seem to be getting NaNs for the loss. I'm not sure whether I am doing something wrong or whether they are simply not allowed in keras or theano.
The text was updated successfully, but these errors were encountered: