Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Momentum bug #4

Closed
mdenil opened this issue Oct 5, 2013 · 0 comments
Closed

Momentum bug #4

mdenil opened this issue Oct 5, 2013 · 0 comments

Comments

@mdenil
Copy link
Owner

mdenil commented Oct 5, 2013

Reported by Ian Goodfellow:

Hi Misha,
I think I found a bug in the momentum for your dropout demo. This came
up when someone suggested adding some code that was partially copied
from your demo to pylearn2.

The bug is with these lines:

for gparam_mom, gparam in zip(gparams_mom, gparams):
updates[gparam_mom] = mom * gparam_mom + (1. - mom) * gparam

# ... and take a step along that direction
for param, gparam_mom in zip(classifier.params, gparams_mom):
    stepped_param = param - (1.-mom) * learning_rate * gparam_mom

There are two things I think are wrong here:

  1. When you update stepped_param, you want to use updates[gparam_mom]
    and not gparam_mom. gparam_mom is one time step too old. Only
    updates[gparam_mom] has been updated with the current gradient.
    gparam_mom won't contain the updated value until after the theano
    function finishes executing. (At first I thought you were doing
    Nesterov momentum, but that would need the gradient from t+1, not t-1)
  2. If you expand the recurrence for stepped_param, it doesn't match
    the formula in appendex A1 of Geoff's paper. It ends up multiplying by
    (1-mom)^2 instead of (1-mom). This probably makes you need a bigger
    learning rate to compensate, since 1-mom will be a small number.

Hope that helps,
Ian

@mdenil mdenil closed this as completed in 7311aea Oct 15, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant