New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search direction #136
Search direction #136
Conversation
updates = {} | ||
direction = {} | ||
for param, grad in gradients.iteritems(): | ||
sqr_grad_sum = theano.shared(numpy.zeros_like(param.get_value())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use pylearn2.utils.sharedX so that this will respect theano.config.floatX. numpy.zeros_like will default to float64, so people who expect their shared variables to go on GPU will be disappointed.
I've commited some of the changes proposed and commented the others. Thanks for the feedback! |
I think there should not be an LRDecay class. When I said before that you might want to use LRDecay and Momentum at the same time, I didn't mean to hack LRDecay into the momentum class. I meant having learning rate decay implemented as a SearchDirection is a problem because you either have to define an interface for combining two search directions, or you will need to hack learning rate decay into every SearchDirection that you might want to use with it. The existing interface makes it pretty easy to use arbitrary learning rate schedules. If we merge this PR then we're stuck using one decay rate formula and if we want to make that more flexible, we need to implement the flexibility in ever SearchDirection class that we want to use with learning rate decay. |
I've made the changes. Concerning the implementation of the momentum class, have you seen my comment? |
Being able to chain different SearchDirections would make sense to me: we could simply pass the |
Which comment about the Momentum class are you referring to? I don't think I've seen it. Which clone method are you talking about? Chaining SearchDirections involves some subtleties and I find it a bit hard to think through just off the top of my head. Under the original interface we discussed, a SearchDirection can safely assume that the optimization algorithm will move in the direction that it returns, and it can assume that the direction it is passed is the gradient. Chaining SearchDirections together breaks both assumptions. Have you thought through what the consequences of that would be for a few different use cases? |
if self.momentum is None: | ||
updates.update( dict(safe_zip(params, [param - learning_rate * \ | ||
lr_scalers.get(param, 1.) * grads[param] | ||
lr_scalers.get(param, 1.) * direction[param] | ||
for param in params]))) | ||
else: | ||
for param in params: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it looks like you didn't modify the case where there is momentum.
@goodfeli I was trying to define a simple "interface for combining two search directions". Chaining them is my proposal. |
Here's what I wrote about the Momentum subclass: I must have figured it out wrong. The equations I had were |
(Once I fix the Momentum subclass, would it be better to suppress entirely the momentum option in the SGD class?) |
I've gotten fairly busy preparing for the AISTATS deadline next week. Do you mind if I want to finish reviewing this until then? |
No problem! |
I've re-based my commit and finished my implementation of momentum. I've added lots of commentaries to justify the counter-intuitive way used to implement it, so hopefully everything will be clear for those who use it. I also removed the old way momentum was implemented. |
@@ -551,6 +536,7 @@ def __call__(self, algorithm): | |||
algorithm.learning_rate.set_value(new_lr) | |||
|
|||
|
|||
# TODO: decide whether this class is still relevant with SearchDirection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is. This is what allows you to change the momentum over time, which is very important for getting good results. But it would have to be updated to point to the new momentum coefficient.
I'm still not convinced momentum should be a SearchDirection, for the following reasons:
|
Nicolas Boulanger may have something to say about point 6. He has -- Yoshua On Wed, Nov 14, 2012 at 11:29 AM, goodfeli notifications@github.com wrote:
|
Yes, I found that the following change of variable simplifies the implementation of Nesterov momentum. It is in the same form as regular momentum in the sense that both velocity and parameter updates depend only on the gradient at the current value of the parameters. In short: regular momentum: Nesterov momentum: alternate formulation for Nesterov momentum: So with Theano you can use (1) then either (2) or (7)/(8) to have both options. |
Thanks @boulanni . I will add this to the repo as a note for someone to implement. |
…od anymore, also replaced theano.shared with sharedX
…ccept a shared variable as a learning rate
I implemented Nicola's reformulation of Nesterov momentum, hopefully everything is all right! |
@@ -54,6 +54,8 @@ class TransformerIterator(object): | |||
def __init__(self, raw_iterator, transformer_dataset): | |||
self.raw_iterator = raw_iterator | |||
self.transformer_dataset = transformer_dataset | |||
self.stochastic = raw_iterator.stochastic | |||
self.uneven = raw_iterator.uneven |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Is there a unit test we should add to make sure this does not break again?
It seems like the last rebase I did messed up BIG TIME: the momentum implementation completely disappeared from the last commit. That would explain the strange disappearance of docstring comments. Sorry about that, I'll have this fixed soon. |
ok, @dwf is probably the most knowledgeable person about git in the lab. He can probably help you recover stuff that disappeared. I know how to do a rebase but I don't know how to recover from a rebase gone wrong. |
@vdumoulin is planning to redo this in several small pull requests rather than one big rebase, so I am closing the PR. |
I've completed adding the SearchDirection class to Pylearn2 and I tested it. It seems to work as I intended.
I also added some subclasses (adagrad and momentum, or at least momentum in the way I understand it) to test it a little.
If everything is okay with my pull request, I think there would be some cleanup to do in the SGD class as some of the hacks added over time might be replaced by a SearchDirection subclass.