-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Learning rate multipliers for convolutional and dense layers #3004
Conversation
…nd pass them to optimizers
Has this been implemented? |
Looking at this now. Two things come to mind:
Also, avoid unnecessary abbreviations that make code harder to read, such as "lr_mult". |
…to follow chollet comments
I have updated the pull request to follow the suggestions. Now learning rate multipliers are a dictionary mapping weights to coefficients similar to constraints. It sure makes more sense and leaves the optimizer code cleaner. Some functionality has been moved to the Layer base class but still some changes in each layer class is needed (following the constrains implementation, in order to create the dictionary of weights to coefficients, implementation needs to be in the child class). Is there a better way to move more code to the base Layer class? Furthermore, I have removed tensorflow temporarily from some of the test code as setting a manual random seed is needed and tensorflow ignores np.random.seed(). Is there a random seed setter for the tensorflow backend? |
This feature seems to be ready (there is now conflicts as it's been done 1 month ago). |
@@ -178,7 +197,9 @@ def get_config(self): | |||
'b_constraint': self.b_constraint.get_config() if self.b_constraint else None, | |||
'bias': self.bias, | |||
'input_dim': self.input_dim, | |||
'input_length': self.input_length} | |||
'input_length': self.input_length, | |||
'W_learning_rate_multiplier': self.W_learning_rate_multiplier if self.W_learning_rate_multiplier else None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Javier, Avanti here. I'm merging your adaptation of our code back into the kundajelab fork - thanks again for doing this. I had a minor thought: is the ifelse in this line really necessary, since self.W_learning_rate_multiplier is either None or a value (i.e. wouldn't it be equivalent to "'W_learning_rate_multiplier': self.W_learning_rate_multiplier")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Avanti,
I wanted to follow the same structure as constraints and that's way I added the ifelse. However you might be right and it seems not needed.
v = self.momentum * m - lr * g # velocity | ||
# Apply learning rate multipliers if needed | ||
if p in multipliers: | ||
lrm = K.variable(multipliers[p]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avanti here again (author of the learning rate multipliers implementation on the kundajelab branch that this was based on). Is the K.variable wrapping really necessary? K.variable returns a shared variable, which I understand is only necessary for trainable parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As lr is a K.variable I thought it was safer to multiply lrm times lr both being K.variables... not sure if keras was really happy with number times K.variable.
Hi. Sorry to ask/disturb... but is there any idea when this pull will be reviewed for merging ?
|
I too have been using this code with success - it is pretty important to imitate a lot of the other reference models out there. I think it would be a great addition to the main branch |
any update? |
+1 :) |
4 similar comments
+1 :) |
+1 :) |
+1 :) |
+1 :) |
I really need this feature, so i merged this pull-request branch with the latest keras commit to date (Commits on Feb 7, 2017) and solved manually the conflicts. I made a fork of keras with my changes in the branch keras-lrmult-implementation. Running locally the tests I got some failure, but I got the same with untouched keras master branch. |
+1:) |
up |
Closing outdated PR. If you still care about the content of the PR, please submit a new PR to |
+1:) |
Any update on this or perhaps another way (using the current version of Keras) where I'd be able to do the same thing? |
I've shared a design review doc before making a new PR for the 2.0 API: https://docs.google.com/document/d/1l4k811Mxz1fIIzyw7-nOVMLkBN6a7bRcmW6EuIW0cc8/edit# stay tuned :) |
If you've comments / references on why this has been proven to be useful, please comment on google doc! |
Has this PR been accepted, or something similar instead? |
Last time I checked (see response to proposal by @fchollet) there was not significant research showing this is beneficial. @meetshah1995 pointed to some work using LR multipliers, and the conversation has not moved from there. |
Is it not accepted in keras 2.x? |
I also can not find it, can some one explain it or maybe give a example how to using it, if the layer-wise learning rate is merged. |
What is pending exactly for this to get merged? |
I don't think there are any plans to have this merged because there is no PR for compatible with keras 2.X it at the moment. |
Hi! any news? |
@envytails I did my own version, which was enough for the implementation I was training to achieve. |
I have updated the pull request #1991 to the latest master branch. The pull request adds functionality to provide learning rate multipliers for convolutional and dense layers (see issue #414).