Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Learning rate multipliers for convolutional and dense layers #3004

Closed
wants to merge 24 commits into from

Conversation

jrhupc
Copy link

@jrhupc jrhupc commented Jun 17, 2016

I have updated the pull request #1991 to the latest master branch. The pull request adds functionality to provide learning rate multipliers for convolutional and dense layers (see issue #414).

@tetmin
Copy link

tetmin commented Jul 27, 2016

Has this been implemented?

@fchollet
Copy link
Member

Looking at this now. Two things come to mind:

  • multipliers should be handled on a per-layer basis, not on a per-weight basis, and should be abstracted into the Layer class. That will minimize the amount of changes to the codebase (you only need to add support in one place, not in every layer).
  • at the optimizer level, it should be handled like we handle constraints. That is to say, multipliers should be a dictionary mapping weights to coefficients.

Also, avoid unnecessary abbreviations that make code harder to read, such as "lr_mult".

@jrhupc
Copy link
Author

jrhupc commented Jul 30, 2016

I have updated the pull request to follow the suggestions. Now learning rate multipliers are a dictionary mapping weights to coefficients similar to constraints. It sure makes more sense and leaves the optimizer code cleaner.

Some functionality has been moved to the Layer base class but still some changes in each layer class is needed (following the constrains implementation, in order to create the dictionary of weights to coefficients, implementation needs to be in the child class). Is there a better way to move more code to the base Layer class?

Furthermore, I have removed tensorflow temporarily from some of the test code as setting a manual random seed is needed and tensorflow ignores np.random.seed(). Is there a random seed setter for the tensorflow backend?

@FlorianImagia
Copy link

FlorianImagia commented Aug 23, 2016

This feature seems to be ready (there is now conflicts as it's been done 1 month ago).
Is there still something blocking?

@@ -178,7 +197,9 @@ def get_config(self):
'b_constraint': self.b_constraint.get_config() if self.b_constraint else None,
'bias': self.bias,
'input_dim': self.input_dim,
'input_length': self.input_length}
'input_length': self.input_length,
'W_learning_rate_multiplier': self.W_learning_rate_multiplier if self.W_learning_rate_multiplier else None,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Javier, Avanti here. I'm merging your adaptation of our code back into the kundajelab fork - thanks again for doing this. I had a minor thought: is the ifelse in this line really necessary, since self.W_learning_rate_multiplier is either None or a value (i.e. wouldn't it be equivalent to "'W_learning_rate_multiplier': self.W_learning_rate_multiplier")?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Avanti,

I wanted to follow the same structure as constraints and that's way I added the ifelse. However you might be right and it seems not needed.

v = self.momentum * m - lr * g # velocity
# Apply learning rate multipliers if needed
if p in multipliers:
lrm = K.variable(multipliers[p])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avanti here again (author of the learning rate multipliers implementation on the kundajelab branch that this was based on). Is the K.variable wrapping really necessary? K.variable returns a shared variable, which I understand is only necessary for trainable parameters.

Copy link
Author

@jrhupc jrhupc Sep 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As lr is a K.variable I thought it was safer to multiply lrm times lr both being K.variables... not sure if keras was really happy with number times K.variable.

@igorbb
Copy link

igorbb commented Oct 12, 2016

Hi.

Sorry to ask/disturb... but is there any idea when this pull will be reviewed for merging ?

Haven't had time to look at it yet, sorry. But I will.

@dnola
Copy link

dnola commented Oct 17, 2016

I too have been using this code with success - it is pretty important to imitate a lot of the other reference models out there. I think it would be a great addition to the main branch

@DingKe
Copy link
Contributor

DingKe commented Nov 21, 2016

any update?

@albertbuchard
Copy link

+1 :)

4 similar comments
@nymph332088
Copy link

+1 :)

@aurora95
Copy link

+1 :)

@jmhessel
Copy link
Contributor

jmhessel commented Feb 4, 2017

+1 :)

@meetps
Copy link

meetps commented Feb 5, 2017

+1 :)

@delchiaro
Copy link

delchiaro commented Feb 9, 2017

I really need this feature, so i merged this pull-request branch with the latest keras commit to date (Commits on Feb 7, 2017) and solved manually the conflicts.

I made a fork of keras with my changes in the branch keras-lrmult-implementation.
I could make a new pull request, but probably is more correct to push the changes in this pull request (I only solved some conflicts, the real work is made by the author of this pull request).

Running locally the tests I got some failure, but I got the same with untouched keras master branch.

@yushuinanrong
Copy link

+1:)

@Tutufa
Copy link

Tutufa commented Feb 15, 2017

up

@fchollet
Copy link
Member

Closing outdated PR. If you still care about the content of the PR, please submit a new PR to master, updated for the Keras 2.0 API.

@fchollet fchollet closed this Mar 15, 2017
kencoken added a commit to kencoken/keras that referenced this pull request Apr 30, 2017
@yinghuang
Copy link

+1:)

@farzaa
Copy link

farzaa commented Jun 28, 2017

Any update on this or perhaps another way (using the current version of Keras) where I'd be able to do the same thing?

@gsabran
Copy link

gsabran commented Jul 17, 2017

I've shared a design review doc before making a new PR for the 2.0 API: https://docs.google.com/document/d/1l4k811Mxz1fIIzyw7-nOVMLkBN6a7bRcmW6EuIW0cc8/edit# stay tuned :)

@gsabran
Copy link

gsabran commented Jul 17, 2017

If you've comments / references on why this has been proven to be useful, please comment on google doc!

@sachinruk
Copy link
Contributor

Has this PR been accepted, or something similar instead?

@gsabran
Copy link

gsabran commented Sep 22, 2017

Last time I checked (see response to proposal by @fchollet) there was not significant research showing this is beneficial. @meetshah1995 pointed to some work using LR multipliers, and the conversation has not moved from there.

@hellojialee
Copy link

Is it not accepted in keras 2.x?
I can't find it.

@HuangBo-Terraloupe
Copy link

I also can not find it, can some one explain it or maybe give a example how to using it, if the layer-wise learning rate is merged.

@brunoklein99
Copy link

What is pending exactly for this to get merged?

@jmhessel
Copy link
Contributor

I don't think there are any plans to have this merged because there is no PR for compatible with keras 2.X it at the moment.

@andreapi87
Copy link

andreapi87 commented Jun 5, 2018

Hi! any news?
If I would use it how can I do? I have the last version of keras

@brunoklein99
Copy link

@envytails I did my own version, which was enough for the implementation I was training to achieve.

https://github.com/brunoklein99/srcnn/blob/5e874eb161d4d27cfdb6ac9b2196b3ad154fc672/LRMultiplierSGD.py#L46

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.