Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to manually reset momentum in SGD? #454

Closed
gsmafra opened this issue Jul 28, 2015 · 9 comments
Closed

Is it possible to manually reset momentum in SGD? #454

gsmafra opened this issue Jul 28, 2015 · 9 comments

Comments

@gsmafra
Copy link

gsmafra commented Jul 28, 2015

Context: I am running the same optimizer multiple times for cross-validation and trying to completely reset it at each run so I can avoid recompiling exactly the same thing as before. My data is relatively small and my GPU is quite fast so it converges very quickly and the compilation time becomes comparable to the processing time.

What I have tried: to reinitialize stuff I stumbled upon this issue and simply saved / loaded my initial parameters at each fold. Apparently it was working until I decided to include momentum on it. My hyper-parameter optimizer converged to a configuration with a very high momentum and was giving absurdly good results, then I realized my mistake of forgetting to reinitialize the momentum shared variable: there was information leaking from fold to fold.

I gave a quick read in the code in optimizers.py and as I understand the shared variables m which hold the momentum value are kept internal so I don't have access to it.

TLDR:

How can I have access to the shared variables holding the momentum value?

Is there any built-in solution in keras for cross-validation that does not involve recompiling everything again?

Are there any other shared variables that could be leaking information from one fold to another?

Are there plans to add a proper reinitialization feature to keras?

@kenterao
Copy link
Contributor

This has been addressed in #428 and fixed with #441. There are get_state and set_state methods on the optimizer. The momentum term is included in self.updates but it is a list so you will have to keep track of the corresponding index of the momentum term.

m = shared_zeros(p.get_value().shape) # momentum
v = self.momentum * m - lr * g # velocity
self.updates.append((m, v))

@gsmafra
Copy link
Author

gsmafra commented Jul 28, 2015

Thanks, so if I understand well if I do something like this

def generic_training_and_testing_function(X_train, Y_train, X_test, model):

    weights = model.get_weights()
    state = model.optimizer.get_state()

    model.fit(X_train, Y_train)

    Y_pred = model.predict_classes(X_test)

    model.set_weights(weights)
    model.optimizer.set_state(state)

    return Y_pred

I will be completely isolated right?

I wasn't able to test your functions yet because I am running with the pip version, I'll edit this when I do it

The momentum term is included in self.updates but it is a list so you will have to keep track of the corresponding index of the momentum term.

I don't need to keep track of momentum individually if it is being taken care of with another function

In #428,

I might be wrong, but I think that every call to model.fit resets the accumulated states (such as momentum, etc.) in the optimizers

Is this true? I get some radically different results when bringing the compilation inside the folds even when using get_weights and set_weights

Going beyond that, doing a one-liner initializer each time using a different initial random point instead of using of four lines to initialize always at the same point would be a good feature I think

@fchollet
Copy link
Member

Successive calls to fit do not reset any of the parameters of the model, including the state of the optimizers. Successive calls to fit with nb_epoch = 1 is effectively the same as a single call to fit.

@kenterao
Copy link
Contributor

I see. If your goal is to reset all states (parameters and optimizer states) then model.set_weights() and model.optimizer.set_state() should do that.

@gsmafra gsmafra closed this as completed Aug 7, 2015
@aniketvartak
Copy link

How can I use model.optimizer.set_state() to modify my learning rate as a function of epochs?
Can you please give an example?!
Thanks..

@gsmafra
Copy link
Author

gsmafra commented Aug 14, 2015

Why don't you try using decay?

If you want to do it differently I cannot help you that much, I've been using set_state() only as a save/reset mechanism. Maybe in that case it would be better to open another issue and be more specific.

@aniketvartak
Copy link

well I use decay, but it works on iterations. I want to reduce my learning rate at discrete stages like for example reduce LR by 1/10, every 100th epoch etc.I guess I have to open a new issue for this.

@mrgloom
Copy link

mrgloom commented Jun 28, 2019

Seems get_state and set_state were removed, what is the current method to reset optimizer state?

@tu-curious
Copy link

Seems get_state and set_state were removed, what is the current method to reset the optimizer state?
I was able to use the optimizer.variables() attribute like this :

#Get initial states
init_states = [var.value() for var in optimizer.variables()]

#Do the optimization
...

#reset optimizer state to init_state
for val,var in zip(init_states,optimizer.variables()): var.assign(val)

(You can use tf.print(optimizer.variables()) in your optimization loop to verify)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants