Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Most optimizers don't save iterations as a weight!! #13027

Closed
danmoller opened this issue Jun 28, 2019 · 7 comments
Closed

Most optimizers don't save iterations as a weight!! #13027

danmoller opened this issue Jun 28, 2019 · 7 comments

Comments

@danmoller
Copy link

I was checking whether Keras saves the optimizers states, and it happens that it does that based on the self.weights variable of the optimizer.

Looking at the source code for optimizers: https://github.com/keras-team/keras/blob/master/keras/optimizers.py

When it's about SGD, everything seems ok, the iterations variable is part of the weights:

self.weights = [self.iterations] + moments

Now, if you look at most other optimizers in the same page, they just save their accumulators, moments, etc., but they don't save iterations.

When there is a decay involved, this would spoil saving and loading the optimizers.

This is a suggestion to fix this in all optimizers by adding iterations to the list.

For instance, take the Adadelta optimizer and replace the following line:

self.weights = accumulators + delta_accumulators

With this:

self.weights = [self.iterations] + accumulators + delta_accumulators
@dynamicwebpaige
Copy link

I believe this may have been resolved in tf.keras- @fchollet, can you confirm?

@danmoller
Copy link
Author

danmoller commented Jun 30, 2019

Is tf.keras the official version now? Last time I tried it it had so many bugs I concluded that simple Keras was the one.
And this should be fixed even though, there are users for other backends, right?

@dynamicwebpaige
Copy link

@danmoller No, tf.keras is not the official version; but I encourage you to try it, if you haven't recently, and if you're using a TensorFlow backend. There have been several performance enhancements created specifically for TF. If you find any issues, please let us know! 👍

@mfenner1
Copy link

Based on a recent experimental run using tensorflow 2.4.1, I'm wondering if this is fully resolved.

More specifically, after 10 epochs of optimization using nadam and then a model.save followed by a load_model, the loss is quite unstable during the subsequent epoch and then tops out at a higher loss then the last epoch of the initial 10 epoch run. In fact, it tops out worse than the first epoch of the initial 10 epoch run. So, I'm guessing that the history necessary to maintain the optimizer state isn't all being saved/restored.

Or, of course, I might be using it wrong! If folks think the relevant state is saved and restored, I'll try to make this more concrete with a minimum working example and see what happens there (my use is currently embedded in a larger program).

@danmoller
Copy link
Author

@mfenner1 , you can try to use the initial_epoch parameter to see if it changes something. Some optimizers depend on the number of the current epoch, not only the internal weights.

@mfenner1
Copy link

@danmoller Thanks for getting back to me here. I did give that try as well: after an epochs=20 run, setting up an epochs=40, initial_epoch=20 scenario. It didn't seem to help. I'm actually going to try to make a MWE of a nadam restart with MNIST or similar. My "real world" problem is way too big to debug this effectively. I'll see what I come up with.

@mfenner1
Copy link

@danmoller So, I did make a MWE that allowed me to store and reload models (and I could easily use different optimizers). After doing that, both SGD and Nadam appear "well behaved": restarting the training in the second *.py program appear to gracefully pick up from where the first left off. So, I'm going to have to dive back into my main code and see if I've mucked something up. I'm also using a significantly more complicated model with far more parameters ... I don't know if that complexity might affect the restart-ability of the model.

For reference, my model.fit calls looked like:

# initial call
model.fit(..., epochs=10, ...)

# subsequent call
model.fit(..., epochs=20, initial_epoch=11, ....)

Best,
Mark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants