Most optimizers don't save `iterations` as a weight!! #13027

danmoller · 2019-06-28T12:14:20Z

I was checking whether Keras saves the optimizers states, and it happens that it does that based on the self.weights variable of the optimizer.

Looking at the source code for optimizers: https://github.com/keras-team/keras/blob/master/keras/optimizers.py

When it's about SGD, everything seems ok, the iterations variable is part of the weights:

self.weights = [self.iterations] + moments

Now, if you look at most other optimizers in the same page, they just save their accumulators, moments, etc., but they don't save iterations.

When there is a decay involved, this would spoil saving and loading the optimizers.

This is a suggestion to fix this in all optimizers by adding iterations to the list.

For instance, take the Adadelta optimizer and replace the following line:

self.weights = accumulators + delta_accumulators

With this:

self.weights = [self.iterations] + accumulators + delta_accumulators

The text was updated successfully, but these errors were encountered:

dynamicwebpaige · 2019-06-30T14:16:53Z

I believe this may have been resolved in tf.keras- @fchollet, can you confirm?

danmoller · 2019-06-30T14:51:55Z

Is tf.keras the official version now? Last time I tried it it had so many bugs I concluded that simple Keras was the one.
And this should be fixed even though, there are users for other backends, right?

dynamicwebpaige · 2019-06-30T21:59:26Z

@danmoller No, tf.keras is not the official version; but I encourage you to try it, if you haven't recently, and if you're using a TensorFlow backend. There have been several performance enhancements created specifically for TF. If you find any issues, please let us know! 👍

mfenner1 · 2021-04-16T16:20:27Z

Based on a recent experimental run using tensorflow 2.4.1, I'm wondering if this is fully resolved.

More specifically, after 10 epochs of optimization using nadam and then a model.save followed by a load_model, the loss is quite unstable during the subsequent epoch and then tops out at a higher loss then the last epoch of the initial 10 epoch run. In fact, it tops out worse than the first epoch of the initial 10 epoch run. So, I'm guessing that the history necessary to maintain the optimizer state isn't all being saved/restored.

Or, of course, I might be using it wrong! If folks think the relevant state is saved and restored, I'll try to make this more concrete with a minimum working example and see what happens there (my use is currently embedded in a larger program).

danmoller · 2021-04-16T22:38:24Z

@mfenner1 , you can try to use the initial_epoch parameter to see if it changes something. Some optimizers depend on the number of the current epoch, not only the internal weights.

mfenner1 · 2021-04-20T13:35:44Z

@danmoller Thanks for getting back to me here. I did give that try as well: after an epochs=20 run, setting up an epochs=40, initial_epoch=20 scenario. It didn't seem to help. I'm actually going to try to make a MWE of a nadam restart with MNIST or similar. My "real world" problem is way too big to debug this effectively. I'll see what I come up with.

mfenner1 · 2021-04-20T15:28:49Z

@danmoller So, I did make a MWE that allowed me to store and reload models (and I could easily use different optimizers). After doing that, both SGD and Nadam appear "well behaved": restarting the training in the second *.py program appear to gracefully pick up from where the first left off. So, I'm going to have to dive back into my main code and see if I've mucked something up. I'm also using a significantly more complicated model with far more parameters ... I don't know if that complexity might affect the restart-ability of the model.

For reference, my model.fit calls looked like:

# initial call
model.fit(..., epochs=10, ...)

# subsequent call
model.fit(..., epochs=20, initial_epoch=11, ....)

Best,
Mark

fchollet closed this as completed Jun 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Most optimizers don't save `iterations` as a weight!! #13027

Most optimizers don't save `iterations` as a weight!! #13027

danmoller commented Jun 28, 2019

dynamicwebpaige commented Jun 30, 2019

danmoller commented Jun 30, 2019 •

edited

dynamicwebpaige commented Jun 30, 2019

mfenner1 commented Apr 16, 2021

danmoller commented Apr 16, 2021

mfenner1 commented Apr 20, 2021

mfenner1 commented Apr 20, 2021

Most optimizers don't save iterations as a weight!! #13027

Most optimizers don't save iterations as a weight!! #13027

Comments

danmoller commented Jun 28, 2019

dynamicwebpaige commented Jun 30, 2019

danmoller commented Jun 30, 2019 • edited

dynamicwebpaige commented Jun 30, 2019

mfenner1 commented Apr 16, 2021

danmoller commented Apr 16, 2021

mfenner1 commented Apr 20, 2021

mfenner1 commented Apr 20, 2021

Most optimizers don't save `iterations` as a weight!! #13027

Most optimizers don't save `iterations` as a weight!! #13027

danmoller commented Jun 30, 2019 •

edited