Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Nadam optimizer and test for it added #2764
Hello @fchollet, this is my first ever PR sorry for any mistakes.
Here is some small experiments which I did, but authors from the report did bigger experiments.
The test passed ok for Theano backend, I don't have TF installed, but if it is necessary I can install it and check.
This is my results comparison for default CIFAR cnn demo from examples folder. (Mini batch = 128, the rest is default).
SGD + NAG:
NAG + Adam (this PR):
Nice work! I would suggest you expose the three constants you have hardcoded in these lines so they can be modified by the user:
schedule_decay = 0.004 # Exactly given in  and  momentum_cache_t = self.beta_1 * (1. - 0.5 * (K.pow(0.96, t * schedule_decay))) momentum_cache_t_1 = self.beta_1 * (1. - 0.5 * (K.pow(0.96, (t + 1) * schedule_decay)))
@the-moliver thanks! I also thought about that and when I did more experiments some time ago not with keras, I tried to jiggle this values without any improvements even more, usually I got worse performance.
This constants came from here, page 4, equation 5, authors said that this constants came from earlier Nesterov papers (one of them is from 1983) and he motivated these values also by some deeper research.
If @fchollet ask to expose this variables too, then I will do that. But I think the best is to give meaningful comment in documentation about these values. (I actually will do that in doc-string I guess).
In general we won't merge into Keras algorithms that aren't widely accepted or haven't been covered in a peer-reviewed paper. At the same time, we try to stay on top of things and incorporate the latest advances --as soon as we are confident in their viability.