# EVE optimizer #45

Merged
merged 8 commits into from Dec 13, 2018

## Conversation

Member

### zoq commented Oct 31, 2018

 Implementation of "Improving Stochastic Gradient Descent with Feedback" by Jayanth Koushik and Hiroaki Hayashi.

### zoq added some commits Oct 31, 2018

 Add EVE optimizer implementation. 
 26610ef 
 Add EVE test suite. 
 b4d91cf 
 Minor style fixes. 
 43d40a7 

Member

### rcurtin left a comment

 Actually I did not like this paper very much at all, although I read a different version than the Koushik2016 reference given in the comments (I read "Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates", arXiv 1611.01505v3). I think the intuitive argument for their step size calculation is (kind of) reasonable, but the main problem is that in general the optimum value may not be known (although they are right that in some cases it is 0). In fact it seems that the optimizer doesn't really work if f^* is not known or greater than 0---it's not clear what to set f^* to in this case (or how it will affect convergence). The results for their CNN are so astoundingly much better than the other optimizers that I can't help but suspect that either there was some problem in the code, or that Eve won't perform like this in most situations, or that at the very least the experiments warrant much further investigation. Also, I suspect that in practice the effective value of $\hat{d}_t$ is going to be the clipped values of either 1/c or c, especially as the function converges to the optimum f^*. (I suppose maybe larger batch sizes could help with that?) Anyway, those are all just comments on the paper itself. I'd be curious if you had a different reaction to the paper. I am always interested in learning more. :) (And maybe the 2016 paper was more comprehensive, but I don't have any internet connection as I write this so I can't get a copy...) Like the others, I think we should merge it, but once we've added documentation.
 * the first point in the dataset (presumably, the dataset is held internally in * the DecomposableFunctionType). */ class EVE

#### rcurtin Nov 19, 2018

Member

Do you think we should just call it "Eve" not "EVE"? THe paper doesn't suggest the name as an acronym, just a reference to 'Adam and Eve' I guess.

include/ensmallen_bits/eve/eve_impl.hpp Outdated
include/ensmallen_bits/eve/eve.hpp
 } else { lastObjective = objective;

#### rcurtin Nov 19, 2018

Member

Shouldn't this happen every iteration, not just when i == 0?

#### zoq Dec 10, 2018

Author Member

You are right, nice catch!

### zoq added some commits Dec 10, 2018

 Merge branch 'master' into ens_eve 
 f0b621e 
 Update the last objective at each step. 
 1e41db4 
 Change optimizer name and add functions to adjust optimizer parameter. 
 2ef3fef 

Member

### rcurtin left a comment

 Looks good to me once the test is fixed and once we add documentation to optimizers.md. 👍
include/ensmallen_bits/eve/eve.hpp Outdated
 if (i > 0) { const double d = std::abs(objective - lastObjective) / std::min(objective, lastObjective);

#### rcurtin Dec 10, 2018

Member

I wonder if this is the source of the nan in the tests. Maybe it needs some correction in the denominator?

#### zoq Dec 12, 2018

Author Member

Right, std::min(objective, lastObjective) could be zero.

### zoq added some commits Dec 12, 2018

 Adjust denominator to avoid division by zero. 
 95cb873 
 Add Eve optimizer documentation. 
 fca8152 

### zoq merged commit 19e7c77 into mlpack:master Dec 13, 2018 1 check passed

#### 1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details