Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implmentation of Quasi-Hyperbolic Momentum for Adam #81

Merged
merged 23 commits into from May 15, 2019

Conversation

@niteya-shah
Copy link
Contributor

commented Feb 12, 2019

Hello , This PR implements the update rule of QHAdam as discussed in Paper .https://arxiv.org/pdf/1810.06801v3.pdf
This PR should be straight forward as it is a simple update rule change like Nadam .

TODO list

  • Implement QHAdam update rule
  • Write Tests for QHAdam
  • Validate the improvements

niteya-shah added some commits Feb 12, 2019

First Commit
Empty File , will discuss with assignee about the PR
Implementation of QHAdam Update
QHAdam updates and an overloaded constructor for QHadam
Fixes to Adam and addition of QHSGD
1) Added SGD Update QHSGD
2) fixed some Adam issues wrt QHAdam
@niteya-shah

This comment has been minimized.

Copy link
Contributor Author

commented Feb 23, 2019

@zoq can you please review my PR ? The Paper also speaks of QHAdamR and QHAdamW which I will implement after my other PR is implemented

/**
* Tests the QHadam optimizer using a simple test function.
*/
TEST_CASE("SimpleQHdamTestFunction", "[AdamTest]")

This comment has been minimized.

Copy link
@zoq

zoq Feb 23, 2019

Member

Picky style comment, this should be SimpleQHAdamTestFunction.

/**
* Run QHAdam on logistic regression and make sure the results are acceptable.
*/
TEST_CASE("QHadamLogisticRegressionTest", "[AdamTest]")

This comment has been minimized.

Copy link
@zoq

zoq Feb 23, 2019

Member

See comment above.

/**
* Construct the Quasi Hyperbolic update policy with the given parameters.
* @param v The quasi hyperbolic term.
* @param momentum The momentum ter.

This comment has been minimized.

Copy link
@zoq

zoq Feb 23, 2019

Member

term instead of ter.

* QHSGD). This allows this method to combine the features of many optimisers
* and provide better optimisation control.
*
* TODO: Paper information

This comment has been minimized.

Copy link
@zoq

zoq Feb 23, 2019

Member

Good idea.


/**
*
* TODO: Fill in these details as well on info about algorithm

This comment has been minimized.

Copy link
@zoq

zoq Feb 23, 2019

Member

Good idea 👍

const double biasCorrection1 = 1.0 - std::pow(beta1, iteration);
const double biasCorrection2 = 1.0 - std::pow(beta2, iteration);

const auto mDash = m / biasCorrection1;

This comment has been minimized.

Copy link
@zoq

zoq Feb 23, 2019

Member

Do you. mind to use double here to be consistent with the rest of the codebase?


iterate -= stepSize * ((((1 - v1) * gradient) + v1 * mDash) /
(arma::sqrt(((1 - v2) * (gradient % gradient)) +
v2 * vDash) + epsilon ));

This comment has been minimized.

Copy link
@zoq

zoq Feb 23, 2019

Member

Can you remove the extra space at the end.

const auto mDash = m / biasCorrection1;
const auto vDash = v / biasCorrection2;

iterate -= stepSize * ((((1 - v1) * gradient) + v1 * mDash) /

This comment has been minimized.

Copy link
@zoq

zoq Feb 23, 2019

Member

Do you mind to add a comment here, "QHAdam recovers Adam when ν1=ν2= 1.".

// The number of iterations.
double iteration;

//The first quasi-hyperbolic term.

This comment has been minimized.

Copy link
@zoq

zoq Feb 23, 2019

Member

Missing space right after //. Same for v2.

*
* @param epsilon The epsilon value used to initialise the squared gradient
* parameter.
* @param beta1 The smoothing parameter.

This comment has been minimized.

Copy link
@zoq

zoq Feb 23, 2019

Member

Looks like the order is wrong.

niteya-shah added some commits Feb 23, 2019

@niteya-shah

This comment has been minimized.

Copy link
Contributor Author

commented Feb 24, 2019

@zoq sorry for the TODO , I had forgotten that I had put them , I have made the changes.

Fix to Parameter type
incorrect parameter type of double assigned to variable

niteya-shah added some commits Mar 18, 2019

@niteya-shah

This comment has been minimized.

Copy link
Contributor Author

commented Mar 27, 2019

@zoq can you review my PR?

/**
* @file adamw.hpp
* @author Niteya Shah
*

This comment has been minimized.

Copy link
@zoq

zoq Mar 27, 2019

Member

Can you add add the license and a description.

* @author Niteya Shah
*
*/
#ifndef ENSMALLEN_ADAM_ADAMW_HPP

This comment has been minimized.

Copy link
@zoq

zoq Mar 27, 2019

Member

This is not AdamW

* @param beta1 Exponential decay rate for the first moment estimates.
* @param beta2 Exponential decay rate for the weighted infinity norm
estimates.
* @param eps Value used to initialise the mean squared gradient parameter.

This comment has been minimized.

Copy link
@zoq

zoq Mar 27, 2019

Member

Do you mind to rename this one to epsilon to match the rest of the code?

* @param resetPolicy If true, parameters are reset before every Optimize
* call; otherwise, their values are retained.
*/

This comment has been minimized.

Copy link
@zoq

zoq Mar 27, 2019

Member

We can remove the extra line here.

double& V2() { return optimizer.UpdatePolicy().V2(); }

private:
//! The Stochastic Gradient Descent object with AdamW policy.

This comment has been minimized.

Copy link
@zoq

zoq Mar 27, 2019

Member

We use QHAdam here.

/**
* QHAdam is a optimising strategy based on the Quasi-Hyperbolic step when
* applied to the Adam Optimiser . QH updates can be considered to a weighted
* average of the momentum . QHAdam , based on its paramterisation can recover

This comment has been minimized.

Copy link
@zoq

zoq Mar 27, 2019

Member

Looks like there are some spaces we can remove.

/**
* Construct the QHAdam update policy with the given parameters.
*
* @param v1 The first quasi-hyperbolic term.

This comment has been minimized.

Copy link
@zoq

zoq Mar 27, 2019

Member

The parameter description order dosn't match with the one below.

public:
/**
* Construct the Quasi Hyperbolic update policy with the given parameters.
* @param v The quasi hyperbolic term.

This comment has been minimized.

Copy link
@zoq

zoq Mar 27, 2019

Member

Can you add another empty line between the method description and the beginning of the parameter description.

*/
QHUpdate(const double v = 0.7,
const double momentum = 0.999) :
momentum(momentum),

This comment has been minimized.

Copy link
@zoq

zoq Mar 27, 2019

Member

Can you use 4 spaces instead of 2 here.

/**
* @file quasi_hyperbolic_momentum_sgd_test.cpp
* @author Niteya Shah
*

This comment has been minimized.

Copy link
@zoq

zoq Mar 27, 2019

Member

See comment above.


#### See also:

* [Quasi-Hyperbolic Momentom and Adam For Deep Learning](https://arxiv.org/pdf/1810.06801.pdf)

This comment has been minimized.

Copy link
@zoq

zoq Apr 14, 2019

Member

Should be "Momentum".

*An optimizer for [differentiable separable functions](#differentiable-separable-functions).*

QHAdam is an optimizer which implements the QHAdam Adam algorithm
which uses Quasi-Hyperbolic Descent with the Adam Optimizer. This Method is the Adam Variant of the Quasi -

This comment has been minimized.

Copy link
@zoq

zoq Apr 14, 2019

Member

Should be "This method is the Adam variant of the quasi-hyperbolic update for Adam".

@zoq

zoq approved these changes Apr 21, 2019

Copy link
Member

left a comment

Looks good to me, there are some minor style issue which I can fix during the merge process.

@mlpack-bot

mlpack-bot bot approved these changes Apr 22, 2019

Copy link

left a comment

Second approval provided automatically after 24 hours. 👍

@rcurtin
Copy link
Member

left a comment

@niteya-shah nice contribution, sorry it took me so long to review this. There are some number of little style issues and documentation bits, but I think they can all be handled during merge. @zoq if you want, I can do the merge and style/doc fixes, just let me know. I saw that you had a couple changes you wanted to make too. In any case, if you merge it, we should add something to HISTORY.md and then I'll run my release script. :)

@@ -508,6 +508,7 @@ The following optimizers can be used with differentiable functions:
- [AdaDelta](#adadelta)
- [AdaGrad](#adagrad)
- [Adam](#adam)
- [QHAdam](#qhadam)

This comment has been minimized.

Copy link
@rcurtin

rcurtin May 12, 2019

Member

Just a minor note, maybe we can address it during merge---all the other optimizers are arranged alphabetically, we should probably keep it like that.

@@ -1394,6 +1394,114 @@ double Optimize(arma::mat& X);
* [Semidefinite programming on Wikipedia](https://en.wikipedia.org/wiki/Semidefinite_programming)
* [Semidefinite programs](#semidefinite-programs) (includes example usage of `PrimalDualSolver`)

## Quasi-Hyperbolic Momentum Update

This comment has been minimized.

Copy link
@rcurtin

rcurtin May 12, 2019

Member

Should this be called Quasi-Hyperbolic Momentum Update SGD (QHSGD)? The anchors will need to be updated. (In fact the qhsgd anchor right now doesn't point anywhere.)


Quasi Hyperbolic Momentum Update is an update policy for SGD where the Quasi Hyperbolic terms are added to the
parametrisation. Simply put QHM’s update rule is a weighted average of momentum’s and plain SGD’s
update rule.

This comment has been minimized.

Copy link
@rcurtin

rcurtin May 12, 2019

Member

This is a bit confusing for users since it's not described as an optimizer. I would suggest something like this:

Quasi-hyperbolic momentum update SGD (QHSGD) is an SGD-like optimizer with momentum where quasi-hyperbolic terms are added to the parametrization.  The update rule for this optimizer is a weighted average of momentum SGD and vanilla SGD.
different values of those terms can recover the following Adam Polices
QHAdam recovers Adam when ν1 = ν2 = 1
QHAdam recovers RMSProp when ν1 = 0 and ν2 = 1
QHAdam reovers NAdam when ν1 = β1 and ν2 = 1

This comment has been minimized.

Copy link
@rcurtin

rcurtin May 12, 2019

Member

I think we can do a little cleanup here too:

QHAdam is an optimizer that uses quasi-hyperbolic descent with the Adam optimizer.  This replaces the moment estimators of Adam with quasi-hyperbolic terms, and different values of the `v1` and `v2` parameters are equivalent to the following other optimizers:

 * When `v1 = v2 = 1`, `QHAdam` is equivalent to `Adam`.
 * When `v1 = 0` and `v2 = 1`, `QHAdam` is equivalent to `RMSProp`.
 * When `v1 = beta1` and `v2 = 1`, `QHAdam` is equivalent to `Nadam`.

I really like the reductions in the documentation.

#### See also:
* [Quasi-Hyperbolic Momentum and Adam For Deep Learning](https://arxiv.org/pdf/1810.06801.pdf)
* [SGD in Wikipedia](https://en.wikipedia.org/wiki/Stochastic_gradient_descent)
* [SGD](#standard-sgd)

This comment has been minimized.

Copy link
@rcurtin

rcurtin May 12, 2019

Member

Might be worth also linking to Adam, Nadam, and RMSprop? (And vice versa from those?)

*
* QHAdam can optimize differentiable separable functions.
* For more details, see the documentation on function
* types included with this distribution or on the ensmallen website.

This comment has been minimized.

Copy link
@rcurtin

rcurtin May 12, 2019

Member

The lines look a little short here, they can probably be streamlined.

const bool resetPolicy = true);

/**
* Optimize the given function using QHAdam. The given starting point will be

This comment has been minimized.

Copy link
@rcurtin

rcurtin May 12, 2019

Member

The spacing seems off on this comment, it should be of the form

/**
 * comment
 */

(note the alignment of the *)

// In case it hasn't been included yet.
#include "qhadam.hpp"

namespace ens {

This comment has been minimized.

Copy link
@rcurtin

rcurtin May 12, 2019

Member

Spacing seems off here too.

/*
* Tests the Quasi Hyperbolic Momentum SGD update policy.
*/
TEST_CASE("QHSGDSpeedUpTestFunction", "[QHMomentumSGDTest]")

This comment has been minimized.

Copy link
@rcurtin

rcurtin May 12, 2019

Member

The name SpeedUp probably isn't meaningful here, just QHSGDTestFunction should be fine.

@mlpack-bot

This comment has been minimized.

Copy link

commented May 12, 2019

Hello there! Thanks for your contribution. I see that this is your first contribution to mlpack. If you'd like to add your name to the list of contributors in src/mlpack/core.hpp and COPYRIGHT.txt and you haven't already, please feel free to push a change to this PR---or, if it gets merged before you can, feel free to open another PR.

In addition, if you'd like some stickers to put on your laptop, I'd be happy to help get them in the mail for you. Just send an email with your physical mailing address to stickers@mlpack.org, and then one of the mlpack maintainers will put some stickers in an envelope for you. It may take a few weeks to get them, depending on your location. 👍

1 similar comment
@mlpack-bot

This comment has been minimized.

Copy link

commented May 12, 2019

Hello there! Thanks for your contribution. I see that this is your first contribution to mlpack. If you'd like to add your name to the list of contributors in src/mlpack/core.hpp and COPYRIGHT.txt and you haven't already, please feel free to push a change to this PR---or, if it gets merged before you can, feel free to open another PR.

In addition, if you'd like some stickers to put on your laptop, I'd be happy to help get them in the mail for you. Just send an email with your physical mailing address to stickers@mlpack.org, and then one of the mlpack maintainers will put some stickers in an envelope for you. It may take a few weeks to get them, depending on your location. 👍

@rcurtin

This comment has been minimized.

Copy link
Member

commented May 12, 2019

Sorry for the erroneous mlpack-bot message :( (but if you want stickers, as always, we're happy to send them)

@rcurtin rcurtin merged commit 0f17a19 into mlpack:master May 15, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@rcurtin

This comment has been minimized.

Copy link
Member

commented May 15, 2019

@niteya-shah thanks for the contribution! I am releasing 1.15.0 with the new support now. I made some style fixes in 7e8108d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.