AdaSqrt - Second-order Information in First-order Optimization Methods #234

zoq · 2020-10-25T20:28:39Z

Implementation of AdaSqrt - "Second-order Information in First-order Optimization Methods", Yuzheng Hu, Licong Lin, Shange Tang.

mlpack-bot · 2020-11-24T20:43:56Z

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

zoq · 2021-02-27T01:39:55Z

@mlpack-jenkins test this please

zoq · 2021-02-27T01:40:19Z

Let's fix the static analysis build.

zoq · 2021-02-27T01:44:28Z

@mlpack-jenkins test this please

zoq · 2021-02-27T01:49:56Z

Alright, it's working again.

rcurtin

This paper was a bit harder to read (I guess I read a preprint). I'm not sure I followed the idea behind the approach well, but the PR looks like it exactly implements the algorithm of the paper. I only have a couple minor comments. Sorry this review took so long to---it took forever to find some time to read the paper... :)

rcurtin · 2021-11-02T19:54:22Z

doc/optimizers.md

+ * [Adaptive Subgradient Methods for Online Learning and Stochastic Optimization](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)
+ * [AdaGrad in Wikipedia](https://en.wikipedia.org/wiki/Stochastic_gradient_descent#AdaGrad)
+ * [AdaDelta](#adadelta)
+ * [Differentiable separable functions](#differentiable-separable-functions)


Nice catch, thanks for adding this 'see also' section.

rcurtin · 2021-11-02T19:57:17Z

include/ensmallen_bits/ada_sqrt/ada_sqrt_update.hpp

+    // Instantiated parent class.
+    AdaSqrtUpdate& parent;
+    // The squared gradient matrix.
+    GradType squaredGradient;


Up to you but it may be clearer to name this sumSquaredGradients. I got a little confused and had to double-check the paper to understand.

rcurtin · 2021-11-02T19:57:58Z

tests/CMakeLists.txt

rcurtin · 2021-11-02T19:58:35Z

tests/ada_grad_test.cpp

@@ -1,8 +1,6 @@
 /**
- * @file ada_grad_test.cpp
- * @author Abhinav Moudgil
+ * @file ada_sqrt_test.cpp


Oops, did you mean to commit these changes to ada_grad_test.cpp or just to the new test file?

rcurtin · 2022-02-18T21:09:15Z

tests/ada_grad_test.cpp

@@ -33,4 +33,4 @@ TEST_CASE("AdaGradLogisticRegressionTestFMat", "[AdaGradTest]")
 {
  AdaGrad adagrad(0.99, 32, 1e-8, 5000000, 1e-9, true);
  LogisticRegressionFunctionTest<arma::fmat>(adagrad, 0.003, 0.006);
-}
+}


Maybe we should revert this? I don't see a reason to remove the trailing newline :)

Maybe I missed something but I think we don't use an extra trailing newline for other test files.

Nevermind, I see what you mean, reverted with the last commit.

tests/ada_sqrt_test.cpp

Co-authored-by: Ryan Curtin <ryan@ratml.org>

include/ensmallen_bits/ada_sqrt/ada_sqrt_update.hpp

Co-authored-by: Conrad Sanderson <conradsnicta@users.noreply.github.com>

include/ensmallen_bits/ada_sqrt/ada_sqrt.hpp

conradsnicta · 2022-02-21T02:47:42Z

include/ensmallen_bits/ada_sqrt/ada_sqrt_impl.hpp

+#define ENSMALLEN_ADA_SQRT_ADA_SQRT_IMPL_HPP
+
+// In case it hasn't been included yet.
+#include "ada_sqrt.hpp"


Is this #include necessary?

ada_sqrt_impl.hpp gets included from ada_sqrt.hpp, which is in turn included from ensmallen.hpp.

If I recall correctly, at the outset we decided not to support usage of only specific optimisers, as it's too messy (maintenance nightmare) and potentially buggy. Only usage via #include <ensmallen.hpp> is supported.

Not at all, it's an artifact from the mlpack code base, I'll remove it and also adapt existing optimizer in another PR.

Yeah, I originally started doing that years ago to be nice to users who strangely included an _impl.hpp only. That makes more sense in the context of mlpack where you could choose multiple headers to include, but here in ensmallen it may not be needed, since users should only include the main ensmallen header.

mlpack-bot

Second approval provided automatically after 24 hours. 👍

…uite.

…obust.

zoq added 4 commits October 25, 2020 21:25

Add AdaSqrt optimizer.

77a6c3d

Add AdaSqrt optimizer test cases.

204379b

Add AdaSqrt optimizer documentation.

8e646fa

Update history - add AdaSqrt optimizer.

5ffb663

zoq added c: optimizers s: needs review labels Oct 25, 2020

Add missing AdaSqrt test file.

f944ad4

mlpack-bot bot added the s: stale label Nov 24, 2020

zoq added s: keep open and removed s: stale labels Nov 25, 2020

rcurtin reviewed Nov 2, 2021

View reviewed changes

Merge branch 'master' into ada_sqrt

d3686fa

rcurtin reviewed Feb 18, 2022

View reviewed changes

tests/ada_sqrt_test.cpp Outdated Show resolved Hide resolved

zoq and others added 4 commits February 18, 2022 21:44

Correct test author.

3fe169e

Co-authored-by: Ryan Curtin <ryan@ratml.org>

Merge branch 'master' into ada_sqrt

be570d5

Update test suits to use updated test framework.

3486dd1

Use the correct optimizer for the AdaSqrt test suite.

3d31cc0

conradsnicta reviewed Feb 21, 2022

View reviewed changes

include/ensmallen_bits/ada_sqrt/ada_sqrt_update.hpp Outdated Show resolved Hide resolved

Use armadillo function to square the gradient.

7776391

Co-authored-by: Conrad Sanderson <conradsnicta@users.noreply.github.com>

conradsnicta requested changes Feb 21, 2022

View reviewed changes

zoq added 2 commits February 21, 2022 18:09

Unify header inclusion format.

69d6cf1

Remove superfluous header.

0dfa8fb

conradsnicta approved these changes Feb 22, 2022

View reviewed changes

Unify header inclusion format.

bb31a44

mlpack-bot bot approved these changes Feb 23, 2022

View reviewed changes

mlpack-bot bot removed the s: needs review label Feb 23, 2022

zoq added 2 commits February 26, 2022 21:17

Use smaller step size for the first iteration to stabilize the test s…

9929e1c

…uite.

Adjust CNE Himmelblau optimizer settings to make the test case more r…

063f1eb

…obust.

zoq merged commit 69abe29 into mlpack:master Mar 1, 2022

zoq mentioned this pull request Apr 7, 2022

Release version 2.19.0: "Eight Ball Deluxe" #343

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AdaSqrt - Second-order Information in First-order Optimization Methods #234

AdaSqrt - Second-order Information in First-order Optimization Methods #234

zoq commented Oct 25, 2020

mlpack-bot bot commented Nov 24, 2020

zoq commented Feb 27, 2021

zoq commented Feb 27, 2021

zoq commented Feb 27, 2021

zoq commented Feb 27, 2021

rcurtin left a comment

rcurtin Nov 2, 2021

rcurtin Nov 2, 2021

rcurtin Nov 2, 2021

rcurtin Nov 2, 2021

rcurtin Feb 18, 2022

zoq Feb 19, 2022

zoq Feb 19, 2022

conradsnicta Feb 21, 2022

zoq Feb 21, 2022

rcurtin Feb 23, 2022

mlpack-bot bot left a comment

AdaSqrt - Second-order Information in First-order Optimization Methods #234

AdaSqrt - Second-order Information in First-order Optimization Methods #234

Conversation

zoq commented Oct 25, 2020

mlpack-bot bot commented Nov 24, 2020

zoq commented Feb 27, 2021

zoq commented Feb 27, 2021

zoq commented Feb 27, 2021

zoq commented Feb 27, 2021

rcurtin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlpack-bot bot left a comment

Choose a reason for hiding this comment