Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AdaSqrt - Second-order Information in First-order Optimization Methods #234

Merged
merged 16 commits into from Mar 1, 2022

Conversation

zoq
Copy link
Member

@zoq zoq commented Oct 25, 2020

Implementation of AdaSqrt - "Second-order Information in First-order Optimization Methods", Yuzheng Hu, Licong Lin, Shange Tang.

@mlpack-bot
Copy link

mlpack-bot bot commented Nov 24, 2020

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

@zoq
Copy link
Member Author

zoq commented Feb 27, 2021

@mlpack-jenkins test this please

@zoq
Copy link
Member Author

zoq commented Feb 27, 2021

Let's fix the static analysis build.

@zoq
Copy link
Member Author

zoq commented Feb 27, 2021

@mlpack-jenkins test this please

@zoq
Copy link
Member Author

zoq commented Feb 27, 2021

Alright, it's working again.

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paper was a bit harder to read (I guess I read a preprint). I'm not sure I followed the idea behind the approach well, but the PR looks like it exactly implements the algorithm of the paper. I only have a couple minor comments. Sorry this review took so long to---it took forever to find some time to read the paper... :)

* [Adaptive Subgradient Methods for Online Learning and Stochastic Optimization](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)
* [AdaGrad in Wikipedia](https://en.wikipedia.org/wiki/Stochastic_gradient_descent#AdaGrad)
* [AdaDelta](#adadelta)
* [Differentiable separable functions](#differentiable-separable-functions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thanks for adding this 'see also' section.

// Instantiated parent class.
AdaSqrtUpdate& parent;
// The squared gradient matrix.
GradType squaredGradient;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you but it may be clearer to name this sumSquaredGradients. I got a little confused and had to double-check the paper to understand.

WORKING_DIRECTORY ${CMAKE_BINARY_DIR})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@@ -1,8 +1,6 @@
/**
* @file ada_grad_test.cpp
* @author Abhinav Moudgil
* @file ada_sqrt_test.cpp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, did you mean to commit these changes to ada_grad_test.cpp or just to the new test file?

@@ -33,4 +33,4 @@ TEST_CASE("AdaGradLogisticRegressionTestFMat", "[AdaGradTest]")
{
AdaGrad adagrad(0.99, 32, 1e-8, 5000000, 1e-9, true);
LogisticRegressionFunctionTest<arma::fmat>(adagrad, 0.003, 0.006);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should revert this? I don't see a reason to remove the trailing newline :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed something but I think we don't use an extra trailing newline for other test files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I see what you mean, reverted with the last commit.

tests/ada_sqrt_test.cpp Outdated Show resolved Hide resolved
Co-authored-by: Conrad Sanderson <conradsnicta@users.noreply.github.com>
include/ensmallen_bits/ada_sqrt/ada_sqrt.hpp Show resolved Hide resolved
#define ENSMALLEN_ADA_SQRT_ADA_SQRT_IMPL_HPP

// In case it hasn't been included yet.
#include "ada_sqrt.hpp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this #include necessary?

ada_sqrt_impl.hpp gets included from ada_sqrt.hpp, which is in turn included from ensmallen.hpp.

If I recall correctly, at the outset we decided not to support usage of only specific optimisers, as it's too messy (maintenance nightmare) and potentially buggy. Only usage via #include <ensmallen.hpp> is supported.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all, it's an artifact from the mlpack code base, I'll remove it and also adapt existing optimizer in another PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I originally started doing that years ago to be nice to users who strangely included an _impl.hpp only. That makes more sense in the context of mlpack where you could choose multiple headers to include, but here in ensmallen it may not be needed, since users should only include the main ensmallen header.

Copy link

@mlpack-bot mlpack-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

@zoq zoq merged commit 69abe29 into mlpack:master Mar 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants