Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DemonSGD & DemonAdam #211

Merged
merged 27 commits into from
Apr 6, 2022
Merged

DemonSGD & DemonAdam #211

merged 27 commits into from
Apr 6, 2022

Conversation

zoq
Copy link
Member

@zoq zoq commented Aug 2, 2020

DemonSGD & DemonAdam - https://arxiv.org/abs/1910.04952

Copy link
Member Author

@zoq zoq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange I tested the tests in a loop with different random seeds without any issue.

@zoq
Copy link
Member Author

zoq commented Aug 8, 2020

Strange I tested the tests in a loop with different random seeds without any issue.

Okay so the issue is that with the cast here: a57ff2d the decay was always set to zero.

@zoq
Copy link
Member Author

zoq commented Aug 9, 2020

@mlpack-jenkins test this please

6 similar comments
@zoq
Copy link
Member Author

zoq commented Aug 9, 2020

@mlpack-jenkins test this please

@zoq
Copy link
Member Author

zoq commented Aug 9, 2020

@mlpack-jenkins test this please

@zoq
Copy link
Member Author

zoq commented Aug 9, 2020

@mlpack-jenkins test this please

@zoq
Copy link
Member Author

zoq commented Aug 9, 2020

@mlpack-jenkins test this please

@zoq
Copy link
Member Author

zoq commented Aug 9, 2020

@mlpack-jenkins test this please

@zoq
Copy link
Member Author

zoq commented Aug 9, 2020

@mlpack-jenkins test this please

@mlpack-bot
Copy link

mlpack-bot bot commented Sep 28, 2020

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

@mlpack-bot mlpack-bot bot added the s: stale label Sep 28, 2020
@rcurtin
Copy link
Member

rcurtin commented Sep 28, 2020

Let's keep this open---I have been meaning to review it but haven't quite gotten to it yet. 👍

@mlpack-bot mlpack-bot bot removed the s: stale label Sep 28, 2020
@mlpack-bot
Copy link

mlpack-bot bot commented Oct 28, 2020

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks good! Only one real comment about the default value for T. It's surprising both that the technique seems to yield such nice results, and also that the authors did such comprehensive comparisons... most of these papers they seem to only compare with plain SGD on logistic regression on MNIST...

@@ -202,7 +202,7 @@ with _`UpdateRule`_` = AdamUpdate`.
| `double` | **`beta1`** | Exponential decay rate for the first moment estimates. | `0.9` |
| `double` | **`beta2`** | Exponential decay rate for the weighted infinity norm estimates. | `0.999` |
| `double` | **`eps`** | Value used to initialize the mean squared gradient parameter. | `1e-8` |
| `size_t` | **`max_iterations`** | Maximum number of iterations allowed (0 means no limit). | `100000` |
| `size_t` | **`maxIterations`** | Maximum number of iterations allowed (0 means no limit). | `100000` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! 👍

doc/optimizers.md Outdated Show resolved Hide resolved
doc/optimizers.md Outdated Show resolved Hide resolved

using DemonNadaMax = DemonAdamType<NadaMaxUpdate>;

using DemonOptimisticAdam = DemonAdamType<OptimisticAdamUpdate>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth at least mentioning these typedefs in the DemonAdam and DemonSGD documentation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, good idea.

assert(momentumIterations != 0 && "The number of iterations before the "
"momentum will decay is zero, make sure the max iterations and "
"batch size parameter is set correctly. "
"Default: momentumIterations = maxIterations * batchSize.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be maxIterations / batchSize as the default? Otherwise the number of decay steps for momentum won't match the number of steps taken during optimization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

include/ensmallen_bits/demon_sgd/demon_sgd_update.hpp Outdated Show resolved Hide resolved
#define ENSMALLEN_DEMON_ADAM_DEMON_ADAM_HPP

#include <ensmallen_bits/sgd/sgd.hpp>
#include "demon_adam_update.hpp"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the includes today.

Copy link
Contributor

@conradsnicta conradsnicta Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to sort out which format to use (<xyz> vs "xyz").

Use of <...> caused problems within the context of RcppArmadillo (see for example RcppCore/RcppArmadillo#315, and related issues). The tldr is that for people that had two versions of Armadillo (one via RcppArmadillo, and one via plain Armadillo), there were clashes between the headers from the two versions. We settled on "xyz" to avoid this.

Within the context of ensmallen, rather than doing something like #include <ensmallen_bits/sgd/sgd.hpp>, perhaps we can use .. to specify one level back in the directory structure, ie. #include "../sgd/sgd.hpp". Alternatively, rather than having subdirectories, we have a flat layout in ensmallen_bits, ie. avoid use of further subdirectories. While the latter option may seem more messy, it has the advantage of all files being visible at the same time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only opinion on the format is that it's consistent. :) It sounds like "" is the right way to go here. How should we document this to ensure that it doesn't happen again? Style guide? Build check that greps for #include <ensmallen_bits/?

Copy link
Member Author

@zoq zoq Feb 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That also means, I should revert https://github.com/mlpack/ensmallen/pull/234/files/7776391f212b6c20a80c5a7fdb58785625736451#r810741863 and use "xyz" instead and adapt <ensmallen_bits/sgd/sgd.hpp> accordingly.

I'd like to keep the sub-directory structure, I think it's cleaner, and I don't mind to go one level back and forth. On the GitLab Armadillo repository page, the browser search function is unusable for me; the interface only lists the files in the viewport, so if I search for a file, I have to scroll down to the end, then I can search for a specific file.

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good to me here. 👍

It seems like the sanity checks are incorrect or irrelevant---maybe we should filter out those messages?

#ifndef ENSMALLEN_DEMON_ADAM_DEMON_ADAM_UPDATE_HPP
#define ENSMALLEN_DEMON_ADAM_DEMON_ADAM_UPDATE_HPP

#include <assert.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine this is already included by other headers? (Also I'm not sure it's needed. Correct me if I'm wrong.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@mlpack-bot mlpack-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second approval provided automatically after 24 hours. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants