Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

radam var_thresh hyperparam #303

Closed
wants to merge 1 commit into from
Closed

Conversation

lgvaz
Copy link

@lgvaz lgvaz commented Nov 20, 2019

As we can see in the original RAdam implementation here it may be useful to modify the maximum RAdam threshold depending on your problem.

I very simply remove the previous hardcoded valued (4) and add it as a parameter called var_thresh.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

@sgugger
Copy link
Contributor

sgugger commented Nov 20, 2019

This hyper-parameter doesn't really make sense, which is why I removed it. The 4 is there because the quantity used after aren't defined for r <= 4 but afterward, you're just ignoring steps with a very low lr (since v is very close to 0) which 1. can't hurt, 2. is the whole point of using Radam, having a warmup.

The effective learning rate is plotted in the notebook just after the RAdam optimizer, where we can see this only impacts a few iterations at the very beginning of training. Going from 4 to 5 for instance, at a beta2 of 0.99, changes the behavior of the optimizer for one iteration, so I have trouble believing this impacts training in any way.

Closing this PR, please reopen with empirical evidence it can be useful as it doesn't make sense theoretically (and is not mentioned anywhere in the paper).

@sgugger sgugger closed this Nov 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants