-
Notifications
You must be signed in to change notification settings - Fork 488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inverse Square Root LR Schedule #657
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will review in full as well, but is there any reason this should be in LLM foundry? It seems generically useful and I would probably prefer it is in composer directly.
This implementation and hyperparameters are a little different from how people typically do it. I wanted to first get it into LLM foundry, run some more experiments and have other people use it so we can work out some kinks and learn about some best practices, and then eventually upstream it into composer with the best practices documented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add some unit tests testing this schedule produces what you expect?
Do you have any suggestions for reasonable unit tests for the LR scheduler? |
I'd suggest:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some unit tests to test functionality
Co-authored-by: Brian <23239305+b-chu@users.noreply.github.com>
Co-authored-by: Brian <23239305+b-chu@users.noreply.github.com>
Added unit tests and tried to address all the comments. Failing the Code Quality Checks, but I'm not sure why from the Error message. |
@mansheej try running |
…nto inv-sqrt-lr-sched
Co-authored-by: Brian <23239305+b-chu@users.noreply.github.com>
Co-authored-by: Brian <23239305+b-chu@users.noreply.github.com>
Co-authored-by: Brian <23239305+b-chu@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically LGTM, left a few comments on simplifying the tests.
Addressed most of the comments. The rest can perhaps be filed for improvement/addressed when we upstream to Composer? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with the level of test coverage here. Before merging, could you please make the PR description more complete?
In this case, I think an example schedule produced by this code (e.g. a wandb graph) and a one sentence description of the gist of the schedule would suffice. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small changes, thanks for the tests!
Adds the Inverse Square Root LR Scheduler.
This scheduler is meant to easily enable continual learning. It consists of three components:
The image below show two examples of the LR schedule with a 10 step warmup, with either no cooldown (orange) or a 20 step cooldown (blue) run for 100 steps total.