Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inverse Square Root LR Schedule #657

Merged
merged 28 commits into from
Oct 11, 2023
Merged

Inverse Square Root LR Schedule #657

merged 28 commits into from
Oct 11, 2023

Conversation

mansheej
Copy link
Contributor

@mansheej mansheej commented Oct 9, 2023

Adds the Inverse Square Root LR Scheduler.

This scheduler is meant to easily enable continual learning. It consists of three components:

  1. A linear LR warmup.
  2. A component where the LR decays as an inverse square root in the number of steps to a constant value at infinite time.
  3. An optional linear cooldown.

The image below show two examples of the LR schedule with a 10 step warmup, with either no cooldown (orange) or a 20 step cooldown (blue) run for 100 steps total.

W B Chart 10_11_2023, 12_01_34 AM

Copy link
Collaborator

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will review in full as well, but is there any reason this should be in LLM foundry? It seems generically useful and I would probably prefer it is in composer directly.

@mansheej
Copy link
Contributor Author

mansheej commented Oct 9, 2023

Will review in full as well, but is there any reason this should be in LLM foundry? It seems generically useful and I would probably prefer it is in composer directly.

This implementation and hyperparameters are a little different from how people typically do it. I wanted to first get it into LLM foundry, run some more experiments and have other people use it so we can work out some kinks and learn about some best practices, and then eventually upstream it into composer with the best practices documented.

Copy link
Collaborator

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add some unit tests testing this schedule produces what you expect?

llmfoundry/optim/scheduler.py Show resolved Hide resolved
llmfoundry/optim/scheduler.py Show resolved Hide resolved
@mansheej
Copy link
Contributor Author

mansheej commented Oct 9, 2023

Could you please add some unit tests testing this schedule produces what you expect?

Do you have any suggestions for reasonable unit tests for the LR scheduler?

@dakinggg
Copy link
Collaborator

dakinggg commented Oct 9, 2023

I'd suggest:

  1. the "build" function can create it successfully
  2. set up a couple simple schedules with known values that you expect and test that your scheduler code produces exactly the schedule you expect.

Copy link
Contributor

@b-chu b-chu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some unit tests to test functionality

llmfoundry/optim/scheduler.py Show resolved Hide resolved
llmfoundry/optim/scheduler.py Show resolved Hide resolved
llmfoundry/optim/scheduler.py Outdated Show resolved Hide resolved
llmfoundry/optim/scheduler.py Show resolved Hide resolved
llmfoundry/optim/scheduler.py Outdated Show resolved Hide resolved
llmfoundry/optim/scheduler.py Outdated Show resolved Hide resolved
llmfoundry/optim/scheduler.py Outdated Show resolved Hide resolved
llmfoundry/optim/scheduler.py Show resolved Hide resolved
llmfoundry/optim/scheduler.py Show resolved Hide resolved
llmfoundry/optim/scheduler.py Outdated Show resolved Hide resolved
mansheej and others added 5 commits October 10, 2023 00:12
Co-authored-by: Brian <23239305+b-chu@users.noreply.github.com>
Co-authored-by: Brian <23239305+b-chu@users.noreply.github.com>
@mansheej
Copy link
Contributor Author

Added unit tests and tried to address all the comments. Failing the Code Quality Checks, but I'm not sure why from the Error message.

@dakinggg
Copy link
Collaborator

@mansheej try running pre-commit run --all-files locally

@codestar12 codestar12 enabled auto-merge (squash) October 10, 2023 15:35
@codestar12 codestar12 requested a review from b-chu October 10, 2023 15:35
llmfoundry/optim/scheduler.py Show resolved Hide resolved
llmfoundry/optim/scheduler.py Show resolved Hide resolved
llmfoundry/optim/scheduler.py Outdated Show resolved Hide resolved
llmfoundry/optim/scheduler.py Outdated Show resolved Hide resolved
llmfoundry/optim/scheduler.py Outdated Show resolved Hide resolved
llmfoundry/optim/scheduler.py Show resolved Hide resolved
tests/test_scheduler.py Outdated Show resolved Hide resolved
tests/test_scheduler.py Outdated Show resolved Hide resolved
tests/test_scheduler.py Show resolved Hide resolved
tests/test_scheduler.py Show resolved Hide resolved
codestar12 and others added 4 commits October 10, 2023 15:13
Co-authored-by: Brian <23239305+b-chu@users.noreply.github.com>
Co-authored-by: Brian <23239305+b-chu@users.noreply.github.com>
Co-authored-by: Brian <23239305+b-chu@users.noreply.github.com>
Copy link
Collaborator

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM, left a few comments on simplifying the tests.

tests/test_scheduler.py Outdated Show resolved Hide resolved
tests/test_scheduler.py Outdated Show resolved Hide resolved
tests/test_scheduler.py Outdated Show resolved Hide resolved
@mansheej
Copy link
Contributor Author

Addressed most of the comments. The rest can perhaps be filed for improvement/addressed when we upstream to Composer?

Copy link
Collaborator

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with the level of test coverage here. Before merging, could you please make the PR description more complete?

In this case, I think an example schedule produced by this code (e.g. a wandb graph) and a one sentence description of the gist of the schedule would suffice. Thanks!

Copy link
Contributor

@b-chu b-chu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small changes, thanks for the tests!

llmfoundry/optim/scheduler.py Outdated Show resolved Hide resolved
tests/test_scheduler.py Outdated Show resolved Hide resolved
@codestar12 codestar12 merged commit 6c98276 into mosaicml:main Oct 11, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants