Skip to content

Commit

Permalink
Merge pull request #248 from kozistr/feature/wsd-lr-scheduler
Browse files Browse the repository at this point in the history
[Feature] Implement WSD lr scheduler
  • Loading branch information
kozistr committed Jun 29, 2024
2 parents 7c40a79 + a0b3cb0 commit 908e82e
Show file tree
Hide file tree
Showing 19 changed files with 332 additions and 143 deletions.
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

**pytorch-optimizer** is optimizer & lr scheduler collections in PyTorch.
I just re-implemented (speed & memory tweaks, plug-ins) the algorithm while based on the original paper. Also, It includes useful and practical optimization ideas.
Currently, **69 optimizers (+ `bitsandbytes`)**, **11 lr schedulers**, and **13 loss functions** are supported!
Currently, **69 optimizers (+ `bitsandbytes`)**, **16 lr schedulers**, and **13 loss functions** are supported!

Highly inspired by [pytorch-optimizer](https://github.com/jettify/pytorch-optimizer).

Expand Down Expand Up @@ -177,11 +177,12 @@ from pytorch_optimizer import get_supported_lr_schedulers
supported_lr_schedulers = get_supported_lr_schedulers()
```

| LR Scheduler | Description | Official Code | Paper | Citation |
|-----------------|---------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|------------------------------------|------------------------------------------------------------------------------|
| Explore-Exploit | *Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule* | | <https://arxiv.org/abs/2003.03977> | [cite](https://ui.adsabs.harvard.edu/abs/2020arXiv200303977I/exportcitation) |
| Chebyshev | *Acceleration via Fractal Learning Rate Schedules* | | <https://arxiv.org/abs/2103.01338> | [cite](https://ui.adsabs.harvard.edu/abs/2021arXiv210301338A/exportcitation) |
| REX | *Revisiting Budgeted Training with an Improved Schedule* | [github](https://github.com/Nerogar/OneTrainer/blob/2c6f34ea0838e5a86774a1cf75093d7e97c70f03/modules/util/lr_scheduler_util.py#L66) | <https://arxiv.org/abs/2107.04197> | [cite](https://ui.adsabs.harvard.edu/abs/2021arXiv210704197C/exportcitation) |
| LR Scheduler | Description | Official Code | Paper | Citation |
|-----------------|---------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|------------------------------------|----------------------------------------------------------------------------------------------------|
| Explore-Exploit | *Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule* | | <https://arxiv.org/abs/2003.03977> | [cite](https://ui.adsabs.harvard.edu/abs/2020arXiv200303977I/exportcitation) |
| Chebyshev | *Acceleration via Fractal Learning Rate Schedules* | | <https://arxiv.org/abs/2103.01338> | [cite](https://ui.adsabs.harvard.edu/abs/2021arXiv210301338A/exportcitation) |
| REX | *Revisiting Budgeted Training with an Improved Schedule* | [github](https://github.com/Nerogar/OneTrainer/blob/2c6f34ea0838e5a86774a1cf75093d7e97c70f03/modules/util/lr_scheduler_util.py#L66) | <https://arxiv.org/abs/2107.04197> | [cite](https://ui.adsabs.harvard.edu/abs/2021arXiv210704197C/exportcitation) |
| WSD | *Warmup-Stable-Decay learning rate scheduler* | [github](https://github.com/OpenBMB/MiniCPM) | <https://arxiv.org/abs/2404.06395> | [cite](https://github.com/OpenBMB/MiniCPM?tab=readme-ov-file#%E5%B7%A5%E4%BD%9C%E5%BC%95%E7%94%A8) |

## Supported Loss Function

Expand Down
15 changes: 15 additions & 0 deletions docs/changelogs/v3.0.2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
## Change Log

### Feature

* Implement `WSD` LR Scheduler. (#247, #248)
* [Warmup-Stable-Decay LR Scheduler](https://arxiv.org/abs/2404.06395)
* Add more Pytorch built-in lr schedulers. (#248)

### Refactor

* Refactor `Chebyschev` lr scheduler modules. (#248)
* Rename `get_chebyshev_lr` to `get_chebyshev_lr_lambda`.
* Rename `get_chebyshev_schedule` to `get_chebyshev_perm_steps`.
* Call `get_chebyshev_schedule` function to get `LamdbaLR` scheduler object.
* Refactor with `ScheduleType`. (#248)
13 changes: 7 additions & 6 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

**pytorch-optimizer** is optimizer & lr scheduler collections in PyTorch.
I just re-implemented (speed & memory tweaks, plug-ins) the algorithm while based on the original paper. Also, It includes useful and practical optimization ideas.
Currently, **69 optimizers (+ `bitsandbytes`)**, **11 lr schedulers**, and **13 loss functions** are supported!
Currently, **69 optimizers (+ `bitsandbytes`)**, **16 lr schedulers**, and **13 loss functions** are supported!

Highly inspired by [pytorch-optimizer](https://github.com/jettify/pytorch-optimizer).

Expand Down Expand Up @@ -177,11 +177,12 @@ from pytorch_optimizer import get_supported_lr_schedulers
supported_lr_schedulers = get_supported_lr_schedulers()
```

| LR Scheduler | Description | Official Code | Paper | Citation |
|-----------------|---------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|------------------------------------|------------------------------------------------------------------------------|
| Explore-Exploit | *Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule* | | <https://arxiv.org/abs/2003.03977> | [cite](https://ui.adsabs.harvard.edu/abs/2020arXiv200303977I/exportcitation) |
| Chebyshev | *Acceleration via Fractal Learning Rate Schedules* | | <https://arxiv.org/abs/2103.01338> | [cite](https://ui.adsabs.harvard.edu/abs/2021arXiv210301338A/exportcitation) |
| REX | *Revisiting Budgeted Training with an Improved Schedule* | [github](https://github.com/Nerogar/OneTrainer/blob/2c6f34ea0838e5a86774a1cf75093d7e97c70f03/modules/util/lr_scheduler_util.py#L66) | <https://arxiv.org/abs/2107.04197> | [cite](https://ui.adsabs.harvard.edu/abs/2021arXiv210704197C/exportcitation) |
| LR Scheduler | Description | Official Code | Paper | Citation |
|-----------------|---------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|------------------------------------|----------------------------------------------------------------------------------------------------|
| Explore-Exploit | *Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule* | | <https://arxiv.org/abs/2003.03977> | [cite](https://ui.adsabs.harvard.edu/abs/2020arXiv200303977I/exportcitation) |
| Chebyshev | *Acceleration via Fractal Learning Rate Schedules* | | <https://arxiv.org/abs/2103.01338> | [cite](https://ui.adsabs.harvard.edu/abs/2021arXiv210301338A/exportcitation) |
| REX | *Revisiting Budgeted Training with an Improved Schedule* | [github](https://github.com/Nerogar/OneTrainer/blob/2c6f34ea0838e5a86774a1cf75093d7e97c70f03/modules/util/lr_scheduler_util.py#L66) | <https://arxiv.org/abs/2107.04197> | [cite](https://ui.adsabs.harvard.edu/abs/2021arXiv210704197C/exportcitation) |
| WSD | *Warmup-Stable-Decay learning rate scheduler* | [github](https://github.com/OpenBMB/MiniCPM) | <https://arxiv.org/abs/2404.06395> | [cite](https://github.com/OpenBMB/MiniCPM?tab=readme-ov-file#%E5%B7%A5%E4%BD%9C%E5%BC%95%E7%94%A8) |

## Supported Loss Function

Expand Down
8 changes: 7 additions & 1 deletion docs/lr_scheduler.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,15 @@

::: pytorch_optimizer.deberta_v3_large_lr_scheduler
:docstring:
:members:

::: pytorch_optimizer.get_chebyshev_lr
::: pytorch_optimizer.get_chebyshev_schedule
:docstring:
:members:

::: pytorch_optimizer.get_wsd_schedule
:docstring:
:members:

::: pytorch_optimizer.CosineAnnealingWarmupRestarts
:docstring:
Expand Down
Loading

0 comments on commit 908e82e

Please sign in to comment.