Merge pull request #248 from kozistr/feature/wsd-lr-scheduler

[Feature] Implement WSD lr scheduler
kozistr · Jun 29, 2024 · 908e82e · 908e82e
2 parents 7c40a79 + a0b3cb0
commit 908e82e
Show file tree

Hide file tree

Showing 19 changed files with 332 additions and 143 deletions.
diff --git a/README.md b/README.md
@@ -10,7 +10,7 @@
 
 **pytorch-optimizer** is optimizer & lr scheduler collections in PyTorch. 
 I just re-implemented (speed & memory tweaks, plug-ins) the algorithm while based on the original paper. Also, It includes useful and practical optimization ideas.  
-Currently, **69 optimizers (+ `bitsandbytes`)**, **11 lr schedulers**, and **13 loss functions** are supported!  
+Currently, **69 optimizers (+ `bitsandbytes`)**, **16 lr schedulers**, and **13 loss functions** are supported!  
 
 Highly inspired by [pytorch-optimizer](https://github.com/jettify/pytorch-optimizer).
 
@@ -177,11 +177,12 @@ from pytorch_optimizer import get_supported_lr_schedulers
 supported_lr_schedulers = get_supported_lr_schedulers()
 ```
 
-| LR Scheduler    | Description                                                                     | Official Code                                                                                                                       | Paper                              | Citation                                                                     |
-|-----------------|---------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|------------------------------------|------------------------------------------------------------------------------|
-| Explore-Exploit | *Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule* |                                                                                                                                     | <https://arxiv.org/abs/2003.03977> | [cite](https://ui.adsabs.harvard.edu/abs/2020arXiv200303977I/exportcitation) |
-| Chebyshev       | *Acceleration via Fractal Learning Rate Schedules*                              |                                                                                                                                     | <https://arxiv.org/abs/2103.01338> | [cite](https://ui.adsabs.harvard.edu/abs/2021arXiv210301338A/exportcitation) |
-| REX             | *Revisiting Budgeted Training with an Improved Schedule*                        | [github](https://github.com/Nerogar/OneTrainer/blob/2c6f34ea0838e5a86774a1cf75093d7e97c70f03/modules/util/lr_scheduler_util.py#L66) | <https://arxiv.org/abs/2107.04197> | [cite](https://ui.adsabs.harvard.edu/abs/2021arXiv210704197C/exportcitation) |
+| LR Scheduler    | Description                                                                     | Official Code                                                                                                                       | Paper                              | Citation                                                                                           |
+|-----------------|---------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|------------------------------------|----------------------------------------------------------------------------------------------------|
+| Explore-Exploit | *Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule* |                                                                                                                                     | <https://arxiv.org/abs/2003.03977> | [cite](https://ui.adsabs.harvard.edu/abs/2020arXiv200303977I/exportcitation)                       |
+| Chebyshev       | *Acceleration via Fractal Learning Rate Schedules*                              |                                                                                                                                     | <https://arxiv.org/abs/2103.01338> | [cite](https://ui.adsabs.harvard.edu/abs/2021arXiv210301338A/exportcitation)                       |
+| REX             | *Revisiting Budgeted Training with an Improved Schedule*                        | [github](https://github.com/Nerogar/OneTrainer/blob/2c6f34ea0838e5a86774a1cf75093d7e97c70f03/modules/util/lr_scheduler_util.py#L66) | <https://arxiv.org/abs/2107.04197> | [cite](https://ui.adsabs.harvard.edu/abs/2021arXiv210704197C/exportcitation)                       |
+| WSD             | *Warmup-Stable-Decay learning rate scheduler*                                   | [github](https://github.com/OpenBMB/MiniCPM)                                                                                        | <https://arxiv.org/abs/2404.06395> | [cite](https://github.com/OpenBMB/MiniCPM?tab=readme-ov-file#%E5%B7%A5%E4%BD%9C%E5%BC%95%E7%94%A8) |
 
 ## Supported Loss Function
 

diff --git a/docs/changelogs/v3.0.2.md b/docs/changelogs/v3.0.2.md
@@ -0,0 +1,15 @@
+## Change Log
+
+### Feature
+
+* Implement `WSD` LR Scheduler. (#247, #248)
+  * [Warmup-Stable-Decay LR Scheduler](https://arxiv.org/abs/2404.06395)
+* Add more Pytorch built-in lr schedulers. (#248)
+
+### Refactor
+
+* Refactor `Chebyschev` lr scheduler modules. (#248)
+  * Rename `get_chebyshev_lr` to `get_chebyshev_lr_lambda`.
+  * Rename `get_chebyshev_schedule` to `get_chebyshev_perm_steps`.
+  * Call `get_chebyshev_schedule` function to get `LamdbaLR` scheduler object.
+* Refactor with `ScheduleType`. (#248)
diff --git a/docs/index.md b/docs/index.md
@@ -10,7 +10,7 @@
 
 **pytorch-optimizer** is optimizer & lr scheduler collections in PyTorch. 
 I just re-implemented (speed & memory tweaks, plug-ins) the algorithm while based on the original paper. Also, It includes useful and practical optimization ideas.  
-Currently, **69 optimizers (+ `bitsandbytes`)**, **11 lr schedulers**, and **13 loss functions** are supported!  
+Currently, **69 optimizers (+ `bitsandbytes`)**, **16 lr schedulers**, and **13 loss functions** are supported!  
 
 Highly inspired by [pytorch-optimizer](https://github.com/jettify/pytorch-optimizer).
 
@@ -177,11 +177,12 @@ from pytorch_optimizer import get_supported_lr_schedulers
 supported_lr_schedulers = get_supported_lr_schedulers()
 ```
 
-| LR Scheduler    | Description                                                                     | Official Code                                                                                                                       | Paper                              | Citation                                                                     |
-|-----------------|---------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|------------------------------------|------------------------------------------------------------------------------|
-| Explore-Exploit | *Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule* |                                                                                                                                     | <https://arxiv.org/abs/2003.03977> | [cite](https://ui.adsabs.harvard.edu/abs/2020arXiv200303977I/exportcitation) |
-| Chebyshev       | *Acceleration via Fractal Learning Rate Schedules*                              |                                                                                                                                     | <https://arxiv.org/abs/2103.01338> | [cite](https://ui.adsabs.harvard.edu/abs/2021arXiv210301338A/exportcitation) |
-| REX             | *Revisiting Budgeted Training with an Improved Schedule*                        | [github](https://github.com/Nerogar/OneTrainer/blob/2c6f34ea0838e5a86774a1cf75093d7e97c70f03/modules/util/lr_scheduler_util.py#L66) | <https://arxiv.org/abs/2107.04197> | [cite](https://ui.adsabs.harvard.edu/abs/2021arXiv210704197C/exportcitation) |
+| LR Scheduler    | Description                                                                     | Official Code                                                                                                                       | Paper                              | Citation                                                                                           |
+|-----------------|---------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|------------------------------------|----------------------------------------------------------------------------------------------------|
+| Explore-Exploit | *Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule* |                                                                                                                                     | <https://arxiv.org/abs/2003.03977> | [cite](https://ui.adsabs.harvard.edu/abs/2020arXiv200303977I/exportcitation)                       |
+| Chebyshev       | *Acceleration via Fractal Learning Rate Schedules*                              |                                                                                                                                     | <https://arxiv.org/abs/2103.01338> | [cite](https://ui.adsabs.harvard.edu/abs/2021arXiv210301338A/exportcitation)                       |
+| REX             | *Revisiting Budgeted Training with an Improved Schedule*                        | [github](https://github.com/Nerogar/OneTrainer/blob/2c6f34ea0838e5a86774a1cf75093d7e97c70f03/modules/util/lr_scheduler_util.py#L66) | <https://arxiv.org/abs/2107.04197> | [cite](https://ui.adsabs.harvard.edu/abs/2021arXiv210704197C/exportcitation)                       |
+| WSD             | *Warmup-Stable-Decay learning rate scheduler*                                   | [github](https://github.com/OpenBMB/MiniCPM)                                                                                        | <https://arxiv.org/abs/2404.06395> | [cite](https://github.com/OpenBMB/MiniCPM?tab=readme-ov-file#%E5%B7%A5%E4%BD%9C%E5%BC%95%E7%94%A8) |
 
 ## Supported Loss Function
 

diff --git a/docs/lr_scheduler.md b/docs/lr_scheduler.md
@@ -2,9 +2,15 @@
 
 ::: pytorch_optimizer.deberta_v3_large_lr_scheduler
     :docstring:
+    :members:
 
-::: pytorch_optimizer.get_chebyshev_lr
+::: pytorch_optimizer.get_chebyshev_schedule
     :docstring:
+    :members:
+
+::: pytorch_optimizer.get_wsd_schedule
+    :docstring:
+    :members:
 
 ::: pytorch_optimizer.CosineAnnealingWarmupRestarts
     :docstring: