bnb.optim.AdamW #10

patrickvonplaten · 2021-11-03T15:03:24Z

Awesome library! bnb.optim.Adam saved me from having to use model parallelism 😍

Do you think it would be easy to also add a bnb.optim.AdamW version for https://pytorch.org/docs/stable/generated/torch.optim.AdamW.html#torch.optim.AdamW ?

Happy to give it a try if you think it's easily feasible :-)

The text was updated successfully, but these errors were encountered:

TimDettmers · 2021-11-15T15:50:29Z

Currently, AdamW is used automatically when you use Adam with weight decay. Since this is unclear, I will include a concrete AdamW alias in the next release (copy of Adam class).

TimDettmers · 2021-12-04T20:07:31Z

This has been added in the newest release. It was also important to get the default hyperparameter for AdamW correct to have the right default behavior. As such, this was an important correction! Thank you, @patrickvonplaten for making that suggestion!

TimDettmers self-assigned this Nov 15, 2021

TimDettmers added the enhancement New feature or request label Nov 15, 2021

TimDettmers added a commit that referenced this issue Nov 29, 2021

Added AdamW. #10 #13

2f8083b

TimDettmers closed this as completed Dec 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bnb.optim.AdamW #10

bnb.optim.AdamW #10

patrickvonplaten commented Nov 3, 2021

TimDettmers commented Nov 15, 2021

TimDettmers commented Dec 4, 2021

bnb.optim.AdamW #10

bnb.optim.AdamW #10

Comments

patrickvonplaten commented Nov 3, 2021

TimDettmers commented Nov 15, 2021

TimDettmers commented Dec 4, 2021