Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

bnb.optim.AdamW #10

Closed
patrickvonplaten opened this issue Nov 3, 2021 · 2 comments
Closed

bnb.optim.AdamW #10

patrickvonplaten opened this issue Nov 3, 2021 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@patrickvonplaten
Copy link

Hey @TimDettmers,

Awesome library! bnb.optim.Adam saved me from having to use model parallelism 馃槏

Do you think it would be easy to also add a bnb.optim.AdamW version for https://pytorch.org/docs/stable/generated/torch.optim.AdamW.html#torch.optim.AdamW ?

Happy to give it a try if you think it's easily feasible :-)

@TimDettmers TimDettmers self-assigned this Nov 15, 2021
@TimDettmers TimDettmers added the enhancement New feature or request label Nov 15, 2021
@TimDettmers
Copy link
Contributor

Currently, AdamW is used automatically when you use Adam with weight decay. Since this is unclear, I will include a concrete AdamW alias in the next release (copy of Adam class).

TimDettmers added a commit that referenced this issue Nov 29, 2021
@TimDettmers
Copy link
Contributor

This has been added in the newest release. It was also important to get the default hyperparameter for AdamW correct to have the right default behavior. As such, this was an important correction! Thank you, @patrickvonplaten for making that suggestion!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants