You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
Awesome library! bnb.optim.Adam saved me from having to use model parallelism 馃槏
Do you think it would be easy to also add a bnb.optim.AdamW version for https://pytorch.org/docs/stable/generated/torch.optim.AdamW.html#torch.optim.AdamW ?
Happy to give it a try if you think it's easily feasible :-)
The text was updated successfully, but these errors were encountered:
Currently, AdamW is used automatically when you use Adam with weight decay. Since this is unclear, I will include a concrete AdamW alias in the next release (copy of Adam class).
This has been added in the newest release. It was also important to get the default hyperparameter for AdamW correct to have the right default behavior. As such, this was an important correction! Thank you, @patrickvonplaten for making that suggestion!
Hey @TimDettmers,
Awesome library!
bnb.optim.Adam
saved me from having to use model parallelism 馃槏Do you think it would be easy to also add a
bnb.optim.AdamW
version forhttps://pytorch.org/docs/stable/generated/torch.optim.AdamW.html#torch.optim.AdamW
?Happy to give it a try if you think it's easily feasible :-)
The text was updated successfully, but these errors were encountered: