Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

How to use it in huggingface AdamW optimizer? #13

Closed
ztl-35 opened this issue Nov 12, 2021 · 4 comments
Closed

How to use it in huggingface AdamW optimizer? #13

ztl-35 opened this issue Nov 12, 2021 · 4 comments
Labels
question Further information is requested

Comments

@ztl-35
Copy link

ztl-35 commented Nov 12, 2021

hi, thanks for this work! I want use it in huggingface AdamW optimizer to train Pre-trained language model, such as BERT. How can I use it, thanks!

648C607C-793D-4819-A63D-386E02F43829

@TimDettmers TimDettmers added the question Further information is requested label Nov 15, 2021
@TimDettmers
Copy link
Contributor

You should be able to just replace AdamW with bnb.optim.Adam (which uses the AdamW formulation if your weight decay is > 0.0). Can you try that and report back if there are any issues?

@ztl-35
Copy link
Author

ztl-35 commented Nov 17, 2021

Hi, I try this method, replacing the AdamW with bnb.optim.Adam. Although output of he task performance is normal, I find the result is much lower than the previous Huggingface AdamW function and I don't change any hyper-paramerters.

@TimDettmers
Copy link
Contributor

Can you share the model that you are fine-tuning along with the dataset and results? If you are fine-tuning on GLUE for example there can be quite some noise in the results and the best comparison is done across 5-10 random seeds to get a clear picture of variability between 32-bit and 8-bit AdamW.

TimDettmers added a commit that referenced this issue Nov 29, 2021
@TimDettmers
Copy link
Contributor

The problem in this example was that the default weight_decay argument is different from AdamW and Adam8bit. Even though Adam8bit implements the same algorithm by default weight decay is turned off. The new AdamW8bit class should rectify this since it has the correct default hyperparameter. This was fixed in 2f8083b.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants