-
Notifications
You must be signed in to change notification settings - Fork 38
How to use it in huggingface AdamW optimizer? #13
Comments
You should be able to just replace |
Hi, I try this method, replacing the AdamW with bnb.optim.Adam. Although output of he task performance is normal, I find the result is much lower than the previous Huggingface AdamW function and I don't change any hyper-paramerters. |
Can you share the model that you are fine-tuning along with the dataset and results? If you are fine-tuning on GLUE for example there can be quite some noise in the results and the best comparison is done across 5-10 random seeds to get a clear picture of variability between 32-bit and 8-bit AdamW. |
The problem in this example was that the default |
hi, thanks for this work! I want use it in huggingface AdamW optimizer to train Pre-trained language model, such as BERT. How can I use it, thanks!
The text was updated successfully, but these errors were encountered: