-
Notifications
You must be signed in to change notification settings - Fork 963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Accelerator
] Fix issue with 8bit models
#1155
Conversation
The documentation is not available anymore as the PR was closed or merged. |
I think this PR is only necessary in case people want to design systems that uses 8-bit models in their training loop without backpropagating on the 8-bit model (for example in RLHF), as using adapters works already out of the box right now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
model = AutoModelForCausalLM.from_pretrained( | ||
"EleutherAI/gpt-neo-125m", device_map=device_map, load_in_8bit=True, llm_int8_enable_fp32_cpu_offload=True | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sgugger I think this will fail as the docker image is not using the main
branch of transformers
no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happy to skip it until the next release of transformers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we are not installing from source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again!
What does this PR do?
Fixes #1147
In theory it is not possible to fine-tune 8-bit models except if you use adapters that can be used only is a
PeftModel
is used for training (I also need to test the snippet below with aPeftModel
to make sure this is relevant or not). EDIT: passing an 8bitPeftModel
throughaccelerator.prepare
seems to work fine.But in some systems you can use the accelerator to load an 8bit model and use it out of the training scope (e.g. get the model's logits and use it in another model)
I am not sure if we should support 8-bit models using
Accelerator
, but if so, I propose the following changes in this PRHappy also to revert the tests /
bnb
dependencyTo reproduce:
cc @sgugger
Ran all the slow tests and got errors on DeepSpeed and FSDP tests but not sure if the failing is related to my PR