-
Notifications
You must be signed in to change notification settings - Fork 25.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue Loading 4-bit and 8-bit language models: ValueError: .to
is not supported for 4-bit
or 8-bit
models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype
.
#24540
Comments
Hi @DJT777 |
I wasn't able to test it using that commit. However running everything with the versioning from my June 8th run got the model loaded back up again. I am using this to run the notebook: !pip install git+https://www.github.com/huggingface/transformers@2e2088f24b60d8817c74c32a0ac6bb1c5d39544d |
Thanks @DJT777 |
I'll have an attempt at running things again with that. |
Great thanks! |
I went for !pip install git+https://github.com/huggingface/transformers.git@6ce6d62b6f20040129ec9831e7c4f6576402ea42 Development is going so fast, hard to keep up with every change 😅 |
Hi @DJT777 from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer
import torch
model_path="tiiuae/falcon-40b-instruct"
config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, load_in_4bit=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-40b-instruct")
input_text = "Describe the solar system."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
outputs = model.generate(input_ids, max_length=10)
print(tokenizer.decode(outputs[0])) and transformers' main branch & the |
I'm not able to confirm if it is working in Colab. |
I get the same error in Google Colab ("ValueError: !pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q datasets
!pip install -q einops
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
model_id = "ybelkada/falcon-7b-sharded-bf16"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, trust_remote_code=True, device_map={"":0}) Notebook settings/runtime type are/is:
|
Hi @Maaalik !pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git@fix-to-int8
!pip install -q datasets
!pip install -q einops
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
model_id = "ybelkada/falcon-7b-sharded-bf16"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, trust_remote_code=True, device_map={"":0}) I just tested it on GColab |
Works like a charm! Thank you very much, @younesbelkada! |
huggingface/accelerate#1652 being merged you can now install |
@younesbelkada
|
@younesbelkada
|
@sgugger Sorry another question here =) as above |
I do not have the answer, no need to tag me. |
hi @Andcircle
|
Hi @younesbelkada, Once I changed to 4.32.0.dev0, the error "ValueError:
I load the llama2 7b model like this, then wanna use SFT trainer
@younesbelkada |
WARNING: Did not find branch or tag 'fix-to-int8', assuming revision or ref. × git checkout -q fix-to-int8 did not run successfully. note: This error originates from a subprocess, and is likely not a problem with pip. × git checkout -q fix-to-int8 did not run successfully. note: This error originates from a subprocess, and is likely not a problem with pip. |
hi @MrKsiJ pip install -U git+https://github.com/huggingface/accelerate.git |
the problem is solved, we are moving to another place, but now I have another question how to run peftmodel.from_trained locally without the Internet, if you disable the Internet, then peftmodel.from_trained for some reason still breaks on humbleface, although everything is downloaded at the first launch |
Which version of accelerate and transformer fix this issue? I am using transformers==4.36.2 and accelerate==0.26.1, and I am still having this error @younesbelkada. The issue still exists if I use transformers==4.38.0 and accelerate==0.27.2. The stacktrace is
|
System Info
System Info
I'm running into an issue where I'm not able to load a 4-bit or 8-bit quantized version of Falcon or LLaMa models. This was working a couple of weeks ago. This is running on Colab. I'm wondering if anyone knows of a fix, or why this is no longer working when it was 2-3 weeks ago around June 8th.
transformers
version: 4.31.0.dev0Who can help?
@ArthurZucker @younesbelkada @sgugger
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Running in Colab on an A100 in Colab PRro
Cell output:
Expected behavior
Model should be loaded and able to run inference.
The text was updated successfully, but these errors were encountered: