Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue Loading 4-bit and 8-bit language models: ValueError: .to is not supported for 4-bit or 8-bit models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype. #24540

Closed
3 of 4 tasks
DJT777 opened this issue Jun 28, 2023 · 22 comments · Fixed by huggingface/accelerate#1652
Assignees

Comments

@DJT777
Copy link

DJT777 commented Jun 28, 2023

System Info

System Info

I'm running into an issue where I'm not able to load a 4-bit or 8-bit quantized version of Falcon or LLaMa models. This was working a couple of weeks ago. This is running on Colab. I'm wondering if anyone knows of a fix, or why this is no longer working when it was 2-3 weeks ago around June 8th.

  • transformers version: 4.31.0.dev0
  • Platform: Linux-5.15.107+-x86_64-with-glibc2.31
  • Python version: 3.10.12
  • Huggingface_hub version: 0.15.1
  • Safetensors version: 0.3.1
  • PyTorch version (GPU?): 2.0.1+cu118 (True)
  • Tensorflow version (GPU?): 2.12.0 (True)
  • Flax version (CPU?/GPU?/TPU?): 0.6.11 (gpu)
  • Jax version: 0.4.10
  • JaxLib version: 0.4.10

Who can help?

@ArthurZucker @younesbelkada @sgugger

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Running in Colab on an A100 in Colab PRro



!pip install git+https://www.github.com/huggingface/transformers

!pip install git+https://github.com/huggingface/accelerate

!pip install bitsandbytes

!pip install einops

from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer
import torch

model_path="tiiuae/falcon-40b-instruct"

config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, load_in_4bit=True, device_map="auto")

tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-40b-instruct")

input_text = "Describe the solar system."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids, max_length=100)
print(tokenizer.decode(outputs[0]))

Cell output:

Collecting git+https://www.github.com/huggingface/transformers
  Cloning https://www.github.com/huggingface/transformers to /tmp/pip-req-build-6pyatvel
  Running command git clone --filter=blob:none --quiet https://www.github.com/huggingface/transformers /tmp/pip-req-build-6pyatvel
  warning: redirecting to https://github.com/huggingface/transformers.git/
  Resolved https://www.github.com/huggingface/transformers to commit e84bf1f734f87aa2bedc41b9b9933d00fc6add98
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (3.12.2)
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers==4.31.0.dev0)
  Downloading huggingface_hub-0.15.1-py3-none-any.whl (236 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 236.8/236.8 kB 11.6 MB/s eta 0:00:00
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (1.22.4)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (23.1)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (6.0)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (2022.10.31)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (2.27.1)
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers==4.31.0.dev0)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.8/7.8 MB 114.2 MB/s eta 0:00:00
Collecting safetensors>=0.3.1 (from transformers==4.31.0.dev0)
  Downloading safetensors-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 79.9 MB/s eta 0:00:00
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (4.65.0)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.14.1->transformers==4.31.0.dev0) (2023.6.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.14.1->transformers==4.31.0.dev0) (4.6.3)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0.dev0) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0.dev0) (2023.5.7)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0.dev0) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0.dev0) (3.4)
Building wheels for collected packages: transformers
  Building wheel for transformers (pyproject.toml) ... done
  Created wheel for transformers: filename=transformers-4.31.0.dev0-py3-none-any.whl size=7228417 sha256=5867afa880111a40f7b630e51d9f1709ec1131236a31c2c7fb5f97179e3d1405
  Stored in directory: /tmp/pip-ephem-wheel-cache-t06u3u6x/wheels/c1/ac/11/e69d454307e735e14f4f95e575c8be27fd99835ec36f504c13
Successfully built transformers
Installing collected packages: tokenizers, safetensors, huggingface-hub, transformers
Successfully installed huggingface-hub-0.15.1 safetensors-0.3.1 tokenizers-0.13.3 transformers-4.31.0.dev0
Collecting git+https://github.com/huggingface/accelerate
  Cloning https://github.com/huggingface/accelerate to /tmp/pip-req-build-76ziff6x
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/accelerate /tmp/pip-req-build-76ziff6x
  Resolved https://github.com/huggingface/accelerate to commit d141b4ce794227450a105b7281611c7980e5b3d6
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (1.22.4)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (23.1)
Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (5.9.5)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (6.0)
Requirement already satisfied: torch>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (2.0.1+cu118)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (3.12.2)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (4.6.3)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (1.11.1)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (3.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (3.1.2)
Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (2.0.0)
Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.6.0->accelerate==0.21.0.dev0) (3.25.2)
Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.6.0->accelerate==0.21.0.dev0) (16.0.6)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.6.0->accelerate==0.21.0.dev0) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.6.0->accelerate==0.21.0.dev0) (1.3.0)
Building wheels for collected packages: accelerate
  Building wheel for accelerate (pyproject.toml) ... done
  Created wheel for accelerate: filename=accelerate-0.21.0.dev0-py3-none-any.whl size=234648 sha256=71b98a6d4b1111cc9ca22265f6699cd552325e5f71c83daebe696afd957497ee
  Stored in directory: /tmp/pip-ephem-wheel-cache-atmtszgr/wheels/f6/c7/9d/1b8a5ca8353d9307733bc719107acb67acdc95063bba749f26
Successfully built accelerate
Installing collected packages: accelerate
Successfully installed accelerate-0.21.0.dev0
Collecting bitsandbytes
  Downloading bitsandbytes-0.39.1-py3-none-any.whl (97.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.1/97.1 MB 18.8 MB/s eta 0:00:00
Installing collected packages: bitsandbytes
Successfully installed bitsandbytes-0.39.1
Collecting einops
  Downloading einops-0.6.1-py3-none-any.whl (42 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.2/42.2 kB 3.8 MB/s eta 0:00:00
Installing collected packages: einops
Successfully installed einops-0.6.1
Downloading (…)lve/main/config.json: 100%
658/658 [00:00<00:00, 51.8kB/s]
Downloading (…)/configuration_RW.py: 100%
2.51k/2.51k [00:00<00:00, 227kB/s]
A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-40b-instruct:
- configuration_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Downloading (…)main/modelling_RW.py: 100%
47.1k/47.1k [00:00<00:00, 3.76MB/s]
A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-40b-instruct:
- modelling_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Downloading (…)model.bin.index.json: 100%
39.3k/39.3k [00:00<00:00, 3.46MB/s]
Downloading shards: 100%
9/9 [04:40<00:00, 29.33s/it]
Downloading (…)l-00001-of-00009.bin: 100%
9.50G/9.50G [00:37<00:00, 274MB/s]
Downloading (…)l-00002-of-00009.bin: 100%
9.51G/9.51G [00:33<00:00, 340MB/s]
Downloading (…)l-00003-of-00009.bin: 100%
9.51G/9.51G [00:28<00:00, 320MB/s]
Downloading (…)l-00004-of-00009.bin: 100%
9.51G/9.51G [00:33<00:00, 317MB/s]
Downloading (…)l-00005-of-00009.bin: 100%
9.51G/9.51G [00:27<00:00, 210MB/s]
Downloading (…)l-00006-of-00009.bin: 100%
9.51G/9.51G [00:34<00:00, 180MB/s]
Downloading (…)l-00007-of-00009.bin: 100%
9.51G/9.51G [00:27<00:00, 307MB/s]
Downloading (…)l-00008-of-00009.bin: 100%
9.51G/9.51G [00:27<00:00, 504MB/s]
Downloading (…)l-00009-of-00009.bin: 100%
7.58G/7.58G [00:27<00:00, 315MB/s]

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/lib64-nvidia did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//172.28.0.1'), PosixPath('8013'), PosixPath('http')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-a100-s-b20acq94qsrp --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true'), PosixPath('--logtostderr --listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//ipykernel.pylab.backend_inline'), PosixPath('module')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
Loading checkpoint shards: 100%
9/9 [05:45<00:00, 35.83s/it]
Downloading (…)neration_config.json: 100%
111/111 [00:00<00:00, 10.3kB/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-1-c89997e10ae9>](https://localhost:8080/#) in <cell line: 15>()
     13 
     14 config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
---> 15 model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, load_in_4bit=True, device_map="auto")
     16 
     17 tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-40b-instruct")

3 frames
[/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py](https://localhost:8080/#) in to(self, *args, **kwargs)
   1894         # Checks if the model has been loaded in 8-bit
   1895         if getattr(self, "is_quantized", False):
-> 1896             raise ValueError(
   1897                 "`.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the"
   1898                 " model has already been set to the correct devices and casted to the correct `dtype`."

ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.

Expected behavior

Model should be loaded and able to run inference.

@younesbelkada
Copy link
Contributor

Hi @DJT777
Thanks for the report
Are you using the main branch of accelerate + single GPU? If that's the case huggingface/accelerate#1652 should solve the issue. Will try to reproduce later without that fix

@DJT777
Copy link
Author

DJT777 commented Jun 28, 2023

I wasn't able to test it using that commit. However running everything with the versioning from my June 8th run got the model loaded back up again. I am using this to run the notebook:

!pip install git+https://www.github.com/huggingface/transformers@2e2088f24b60d8817c74c32a0ac6bb1c5d39544d
!pip install huggingface-hub==0.15.1
!pip install tokenizers==0.13.3
!pip install safetensors==0.3.1
!pip install git+https://github.com/huggingface/accelerate@040f178569fbfe7ab7113af709dc5a7fa09e95bd
!pip install bitsandbytes==0.39.0
!pip install einops==0.6.1

@younesbelkada
Copy link
Contributor

Thanks @DJT777
Can you try with pip install git+https://github.com/huggingface/accelerate.git@fix-to-int8 ?

@DJT777
Copy link
Author

DJT777 commented Jun 28, 2023

@younesbelkada

I'll have an attempt at running things again with that.

@younesbelkada
Copy link
Contributor

Great thanks!

@buzzCraft
Copy link

I went for

!pip install git+https://github.com/huggingface/transformers.git@6ce6d62b6f20040129ec9831e7c4f6576402ea42
!pip install git+https://github.com/huggingface/accelerate.git@5791d949ff93733c102461ba89c8310745a3fa79
!pip install git+https://github.com/huggingface/peft.git@e2b8e3260d3eeb736edf21a2424e89fe3ecf429d
!pip install transformers[deepspeed]
I had to include transformers[deepspeed] yesterday, and earlier today I had to cherrypick commits to make things work..

Development is going so fast, hard to keep up with every change 😅

@younesbelkada
Copy link
Contributor

Hi @DJT777
I just ran the script below:

from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer
import torch

model_path="tiiuae/falcon-40b-instruct"

config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, load_in_4bit=True, device_map="auto")

tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-40b-instruct")

input_text = "Describe the solar system."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids, max_length=10)
print(tokenizer.decode(outputs[0]))

and transformers' main branch & the fix-to-int8 branch of accelerate and I can confirm the script worked fine. I am running on 2x NVIDIA T4 16GB

@DJT777
Copy link
Author

DJT777 commented Jun 28, 2023

@younesbelkada

I'm not able to confirm if it is working in Colab.

@Maaalik
Copy link

Maaalik commented Jun 28, 2023

I get the same error in Google Colab ("ValueError: .to is not supported for 4-bit or 8-bit models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype."), things were working perfectly well yesterday... Copy-pasting this code in a Colab notebook cell and running it might allow for the reproduction of that error:

!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q datasets
!pip install -q einops

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "ybelkada/falcon-7b-sharded-bf16"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, trust_remote_code=True, device_map={"":0})

Notebook settings/runtime type are/is:

  • Runtime type = Python 3
  • GPU = T4

@younesbelkada
Copy link
Contributor

Hi @Maaalik
I can confirm the PR mentioned above on accelerate fixes your issue on GColab, can you try on a new runtime / fresh environment:

!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git@fix-to-int8
!pip install -q datasets
!pip install -q einops

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "ybelkada/falcon-7b-sharded-bf16"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, trust_remote_code=True, device_map={"":0})

I just tested it on GColab

@Maaalik
Copy link

Maaalik commented Jun 28, 2023

Works like a charm! Thank you very much, @younesbelkada!

@younesbelkada
Copy link
Contributor

huggingface/accelerate#1652 being merged you can now install accelerate from source and it should work

@Andcircle
Copy link

@younesbelkada
All the test case above is using device_map="auto", it also works for me.
BUT:
if I use device_map={'':torch.cuda.current_device()}, the error shows again like:

Traceback (most recent call last):
  File "train1.py", line 124, in <module>
    trainer = SFTTrainer(
  File "/usr/local/lib/python3.8/dist-packages/trl/trainer/sft_trainer.py", line 212, in __init__
    super().__init__(
  File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 499, in __init__
    self._move_model_to_device(model, args.device)
  File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 741, in _move_model_to_device
    model = model.to(device)
  File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 1886, in to
    raise ValueError(
ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.

@Andcircle
Copy link

@younesbelkada
Even if set device_map="auto", if only have 1 GPU, still facing the error:

Traceback (most recent call last):
  File "train1.py", line 124, in <module>
    trainer = SFTTrainer(
  File "/usr/local/lib/python3.8/dist-packages/trl/trainer/sft_trainer.py", line 212, in __init__
    super().__init__(
  File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 499, in __init__
    self._move_model_to_device(model, args.device)
  File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 741, in _move_model_to_device
    model = model.to(device)
  File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 1886, in to
    raise ValueError(
ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`

@Andcircle
Copy link

@sgugger Sorry another question here =) as above

@sgugger
Copy link
Collaborator

sgugger commented Aug 9, 2023

I do not have the answer, no need to tag me.

@younesbelkada
Copy link
Contributor

hi @Andcircle
Do you face the same issue with the main branch of transformers?

pip install -U git+https://github.com/huggingface/transformers.git

@Andcircle
Copy link

Andcircle commented Aug 17, 2023

hi @Andcircle Do you face the same issue with the main branch of transformers?

pip install -U git+https://github.com/huggingface/transformers.git

Hi @younesbelkada,

Once I changed to 4.32.0.dev0, the error "ValueError: .to is not supported for 4-bit or 8-bit models." is gone. But got new error:

ValueError: weight is on the meta device, we need a `value` to put in on 0. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 14627) of binary: /usr/bin/python3

I load the llama2 7b model like this, then wanna use SFT trainer

bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        # load_in_8bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=compute_dtype,
        bnb_4bit_use_double_quant=True,
    )

    model = AutoModelForCausalLM.from_pretrained(
        model_name, quantization_config=bnb_config, trust_remote_code=True, 
        low_cpu_mem_usage=False,
        # device_map={'':torch.cuda.current_device()}
        )

@younesbelkada
If I switch pip install -U git+https://github.com/huggingface/transformers@de9255de27abfcae4a1f816b904915f0b1e23cd9, there's no "weight is on the meta device" issue, but it has "ValueError: .to is not supported for 4-bit or 8-bit models" issue for full fine tuning without lora.

@MrKsiJ
Copy link

MrKsiJ commented Oct 20, 2023

Thanks @DJT777
Can you try with pip install git+https://github.com/huggingface/accelerate.git@fix-to-int8 ?

Using https://github.com/huggingface/accelerate@d1628ee, didn't solve.

WARNING: Did not find branch or tag 'fix-to-int8', assuming revision or ref.
Running command git checkout -q fix-to-int8
error: pathspec 'fix-to-int8' did not match any file(s) known to git
error: subprocess-exited-with-error

× git checkout -q fix-to-int8 did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× git checkout -q fix-to-int8 did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

@younesbelkada
Copy link
Contributor

hi @MrKsiJ
You can now use with accelerate main branch

pip install -U git+https://github.com/huggingface/accelerate.git

@MrKsiJ
Copy link

MrKsiJ commented Oct 23, 2023

hi @MrKsiJ You can now use with accelerate main branch

pip install -U git+https://github.com/huggingface/accelerate.git

the problem is solved, we are moving to another place, but now I have another question how to run peftmodel.from_trained locally without the Internet, if you disable the Internet, then peftmodel.from_trained for some reason still breaks on humbleface, although everything is downloaded at the first launch

@robinsonmhj
Copy link

robinsonmhj commented Feb 23, 2024

Which version of accelerate and transformer fix this issue? I am using transformers==4.36.2 and accelerate==0.26.1, and I am still having this error @younesbelkada. The issue still exists if I use transformers==4.38.0 and accelerate==0.27.2.

The stacktrace is

2024-02-23 11:26:16,461 ERROR tune_controller.py:1374 -- Trial task failed for trial TorchTrainer_22c6b_00000
Traceback (most recent call last):
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
    result = ray.get(future)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/_private/worker.py", line 2624, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::_Inner.train() (pid=3455, ip=10.68.12.214, actor_id=3ae8e14d20959e0bb1e7fd5c0c000000, repr=TorchTrainer)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 342, in train
    raise skipped from exception_cause(skipped)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/train/_internal/utils.py", line 43, in check_for_failure
    ray.get(object_ref)
ray.exceptions.RayTaskError(ValueError): ray::_RayTrainWorker__execute.get_next() (pid=6302, ip=10.68.23.157, actor_id=77895994777a3819337ced8b0c000000, repr=<ray.train._internal.worker_group.RayTrainWorker object at 0x7f83c6ae1720>)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/train/_internal/worker_group.py", line 33, in __execute
    raise skipped from exception_cause(skipped)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/train/_internal/utils.py", line 118, in discard_return_wrapper
    train_func(*args, **kwargs)
  File "/tmp/ray/session_2024-02-23_10-14-31_321538_1/runtime_resources/working_dir_files/_ray_pkg_0639d2a5677b0f10/llama_ray_2.9.py", line 138, in train_func
    model, optimizer, _, lr_scheduler = deepspeed.initialize(
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/deepspeed/__init__.py", line 176, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 262, in __init__
    self._configure_distributed_model(model)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1113, in _configure_distributed_model
    self.module.to(self.device)
  File "/tmp/ray/session_2024-02-23_10-14-31_321538_1/runtime_resources/pip/7ca0a277e147900f193d749730594f67ff7cd52d/virtualenv/lib/python3.10/site-packages/accelerate/big_modeling.py", line 448, in wrapper
    return fn(*args, **kwargs)
  File "/tmp/ray/session_2024-02-23_10-14-31_321538_1/runtime_resources/pip/7ca0a277e147900f193d749730594f67ff7cd52d/virtualenv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2534, in to
    raise ValueError(
ValueError: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants