-
Notifications
You must be signed in to change notification settings - Fork 878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
int8 quantization doesn't work with accelerate on multi-GPUs #875
Comments
The scipt work fine with 1 T4 GPU, the error persist only with multi-GPUs |
The problem is that you are sending your |
Thanks for the response, Reproduction
|
You can't use data parallelism with |
Hello, this behavior indeed is quite strange, if the script above works for the first batch I don't see why it shouldn't work for the second batch. In any case, is possible to find somewhere a list of the libraries supported by accelerate and the ones not supported? For the moment is not very clear how to use this library with int8 quantization and deepspeed_for_inference. |
Hello @giulio98, https://github.com/huggingface/accelerate#supported-integrations has the list of all the integrations supported by Accelerate. For more details and guidance on how to use these, please refer the |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Is there any solution to use data parallelism with int8 quantized model? |
Hi, I'm reopening this issue to inquire whether it's currently feasible to perform inference across multiple GPUs (by distributing the weights on multiple GPUs) while employing data parallelism. Specifically, is it viable to utilize PyTorch's Fully Sharded Data Parallel (FSDP) for this purpose? |
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
Expected behavior
accelerator.unwrap_model(model).generate(...) should work fine instead fail with the following error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper__native_layer_norm)
Full backtrace:
/bin/bash: /azureml-envs/pytorch-1.12/lib/libtinfo.so.6: no version information available (required by /bin/bash)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:78 [0] NCCL INFO Bootstrap : Using eth0:10.0.0.5<0>
3e8feffd8a7f4157a9116efab3d0ff63000001:78:78 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v4 symbol.
3e8feffd8a7f4157a9116efab3d0ff63000001:78:78 [0] NCCL INFO Plugin Path : /usr/local/nccl-rdma-sharp-plugins/lib/libnccl-net.so
3e8feffd8a7f4157a9116efab3d0ff63000001:78:78 [0] NCCL INFO P2P plugin IBext
3e8feffd8a7f4157a9116efab3d0ff63000001:78:78 [0] NCCL INFO NET/IB : No device found.
3e8feffd8a7f4157a9116efab3d0ff63000001:78:78 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
3e8feffd8a7f4157a9116efab3d0ff63000001:78:78 [0] NCCL INFO NET/Socket : Using [0]eth0:10.0.0.5<0>
3e8feffd8a7f4157a9116efab3d0ff63000001:78:78 [0] NCCL INFO Using network Socket
NCCL version 2.10.3+cuda11.3
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 1(=6f9100000) and dev 0(=402400000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 2(=71a000000) and dev 0(=402400000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 3(=c45a00000) and dev 0(=402400000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 0(=402400000) and dev 1(=6f9100000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 2(=71a000000) and dev 1(=6f9100000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 3(=c45a00000) and dev 1(=6f9100000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 0(=402400000) and dev 2(=71a000000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 1(=6f9100000) and dev 2(=71a000000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 3(=c45a00000) and dev 2(=71a000000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 0(=402400000) and dev 3(=c45a00000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 1(=6f9100000) and dev 3(=c45a00000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 2(=71a000000) and dev 3(=c45a00000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 1(=6f9100000) and dev 0(=402400000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 2(=71a000000) and dev 0(=402400000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 3(=c45a00000) and dev 0(=402400000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 0(=402400000) and dev 1(=6f9100000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 2(=71a000000) and dev 1(=6f9100000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 3(=c45a00000) and dev 1(=6f9100000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 0(=402400000) and dev 2(=71a000000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 1(=6f9100000) and dev 2(=71a000000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 3(=c45a00000) and dev 2(=71a000000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 0(=402400000) and dev 3(=c45a00000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 1(=6f9100000) and dev 3(=c45a00000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 2(=71a000000) and dev 3(=c45a00000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Channel 00/02 : 0 1 2 3
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Channel 01/02 : 0 1 2 3
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Setting affinity for GPU 0 to ffff
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 0(=402400000) and dev 3(=c45a00000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 0(=402400000) and dev 3(=c45a00000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 0(=402400000) and dev 1(=6f9100000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Channel 00 : 0[402400000] -> 1[6f9100000] via direct shared memory
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 0(=402400000) and dev 1(=6f9100000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Channel 01 : 0[402400000] -> 1[6f9100000] via direct shared memory
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Connected all rings
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 0(=402400000) and dev 1(=6f9100000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Could not enable P2P between dev 0(=402400000) and dev 1(=6f9100000)
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO Connected all trees
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO threadThresholds 8/8/64 | 32/8/64 | 8/8/512
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
3e8feffd8a7f4157a9116efab3d0ff63000001:78:378 [0] NCCL INFO comm 0x7fd570002fb0 rank 0 nranks 4 cudaDev 0 busId 402400000 - Init COMPLETE
3e8feffd8a7f4157a9116efab3d0ff63000001:78:78 [0] NCCL INFO Launch mode Parallel
0it [00:00, ?it/s]
0it [00:02, ?it/s]
Traceback (most recent call last):
File "test_8bit.py", line 49, in
output = accelerator.unwrap_model(model).generate(batch[0], min_length=30, max_length=30, do_sample=True)
File "/azureml-envs/pytorch-1.12/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/azureml-envs/pytorch-1.12/lib/python3.8/site-packages/transformers/generation_utils.py", line 1543, in generate
return self.sample(
File "/azureml-envs/pytorch-1.12/lib/python3.8/site-packages/transformers/generation_utils.py", line 2482, in sample
outputs = self(
File "/azureml-envs/pytorch-1.12/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/azureml-envs/pytorch-1.12/lib/python3.8/site-packages/accelerate/hooks.py", line 156, in new_forward
output = old_forward(*args, **kwargs)
File "/azureml-envs/pytorch-1.12/lib/python3.8/site-packages/transformers/models/opt/modeling_opt.py", line 929, in forward
outputs = self.model.decoder(
File "/azureml-envs/pytorch-1.12/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/azureml-envs/pytorch-1.12/lib/python3.8/site-packages/transformers/models/opt/modeling_opt.py", line 693, in forward
layer_outputs = decoder_layer(
File "/azureml-envs/pytorch-1.12/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/azureml-envs/pytorch-1.12/lib/python3.8/site-packages/accelerate/hooks.py", line 156, in new_forward
output = old_forward(*args, **kwargs)
File "/azureml-envs/pytorch-1.12/lib/python3.8/site-packages/transformers/models/opt/modeling_opt.py", line 321, in forward
hidden_states = self.self_attn_layer_norm(hidden_states)
File "/azureml-envs/pytorch-1.12/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/azureml-envs/pytorch-1.12/lib/python3.8/site-packages/accelerate/hooks.py", line 156, in new_forward
output = old_forward(*args, **kwargs)
File "/azureml-envs/pytorch-1.12/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 189, in forward
return F.layer_norm(
File "/azureml-envs/pytorch-1.12/lib/python3.8/site-packages/torch/nn/functional.py", line 2503, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper__native_layer_norm)
The text was updated successfully, but these errors were encountered: