Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLIP2 inference error: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:2 #26806

Closed
2 of 4 tasks
YongLD opened this issue Oct 14, 2023 · 7 comments

Comments

@YongLD
Copy link

YongLD commented Oct 14, 2023

System Info

Describe the bug
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:2.

Screenshots

Traceback (most recent call last):
  File "/home/cike/ldy/ner/test-blip2-1.py", line 18, in <module>
    out = model.generate(**inputs)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cike/anaconda3/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
     ......
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/cike/.local/lib/python3.11/site-packages/transformers/generation/utils.py", line 2494, in greedy_search
    next_tokens = next_tokens * unfinished_sequences + pad_token_id * (1 - unfinished_sequences)
                  ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:2

System info (please complete the following information):

  • OS: 18.04.2 LTS
  • One mechine with 8x tesla p100-pcie-16gb

How can I fix this bug?

Who can help?

@pacman100

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

To Reproduce
I am trying to enable multi-gpus inference on the BLIP2 model.
I tried the following code snippet:

model_path = "/home/cike/.cache/huggingface/hub/models--Salesforce--blip2-flan-t5-xl/snapshots/cc2bb7bce2f7d4d1c37753c7e9c05a443a226614/"
processor = Blip2Processor.from_pretrained(model_path)
model = Blip2ForConditionalGeneration.from_pretrained(model_path, device_map="auto")

img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg' 
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')

question = "how many dogs are in the picture?"
inputs = processor(raw_image, question, return_tensors="pt").to("cuda")

print("model: ",model.hf_device_map)

out = model.generate(**inputs)
print(processor.decode(out[0], skip_special_tokens=True))

Expected behavior

The BLIP2 model loads and runs successfully on multi-GPUs.

@ArthurZucker
Copy link
Collaborator

pinging @SunMarc and @younesbelkada as well!

@SunMarc
Copy link
Member

SunMarc commented Oct 16, 2023

Hi @YongLD, please make sure to have the latest version of transformers. We fixed a similar issue to this one in the past. On my side, I'm able to run on 2 GPUs. LMK how it goes. If it doesn't work, please provide you environnement config.

@YongLD
Copy link
Author

YongLD commented Oct 18, 2023

Environment Config

transformers== 4.34.0
accelerate== 0.23.0
torch== 2.0.1+cu117

Beside, I found a warning when I run with device_map="auto":

The `language_model` is not in the `hf_device_map` dictionary and you are running your script in a multi-GPU environment. 
this may lead to unexpected behavior when using `accelerate`. Please pass a `device_map` that contains `language_model` to remove this warning.

Does accelerate-large-model support blip2-flan-t5-xl or blip2-flan-t5-xxl?

Anorther Question (Although it's a CUDA bug.)
I found DeferredCudaCallError When I use to("cuda") with multi-gpu, Do you know why?

os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
import torch
model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-opt-2.7b").to("cuda")
Error:torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus

@YongLD
Copy link
Author

YongLD commented Oct 18, 2023

@SunMarc How to lock the usage of device_map="auto" to a specific GPU?
I have used os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2", but it does not work;
The command torchrun test.py --nproc_per_node=3 does not work, either.

@SunMarc
Copy link
Member

SunMarc commented Oct 18, 2023

I think it is a problem with torch and cuda. In the past, we had a similar case. Can you reinstall and try again ?

Also, the following code snippet works on my side:

os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
import torch
model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-opt-2.7b").to("cuda")

As for the warning, this is something we need to fix. It shouldn't show the warning.

@YongLD
Copy link
Author

YongLD commented Oct 20, 2023

@SunMarc yes, I can use Salesforce/blip2-opt-2.7b with to("cuda"), but I can not use Salesforce--blip2-flan-t5-xxl in 1 gpu with 16GB.
There is always a RuntimeError when I use device_map="auto" for Blip2 multi-gpu test, but I can use it for T5 model

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xxl")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xxl", device_map="auto")

So I wonder is it a problem with Salesforce--blip2-flan-t5-xxl or other Blip2 model?

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants