device_map='auto' gives bad results #20896

youngwoo-yoon · 2022-12-26T08:35:34Z

System Info

transformers version: 4.25.1
Platform: Linux-5.15.0-56-generic-x86_64-with-glibc2.17
Python version: 3.8.15
Huggingface_hub version: 0.11.1
PyTorch version (GPU?): 1.11.0 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: yes
Using distributed or parallel set-up in script?: no
GPUs: two A100

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Minimal test example:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = 'EleutherAI/gpt-neo-125M'
model = AutoModelForCausalLM.from_pretrained(model_name, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(model_name)

sentence = 'Hello, nice to meet you. How are'
with torch.no_grad():
    tokenize_input = tokenizer.tokenize(sentence)
    tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
    gen_tokens = model.generate(tensor_input, max_length=32)
    generated = tokenizer.batch_decode(gen_tokens)[0]

print(generated)

Results:

Hello, nice to meet you. How are noise retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy

The above result is not expected behavior.
Without device_map='auto' at line 5, it works correctly.
Line 5 becomes model = AutoModelForCausalLM.from_pretrained(model_name)

Results:

Hello, nice to meet you. How are you?

I’m a bit of a newbie to the world of web development, but I

My machine has two A100 (80 GB) GPUs, and I confirmed that the model is loaded on two GPUs when I use device_map='auto'.

Expected behavior

Explained above

The text was updated successfully, but these errors were encountered:

younesbelkada · 2022-12-26T09:18:50Z

Hi @youngwoo-yoon

Thanks for the issue!
What is your version of accelerate ? With the latest version (0.15.0) & same pytorch version I get (on a NVIDIA T4) on the minimal test example shared above that uses device_map=auto :

Hello, nice to meet you. How are you?

I’m a bit of a newbie to the world of web development, but I

youngwoo-yoon · 2022-12-26T09:27:35Z

Hello, @younesbelkada
I'm using the same version 0.15.0 of accelerate.
I also got the correct result when I ran with export CUDA_VISIBLE_DEVICES=0
Still wrong results with two GPUS export CUDA_VISIBLE_DEVICES=0,1

younesbelkada · 2022-12-26T09:42:24Z

Thanks for the details! I still did not managed to reproduce, can you try this snippet instead:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = 'EleutherAI/gpt-neo-125M'
model = AutoModelForCausalLM.from_pretrained(model_name, device_map={"transformer.wte":0, "transformer.wpe":0, "transformer.h":1, "transformer.ln_f":1, "lm_head":1})
tokenizer = AutoTokenizer.from_pretrained(model_name)

sentence = 'Hello, nice to meet you. How are'
with torch.no_grad():
    tokenize_input = tokenizer.tokenize(sentence)
    tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
    gen_tokens = model.generate(tensor_input, max_length=32)
    generated = tokenizer.batch_decode(gen_tokens)[0]

print(generated)

and let me know if the problem still persists?
We're using the same Pytorch, transformers, accelerate version. The only difference is on the hardware (I am using 2xNvidia T4)
Can you also try your script with export CUDA_VISIBLE_DEVICES=1 instead of export CUDA_VISIBLE_DEVICES=0?

youngwoo-yoon · 2022-12-26T09:49:11Z

Thanks for the quick replies.
This is the result and it still doesn't look good.

Hello, nice to meet you. How are!!!!!!!!!!!!!!!!!!!!!!!

My original test code with export CUDA_VISIBLE_DEVICES=1 gives the same correct result with export CUDA_VISIBLE_DEVICES=0

Hello, nice to meet you. How are you?

I’m a bit of a newbie to the world of web development, but I

younesbelkada · 2022-12-26T10:11:38Z

I am slightly unsure here about what could be causing the issue but I suspect it's highly correlated to the fact that you're running your script under two RTX A6000 but not sure
@sgugger do you think that the problem can be related to accelerate & the fact that the script is running under two RTX A6000 instead of another hardware (i.e. have you seen similar discrepancy errors in the past)?
@youngwoo-yoon could you ultimately try the script with the latest pytorch version (1.13.1)?

youngwoo-yoon · 2022-12-26T10:30:47Z

@younesbelkada, I got the same wrong result with PyTorch 1.13.1.

Hello, nice to meet you. How are noise retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy

sgugger · 2022-12-27T07:34:40Z

Mmmm there is no reason for the script to give different results for different GPUs, especially since removing the device_map="auto" gives the same results.

I also can't reproduce on my side. Are you absolutely certain your script is launched in the same Python environment you are reporting? E.g. can you print the versions of Accelerate/Transformers/Pytorch in the same script?

youngwoo-yoon · 2022-12-27T07:58:41Z

I put the test scripts using cpu, gpu0, gpu1, and device_map=auto on a single python file to be sure.

from importlib.metadata import version
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

print('torch', version('torch'))
print('transformers', version('transformers'))
print('accelerate', version('accelerate'))
print('# of gpus: ', torch.cuda.device_count())

# cpu
model_name = 'EleutherAI/gpt-neo-125M'
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

sentence = 'Hello, nice to meet you. How are'
with torch.no_grad():
    tokenize_input = tokenizer.tokenize(sentence)
    tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
    gen_tokens = model.generate(tensor_input, max_length=32)
    generated = tokenizer.batch_decode(gen_tokens)[0]

print(generated)
print('-------------------------------------------')

# on the gpu 0
model = AutoModelForCausalLM.from_pretrained(model_name)
model = model.to('cuda:0')

with torch.no_grad():
    tokenize_input = tokenizer.tokenize(sentence)
    tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
    tensor_input = tensor_input.to('cuda:0')
    gen_tokens = model.generate(tensor_input, max_length=32)
    generated = tokenizer.batch_decode(gen_tokens)[0]

print(generated)
print('-------------------------------------------')

# on the gpu 1
model = AutoModelForCausalLM.from_pretrained(model_name)
model = model.to('cuda:1')

with torch.no_grad():
    tokenize_input = tokenizer.tokenize(sentence)
    tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
    tensor_input = tensor_input.to('cuda:1')
    gen_tokens = model.generate(tensor_input, max_length=32)
    generated = tokenizer.batch_decode(gen_tokens)[0]

print(generated)
print('-------------------------------------------')

# with device_map=auto
model = AutoModelForCausalLM.from_pretrained(model_name, device_map='auto')

with torch.no_grad():
    tokenize_input = tokenizer.tokenize(sentence)
    tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
    gen_tokens = model.generate(tensor_input, max_length=32)
    generated = tokenizer.batch_decode(gen_tokens)[0]

print(generated)

And this the result

torch 1.13.1
transformers 4.25.1
accelerate 0.15.0
# of gpus:  2
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Hello, nice to meet you. How are you?

I’m a bit of a newbie to the world of web development, but I
-------------------------------------------
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Hello, nice to meet you. How are you?

I’m a bit of a newbie to the world of web development, but I
-------------------------------------------
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Hello, nice to meet you. How are you?

I’m a bit of a newbie to the world of web development, but I
-------------------------------------------
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
/home/user/anaconda3/envs/task_temp/lib/python3.10/site-packages/transformers/generation/utils.py:1470: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`.
  warnings.warn(
Hello, nice to meet you. How are noise retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy retaliateousy

And this is nvidia-smi results

Tue Dec 27 16:57:48 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.106.00   Driver Version: 460.106.00   CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  A100 80GB PCIe      Off  | 00000000:4F:00.0 Off |                    0 |
| N/A   36C    P0    47W / 300W |      9MiB / 81251MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  A100 80GB PCIe      Off  | 00000000:52:00.0 Off |                    0 |
| N/A   37C    P0    45W / 300W |      9MiB / 81251MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2915      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A    119486      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A      2915      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A    119486      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

sorgfresser · 2022-12-27T12:09:16Z

There is a warning

/home/user/anaconda3/envs/task_temp/lib/python3.10/site-packages/transformers/generation/utils.py:1470: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cuda') before running `.generate()`.

You did move the inputs when processing on one of the two GPUs, it might be necessary here too. Could you print the hf_device_map attribute of the model and try to move the inputs to cuda device 0 and 1?

youngwoo-yoon · 2022-12-28T00:51:34Z

I moved inputs to cuda:0 and cuda:1 but both gave the same wrong result.
Below is the output when I moved inputs to cuda:0.

torch 1.13.1
transformers 4.25.1
accelerate 0.15.0
# of gpus: 2
hf_device_map output: {'transformer.wte': 0, 'lm_head': 0, 'transformer.wpe': 0, 'transformer.drop': 0, 'transformer.h.0': 0, 'transformer.h.1': 0, 'transformer.h.2': 0, 'transformer.h.3': 0, 'transformer.h.4': 0, 'transformer.h.5': 0, 'transformer.h.6': 1, 'transformer.h.7': 1, 'transformer.h.8': 1, 'transformer.h.9': 1, 'transformer.h.10': 1, 'transformer.h.11': 1, 'transformer.ln_f': 1}
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Hello, nice to meet you. How are noiseleanor pressuring retaliate incarcer boundousy]= incarcer incarcer high * Karin�� Annotationsousyousyousy pressuring retaliateousyousyousy

I will try to reproduce this issue on another machine having two GPUs.

youngwoo-yoon · 2023-01-02T06:30:17Z

It works well on another machine with two Quadro 6000 GPUs.
I've tried different device_map strategies 'sequential' and 'balanced_low_0', but it still fails when two A100 GPUs are used.

I ran accelerate test command which tests accelerate library but it also failed. It seems like a problem of accelerate library.
I found some other people also had problems with A100 GPUs.
Related issue: huggingface/accelerate#934

github-actions · 2023-01-26T15:01:45Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

yuchguo1007 · 2023-06-27T09:18:58Z

Hi @younesbelkada I got the same error with two V100, with accelerate version 0.18.0
prompt = 'Q: What is the largest animal?\nA:'
output:

A: The blue whale.
Q: What is the largest animal?
A: The blue whale. It is the largest animal on Earth. It is also the largest mammal. It is the largest creature that has ever lived.
Q: What is the largest animal?
A: The blue whale is the largest animal on Earth. It is also the largest mammal. It is the largest creature that has ever lived.
Q: What is the largest animal?
A: The blue whale is the largest animal on Earth. It is also the largest mammal. It is the largest creature that has ever lived.
Q: What is the largest animal?
A: The blue whale is the largest animal on Earth. It is also the largest mammal. It is the largest creature that has ever lived.
Q: What is the largest animal?
A: The blue whale is the largest animal on Earth. It is also the largest mammal. It is the largest creature that has ever lived.
Q: What is the largest animal?
A: The blue whale is the largest animal on Earth. It is also the largest mammal. It is the largest creature that has ever lived.
Q

code:

model_path = 'openlm-research/open_llama_3b'

tokenizer = LlamaTokenizer.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(
    model_path, torch_dtype=torch.float16, device_map='auto'
)

prompt = 'Q: What is the largest animal?\nA:'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
input_ids = input_ids.to('cuda')

generation_output = model.generate(
    input_ids=input_ids, max_length=400
)
print(tokenizer.decode(generation_output[0]))

Have you found a solution？

nhungntaime · 2023-08-29T02:55:21Z

I think you should add the prompt which is the same one in the training. Moreover, please note the special token that you add.
Example:
In the training, I tokenize:

`f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n ### Input: <s>{input}</s>. \n### Response: <s>{ouput}</s>"`

Afterward, I used the model:

text = f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n ### Input: {input}. \n### Response: "
batch = tokenizer(text, return_tensors='pt', padding=True, return_token_type_ids=False)
with torch.cuda.amp.autocast():
      output_tokens = model.generate(**batch, max_new_tokens=500)
decode = tokenizer.decode(output_tokens[0], skip_special_tokens=True)
decode_text = decode[len(text):]
print(decode_text)

Hope to help you!

ZaVang · 2023-09-01T12:22:35Z

It works well on another machine with two Quadro 6000 GPUs. I've tried different device_map strategies 'sequential' and 'balanced_low_0', but it still fails when two A100 GPUs are used.

I ran accelerate test command which tests accelerate library but it also failed. It seems like a problem of accelerate library. I found some other people also had problems with A100 GPUs. Related issue: huggingface/accelerate#934

@youngwoo-yoon hi, have you solved this problem? I have the same problem on A100

tsengalb99 · 2023-09-02T04:19:28Z

I'm also running into a similar issue, except with A6000s. With 1 A6000 and the rest of the weights on cpu, I get coherent text. With multiple A6000s, I get garbage outputs.

youngwoo-yoon · 2023-09-02T09:22:40Z

I solved this problem by disabling ACS in BIOS.
This document might be helpful to some of you.
https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/troubleshooting.html

yuge-byte · 2024-03-29T17:35:15Z

I solved this problem by disabling ACS in BIOS. This document might be helpful to some of you. https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/troubleshooting.html

Amazing!!! It works for me.

oobabooga mentioned this issue Jan 24, 2023

Doesn't run on colab with Pygmalion-6B / results look different on Colab oobabooga/text-generation-webui#14

Closed

github-actions bot closed this as completed Feb 3, 2023

BangHonor mentioned this issue Jun 5, 2024

Text generation task otuputs nonsense when using transformers.pipeline with device_map="auto" huggingface/accelerate#2812

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

device_map='auto' gives bad results #20896

device_map='auto' gives bad results #20896

youngwoo-yoon commented Dec 26, 2022 •

edited

younesbelkada commented Dec 26, 2022 •

edited

youngwoo-yoon commented Dec 26, 2022 •

edited

younesbelkada commented Dec 26, 2022 •

edited

youngwoo-yoon commented Dec 26, 2022

younesbelkada commented Dec 26, 2022 •

edited

youngwoo-yoon commented Dec 26, 2022

sgugger commented Dec 27, 2022

youngwoo-yoon commented Dec 27, 2022

sorgfresser commented Dec 27, 2022

youngwoo-yoon commented Dec 28, 2022

youngwoo-yoon commented Jan 2, 2023

github-actions bot commented Jan 26, 2023

yuchguo1007 commented Jun 27, 2023

nhungntaime commented Aug 29, 2023 •

edited

ZaVang commented Sep 1, 2023 •

edited

tsengalb99 commented Sep 2, 2023

youngwoo-yoon commented Sep 2, 2023

yuge-byte commented Mar 29, 2024

device_map='auto' gives bad results #20896

device_map='auto' gives bad results #20896

Comments

youngwoo-yoon commented Dec 26, 2022 • edited

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

younesbelkada commented Dec 26, 2022 • edited

youngwoo-yoon commented Dec 26, 2022 • edited

younesbelkada commented Dec 26, 2022 • edited

youngwoo-yoon commented Dec 26, 2022

younesbelkada commented Dec 26, 2022 • edited

youngwoo-yoon commented Dec 26, 2022

sgugger commented Dec 27, 2022

youngwoo-yoon commented Dec 27, 2022

sorgfresser commented Dec 27, 2022

youngwoo-yoon commented Dec 28, 2022

youngwoo-yoon commented Jan 2, 2023

github-actions bot commented Jan 26, 2023

yuchguo1007 commented Jun 27, 2023

nhungntaime commented Aug 29, 2023 • edited

ZaVang commented Sep 1, 2023 • edited

tsengalb99 commented Sep 2, 2023

youngwoo-yoon commented Sep 2, 2023

yuge-byte commented Mar 29, 2024

youngwoo-yoon commented Dec 26, 2022 •

edited

younesbelkada commented Dec 26, 2022 •

edited

youngwoo-yoon commented Dec 26, 2022 •

edited

younesbelkada commented Dec 26, 2022 •

edited

younesbelkada commented Dec 26, 2022 •

edited

nhungntaime commented Aug 29, 2023 •

edited

ZaVang commented Sep 1, 2023 •

edited