[BUG] I have been trying to run deepspeed on 32 GB Tesla V 100 GPU #3463

AbhayGoyal · 2023-05-05T21:08:56Z

Describe the bug
I have been trying to run deepspeed on 32 GB Tesla V 100 GPU but it still does not work. I tried parellelizing it over 4 GPUs as well and it shows me a SIGKILL

To Reproduce
Here is the code i ran

`import os
import deepspeed
import torch
from transformers import pipeline

local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '1'))
generator = pipeline('text-generation', model='EleutherAI/gpt-neo-2.7B')

generator.model = deepspeed.init_inference(generator.model,
mp_size=world_size,
dtype=torch.float,
replace_with_kernel_inject=True)

string = generator("DeepSpeed is", do_sample=True, min_length=50)
if not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0:
print(string)
`

Docker context
Are you using a specific docker image that you can share?

Additional context
Add any other context about the problem here.

mrwyattii · 2023-05-05T23:02:29Z

@AbhayGoyal you need to specify the device in pipeline. If you don't do this, the tokenizer will be on CPU and the model with be on GPU, resulting in the following error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Here is an updated version of your script that should work:

import os
import deepspeed
import torch
from transformers import pipeline

local_rank = int(os.getenv("LOCAL_RANK", "0"))
world_size = int(os.getenv("WORLD_SIZE", "1"))
device = torch.device(f"cuda:{local_rank}")
generator = pipeline("text-generation", model="EleutherAI/gpt-neo-2.7B", device=device)

generator.model = deepspeed.init_inference(
    generator.model,
    mp_size=world_size,
    dtype=torch.float,
    replace_with_kernel_inject=True,
)

string = generator("DeepSpeed is", do_sample=True, min_length=50)
if not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0:
    print(string)

AbhayGoyal · 2023-05-05T23:08:15Z

Thanks for the reply. I understand what you are saying and will make the changes. But will this also fix the memory problem I am facing?

…

On Fri, May 5, 2023, 6:02 PM Michael Wyatt ***@***.***> wrote: @AbhayGoyal <https://github.com/AbhayGoyal> you need to specify the device in pipeline. If you don't do this, the tokenizer will be on CPU and the model with be on GPU, resulting in the following error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! Here is an updated version of your script that should work: import osimport deepspeedimport torchfrom transformers import pipeline local_rank = int(os.getenv("LOCAL_RANK", "0"))world_size = int(os.getenv("WORLD_SIZE", "1"))device = torch.device(f"cuda:{local_rank}")generator = pipeline("text-generation", model="EleutherAI/gpt-neo-2.7B", device=device) generator.model = deepspeed.init_inference( generator.model, mp_size=world_size, dtype=torch.float, replace_with_kernel_inject=True, ) string = generator("DeepSpeed is", do_sample=True, min_length=50)if not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0: print(string) — Reply to this email directly, view it on GitHub <#3463 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEMF2JT45KR2B5LTM5M2GRLXEWBJBANCNFSM6AAAAAAXXS3NPQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

AbhayGoyal · 2023-05-06T03:36:07Z

I tried the solution you gave. It still gives me the exact same error

mrwyattii · 2023-05-08T17:42:08Z

@AbhayGoyal can you please share the error message you are seeing? Is it an Out Of Memory error?

AbhayGoyal · 2023-05-08T23:04:45Z

Actually it turns out that if I run it on just 1 GPU, it works well. Let me send the code here

…

On Mon, May 8, 2023, 12:42 PM Michael Wyatt ***@***.***> wrote: @AbhayGoyal <https://github.com/AbhayGoyal> can you please share the error message you are seeing? Is it an Out Of Memory error? — Reply to this email directly, view it on GitHub <#3463 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEMF2JT757WHAQVXIMZMFCTXFEV7XANCNFSM6AAAAAAXXS3NPQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

AbhayGoyal · 2023-05-09T00:35:56Z

https://github.com/microsoft/DeepSpeedExamples/blob/8e4ec02c1545f7bd87d3bfe5daaafa5a5f1fe6a6/inference/huggingface/text-generation/inference-test.py

…

On Mon, May 8, 2023 at 6:04 PM Abhay Goyal ***@***.***> wrote: Actually it turns out that if I run it on just 1 GPU, it works well. Let me send the code here On Mon, May 8, 2023, 12:42 PM Michael Wyatt ***@***.***> wrote: > @AbhayGoyal <https://github.com/AbhayGoyal> can you please share the > error message you are seeing? Is it an Out Of Memory error? > > — > Reply to this email directly, view it on GitHub > <#3463 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AEMF2JT757WHAQVXIMZMFCTXFEV7XANCNFSM6AAAAAAXXS3NPQ> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >

mrwyattii · 2023-05-09T22:52:26Z

https://github.com/microsoft/DeepSpeedExamples/blob/8e4ec02c1545f7bd87d3bfe5daaafa5a5f1fe6a6/inference/huggingface/text-generation/inference-test.py
…
On Mon, May 8, 2023 at 6:04 PM Abhay Goyal @.> wrote: Actually it turns out that if I run it on just 1 GPU, it works well. Let me send the code here On Mon, May 8, 2023, 12:42 PM Michael Wyatt @.> wrote: > @AbhayGoyal https://github.com/AbhayGoyal can you please share the > error message you are seeing? Is it an Out Of Memory error? > > — > Reply to this email directly, view it on GitHub > <#3463 (comment)>, > or unsubscribe > https://github.com/notifications/unsubscribe-auth/AEMF2JT757WHAQVXIMZMFCTXFEV7XANCNFSM6AAAAAAXXS3NPQ > . > You are receiving this because you were mentioned.Message ID: > @.***> >

What are the exact command line arguments you are using to launch the script? If you can run on a single GPU, it should run on multiple GPU as well. Please ensure you are using --ds_inference and --use_kernel when you run this script!

AbhayGoyal · 2023-05-09T23:03:17Z

I don't think that is the case. I also did not explicitly mention the number of GPUs to be used. Here is the command I used deepspeed inference_test.py --name EleutherAI/gpt-neo-2.7B --batch_size 10 On Tue, May 9, 2023 at 5:52 PM Michael Wyatt ***@***.***> wrote:

…

https://github.com/microsoft/DeepSpeedExamples/blob/8e4ec02c1545f7bd87d3bfe5daaafa5a5f1fe6a6/inference/huggingface/text-generation/inference-test.py … <#m_5188765260997883064_> On Mon, May 8, 2023 at 6:04 PM Abhay Goyal *@*.*> wrote: Actually it turns out that if I run it on just 1 GPU, it works well. Let me send the code here On Mon, May 8, 2023, 12:42 PM Michael Wyatt @.*> wrote: > @AbhayGoyal <https://github.com/AbhayGoyal> https://github.com/AbhayGoyal can you please share the > error message you are seeing? Is it an Out Of Memory error? > > — > Reply to this email directly, view it on GitHub > <#3463 (comment) <#3463 (comment)>>, > or unsubscribe > https://github.com/notifications/unsubscribe-auth/AEMF2JT757WHAQVXIMZMFCTXFEV7XANCNFSM6AAAAAAXXS3NPQ > . > You are receiving this because you were mentioned.Message ID: > *@*.***> > What are the exact command line arguments you are using to launch the script? If you can run on a single GPU, it should run on multiple GPU as well. Please ensure you are using --ds_inference and --use_kernel when you run this script! — Reply to this email directly, view it on GitHub <#3463 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEMF2JVM3CO3IHC4R44GSATXFLDDLANCNFSM6AAAAAAXXS3NPQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

karandua2016 · 2023-05-19T07:52:01Z

@AbhayGoyal I was facing the same issue on V100. In my case my process crashed with SIGKILL when I ran out of System RAM. The reason is that the model is first loaded on the CPU, and then moved to GPU by Deepspeed. So if you run the script with more than one GPUs, DS loads multiple instances of the model and may cause system memory to be exceeded.
Can you check the amount of RAM (System RAM not GPU RAM) available? You should run the Inference script and then monitor the RAM using "free -s2 -g".

AbhayGoyal · 2023-05-19T12:01:20Z

Thanks. You are correct. I did that. So instead of using multiple GPU, i just used 1. Just to make things simpler.

…

On Fri, May 19, 2023, 2:52 AM karandua2016 ***@***.***> wrote: @AbhayGoyal <https://github.com/AbhayGoyal> I was facing the same issue on V100. In my case my process crashed with SIGKILL when I ran out of System RAM. The reason is that the model is first loaded on the CPU, and then moved to GPU by Deepspeed. So if you run the script with more than one GPUs, DS loads multiple instances of the model and may cause system memory to be exceeded. Can you check the amount of RAM (System RAM not GPU RAM) available? You should run the Inference script and then monitor the RAM using "free -s2 -g". — Reply to this email directly, view it on GitHub <#3463 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEMF2JQV3GRIRBOM7LQ2HPLXG4RCZANCNFSM6AAAAAAXXS3NPQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

KMFODA · 2023-06-15T15:38:05Z

Hi all, I'm facing the same issue here. Was wondering wether anyone has any ideas what might be causing this.

I'm trying to run inference on a model that needs 2 A100 GPUs minimum for inference using

/opt/conda/bin/deepspeed /root/DeepSpeedExamples/inference/huggingface/text-generation/inf
erence-test.py --num_gpus 2 --name huggyllama/llama-65b

and getting the sigkill error:

[2023-06-15 15:32:36,151] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 18390
[2023-06-15 15:32:43,064] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 18391

even though in theory the model should fit on 2 A100 GPUs and generate results using deepspeed.

abmybgx · 2023-08-15T07:52:29Z

Same issue on 8 * A100, mark.

zzkcaesar · 2023-08-25T06:50:00Z

Hi, I have encounter the same error on 8*H800 GPU, so any solution about this?

egesko · 2023-08-26T11:46:07Z

Same error with 4*RTXA5000 GPU.

mrwyattii · 2023-09-20T16:09:44Z

Hi All, we have recently made some updates that affect this issue. Please install the latest DeepSpeed and use the latest scripts from https://github.com/microsoft/DeepSpeedExamples/blob/master/inference/huggingface/text-generation/inference-test.py

You can now load models using meta tensors to avoid using all the system memory and causing these errors. This works for most models when using Auto Tensor Parallelism (i.e., when not using --use_kernel) and it works for GPT-NEO, BLOOM, OPT, and GPT-J models when using kernel injection (i.e., when using --use_kernel):

deepspeed --num_gpus 2 inference-test.py --model huggyllama/llama-65b --use_meta_tensor

AbhayGoyal added bug Something isn't working inference labels May 5, 2023

mrwyattii self-assigned this May 5, 2023

zhijianma mentioned this issue Oct 20, 2023

FT-Data Ranker-1b OOM finetuning on single GPU modelscope/data-juicer#39

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] I have been trying to run deepspeed on 32 GB Tesla V 100 GPU #3463

[BUG] I have been trying to run deepspeed on 32 GB Tesla V 100 GPU #3463

AbhayGoyal commented May 5, 2023

mrwyattii commented May 5, 2023

AbhayGoyal commented May 5, 2023 via email

AbhayGoyal commented May 6, 2023

mrwyattii commented May 8, 2023

AbhayGoyal commented May 8, 2023 via email

AbhayGoyal commented May 9, 2023 via email

mrwyattii commented May 9, 2023

AbhayGoyal commented May 9, 2023 via email

karandua2016 commented May 19, 2023

AbhayGoyal commented May 19, 2023 via email

KMFODA commented Jun 15, 2023

abmybgx commented Aug 15, 2023

zzkcaesar commented Aug 25, 2023

egesko commented Aug 26, 2023

mrwyattii commented Sep 20, 2023

[BUG] I have been trying to run deepspeed on 32 GB Tesla V 100 GPU #3463

[BUG] I have been trying to run deepspeed on 32 GB Tesla V 100 GPU #3463

Comments

AbhayGoyal commented May 5, 2023

mrwyattii commented May 5, 2023

AbhayGoyal commented May 5, 2023 via email

AbhayGoyal commented May 6, 2023

mrwyattii commented May 8, 2023

AbhayGoyal commented May 8, 2023 via email

AbhayGoyal commented May 9, 2023 via email

mrwyattii commented May 9, 2023

AbhayGoyal commented May 9, 2023 via email

karandua2016 commented May 19, 2023

AbhayGoyal commented May 19, 2023 via email

KMFODA commented Jun 15, 2023

abmybgx commented Aug 15, 2023

zzkcaesar commented Aug 25, 2023

egesko commented Aug 26, 2023

mrwyattii commented Sep 20, 2023