-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] I have been trying to run deepspeed on 32 GB Tesla V 100 GPU #3463
Comments
@AbhayGoyal you need to specify the Here is an updated version of your script that should work: import os
import deepspeed
import torch
from transformers import pipeline
local_rank = int(os.getenv("LOCAL_RANK", "0"))
world_size = int(os.getenv("WORLD_SIZE", "1"))
device = torch.device(f"cuda:{local_rank}")
generator = pipeline("text-generation", model="EleutherAI/gpt-neo-2.7B", device=device)
generator.model = deepspeed.init_inference(
generator.model,
mp_size=world_size,
dtype=torch.float,
replace_with_kernel_inject=True,
)
string = generator("DeepSpeed is", do_sample=True, min_length=50)
if not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0:
print(string) |
Thanks for the reply. I understand what you are saying and will make the
changes. But will this also fix the memory problem I am facing?
…On Fri, May 5, 2023, 6:02 PM Michael Wyatt ***@***.***> wrote:
@AbhayGoyal <https://github.com/AbhayGoyal> you need to specify the device
in pipeline. If you don't do this, the tokenizer will be on CPU and the
model with be on GPU, resulting in the following error:
RuntimeError: Expected all tensors to be on the same device, but found at
least two devices, cuda:0 and cpu!
Here is an updated version of your script that should work:
import osimport deepspeedimport torchfrom transformers import pipeline
local_rank = int(os.getenv("LOCAL_RANK", "0"))world_size = int(os.getenv("WORLD_SIZE", "1"))device = torch.device(f"cuda:{local_rank}")generator = pipeline("text-generation", model="EleutherAI/gpt-neo-2.7B", device=device)
generator.model = deepspeed.init_inference(
generator.model,
mp_size=world_size,
dtype=torch.float,
replace_with_kernel_inject=True,
)
string = generator("DeepSpeed is", do_sample=True, min_length=50)if not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0:
print(string)
—
Reply to this email directly, view it on GitHub
<#3463 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEMF2JT45KR2B5LTM5M2GRLXEWBJBANCNFSM6AAAAAAXXS3NPQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I tried the solution you gave. It still gives me the exact same error |
@AbhayGoyal can you please share the error message you are seeing? Is it an Out Of Memory error? |
Actually it turns out that if I run it on just 1 GPU, it works well. Let me
send the code here
…On Mon, May 8, 2023, 12:42 PM Michael Wyatt ***@***.***> wrote:
@AbhayGoyal <https://github.com/AbhayGoyal> can you please share the
error message you are seeing? Is it an Out Of Memory error?
—
Reply to this email directly, view it on GitHub
<#3463 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEMF2JT757WHAQVXIMZMFCTXFEV7XANCNFSM6AAAAAAXXS3NPQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
… On Mon, May 8, 2023 at 6:04 PM Abhay Goyal ***@***.***> wrote:
Actually it turns out that if I run it on just 1 GPU, it works well. Let
me send the code here
On Mon, May 8, 2023, 12:42 PM Michael Wyatt ***@***.***>
wrote:
> @AbhayGoyal <https://github.com/AbhayGoyal> can you please share the
> error message you are seeing? Is it an Out Of Memory error?
>
> —
> Reply to this email directly, view it on GitHub
> <#3463 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AEMF2JT757WHAQVXIMZMFCTXFEV7XANCNFSM6AAAAAAXXS3NPQ>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
What are the exact command line arguments you are using to launch the script? If you can run on a single GPU, it should run on multiple GPU as well. Please ensure you are using |
I don't think that is the case. I also did not explicitly mention the
number of GPUs to be used. Here is the command I used
deepspeed inference_test.py --name EleutherAI/gpt-neo-2.7B --batch_size 10
On Tue, May 9, 2023 at 5:52 PM Michael Wyatt ***@***.***>
wrote:
…
https://github.com/microsoft/DeepSpeedExamples/blob/8e4ec02c1545f7bd87d3bfe5daaafa5a5f1fe6a6/inference/huggingface/text-generation/inference-test.py
… <#m_5188765260997883064_>
On Mon, May 8, 2023 at 6:04 PM Abhay Goyal *@*.*> wrote: Actually it
turns out that if I run it on just 1 GPU, it works well. Let me send the
code here On Mon, May 8, 2023, 12:42 PM Michael Wyatt @.*> wrote: >
@AbhayGoyal <https://github.com/AbhayGoyal> https://github.com/AbhayGoyal
can you please share the > error message you are seeing? Is it an Out Of
Memory error? > > — > Reply to this email directly, view it on GitHub > <#3463
(comment)
<#3463 (comment)>>,
> or unsubscribe >
https://github.com/notifications/unsubscribe-auth/AEMF2JT757WHAQVXIMZMFCTXFEV7XANCNFSM6AAAAAAXXS3NPQ
> . > You are receiving this because you were mentioned.Message ID: > *@*.***>
>
What are the exact command line arguments you are using to launch the
script? If you can run on a single GPU, it should run on multiple GPU as
well. Please ensure you are using --ds_inference and --use_kernel when
you run this script!
—
Reply to this email directly, view it on GitHub
<#3463 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEMF2JVM3CO3IHC4R44GSATXFLDDLANCNFSM6AAAAAAXXS3NPQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@AbhayGoyal I was facing the same issue on V100. In my case my process crashed with SIGKILL when I ran out of System RAM. The reason is that the model is first loaded on the CPU, and then moved to GPU by Deepspeed. So if you run the script with more than one GPUs, DS loads multiple instances of the model and may cause system memory to be exceeded. |
Thanks. You are correct. I did that. So instead of using multiple GPU, i
just used 1. Just to make things simpler.
…On Fri, May 19, 2023, 2:52 AM karandua2016 ***@***.***> wrote:
@AbhayGoyal <https://github.com/AbhayGoyal> I was facing the same issue
on V100. In my case my process crashed with SIGKILL when I ran out of
System RAM. The reason is that the model is first loaded on the CPU, and
then moved to GPU by Deepspeed. So if you run the script with more than one
GPUs, DS loads multiple instances of the model and may cause system memory
to be exceeded.
Can you check the amount of RAM (System RAM not GPU RAM) available? You
should run the Inference script and then monitor the RAM using "free -s2
-g".
—
Reply to this email directly, view it on GitHub
<#3463 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEMF2JQV3GRIRBOM7LQ2HPLXG4RCZANCNFSM6AAAAAAXXS3NPQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi all, I'm facing the same issue here. Was wondering wether anyone has any ideas what might be causing this. I'm trying to run inference on a model that needs 2 A100 GPUs minimum for inference using
and getting the sigkill error:
even though in theory the model should fit on 2 A100 GPUs and generate results using deepspeed. |
Same issue on 8 * A100, mark. |
Hi, I have encounter the same error on 8*H800 GPU, so any solution about this? |
Same error with 4*RTXA5000 GPU. |
Hi All, we have recently made some updates that affect this issue. Please install the latest DeepSpeed and use the latest scripts from https://github.com/microsoft/DeepSpeedExamples/blob/master/inference/huggingface/text-generation/inference-test.py You can now load models using meta tensors to avoid using all the system memory and causing these errors. This works for most models when using Auto Tensor Parallelism (i.e., when not using
|
Describe the bug
I have been trying to run deepspeed on 32 GB Tesla V 100 GPU but it still does not work. I tried parellelizing it over 4 GPUs as well and it shows me a SIGKILL
To Reproduce
Here is the code i ran
`import os
import deepspeed
import torch
from transformers import pipeline
local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '1'))
generator = pipeline('text-generation', model='EleutherAI/gpt-neo-2.7B')
generator.model = deepspeed.init_inference(generator.model,
mp_size=world_size,
dtype=torch.float,
replace_with_kernel_inject=True)
string = generator("DeepSpeed is", do_sample=True, min_length=50)
if not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0:
print(string)
`
Docker context
Are you using a specific docker image that you can share?
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: