Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DeepSpeed non-deterministic inference with HF GPT2 when replace_with_kernel_inject=True #2243

Closed
trianxy opened this issue Aug 19, 2022 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@trianxy
Copy link

trianxy commented Aug 19, 2022

Describe the bug
#1950 describes a bug by which running inference twice on the same input leads to different outputs. It was supposedly fixed in version 0.6.5, but I am encountering a similar bug (for Huggingface's GPT2, on an NVidia A10G) in every deepspeed version after including 0.6.3 when running long sequences. My current fix is to use version 0.6.1.

Note: When running too short a sequence this bug does not appear. When running too long a sequence, I am rather seeing another open bug (#2062) which prohibits inference.

Perhaps related bug: #2229

To Reproduce

  1. Install packages
!pip uninstall -y torch deepspeed transformers
!pip install --upgrade pip
!pip install --upgrade torch==1.12.1 --extra-index-url https://download.pytorch.org/whl/cu116
!pip install --upgrade deepspeed==0.7.0 transformers==4.21.1
  1. Run code
import os
import deepspeed
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

os.environ["CUDA_LAUNCH_BLOCKING"] = "1"

model = AutoModelForCausalLM.from_pretrained("gpt2").to("cuda")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

model = deepspeed.init_inference(model, dtype=torch.half, replace_method='auto', replace_with_kernel_inject=True)

long_sequence = "asdfjk **[][] 890 889288 =-0=- 888***&*&#*$&*(#$ &*#$ &*( *(&))  lf  ds890342888977889888***&*&#*$&*(#$ &*#$ &*( *(&))890234908 fdS 809d890342888977889888***&*&#*$&*(#$ &*#$ &*( *(&))fs 8903428889&*(#$ &*#$ &*( *(&)))"
complex_input = tokenizer(long_sequence, return_tensors="pt").to("cuda")

for _ in range(3):
    outputs = model(**complex_input)
    token_id = torch.argmax(outputs.logits.squeeze()[-1]).item()
    print(tokenizer.decode(token_id), outputs.logits.mean().item())  # we should always see the same output, but we don't
  1. Observe that the output of the last print statement is different each time, although the input was always the same. Last time I ran it, I got e.g.
 Season -125.25
sp -170.25
 A -82.25

Expected behavior
I expected to see the same output each time, i.e.

 Season -125.25
 Season -125.25
 Season -125.25

ds_report output

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-devel package with yum
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch']
torch version .................... 1.12.1+cu116
torch cuda version ............... 11.6
torch hip version ................ None
nvcc version ..................... 11.1
deepspeed install path ........... ['/home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.7.0, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.12, cuda 11.6

System info (please complete the following information):

  • OS: Amazon Linux 2
  • GPU count and types: one NVidia A10G (AWS g5.xlarge)
  • Python version: 3.8.12

Launcher context
inside a Python notebook

@trianxy trianxy added the bug Something isn't working label Aug 19, 2022
@RezaYazdaniAminabadi
Copy link
Contributor

Hi @trianxy ,

I think I know where this issue is coming from. It is due to reducing the max-tokens to 128 here. We have a PR to fix this issue. We will merge this soon to resolve this issue.
Thanks,
Reza

@RezaYazdaniAminabadi
Copy link
Contributor

Okay, I verify that by changing the MAX_OUT_TOKES to a large enough #tokens, the problem goes away. We will merge the PR soon to resolve this issue.
cc: @cmikeh2

@cmikeh2
Copy link
Contributor

cmikeh2 commented Nov 12, 2022

Hi @trianxy,

I'm sorry for the lack of updates on this, but with latest master (should be released as 0.7.5 in the next few days) I believe the issue you're observing here is fixed. Would you mind testing this on your end to verify if this is true?

Thanks!

@trianxy
Copy link
Author

trianxy commented Nov 13, 2022

Thank you @cmikeh2 for coming back to me on that. I think the above issue can be closed, because it is fixed in versions 0.7.5+f2710bbe BUT ALSO in 0.7.4.

Does the fact, that it works already in 0.7.4 raise any red flags for you that we might be missing sth?

I am happy to do additional tests.

@cmikeh2
Copy link
Contributor

cmikeh2 commented Nov 14, 2022

Thank you @cmikeh2 for coming back to me on that. I think the above issue can be closed, because it is fixed in versions 0.7.5+f2710bbe BUT ALSO in 0.7.4.

Does the fact, that it works already in 0.7.4 raise any red flags for you that we might be missing sth?

I am happy to do additional tests.

0.7.4 did have some fixes that were designed to fix this and related issues, but also introduced a couple of regressions elsewhere that meant it was kind of unpredictable where things were and weren't working, particularly with long sequence lengths. 0.7.5 (just released) should have squashed all of that and consistently work.

@cmikeh2 cmikeh2 closed this as completed Nov 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants