[BUG] DeepSpeed non-deterministic inference with HF GPT2 when `replace_with_kernel_inject=True` #2243

trianxy · 2022-08-19T19:46:18Z

Describe the bug
#1950 describes a bug by which running inference twice on the same input leads to different outputs. It was supposedly fixed in version 0.6.5, but I am encountering a similar bug (for Huggingface's GPT2, on an NVidia A10G) in every deepspeed version after including 0.6.3 when running long sequences. My current fix is to use version 0.6.1.

Note: When running too short a sequence this bug does not appear. When running too long a sequence, I am rather seeing another open bug (#2062) which prohibits inference.

Perhaps related bug: #2229

To Reproduce

Install packages

!pip uninstall -y torch deepspeed transformers
!pip install --upgrade pip
!pip install --upgrade torch==1.12.1 --extra-index-url https://download.pytorch.org/whl/cu116
!pip install --upgrade deepspeed==0.7.0 transformers==4.21.1

Run code

import os
import deepspeed
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

os.environ["CUDA_LAUNCH_BLOCKING"] = "1"

model = AutoModelForCausalLM.from_pretrained("gpt2").to("cuda")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

model = deepspeed.init_inference(model, dtype=torch.half, replace_method='auto', replace_with_kernel_inject=True)

long_sequence = "asdfjk **[][] 890 889288 =-0=- 888***&*&#*$&*(#$ &*#$ &*( *(&))  lf  ds890342888977889888***&*&#*$&*(#$ &*#$ &*( *(&))890234908 fdS 809d890342888977889888***&*&#*$&*(#$ &*#$ &*( *(&))fs 8903428889&*(#$ &*#$ &*( *(&)))"
complex_input = tokenizer(long_sequence, return_tensors="pt").to("cuda")

for _ in range(3):
    outputs = model(**complex_input)
    token_id = torch.argmax(outputs.logits.squeeze()[-1]).item()
    print(tokenizer.decode(token_id), outputs.logits.mean().item())  # we should always see the same output, but we don't

Observe that the output of the last print statement is different each time, although the input was always the same. Last time I ran it, I got e.g.

 Season -125.25
sp -170.25
 A -82.25

Expected behavior
I expected to see the same output each time, i.e.

 Season -125.25
 Season -125.25
 Season -125.25

ds_report output

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-devel package with yum
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch']
torch version .................... 1.12.1+cu116
torch cuda version ............... 11.6
torch hip version ................ None
nvcc version ..................... 11.1
deepspeed install path ........... ['/home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.7.0, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.12, cuda 11.6

System info (please complete the following information):

OS: Amazon Linux 2
GPU count and types: one NVidia A10G (AWS g5.xlarge)
Python version: 3.8.12

Launcher context
inside a Python notebook

The text was updated successfully, but these errors were encountered:

RezaYazdaniAminabadi · 2022-08-19T23:26:35Z

Hi @trianxy ,

I think I know where this issue is coming from. It is due to reducing the max-tokens to 128 here. We have a PR to fix this issue. We will merge this soon to resolve this issue.
Thanks,
Reza

RezaYazdaniAminabadi · 2022-08-20T01:44:58Z

Okay, I verify that by changing the MAX_OUT_TOKES to a large enough #tokens, the problem goes away. We will merge the PR soon to resolve this issue.
cc: @cmikeh2

cmikeh2 · 2022-11-12T17:28:57Z

Hi @trianxy,

I'm sorry for the lack of updates on this, but with latest master (should be released as 0.7.5 in the next few days) I believe the issue you're observing here is fixed. Would you mind testing this on your end to verify if this is true?

Thanks!

trianxy · 2022-11-13T20:26:04Z

Thank you @cmikeh2 for coming back to me on that. I think the above issue can be closed, because it is fixed in versions 0.7.5+f2710bbe BUT ALSO in 0.7.4.

Does the fact, that it works already in 0.7.4 raise any red flags for you that we might be missing sth?

I am happy to do additional tests.

cmikeh2 · 2022-11-14T20:00:27Z

Thank you @cmikeh2 for coming back to me on that. I think the above issue can be closed, because it is fixed in versions 0.7.5+f2710bbe BUT ALSO in 0.7.4.

Does the fact, that it works already in 0.7.4 raise any red flags for you that we might be missing sth?

I am happy to do additional tests.

0.7.4 did have some fixes that were designed to fix this and related issues, but also introduced a couple of regressions elsewhere that meant it was kind of unpredictable where things were and weren't working, particularly with long sequence lengths. 0.7.5 (just released) should have squashed all of that and consistently work.

trianxy added the bug Something isn't working label Aug 19, 2022

trianxy mentioned this issue Sep 12, 2022

[BUG] DeepSpeed Inference with GPT-J using batches with padding gives wrong outputs #1797

Open

cmikeh2 self-assigned this Nov 12, 2022

cmikeh2 closed this as completed Nov 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] DeepSpeed non-deterministic inference with HF GPT2 when `replace_with_kernel_inject=True` #2243

[BUG] DeepSpeed non-deterministic inference with HF GPT2 when `replace_with_kernel_inject=True` #2243

trianxy commented Aug 19, 2022

RezaYazdaniAminabadi commented Aug 19, 2022

RezaYazdaniAminabadi commented Aug 20, 2022

cmikeh2 commented Nov 12, 2022

trianxy commented Nov 13, 2022

cmikeh2 commented Nov 14, 2022

[BUG] DeepSpeed non-deterministic inference with HF GPT2 when replace_with_kernel_inject=True #2243

[BUG] DeepSpeed non-deterministic inference with HF GPT2 when replace_with_kernel_inject=True #2243

Comments

trianxy commented Aug 19, 2022

RezaYazdaniAminabadi commented Aug 19, 2022

RezaYazdaniAminabadi commented Aug 20, 2022

cmikeh2 commented Nov 12, 2022

trianxy commented Nov 13, 2022

cmikeh2 commented Nov 14, 2022

[BUG] DeepSpeed non-deterministic inference with HF GPT2 when `replace_with_kernel_inject=True` #2243

[BUG] DeepSpeed non-deterministic inference with HF GPT2 when `replace_with_kernel_inject=True` #2243