DeepSpeed Inference support for OPT #1978

shijie-wu · 2022-05-25T19:02:30Z

Right now OPT (https://huggingface.co/docs/transformers/model_doc/opt) can only be supported via custom kernel injection policy. It would be great if there's official support. Thanks!

jeffra · 2022-05-25T21:57:51Z

This is definitely on our upcoming TODO list to investigate. Are you saying you've tried your own custom kernel injection policy and it (partially?) works?

shijie-wu · 2022-05-25T23:45:39Z

Thanks! I haven't tried to inject it yet. But I assume it would look something like this unless it's not supported, in which case it would look like the example of T5 (injection_policy={T5Block: ('SelfAttention.o', 'EncDecAttention.o', 'DenseReluDense.wo')})?

comaniac · 2022-06-17T23:39:25Z

I have the same requirement so I wrote a custom policy for OPT and it seems working (i.e., the inference engine was initialized successfully and I could see some nvcc commends which were for fused kernel compilation I guess?)

However when actual performing the inference, I got an error:

  File "/usr/local/lib/python3.7/dist-packages/transformers/models/opt/modeling_opt.py", line 706, in forward
    return forward_call(*input, **kwargs)
TypeError:     forward() got an unexpected keyword argument 'layer_head_mask'return forward_call(*input, **kwargs)

TypeError: forward() got an unexpected keyword argument 'layer_head_mask'
        use_cache=use_cache,use_cache=use_cache,
TypeErrorTypeError: : forward() got an unexpected keyword argument 'layer_head_mask'forward() got an unexpected keyword argument 'layer_head_mask'

I dived into the code and found that this is because of the argument mismatch between the following two:

DeepSpeed/deepspeed/ops/transformer/inference/transformer_inference.py

Line 622 in b666d5c

def forward(self,
https://github.com/huggingface/transformers/blob/main/src/transformers/models/opt/modeling_opt.py#L281

I have no idea how to improve DeepSpeedTransformerInference to make sure its arguments are compatible with emerging models, tho.

As a workaround, I manually changed the arguments of DeepSpeedTransformerInference (get_key_value -> past_key_value and head_mask -> layer_head_mask). Although it seems working again, I got the following error and decided to stop for now:

RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1193, unhandled cuda error, NCCL version 2.10.3
ncclUnhandledCudaError: Call to CUDA function failed.
terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from get at ../c10/cuda/CUDACachingAllocator.cpp:315 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7f8b936e1afe in /usr/local/lib/python3.7/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x2656e (0x7f8bbc85556e in /usr/local/lib/python3.7/dist-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x257 (0x7f8bbc859b87 in /usr/local/lib/python3.7/dist-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x486fbe (0x7f8be60f0fbe in /usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_python.so)
frame #4: c10::TensorImpl::~TensorImpl() + 0x9 (0x7f8b936c9749 in /usr/local/lib/python3.7/dist-packages/torch/lib/libc10.so)
frame #5: <unknown function> + 0x6d13d0 (0x7f8be633b3d0 in /usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_python.so)
frame #6: THPVariable_subclass_dealloc(_object*) + 0x308 (0x7f8be633b7c8 in /usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_python.so)
frame #7: PyDict_SetItem + 0x337 (0x5aa957 in /usr/bin/python3)
frame #8: _PyModule_ClearDict + 0x20a (0x59bbea in /usr/bin/python3)
frame #9: PyImport_Cleanup + 0x354 (0x5248d4 in /usr/bin/python3)
frame #10: Py_FinalizeEx + 0x6e (0x62c79e in /usr/bin/python3)
frame #11: /usr/bin/python3() [0x650de0]
frame #12: _Py_UnixMain + 0x2e (0x6511be in /usr/bin/python3)
frame #13: __libc_start_main + 0xe7 (0x7f8db2bcdbf7 in /lib/x86_64-linux-gnu/libc.so.6)
frame #14: _start + 0x2a (0x5d141a in /usr/bin/python3)

shijie-wu added the enhancement New feature or request label May 25, 2022

jeffra added the inference label Jul 29, 2022

arashb self-assigned this Jul 29, 2022

arashb mentioned this issue Aug 9, 2022

Add support of OPT models #2205

Merged

tjruwase closed this as completed in #2205 Aug 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepSpeed Inference support for OPT #1978

DeepSpeed Inference support for OPT #1978

shijie-wu commented May 25, 2022

jeffra commented May 25, 2022

shijie-wu commented May 25, 2022

comaniac commented Jun 17, 2022 •

edited

DeepSpeed Inference support for OPT #1978

DeepSpeed Inference support for OPT #1978

Comments

shijie-wu commented May 25, 2022

jeffra commented May 25, 2022

shijie-wu commented May 25, 2022

comaniac commented Jun 17, 2022 • edited

comaniac commented Jun 17, 2022 •

edited