Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepSpeed Inference support for OPT #1978

Closed
shijie-wu opened this issue May 25, 2022 · 3 comments · Fixed by #2205
Closed

DeepSpeed Inference support for OPT #1978

shijie-wu opened this issue May 25, 2022 · 3 comments · Fixed by #2205
Assignees
Labels
enhancement New feature or request inference

Comments

@shijie-wu
Copy link

Right now OPT (https://huggingface.co/docs/transformers/model_doc/opt) can only be supported via custom kernel injection policy. It would be great if there's official support. Thanks!

@shijie-wu shijie-wu added the enhancement New feature or request label May 25, 2022
@jeffra
Copy link
Contributor

jeffra commented May 25, 2022

This is definitely on our upcoming TODO list to investigate. Are you saying you've tried your own custom kernel injection policy and it (partially?) works?

@shijie-wu
Copy link
Author

Thanks! I haven't tried to inject it yet. But I assume it would look something like this unless it's not supported, in which case it would look like the example of T5 (injection_policy={T5Block: ('SelfAttention.o', 'EncDecAttention.o', 'DenseReluDense.wo')})?

@comaniac
Copy link

comaniac commented Jun 17, 2022

I have the same requirement so I wrote a custom policy for OPT and it seems working (i.e., the inference engine was initialized successfully and I could see some nvcc commends which were for fused kernel compilation I guess?)

However when actual performing the inference, I got an error:

  File "/usr/local/lib/python3.7/dist-packages/transformers/models/opt/modeling_opt.py", line 706, in forward
    return forward_call(*input, **kwargs)
TypeError:     forward() got an unexpected keyword argument 'layer_head_mask'return forward_call(*input, **kwargs)

TypeError: forward() got an unexpected keyword argument 'layer_head_mask'
        use_cache=use_cache,use_cache=use_cache,
TypeErrorTypeError: : forward() got an unexpected keyword argument 'layer_head_mask'forward() got an unexpected keyword argument 'layer_head_mask'

I dived into the code and found that this is because of the argument mismatch between the following two:

  1. https://github.com/huggingface/transformers/blob/main/src/transformers/models/opt/modeling_opt.py#L281

I have no idea how to improve DeepSpeedTransformerInference to make sure its arguments are compatible with emerging models, tho.

As a workaround, I manually changed the arguments of DeepSpeedTransformerInference (get_key_value -> past_key_value and head_mask -> layer_head_mask). Although it seems working again, I got the following error and decided to stop for now:

RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1193, unhandled cuda error, NCCL version 2.10.3
ncclUnhandledCudaError: Call to CUDA function failed.
terminate called after throwing an instance of 'c10::CUDAError'
  what():  CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from get at ../c10/cuda/CUDACachingAllocator.cpp:315 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7f8b936e1afe in /usr/local/lib/python3.7/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x2656e (0x7f8bbc85556e in /usr/local/lib/python3.7/dist-packages/torch/lib/libc10_cuda.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void*) + 0x257 (0x7f8bbc859b87 in /usr/local/lib/python3.7/dist-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x486fbe (0x7f8be60f0fbe in /usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_python.so)
frame #4: c10::TensorImpl::~TensorImpl() + 0x9 (0x7f8b936c9749 in /usr/local/lib/python3.7/dist-packages/torch/lib/libc10.so)
frame #5: <unknown function> + 0x6d13d0 (0x7f8be633b3d0 in /usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_python.so)
frame #6: THPVariable_subclass_dealloc(_object*) + 0x308 (0x7f8be633b7c8 in /usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_python.so)
frame #7: PyDict_SetItem + 0x337 (0x5aa957 in /usr/bin/python3)
frame #8: _PyModule_ClearDict + 0x20a (0x59bbea in /usr/bin/python3)
frame #9: PyImport_Cleanup + 0x354 (0x5248d4 in /usr/bin/python3)
frame #10: Py_FinalizeEx + 0x6e (0x62c79e in /usr/bin/python3)
frame #11: /usr/bin/python3() [0x650de0]
frame #12: _Py_UnixMain + 0x2e (0x6511be in /usr/bin/python3)
frame #13: __libc_start_main + 0xe7 (0x7f8db2bcdbf7 in /lib/x86_64-linux-gnu/libc.so.6)
frame #14: _start + 0x2a (0x5d141a in /usr/bin/python3)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request inference
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants