-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeepSpeed Inference support for OPT #1978
Comments
This is definitely on our upcoming TODO list to investigate. Are you saying you've tried your own custom kernel injection policy and it (partially?) works? |
Thanks! I haven't tried to inject it yet. But I assume it would look something like this unless it's not supported, in which case it would look like the example of T5 ( |
I have the same requirement so I wrote a custom policy for OPT and it seems working (i.e., the inference engine was initialized successfully and I could see some However when actual performing the inference, I got an error:
I dived into the code and found that this is because of the argument mismatch between the following two:
I have no idea how to improve As a workaround, I manually changed the arguments of DeepSpeedTransformerInference (get_key_value -> past_key_value and head_mask -> layer_head_mask). Although it seems working again, I got the following error and decided to stop for now:
|
Right now OPT (https://huggingface.co/docs/transformers/model_doc/opt) can only be supported via custom kernel injection policy. It would be great if there's official support. Thanks!
The text was updated successfully, but these errors were encountered: