New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to use GPU accelerated Optimum Onnx transformer model for inference #580
Comments
Thanks for the report! The ONNX Runtime pipeline follow the same schema as transformers: https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.TextClassificationPipeline So you need to pass But I agree the error message is not ideal, we can fix that! I'll add an example with pipelines as well in the guide you linked. |
How am I able to still run the same code without device=0 with 1.4.1 ? Is there something wrong what I'm doing here? |
The full script, working well with optimum-1.5.1, is: from optimum.onnxruntime import ORTModelForSequenceClassification
ort_model = ORTModelForSequenceClassification.from_pretrained(
"philschmid/tiny-bert-sst2-distilled",
from_transformers=True,
provider="CUDAExecutionProvider",
)
from optimum.pipelines import pipeline
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("philschmid/tiny-bert-sst2-distilled")
pipe = pipeline(task="text-classification", model=ort_model, tokenizer=tokenizer, device="cuda:0")
result = pipe("Both the music and visual were astounding, not to mention the actors performance.")
print(result) It works as well on 1.4.1, but I would advise to update. |
Got it. Would be helpful if you could update the docs on optimum gpu to reflect the code above and change the error prompt. Feel free to close the issue. Thanks for the quick resolution. |
Will do, thanks for reporting this lack in the doc! We are open to contributions as well! |
Oh, I would love to take this up and contribute to docs and suggest a better error invocation. Will take this up if that's fine ? |
For sure, thanks a lot! Don't hesitate if you need any guidance! |
@fxmarty could you help me with where exactly do i need to handle the error in code for raising a better error - Here is the stacktrace
|
Thank you for your PR, it's great! I think there could be a check here that the tensors are on the right device: optimum/optimum/onnxruntime/modeling_ort.py Lines 858 to 860 in d695659
Although I am not sure it's worth it introducing more checks directly in ORTModel's, what do you think @JingyaHuang ? Alternatively, it could be possible to raise an error in Line 258 in d695659
device is not passed to the pipeline.
|
|
@smiraldr So as I understand in fact it was a device indexing issue, @JingyaHuang fixed it in #613 . So your PR looks good as is, moving the discussion there! |
System Info
Who can help?
@JingyaHuang @echarlaix
When following the documentation on https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/gpu for 1.5.0 version optimum. We get the following error:
RuntimeError Traceback (most recent call last)
in
19 "education",
20 "music"]
---> 21 pred = onnx_z0(sequence_to_classify, candidate_labels, multi_class=False)
8 frames
/usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in bind_input(self, name, device_type, device_id, element_type, shape, buffer_ptr)
454 :param buffer_ptr: memory pointer to input data
455 """
--> 456 self._iobinding.bind_input(
457 name,
458 C.OrtDevice(
RuntimeError: Error when binding input: There's no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0]
This is reproducible on google colab gpu instance as well. This is observed from 1.5.0 version only and 1.4.1 works as expected.
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
!pip install optimum[onnxruntime-gpu]==1.5.1
!pip install transformers onnx
from optimum.onnxruntime import ORTModelForSequenceClassification
ort_model = ORTModelForSequenceClassification.from_pretrained(
"philschmid/tiny-bert-sst2-distilled",
from_transformers=True,
provider="CUDAExecutionProvider",
)
from optimum.pipelines import pipeline
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("philschmid/tiny-bert-sst2-distilled")
pipe = pipeline(task="text-classification", model=ort_model, tokenizer=tokenizer)
result = pipe("Both the music and visual were astounding, not to mention the actors performance.")
print(result)
Expected behavior
Inference fails due to device error, which is not expected.
The text was updated successfully, but these errors were encountered: