Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can quantized onnx t5 run on GPU #8

Closed
gyin94 opened this issue Apr 8, 2021 · 3 comments
Closed

can quantized onnx t5 run on GPU #8

gyin94 opened this issue Apr 8, 2021 · 3 comments

Comments

@gyin94
Copy link

gyin94 commented Apr 8, 2021

Just wonder whether this quantized onnx t5 can run on GPU.

@Ki6an
Copy link
Owner

Ki6an commented Apr 8, 2021

unfortunately no. quantized onnx models can only be run on CPU & onnxruntime-gpu does not support quantization.
if you want more details on this question you should create an issue in onnxruntime.

@gyin94 gyin94 closed this as completed Apr 9, 2021
@mblank5
Copy link

mblank5 commented Dec 15, 2021

@Ki6an HI, do we support onnxruntime-gpu mode with V100 or T4 if dont use the quantization?

i see with onnxruntime only we can have x2 speedup ?

@Ki6an
Copy link
Owner

Ki6an commented Dec 18, 2021

@mblank5 no, this library uses onnxruntime, and to support GPUs you need to have onnxruntime-GPU installed.

BUT you can uninstall onnxruntime after the fastt5 library is installed, and install onnxruntime-gpu and try running the model but not sure you'll get speed up. for more info refer to this issue.

with onnxruntime, you'll get speed up if you are using modern CPUs and more CPU cores. refer benchmark section of README.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants