-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed as not planned
Labels
ep:CUDAissues related to the CUDA execution providerissues related to the CUDA execution providerperformanceissues related to performance regressionsissues related to performance regressionsquantizationissues related to quantizationissues related to quantizationstaleissues that have not been addressed in a while; categorized by a botissues that have not been addressed in a while; categorized by a bot
Description
Describe the issue
Onnxruntime does not utilize the GPU when running on it, resulting in GPU time exceeding CPU time.

At the same time, the GPU utilization rate is 0.

But when the model is loaded, the GPU utilization rate is 100%.

To reproduce
import onnxruntime as ort
import numpy as np
import time
# The model is YOLOv5 model
session1 = ort.InferenceSession(
path_or_bytes=r"D:\data\liujc\model\person_vehicle_onnx\1\person_vehicle_v3.4.onnx",
providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
)
session2 = ort.InferenceSession(
path_or_bytes=r"D:\data\liujc\model\person_vehicle_onnx\1\person_vehicle_v3.4.onnx",
providers=["CPUExecutionProvider"],
)
while True:
img = np.random.rand(1, 3, 640, 640).astype(np.float32)
t = time.time()
session1.run(None, {session1.get_inputs()[0].name: img})
print(session1.get_providers())
print("gpu cost time %fs" % (time.time() - t))
time.sleep(0.1)
t = time.time()
session2.run(None, {session2.get_inputs()[0].name: img})
print(session2.get_providers())
print("cpu cost time %fs" % (time.time() - t))
time.sleep(0.1)
Urgency
否
Platform
Windows
OS Version
windows 11
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
onnxruntime-gpu 1.17.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 11.8
Model File
No response
Is this a quantized model?
Yes
Metadata
Metadata
Assignees
Labels
ep:CUDAissues related to the CUDA execution providerissues related to the CUDA execution providerperformanceissues related to performance regressionsissues related to performance regressionsquantizationissues related to quantizationissues related to quantizationstaleissues that have not been addressed in a while; categorized by a botissues that have not been addressed in a while; categorized by a bot