[CUDAExecutionProvider] Regression from ORT 1.15.0 onwards: Compute MatMul dimension mismatch #18692

fxmarty · 2023-12-04T16:50:07Z

Describe the issue

Hi, I noticed a regression in onnxruntime-gpu==1.15.1 and onnxruntime-gpu==1.16.3 (no problem on onnxruntime-gpu==1.14.1.

The following code runs fine on CPUExecutionProvider for all three ORT versions, but fails on CUDAExecutionProvider for 1.15.1 and 1.16.3.

import onnxruntime
from transformers import DetrImageProcessor
import torch
from PIL import Image
import requests
from transformers.models.detr.modeling_detr import DetrObjectDetectionOutput

from optimum.onnxruntime import ORTModelForCustomTasks

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

model = onnxruntime.InferenceSession("/path/to/model.onnx", providers=["CUDAExecutionProvider"])

processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50", revision="no_timm")

inputs = processor(images=image, return_tensors="np")
inputs = {
    "pixel_values": inputs["pixel_values"]
}

outputs = model.run(None, inputs)

with the error:

Traceback (most recent call last):
  File "<tmp 1>", line 22, in <module>
    outputs = model.run(None, inputs)
  File "/home/fxmarty/anaconda3/envs/hf-inf/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 217, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running MatMul node. Name:'/model/decoder/layers.0/self_attn/out_proj/MatMul' Status Message: matmul_helper.h:59 Compute MatMul dimension mismatch

To reproduce

As above. Reproduce with https://huggingface.co/fxmarty/bugged-detr-ort-cuda/tree/main

Using CUDA 11.7, which should be compatible according to https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html

Urgency

medium

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

as above

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 11.7

The text was updated successfully, but these errors were encountered:

fxmarty · 2023-12-04T16:50:34Z

cc @tianleiwu @yufenglee

I guess one may compile with ORT_DEBUG_NODE_IO_DUMP_SHAPE_DATA=1 to see which node the issue comes from.

github-actions · 2024-01-22T15:01:03Z

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

fxmarty · 2024-03-26T09:03:58Z

Hi @yufenglee @tianleiwu, this issue is not stale and reported by user for an other architecture (table-transformer), with onnxruntime-gpu==1.17.1: huggingface/optimum#1774

tianleiwu · 2024-03-26T18:36:27Z

The issue is resolved in the main branch.

I did reproduce it in 1.17.1:

The issue is gone after disabling all graph optimizations in session options.
I changed graph optimization level to basic, the issue is still there.

So the issue is caused by some basic level graph optimization. If there is time, some debugging (by disabling basic level graph optimization one by one) can find which optimizer is the cause.

fxmarty · 2024-03-27T09:11:58Z

Thanks a lot @tianleiwu

thisisd3 · 2024-04-18T08:51:49Z

Hi all, is this resolved in 1.17.3 released 2 days ago? @tianleiwu

github-actions bot added ep:CUDA issues related to the CUDA execution provider model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. labels Dec 4, 2023

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Jan 22, 2024

fxmarty mentioned this issue Mar 26, 2024

Unable to export Table Transformer huggingface/optimum#1774

Closed

4 tasks

fxmarty closed this as completed Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDAExecutionProvider] Regression from ORT 1.15.0 onwards: Compute MatMul dimension mismatch #18692

[CUDAExecutionProvider] Regression from ORT 1.15.0 onwards: Compute MatMul dimension mismatch #18692

fxmarty commented Dec 4, 2023

fxmarty commented Dec 4, 2023 •

edited

github-actions bot commented Jan 22, 2024

fxmarty commented Mar 26, 2024 •

edited

tianleiwu commented Mar 26, 2024 •

edited

fxmarty commented Mar 27, 2024

thisisd3 commented Apr 18, 2024 •

edited

[CUDAExecutionProvider] Regression from ORT 1.15.0 onwards: Compute MatMul dimension mismatch #18692

[CUDAExecutionProvider] Regression from ORT 1.15.0 onwards: Compute MatMul dimension mismatch #18692

Comments

fxmarty commented Dec 4, 2023

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

fxmarty commented Dec 4, 2023 • edited

github-actions bot commented Jan 22, 2024

fxmarty commented Mar 26, 2024 • edited

tianleiwu commented Mar 26, 2024 • edited

fxmarty commented Mar 27, 2024

thisisd3 commented Apr 18, 2024 • edited

fxmarty commented Dec 4, 2023 •

edited

fxmarty commented Mar 26, 2024 •

edited

tianleiwu commented Mar 26, 2024 •

edited

thisisd3 commented Apr 18, 2024 •

edited