Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDAExecutionProvider] Regression from ORT 1.15.0 onwards: Compute MatMul dimension mismatch #18692

Closed
fxmarty opened this issue Dec 4, 2023 · 6 comments
Labels
ep:CUDA issues related to the CUDA execution provider model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. stale issues that have not been addressed in a while; categorized by a bot

Comments

@fxmarty
Copy link
Contributor

fxmarty commented Dec 4, 2023

Describe the issue

Hi, I noticed a regression in onnxruntime-gpu==1.15.1 and onnxruntime-gpu==1.16.3 (no problem on onnxruntime-gpu==1.14.1.

The following code runs fine on CPUExecutionProvider for all three ORT versions, but fails on CUDAExecutionProvider for 1.15.1 and 1.16.3.

import onnxruntime
from transformers import DetrImageProcessor
import torch
from PIL import Image
import requests
from transformers.models.detr.modeling_detr import DetrObjectDetectionOutput

from optimum.onnxruntime import ORTModelForCustomTasks

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

model = onnxruntime.InferenceSession("/path/to/model.onnx", providers=["CUDAExecutionProvider"])

processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50", revision="no_timm")

inputs = processor(images=image, return_tensors="np")
inputs = {
    "pixel_values": inputs["pixel_values"]
}

outputs = model.run(None, inputs)

with the error:

Traceback (most recent call last):
  File "<tmp 1>", line 22, in <module>
    outputs = model.run(None, inputs)
  File "/home/fxmarty/anaconda3/envs/hf-inf/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 217, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running MatMul node. Name:'/model/decoder/layers.0/self_attn/out_proj/MatMul' Status Message: matmul_helper.h:59 Compute MatMul dimension mismatch

To reproduce

As above. Reproduce with https://huggingface.co/fxmarty/bugged-detr-ort-cuda/tree/main

Using CUDA 11.7, which should be compatible according to https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html

Urgency

medium

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

as above

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 11.7

@github-actions github-actions bot added ep:CUDA issues related to the CUDA execution provider model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. labels Dec 4, 2023
@fxmarty
Copy link
Contributor Author

fxmarty commented Dec 4, 2023

cc @tianleiwu @yufenglee

I guess one may compile with ORT_DEBUG_NODE_IO_DUMP_SHAPE_DATA=1 to see which node the issue comes from.

Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Jan 22, 2024
@fxmarty
Copy link
Contributor Author

fxmarty commented Mar 26, 2024

Hi @yufenglee @tianleiwu, this issue is not stale and reported by user for an other architecture (table-transformer), with onnxruntime-gpu==1.17.1: huggingface/optimum#1774

@tianleiwu
Copy link
Contributor

tianleiwu commented Mar 26, 2024

The issue is resolved in the main branch.

I did reproduce it in 1.17.1:

  • The issue is gone after disabling all graph optimizations in session options.
  • I changed graph optimization level to basic, the issue is still there.

So the issue is caused by some basic level graph optimization. If there is time, some debugging (by disabling basic level graph optimization one by one) can find which optimizer is the cause.

@fxmarty
Copy link
Contributor Author

fxmarty commented Mar 27, 2024

Thanks a lot @tianleiwu

@fxmarty fxmarty closed this as completed Mar 27, 2024
@thisisd3
Copy link

thisisd3 commented Apr 18, 2024

Hi all, is this resolved in 1.17.3 released 2 days ago? @tianleiwu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:CUDA issues related to the CUDA execution provider model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

3 participants