Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-zero status code returned #11548

Open
pythondever opened this issue May 17, 2022 · 21 comments
Open

Non-zero status code returned #11548

pythondever opened this issue May 17, 2022 · 21 comments
Assignees
Labels
core runtime issues related to core runtime ep:CUDA issues related to the CUDA execution provider

Comments

@pythondever
Copy link

Describe the bug
I exported an onnx model can run in CPUExecutionProvider,But when I use GPUExecutionProvider got Error

2022-05-17 17:30:35.309693323 [E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDNN failure 3: CUDNN_STATUS_BAD_PARAM ; GPU=0 ; hostname=pc-Z390-GAMING-X ; expr=cudnnAddTensor(Base::CudnnHandle(), &alpha, Base::s_.z_tensor, Base::s_.z_data, &alpha, Base::s_.y_tensor, Base::s_.y_data);
2022-05-17 17:30:35.309720084 [E:onnxruntime:, sequential_executor.cc:364 Execute] Non-zero status code returned while running FusedConv node. Name:'Conv_34' Status Message: CUDNN error executing cudnnAddTensor(Base::CudnnHandle(), &alpha, Base::s_.z_tensor, Base::s_.z_data, &alpha, Base::s_.y_tensor, Base::s_.y_data)
Traceback (most recent call last):
File "pred_onnx.py", line 97, in
pred_onnx(model, im_path)
File "pred_onnx.py", line 80, in pred_onnx
pred = session.run([output_name], {input_name: image})[0]
File "/home/pc/anaconda3/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 192, in run
return self.sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedConv node. Name:'Conv_34' Status Message: CUDNN error executing cudnnAddTensor(Base::CudnnHandle(), &alpha, Base::s
.z_tensor, Base::s_.z_data, &alpha, Base::s_.y_tensor, Base::s_.y_data)

how can i fix this error ?

System information

  • OS Platform and Distribution: ubuntu18.04
  • ONNX Runtime installed from (source or binary):
  • ONNX Runtime version:1.9.0
  • Python version:3.8.8
  • Visual Studio version (if applicable):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:11.1/8.0
  • GPU model and memory:Nvidia Geforce 3070
@yuslepukhin
Copy link
Member

It may be a bug. Try the latest ORT release, or share with us your model to investigate.

@yuslepukhin yuslepukhin added the ep:CUDA issues related to the CUDA execution provider label May 17, 2022
@pythondever
Copy link
Author

It may be a bug. Try the latest ORT release, or share with us your model to investigate.

I've tried that,but it doesn't work

@pythondever
Copy link
Author

It may be a bug. Try the latest ORT release, or share with us your model to investigate.

model file https://github.com/pythondever/_onnx_demo_error

@pythondever
Copy link
Author

@yuslepukhin Is there anything update?

@yuslepukhin
Copy link
Member

@yuslepukhin Is there anything update?

I did not get a chance to look at it yet.

@beyonehan
Copy link

beyonehan commented May 25, 2022

@yuslepukhin Is there anything update?

I did not get a chance to look at it yet.

I have the same problem as well. I have tried everything I can do but still doesn't work. Could you give me some suggestions? Many thanks.

@yuslepukhin
Copy link
Member

Please, follow the issue reporting template and submit a program that reproduces the behavior, that would speed up things a lot. Please, attach the input data and your model if you are able to. Then we would be able to determine what the issue is and the best course of action.

@pythondever
Copy link
Author

@yuslepukhin thanks reply!
I push my demo program please follow this url https://github.com/pythondever/_onnx_demo_error
and just run python pred_onnx.py, you can reproduce the error

@yuslepukhin
Copy link
Member

@yuslepukhin thanks reply! I push my demo program please follow this url https://github.com/pythondever/_onnx_demo_error and just run python pred_onnx.py, you can reproduce the error

I have seen the link, it shows up as empty folder.

@pythondever
Copy link
Author

@yuslepukhin thanks reply! I push my demo program please follow this url https://github.com/pythondever/_onnx_demo_error and just run python pred_onnx.py, you can reproduce the error

I have seen the link, it shows up as empty folder.

really sorry,I copied the address again,It's ready to go,

https://github.com/pythondever/_onnx_demo_error

@pythondever pythondever reopened this May 27, 2022
@Eliza-and-black
Copy link

I got a similar error:
"onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Mul node. Name:'Mul_710' Status Message: Mul_710: right operand cannot broadcast on dim 3 LeftShape: {1,3,15,25,2}, RightShape: {1,3,20,20,2}"

@beyonehan
Copy link

2022-05-17 17:30:35.309693323 [E:onnxruntime:Default, cuda_call.cc:118 CudaCall] CUDNN failure 3: CUDNN_STATUS_BAD_PARAM ; GPU=0 ; hostname=pc-Z390-GAMING-X ; expr=cudnnAddTensor(Base::CudnnHandle(), &alpha, Base::s_.z_tensor, Base::s_.z_data, &alpha, Base::s_.y_tensor, Base::s_.y_data)

Hi , I am really looking forward your answer.Many thanks. I have same problem with pythondever. I have tired everything that I could do for that problems, but it still remains.

@hosea7456
Copy link

@yuslepukhin
I recently came across the same issue when I running my code.
Describe the bug:
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedConv node. Name:'p2o.Conv.68' Status Message: CUDNN error executing cudnnAddTensor(Base::CudnnHandle(), &alpha, Base::s_.z_tensor, Base::s_.z_data, &alpha, Base::s_.y_tensor, Base::s_.y_data)

And my model can be get by the following link:
https://drive.google.com/file/d/1GiNLsQropC2YJtlmWW2z6RWBVwcyDEQh/view?usp=sharing

Any help will be gratefully appreciated!

@yuslepukhin
Copy link
Member

I got a similar error: "onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Mul node. Name:'Mul_710' Status Message: Mul_710: right operand cannot broadcast on dim 3 LeftShape: {1,3,15,25,2}, RightShape: {1,3,20,20,2}"

This is a very different issue and problem is in the model. The two shapes can not be broadcasted. Please, report this separate.

@yuslepukhin
Copy link
Member

@yuslepukhin thanks reply! I push my demo program please follow this url https://github.com/pythondever/_onnx_demo_error and just run python pred_onnx.py, you can reproduce the error

I have seen the link, it shows up as empty folder.

really sorry,I copied the address again,It's ready to go,

https://github.com/pythondever/_onnx_demo_error

This still does not open.

@yuslepukhin
Copy link
Member

@yuslepukhin I recently came across the same issue when I running my code. Describe the bug: onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedConv node. Name:'p2o.Conv.68' Status Message: CUDNN error executing cudnnAddTensor(Base::CudnnHandle(), &alpha, Base::s_.z_tensor, Base::s_.z_data, &alpha, Base::s_.y_tensor, Base::s_.y_data)

And my model can be get by the following link: https://drive.google.com/file/d/1GiNLsQropC2YJtlmWW2z6RWBVwcyDEQh/view?usp=sharing

Any help will be gratefully appreciated!

Got this, would be nice if you shared a script to drive this with inputs. Does this work on CPU?

@hosea7456
Copy link

@yuslepukhin I recently came across the same issue when I running my code. Describe the bug: onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedConv node. Name:'p2o.Conv.68' Status Message: CUDNN error executing cudnnAddTensor(Base::CudnnHandle(), &alpha, Base::s_.z_tensor, Base::s_.z_data, &alpha, Base::s_.y_tensor, Base::s_.y_data)
And my model can be get by the following link: https://drive.google.com/file/d/1GiNLsQropC2YJtlmWW2z6RWBVwcyDEQh/view?usp=sharing
Any help will be gratefully appreciated!

Got this, would be nice if you shared a script to drive this with inputs. Does this work on CPU?

Many thanks for your immediate reply!
Yes, it runs well on CPU but filed to run on GPU. And my environment information is: ONNXRuntime 1.11.1, CUDA11.4, CUDNN8.2
Here is the code for running inference.
https://drive.google.com/drive/folders/1fWoIE2q1t6DJYgyvCIPQcwrsZjdXR4rY?usp=sharing

@yuslepukhin
Copy link
Member

I have been able to reproduce your issue. For an immediate workaround you enable only basic optimizations for CUDA runs.
sess_opt = ort.SessionOptions()
sess_opt.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_BASIC
session = ort.InferenceSession(args.onnx_file_path, sess_options=sess_opt, providers=providers)

This will likely result in some performance loss, while you are waiting for a fix.

@yuslepukhin yuslepukhin added core runtime issues related to core runtime component:operator labels Jun 17, 2022
@pythondever
Copy link
Author

ort.GraphOptimizationLevel.ORT_ENABLE_BASIC

Yes, it works now thanks!

@hosea7456
Copy link

I have been able to reproduce your issue. For an immediate workaround you enable only basic optimizations for CUDA runs. sess_opt = ort.SessionOptions() sess_opt.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_BASIC session = ort.InferenceSession(args.onnx_file_path, sess_options=sess_opt, providers=providers)

This will likely result in some performance loss, while you are waiting for a fix.

The problem was solved, thanks a lot. And looking forwaord the fix version!

felixhjh added a commit to felixhjh/fastdeploy_ci that referenced this issue Nov 26, 2022
@gcunhase
Copy link

gcunhase commented Dec 1, 2023

Has this been fixed? I'm getting the following error on mine:

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Conv node. Name:'/stages.0/stages.0.1/dwconv/Conv' Status Message: X num_dims does not match W num_dims. X: {1,96,128,256} W: {96}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core runtime issues related to core runtime ep:CUDA issues related to the CUDA execution provider
Projects
None yet
Development

No branches or pull requests

8 participants