Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU bug with Unpooling layer and large size inputs #3113

Open
belgraviton opened this issue Mar 1, 2020 · 12 comments
Open

GPU bug with Unpooling layer and large size inputs #3113

belgraviton opened this issue Mar 1, 2020 · 12 comments
Assignees

Comments

@belgraviton
Copy link

Describe the bug
Large input size SegNet like model with unpooling layer (return_indices= True) fails to run on GPU.

Urgency
Using unpooling layer on GPU with large input size is blocked by this issue.

System information

  • OS Platform and Distribution: Linux Ubuntu 16.04
  • ONNX Runtime installed from: source
  • ONNX Runtime version: 1.1.1 (2cec09a, 2020-01-22)
  • Python version: 3.7
  • Visual Studio version (if applicable): -
  • GCC/Compiler version (if compiling from source): 5.4.0
  • CUDA/cuDNN version: 10.1 / 7.6.4
  • GPU model and memory: 1080 Ti / 11 Gb

To Reproduce
Link to source and models.
Compile test.cpp with ionnx class interface to onnxruntime and run it with command:
“test model_name.onnx”

Expected behavior
Should successfully run.

Additional context

All tests are carried out with C++ (CPU and GPU) and python interface (CPU). “Upsample” model converted directly from pytorch. “Unpooling” models were created in python manually (example link).

Input size Upscale layer CPU, python and C++ GPU, C++
128x64x1 Unpooling OK OK
512x256x1 Unpooling OK FAIL
512x256x1 Upsample OK OK

I get “Process finished with exit code 135 (interrupted by signal 7: SIGEMT)” on Ubuntu 16.04 with onnxruntime built from source.

I get “Ort::Exception at memory location 0x000000A007AFBB50” error in release mode and “Exception thrown: read access violation. Y_data was 0x111011101110111” in debug mode for similar models on Windows 10 with onnxruntime prebuild v1.1.

@belgraviton
Copy link
Author

There are 2 main functions in ionnx interface: initialization and run.

Zero filled input is used to run model in INITIALIZATION function:

  • 1st run ALWAYS succeeds (even in GPU mode with large input size)
  • 2nd run fails in GPU mode with large input size

This behavior is close to issue #2700

Real or zero input run in “RUN” function fails in GPU mode with large input size.

@hariharans29
Copy link
Member

Can you please try this with the 1.2 release please ? I ll take a look if it still occurs with the latest release.

@hariharans29 hariharans29 self-assigned this Apr 8, 2020
@belgraviton
Copy link
Author

I have checked models with the 1.2 release build. Results are the SAME. Unpooling model with 512x256 input is FAILED to run on GPU with C++ interface.

@belgraviton
Copy link
Author

@hariharans29 Are any ideas in bug reasons?

@hariharans29
Copy link
Member

Will take a look at this next week. Sorry for the delay and thanks for confirming.

@belgraviton
Copy link
Author

@hariharans29 Had you a chance to look on the bug?

@stale
Copy link

stale bot commented Aug 8, 2020

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@stale stale bot added the wontfix label Aug 8, 2020
@stale
Copy link

stale bot commented Aug 15, 2020

This issue has been automatically closed due to inactivity. Please reactivate if further support is needed.

@stale stale bot closed this as completed Aug 15, 2020
@hariharans29 hariharans29 reopened this Aug 15, 2020
@stale stale bot removed the wontfix label Aug 15, 2020
@hariharans29
Copy link
Member

Sorry - I never dis get a chance to look at this. I ll try to do so, keeping this open.

@belgraviton
Copy link
Author

Thank you

@stale
Copy link

stale bot commented Oct 17, 2020

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@stale stale bot added the stale issues that have not been addressed in a while; categorized by a bot label Oct 17, 2020
@hariharans29 hariharans29 removed the stale issues that have not been addressed in a while; categorized by a bot label Dec 1, 2020
@faxu faxu removed the type:bug label Aug 18, 2021
@stale
Copy link

stale bot commented Apr 19, 2022

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@stale stale bot added the stale issues that have not been addressed in a while; categorized by a bot label Apr 19, 2022
@hariharans29 hariharans29 removed the stale issues that have not been addressed in a while; categorized by a bot label Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants