Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault on session.Run() with custom model on c++ API #5630

Closed
ManuelPalermo opened this issue Oct 29, 2020 · 4 comments
Closed

Comments

@ManuelPalermo
Copy link

Describe the bug
Segmentation fault when doing session.Run() with custom onnx model on c++ API. The models works as expected on Python however when running the same model on the C++ API it gets a segmentation fault without any error or info to help debug. Any ideas on what might be wrong or how to get debug information inside run()?

Urgency
Trying to finish integrating and deploying model for a project demo due tomorrow.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
  • ONNX Runtime installed from (source or binary): binary
  • ONNX Runtime version: onnxruntime-linux-x64-1.5.2.tgz
  • Python version: 3.7.4
  • GCC/Compiler version (if compiling from source): g++-10 -std=c++14
  • CUDA/cuDNN version: Running on CPU

To Reproduce

  • Perform session.Run() with custom model on two image inputs(see shapes bellow) using the C++ API

Expected behavior
Model to perform inference correctly when doing session.Run(...);

Screenshots
Model inputs/outputs (Netron):
image

Code bits:
image

Additional context
Model:
walker_full_model.zip

@ManuelPalermo
Copy link
Author

Error stack using Valgrind:

==14559== Invalid read of size 8
==14559== at 0x564145F: onnxruntime::logging::LoggingManager::Log(std::string const&, onnxruntime::logging::Capture const&) const (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/extra/onnxruntime/lib/libonnxruntime.so.1.5.2)
==14559== by 0x5640F15: onnxruntime::logging::Capture::~Capture() (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/extra/onnxruntime/lib/libonnxruntime.so.1.5.2)
==14559== by 0x55D4E57: onnxruntime::SequentialExecutor::Execute(onnxruntime::SessionState const&, std::vector<int, std::allocator > const&, std::vector<OrtValue, std::allocator > const&, std::vector<int, std::allocator > const&, std::vector<OrtValue, std::allocator >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)>, std::hash, std::equal_to, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)> > > > const&, onnxruntime::logging::Logger const&) (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/extra/onnxruntime/lib/libonnxruntime.so.1.5.2)
==14559== by 0x55C45FE: onnxruntime::utils::ExecuteGraphImpl(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager const&, std::vector<OrtValue, std::allocator > const&, std::vector<OrtValue, std::allocator >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)>, std::hash, std::equal_to, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)> > > > const&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, bool) (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/extra/onnxruntime/lib/libonnxruntime.so.1.5.2)
==14559== by 0x55C5607: onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, std::vector<OrtValue, std::allocator > const&, std::vector<OrtValue, std::allocator >&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, bool) (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/extra/onnxruntime/lib/libonnxruntime.so.1.5.2)
==14559== by 0x516B8D9: onnxruntime::InferenceSession::Run(OrtRunOptions const&, std::vector<std::string, std::allocatorstd::string > const&, std::vector<OrtValue, std::allocator > const&, std::vector<std::string, std::allocatorstd::string > const&, std::vector<OrtValue, std::allocator >, std::vector<OrtDevice, std::allocator > const) (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/extra/onnxruntime/lib/libonnxruntime.so.1.5.2)
==14559== by 0x513863D: OrtApis::Run(OrtSession*, OrtRunOptions const*, char const* const*, OrtValue const* const*, unsigned long, char const* const*, unsigned long, OrtValue**) (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/extra/onnxruntime/lib/libonnxruntime.so.1.5.2)
==14559== by 0x114C3C: Ort::Session::Run(Ort::RunOptions const&, char const* const*, Ort::Value const*, unsigned long, char const* const*, Ort::Value*, unsigned long) (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/algorithm_joint_detector_nns/src/backend/run_debug_cpp_onnx.out)
==14559== by 0x114B8C: Ort::Session::Run(Ort::RunOptions const&, char const* const*, Ort::Value const*, unsigned long, char const* const*, unsigned long) (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/algorithm_joint_detector_nns/src/backend/run_debug_cpp_onnx.out)
==14559== by 0x113217: asbgo::vision::PoseDetector::detect(cv::Mat const&, cv::Mat const&, cv::Mat const&, cv::Mat const&) (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/algorithm_joint_detector_nns/src/backend/run_debug_cpp_onnx.out)
==14559== by 0x10DDF7: main (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/algorithm_joint_detector_nns/src/backend/run_debug_cpp_onnx.out)
==14559== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==14559==
==14559==
==14559== Process terminating with default action of signal 11 (SIGSEGV)
==14559== Access not within mapped region at address 0x0
==14559== at 0x564145F: onnxruntime::logging::LoggingManager::Log(std::string const&, onnxruntime::logging::Capture const&) const (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/extra/onnxruntime/lib/libonnxruntime.so.1.5.2)
==14559== by 0x5640F15: onnxruntime::logging::Capture::~Capture() (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/extra/onnxruntime/lib/libonnxruntime.so.1.5.2)
==14559== by 0x55D4E57: onnxruntime::SequentialExecutor::Execute(onnxruntime::SessionState const&, std::vector<int, std::allocator > const&, std::vector<OrtValue, std::allocator > const&, std::vector<int, std::allocator > const&, std::vector<OrtValue, std::allocator >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)>, std::hash, std::equal_to, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)> > > > const&, onnxruntime::logging::Logger const&) (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/extra/onnxruntime/lib/libonnxruntime.so.1.5.2)
==14559== by 0x55C45FE: onnxruntime::utils::ExecuteGraphImpl(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager const&, std::vector<OrtValue, std::allocator > const&, std::vector<OrtValue, std::allocator >&, std::unordered_map<unsigned long, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)>, std::hash, std::equal_to, std::allocator<std::pair<unsigned long const, std::function<onnxruntime::common::Status (onnxruntime::TensorShape const&, OrtMemoryInfo const&, OrtValue&, bool&)> > > > const&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, bool) (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/extra/onnxruntime/lib/libonnxruntime.so.1.5.2)
==14559== by 0x55C5607: onnxruntime::utils::ExecuteGraph(onnxruntime::SessionState const&, onnxruntime::FeedsFetchesManager&, std::vector<OrtValue, std::allocator > const&, std::vector<OrtValue, std::allocator >&, ExecutionMode, bool const&, onnxruntime::logging::Logger const&, bool) (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/extra/onnxruntime/lib/libonnxruntime.so.1.5.2)
==14559== by 0x516B8D9: onnxruntime::InferenceSession::Run(OrtRunOptions const&, std::vector<std::string, std::allocatorstd::string > const&, std::vector<OrtValue, std::allocator > const&, std::vector<std::string, std::allocatorstd::string > const&, std::vector<OrtValue, std::allocator >, std::vector<OrtDevice, std::allocator > const) (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/extra/onnxruntime/lib/libonnxruntime.so.1.5.2)
==14559== by 0x513863D: OrtApis::Run(OrtSession*, OrtRunOptions const*, char const* const*, OrtValue const* const*, unsigned long, char const* const*, unsigned long, OrtValue**) (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/extra/onnxruntime/lib/libonnxruntime.so.1.5.2)
==14559== by 0x114C3C: Ort::Session::Run(Ort::RunOptions const&, char const* const*, Ort::Value const*, unsigned long, char const* const*, Ort::Value*, unsigned long) (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/algorithm_joint_detector_nns/src/backend/run_debug_cpp_onnx.out)
==14559== by 0x114B8C: Ort::Session::Run(Ort::RunOptions const&, char const* const*, Ort::Value const*, unsigned long, char const* const*, unsigned long) (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/algorithm_joint_detector_nns/src/backend/run_debug_cpp_onnx.out)
==14559== by 0x113217: asbgo::vision::PoseDetector::detect(cv::Mat const&, cv::Mat const&, cv::Mat const&, cv::Mat const&) (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/algorithm_joint_detector_nns/src/backend/run_debug_cpp_onnx.out)
==14559== by 0x10DDF7: main (in /home/palermo/Programing_Workspace/TeseMestrado/na_high_evel/src/algorithm_joint_detector_nns/src/backend/run_debug_cpp_onnx.out)
==14559== If you believe this happened as a result of a stack
==14559== overflow in your program's main thread (unlikely but
==14559== possible), you can try to increase the size of the
==14559== main thread stack using the --main-stacksize= flag.
==14559== The main thread stack size used in this run was 8388608.
==14559==
==14559== HEAP SUMMARY:
==14559== in use at exit: 57,468,954 bytes in 157,446 blocks
==14559== total heap usage: 1,167,475 allocs, 1,010,029 frees, 544,601,708 bytes allocated
==14559==
==14559== LEAK SUMMARY:
==14559== definitely lost: 0 bytes in 0 blocks
==14559== indirectly lost: 0 bytes in 0 blocks
==14559== possibly lost: 5,330,485 bytes in 2,411 blocks
==14559== still reachable: 52,034,765 bytes in 154,200 blocks
==14559== of which reachable via heuristic:
==14559== stdstring : 674,859 bytes in 14,681 blocks
==14559== length64 : 5,304 bytes in 87 blocks
==14559== newarray : 5,192 bytes in 54 blocks
==14559== suppressed: 0 bytes in 0 blocks
==14559== Rerun with --leak-check=full to see details of leaked memory
==14559==

@pranavsharma
Copy link
Contributor

Can you attach your test code here? I can't repro this with onnx_test_runner using randomly generated inputs for 'img' and 'depth'.

@ManuelPalermo
Copy link
Author

Here is the code to reproduce the error(just run the bash file):
test_cpp_onnxruntime.zip

Thank you for the help!

@pranavsharma
Copy link
Contributor

pranavsharma commented Oct 30, 2020

This is an issue with your program, not onnxruntime. Ort::Env obj is going out of scope in the PoseDetector constructor. You should make it a member of your class as env needs to live until you've no need for any more inferencing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants