ONNX runtime crashes randomly in C++ when running a model #19160

Rayndell · 2024-01-16T13:24:18Z

Describe the issue

I exported a TensorFlow model (frozen, in .pb format) to an ONNX model, that works perfectly fine in Python. Troubles arise when I try to do the same with the ONNX runtime in C++. When I run the session it randomly crashes, sometimes on the first execution, sometimes on the third or fourth... I use OpenCV to read an image to feed the model, but it doesn't seem to come from it since the same occurs with tensors fed with random values, even when not linking the program to the OpenCV libraries.

The ONNX model I use is too big to be uploaded here, so here is a download link:
https://evolucare-my.sharepoint.com/:u:/p/a_ducournau/Ee-QIfcU6nJKv5Mpu1oR7v4BfprDsxvoelBZ3cKVt6KnaQ?e=Y3mWGJ

To reproduce

`basic_string<ORTCHAR_T> model_path = L"ResNetV1_101_ImageNet.onnx";
string image_path = "D:/ONNX/Images/Classification/030_0001.jpg";

Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "example-model-explorer");
Ort::SessionOptions session_options;
Ort::Session session = Ort::Session(env, model_path.c_str(), session_options);

vector<char*> inputNames;
for (size_t i = 0; i < session.GetInputCount(); i++)
{
    inputNames.emplace_back(session.GetInputNameAllocated(i, allocator).get());
}
vector<char*> outputNames;
for (size_t i = 0; i < session.GetOutputCount(); i++)
{
    outputNames.emplace_back(session.GetOutputNameAllocated(i, allocator).get());
}

size_t numElements = 200 * 200 * 3;
array<int64_t, 3> inputShape{ 200, 200, 3 };
array<int64_t, 2> outputShape{ 1, 1000 };

std::vector<float> inputTensorValues(numElements);
for (int i = 0; i < numElements; i++) inputTensorValues[i] = 0;

for (int i = 0; i < 5; i++)
{
    time_t start = clock();
    
    vector<Ort::Value> inputTensors;
    vector<Ort::Value> outputTensors;
    Ort::MemoryInfo memoryInfo = Ort::MemoryInfo::CreateCpu(
        OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault);
    inputTensors.push_back(Ort::Value::CreateTensor<float>(
        memoryInfo, inputTensorValues.data(), numElements, inputShape.data(),
        inputShape.size()));
    vector<float> outputTensorValues(1000);
    outputTensors.push_back(Ort::Value::CreateTensor<float>(
        memoryInfo, outputTensorValues.data(), 1000,
        outputShape.data(), outputShape.size()));
    
    session.Run(Ort::RunOptions{ nullptr }, inputNames.data(),
        inputTensors.data(), 1, outputNames.data(),
        outputTensors.data(), 1);
}

Urgency

No response

Platform

Windows

OS Version

Server 2016

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16.3

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

xadupre · 2024-01-17T16:48:34Z

I assume the same model works with the python bindings. Did you try to enable the logs to have more information about the crash?

Rayndell · 2024-01-17T17:13:55Z

OK, I figured out the origin of the issue. It turned out that using a vector<char*> for input and output names was the problem. I temporarily changed it to arrays of const chars to make it work.

array<const char*, 1> input_names = { "input:0" };
array<const char*, 1> output_names = { "Softmax:0" };

Now thinking about a means to provide names more dynamically.

xadupre · 2024-01-17T17:20:38Z

You should be able to find the information you need here: https://onnxruntime.ai/docs/api/c/struct_ort_1_1detail_1_1_const_session_impl.html.

Rayndell · 2024-01-17T17:45:25Z

Thanks, but it does not really solve anything. The problem is that the char* you receive from session.GetInputNameAllocated(i, allocator).get() is deallocated after a short period of time, so storing it into the inputNames vector is not ok because it will point to a free memory block. This problem is also pointed in this thread: #14157, but it does not really provide a satisfying solution since it implies to move the unique_ptr of GetInputNameAllocated into a shared_ptr, which does not work either in my case. Simply copying the char* does not work because it is also deallocated too quickly before you can even do that. I'm out of solutions here.

Rayndell · 2024-01-17T18:12:15Z

OK, so it turn out you HAVE to store the shared pointer outside of the loop to have access to the memory block it points:

std::shared_ptr outputName;
vector<char*> outputNames;
for (size_t i = 0; i < session.GetOutputCount(); i++)
{
outputName = std::move(session.GetOutputNameAllocated(i, allocator));
outputNames.push_back(outputName.get());
}

Doing this instead of creating a temporary variable outputName works, but is not really satisfactory, since it involves transforming a unique_ptr into a shared_ptr which you have to store, while its only purpose it to hold a reference to the memory pointed in the outputNames vector... If anybody have a better solution I am open to it.

github-actions bot added the platform:windows issues related to the Windows platform label Jan 16, 2024

Rayndell closed this as completed Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX runtime crashes randomly in C++ when running a model #19160

ONNX runtime crashes randomly in C++ when running a model #19160

Rayndell commented Jan 16, 2024 •

edited by xadupre

Loading

xadupre commented Jan 17, 2024

Rayndell commented Jan 17, 2024

xadupre commented Jan 17, 2024

Rayndell commented Jan 17, 2024

Rayndell commented Jan 17, 2024

ONNX runtime crashes randomly in C++ when running a model #19160

ONNX runtime crashes randomly in C++ when running a model #19160

Comments

Rayndell commented Jan 16, 2024 • edited by xadupre Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

xadupre commented Jan 17, 2024

Rayndell commented Jan 17, 2024

xadupre commented Jan 17, 2024

Rayndell commented Jan 17, 2024

Rayndell commented Jan 17, 2024

Rayndell commented Jan 16, 2024 •

edited by xadupre

Loading