Fix session logger use-after-free during EP teardown under VERBOSE logging#28274
Fix session logger use-after-free during EP teardown under VERBOSE logging#28274
Conversation
…execution_providers_ Move owned_session_logger_ declaration before execution_providers_ in InferenceSession so the logger outlives EPs during destruction. C++ destroys members in reverse declaration order, so previously the logger was destroyed before EPs, causing a dangling pointer when EP teardown callbacks tried to log (deterministic crash under VERBOSE logging). Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/3f374444-9d0d-4b35-ab2a-a2231bbfd994 Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
1d561d1 to
8ecb2c2
Compare
adrianlizarraga
left a comment
There was a problem hiding this comment.
The added tests fail for WASM debug build:
[ RUN ] InferenceSessionTests.SessionLoggerOutlivesEPsOnDestruction
1: unknown file: Failure
1: C++ exception with description "/mnt/vss/_work/onnxruntime/onnxruntime/onnxruntime/core/session/inference_session.cc:604 void onnxruntime::InferenceSession::ConstructorCommon(const SessionOptions &, const Environment &) session_env.EnvCreatedWithGlobalThreadPools() was false. When the session is not configured to use per session threadpools, the env must be created with the the CreateEnvWithGlobalThreadPools API.
1: " thrown in the test body.
1:
1: [ FAILED ] InferenceSessionTests.SessionLoggerOutlivesEPsOnDestruction (1 ms)
1322fd1 to
92e0d54
Compare
There was a problem hiding this comment.
Pull request overview
This PR addresses a crash caused by a session logger lifetime issue during execution provider (EP) teardown (use-after-free under VERBOSE logging), by adjusting InferenceSession member destruction ordering and adding regression tests to validate safe EP logging during session destruction.
Changes:
- Reorders
InferenceSessionmembers so the owned session logger is intended to outliveexecution_providers_during destruction. - Adds new
InferenceSessiontests with a mock EP that logs during its destructor to catch regressions. - Updates Azure Pipelines plugin CUDA stages (step ordering and removal of an artifact-merge stage).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tools/ci_build/github/azure-pipelines/stages/plugin-linux-cuda-stage.yml | Reorders a plugin build variables step relative to Python/setup steps. |
| tools/ci_build/github/azure-pipelines/stages/plugin-cuda-packaging-stage.yml | Removes the Linux x64/aarch64 artifact merge stage. |
| onnxruntime/test/framework/inference_session_test.cc | Adds regression tests using a mock EP that logs in its destructor. |
| onnxruntime/core/session/inference_session.h | Moves owned_session_logger_ earlier to change destruction order relative to EPs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…gging (#28274) ### Description Move `owned_session_logger_` declaration before `execution_providers_` in `InferenceSession` so the logger outlives EPs during member destruction. C++ destroys members in reverse declaration order. Previously: ``` ~owned_session_logger_ (L905) // logger freed ↓ ~execution_providers_ (L745) // EP teardown logs via dangling pointer → crash ``` After this change: ``` ~execution_providers_ (L750) // EP teardown logs safely ✅ ↓ ~owned_session_logger_ (L744) // logger freed, no remaining users ``` ### Motivation and Context Plugin EPs receive an `OrtLogger*` via `OrtEpFactory::CreateEp()`. During session destruction, EP teardown callbacks (e.g., `ReleaseNodeComputeInfos`) may log through this pointer. Because `owned_session_logger_` was declared after `execution_providers_`, the logger was already freed when EPs destructed — a use-after-free that crashes deterministically under VERBOSE logging. Affects all Plugin EPs that log in any teardown path. Reproduced with OpenVINO Plugin EP via `webnn_graph_impl_fuzzer` at VERBOSE level. Fixes #28234 ### Tests Added Added regression tests in `onnxruntime/test/framework/inference_session_test.cc`: - **`LoggingOnDestroyExecutionProvider`** — A mock EP that logs via its stored logger pointer in its destructor. If the logger has been freed, this triggers a use-after-free (detected by ASan or as a segfault). - **`SessionLoggerOutlivesEPsOnDestruction`** — Creates a session with VERBOSE logging and the mock EP, then destroys the session. Verifies that the logger was valid during EP teardown and that the teardown log message was captured. - **`SessionLoggerOutlivesEPsWithMultipleEPs`** — Same scenario with two mock EPs (distinct type names) to confirm all registered EPs can safely log during teardown. ### Verification Confirmed the tests are effective regression tests: | Scenario | Result | |----------|--------| | **With fix** (logger declared before EPs) | Both tests pass ✅ | | **Without fix** (logger declared after EPs, original bug) | `SessionLoggerOutlivesEPsOnDestruction` crashes with **Segmentation fault** (exit code 139) — use-after-free ❌ | This proves the member declaration order is the critical factor, and the tests will catch any future regression that reorders these members. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: tianleiwu <30328909+tianleiwu@users.noreply.github.com>
Description
Move
owned_session_logger_declaration beforeexecution_providers_inInferenceSessionso the logger outlives EPs during member destruction.C++ destroys members in reverse declaration order. Previously:
After this change:
Motivation and Context
Plugin EPs receive an
OrtLogger*viaOrtEpFactory::CreateEp(). During session destruction, EP teardown callbacks (e.g.,ReleaseNodeComputeInfos) may log through this pointer. Becauseowned_session_logger_was declared afterexecution_providers_, the logger was already freed when EPs destructed — a use-after-free that crashes deterministically under VERBOSE logging.Affects all Plugin EPs that log in any teardown path. Reproduced with OpenVINO Plugin EP via
webnn_graph_impl_fuzzerat VERBOSE level.Fixes #28234
Tests Added
Added regression tests in
onnxruntime/test/framework/inference_session_test.cc:LoggingOnDestroyExecutionProvider— A mock EP that logs via its stored logger pointer in its destructor. If the logger has been freed, this triggers a use-after-free (detected by ASan or as a segfault).SessionLoggerOutlivesEPsOnDestruction— Creates a session with VERBOSE logging and the mock EP, then destroys the session. Verifies that the logger was valid during EP teardown and that the teardown log message was captured.SessionLoggerOutlivesEPsWithMultipleEPs— Same scenario with two mock EPs (distinct type names) to confirm all registered EPs can safely log during teardown.Verification
Confirmed the tests are effective regression tests:
SessionLoggerOutlivesEPsOnDestructioncrashes with Segmentation fault (exit code 139) — use-after-free ❌This proves the member declaration order is the critical factor, and the tests will catch any future regression that reorders these members.