Add per-session thread pool work callbacks API by sdotpeng · Pull Request #27253 · microsoft/onnxruntime

sdotpeng · 2026-02-05T10:44:31Z

Description

Adds per-session thread pool work callbacks, allowing callers to hook into the enqueue/start/stop/abandon lifecycle of thread pool work items. The feature is gated behind a build flag (--enable_session_threadpool_callbacks) with zero overhead when disabled.

API additions

C API: OrtApi::SetPerSessionThreadPoolCallbacks — stores an OrtThreadPoolCallbacksConfig on the OrtEnv, applied to per-session thread pools
C++ wrapper: Ort::Env::SetPerSessionThreadPoolCallbacks
Versioned C config struct OrtThreadPoolCallbacksConfig with fields: on_enqueue, on_start_work, on_stop_work, on_abandon, user_context
Four callback typedefs: OrtThreadPoolWorkEnqueueFn, OrtThreadPoolWorkStartFn, OrtThreadPoolWorkStopFn, OrtThreadPoolWorkAbandonFn

Implementation

EigenNonBlockingThreadPool.h: Introduced a policy-based design with two compile-time callback policies:
- WorkNoCallbackPolicy: Work = std::function<void()>, all callback methods are trivial inlines eliminated by the compiler. Zero overhead for non-callback builds.
- WorkWithCallbackPolicy: Work = WorkItem bundling tasks with callback data; invokes user callbacks around task execution via MakeWork/Execute/OnEnqueue/OnAbandon methods.
- ThreadPoolTempl<Environment, CallbackPolicy> uses the policy for all callback-related operations.
- RunQueue::RevokeWithTag calls policy_->OnAbandon(e.w) on successful revocation; the policy implementation decides whether to invoke user callbacks.
threadpool.h: extended_eigen_threadpool_ changed to unique_ptr<ExtendedThreadPoolInterface> for type erasure across policy instantiations. EnableSpinning/DisableSpinning added to the virtual interface.
threadpool.cc: Single #ifdef selects policy at ThreadPoolTempl instantiation.
environment.h/.cc: Added SetPerSessionWorkCallbacks/GetPerSessionWorkCallbacks on Environment.
inference_session.cc: Propagates callbacks from Environment to per-session thread pool options.
thread_utils.h/.cc: Added callback fields to OrtThreadPoolParams and wiring in CreateThreadPoolHelper.
env.h: OrtThreadPoolCallbacksConfig* pointer in ThreadOptions.

Build

CMake option onnxruntime_ENABLE_SESSION_THREADPOOL_CALLBACKS; build.py argument --enable_session_threadpool_callbacks

Tests

8 callback-specific tests: Schedule, OnEnqueueOnly, NoCallbacks, ParallelFor, ParallelSection, Abandon, EnqueueReturnsNull, NoEnqueueWithStartStop
End-to-end C API test (SetPerSessionThreadPoolCallbacks via ModelBuilder with 1M-element Mul)
All 73 existing ThreadPool tests pass unchanged with both callback-enabled and callback-disabled builds (81/81 and 73/73 respectively)

Motivation and Context

Thread pool work callbacks enable telemetry, tracing, and resource management by providing visibility into when work is enqueued, executed, and abandoned in per-session thread pools. This is needed for production diagnostics and performance instrumentation scenarios.

sdotpeng · 2026-02-05T10:46:45Z

@microsoft-github-policy-service agree

chwarr

Looks pretty good. I see some issues around on_enqueue returning NULL.

include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h

include/onnxruntime/core/session/onnxruntime_c_api.h

onnxruntime/test/platform/threadpool_test.cc

include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h

include/onnxruntime/core/session/onnxruntime_c_api.h

chwarr

Also consider what happens to any callback state allocated in on_enqueue when the thread pool is shutdown and the work items do not run.

include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h

Copilot

Pull request overview

This PR adds a new opt-in C API (SetDefaultThreadPoolCallbacks) and corresponding C++ wrapper (Ort::Env::SetDefaultThreadPoolCallbacks) to register lifecycle callbacks for per-session thread pool work items. When enabled via the --session_threadpool_callbacks build flag, callbacks can observe when work is enqueued, started, stopped, or abandoned in per-session thread pools, enabling profiling, tracing, and custom scheduling instrumentation.

Changes:

New CMake option onnxruntime_SESSION_THREADPOOL_CALLBACKS and build script argument --session_threadpool_callbacks to opt into the feature
New callback types (OrtThreadPoolWorkEnqueueFn, etc.) in the C API header, with SetDefaultThreadPoolCallbacks added to the OrtApi struct (v1.25)
Thread pool implementation updated: introduces a WorkItem wrapper type bundling task + callback data, with InvokeOnEnqueue/InvokeWorkItem/InvokeOnAbandon helpers and revocation propagation

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`include/onnxruntime/core/session/onnxruntime_c_api.h`	Defines new callback typedefs and adds `SetDefaultThreadPoolCallbacks` to `OrtApi` v1.25
`include/onnxruntime/core/session/onnxruntime_cxx_api.h`	Adds C++ `Env::SetDefaultThreadPoolCallbacks` declaration
`include/onnxruntime/core/session/onnxruntime_cxx_inline.h`	Implements the C++ `Env::SetDefaultThreadPoolCallbacks` wrapper
`include/onnxruntime/core/session/environment.h`	Adds `ThreadPoolWorkCallbacks` struct and `default_session_work_callbacks_` field to `Environment`
`include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h`	Core implementation: `WorkItem` type, callback invocation helpers, queue revocation propagation
`onnxruntime/core/session/environment.cc`	Implements `SetDefaultSessionWorkCallbacks`
`onnxruntime/core/session/onnxruntime_c_api.cc`	Implements `OrtApis::SetDefaultThreadPoolCallbacks` C API entry point
`onnxruntime/core/session/ort_apis.h`	Declares `SetDefaultThreadPoolCallbacks` in the `OrtApis` namespace
`onnxruntime/core/session/inference_session.cc`	Propagates env-level callbacks to per-session thread pool options
`onnxruntime/core/util/thread_utils.h` / `.cc`	Adds callback fields to `OrtThreadPoolParams` and wires them into thread pool creation
`onnxruntime/core/platform/env.h`	Adds `ThreadPoolWorkCallbacks` struct and `work_callbacks` field to `ThreadOptions`
`cmake/CMakeLists.txt` / `adjust_global_compile_flags.cmake`	CMake option and compile definition for the feature flag
`tools/ci_build/build_args.py` / `build.py`	Build script support for `--session_threadpool_callbacks`
`onnxruntime/test/platform/threadpool_test.cc`	Unit tests for all callback scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

onnxruntime/core/session/environment.cc

include/onnxruntime/core/session/onnxruntime_c_api.h

include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h

include/onnxruntime/core/session/onnxruntime_cxx_api.h

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h

onnxruntime/core/session/inference_session.cc

include/onnxruntime/core/session/onnxruntime_c_api.h

onnxruntime/core/session/onnxruntime_c_api.cc

include/onnxruntime/core/session/onnxruntime_c_api.h

include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h

onnxruntime/test/platform/threadpool_test.cc

yuslepukhin · 2026-03-06T22:56:40Z

The PR makes TP perf dependent on the nature of callbacks. Is there any perf numbers that could characterize the perf under the supposed usage?

sdotpeng · 2026-03-10T05:02:21Z

The PR makes TP perf dependent on the nature of callbacks. Is there any perf numbers that could characterize the perf under the supposed usage?

Yes, the overhead does depend on the callback implementation. We benchmarked the three relevant configurations to characterize this:

Flag OFF: baseline, identical to ORT main
Flag ON, no callbacks registered: measures structural overhead (extra pointer in work item, null-check branches)
Flag ON, WinML callbacks registered: the intended usage, where each callback performs a lightweight NT kernel call on thread-local state

Methodology: Built ARM64 Release with --build_micro_benchmarks for each configuration. Ran onnxruntime_benchmark.exe threadpool microbenchmarks (BM_ThreadPoolParallelFor) with --benchmark_repetitions=3 on Snapdragon X Elite (12 cores, 2976 MHz). The benchmark parameters are iteration count (work volume) and cost (per-element cost that controls work partitioning across threads).

Results (mean real_time, percentages relative to Flag OFF):

Benchmark	Flag OFF (ns)	Flag ON, no callbacks (ns)	Flag ON, WinML callbacks (ns)
ParallelFor 100/1	236	260 (+10%)	399 (+69%)
ParallelFor 100/400	225	253 (+12%)	328 (+46%)
ParallelFor 1K/1	2,114	2,229 (+5%)	2,853 (+35%)
ParallelFor 1K/200	2,046	2,237 (+9%)	2,527 (+24%)
ParallelFor 10K/1	19,516	22,026 (+13%)	22,624 (+16%)
ParallelFor 10K/200	30,423	30,031 (-1%)	32,546 (+7%)
ParallelFor 20K/200	57,804	58,457 (+1%)	61,884 (+7%)
ParallelFor 40K/200	114,663	119,569 (+4%)	123,416 (+8%)
ParallelFor 80K/200	225,654	233,793 (+4%)	248,085 (+10%)
ParallelFor 160K/200	454,959	475,879 (+5%)	463,962 (+2%)

Flag OFF and Flag ON numbers were collected in separate runs, so +/-5% variation is expected run-to-run noise.

Summary: The callback overhead is a fixed 100-500ns per dispatch from three kernel calls per work item. On very short loops (100 iterations, total time 230ns), this dominates. On realistic workloads (10K+ iterations), the overhead is 2-10%. In practice, ORT inference kernels run in the hundreds-of-microseconds to milliseconds range, making the per-dispatch callback cost negligible. Builds without --session_threadpool_callbacks have exactly zero overhead — the feature is entirely compiled out.

include/onnxruntime/core/session/onnxruntime_c_api.h

yuslepukhin

🕐

… etc)

…don paths

onnxruntime/core/platform/env.h

include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h

include/onnxruntime/core/session/onnxruntime_c_api.h

yuslepukhin

include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h

include/onnxruntime/core/session/environment.h

onnxruntime/test/platform/threadpool_test.cc

onnxruntime/test/shared_lib/test_env_creation.cc

cmake/CMakeLists.txt

include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h

onnxruntime/test/platform/threadpool_test.cc

yuslepukhin

eserscor · 2026-03-25T01:28:33Z

/azp run Linux_TRT_Minimal_CUDA_Test_CI

azure-pipelines · 2026-03-25T01:28:44Z

Azure Pipelines successfully started running 1 pipeline(s).

sdotpeng marked this pull request as draft February 5, 2026 10:45

chwarr suggested changes Feb 7, 2026

View reviewed changes

chwarr reviewed Feb 9, 2026

View reviewed changes

include/onnxruntime/core/session/onnxruntime_c_api.h Outdated Show resolved Hide resolved

chwarr suggested changes Feb 11, 2026

View reviewed changes

include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h Outdated Show resolved Hide resolved

Siyuan Peng added 6 commits February 25, 2026 10:24

Experiment

a0ebc36

Delete WrapTaskWithCallbacks

f7e983a

Add C++ api and update tests

2c76396

Add C API test for SetGlobalCallbacks

bb5434b

Cleanup

c9efdf7

Use per session TP for callbacks + Cleanup

4cdf234

sdotpeng force-pushed the sdotpeng/ThreadPoolCallbacks branch from d1534de to 4cdf234 Compare February 25, 2026 18:30

sdotpeng requested a review from chwarr February 25, 2026 18:32

Add on_abandon for enqueue_data leak risk

1236ed5

skottmckay requested a review from Copilot March 4, 2026 22:43

Copilot started reviewing on behalf of skottmckay March 4, 2026 22:45 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

Address Copilot feedback

27c7d5d

sdotpeng marked this pull request as ready for review March 5, 2026 04:47

sdotpeng requested a review from Copilot March 5, 2026 05:00

Copilot started reviewing on behalf of sdotpeng March 5, 2026 05:02 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

skottmckay reviewed Mar 5, 2026

View reviewed changes

Address PR comments

c3c5838

sdotpeng requested a review from skottmckay March 9, 2026 16:39

skottmckay previously approved these changes Mar 10, 2026

View reviewed changes

Apply lintrunner

5b6207e

sdotpeng dismissed skottmckay’s stale review via 5b6207e March 10, 2026 14:38

yuslepukhin reviewed Mar 19, 2026

View reviewed changes

include/onnxruntime/core/session/onnxruntime_c_api.h Show resolved Hide resolved

yuslepukhin reviewed Mar 19, 2026

View reviewed changes

include/onnxruntime/core/session/onnxruntime_c_api.h Show resolved Hide resolved

yuslepukhin requested changes Mar 19, 2026

View reviewed changes

Siyuan Peng added 2 commits March 20, 2026 13:03

Address PR feedback (versioned struct, code duplication, readability,…

3328734

… etc)

Address PR feeback + template RunQueue for better readbility for aban…

071c306

…don paths

sdotpeng requested a review from yuslepukhin March 20, 2026 20:23

yuslepukhin reviewed Mar 20, 2026

View reviewed changes

onnxruntime/core/platform/env.h Outdated Show resolved Hide resolved

yuslepukhin reviewed Mar 20, 2026

View reviewed changes

include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h Outdated Show resolved Hide resolved

yuslepukhin reviewed Mar 20, 2026

View reviewed changes

include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h Outdated Show resolved Hide resolved

yuslepukhin reviewed Mar 20, 2026

View reviewed changes

include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h Outdated Show resolved Hide resolved

yuslepukhin reviewed Mar 20, 2026

View reviewed changes

include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h Show resolved Hide resolved

yuslepukhin reviewed Mar 20, 2026

View reviewed changes

include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h Show resolved Hide resolved

yuslepukhin reviewed Mar 20, 2026

View reviewed changes

include/onnxruntime/core/session/onnxruntime_c_api.h Outdated Show resolved Hide resolved

Address PR feedback (reusing struct, add documents, misc.)

6d846f0

sdotpeng requested a review from yuslepukhin March 20, 2026 23:21

yuslepukhin previously approved these changes Mar 20, 2026

View reviewed changes

Apply lintrunner

2030d08

sdotpeng dismissed yuslepukhin’s stale review via 2030d08 March 21, 2026 00:16

chwarr approved these changes Mar 21, 2026

View reviewed changes

Address PR feedback (compiler flag rename, comments, misc.)

9ee1d78

sdotpeng requested review from chwarr and yuslepukhin March 23, 2026 19:35

Apply lintrunner

6fd3fe4

chwarr approved these changes Mar 24, 2026

View reviewed changes

include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h Outdated Show resolved Hide resolved

onnxruntime/test/platform/threadpool_test.cc Show resolved Hide resolved

Address PR feedback (remove unused function)

23743b5

chwarr approved these changes Mar 24, 2026

View reviewed changes

yuslepukhin approved these changes Mar 24, 2026

View reviewed changes

sdotpeng merged commit f869122 into microsoft:main Mar 30, 2026
96 of 103 checks passed

Conversation

sdotpeng commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

API additions

Implementation

Build

Tests

Motivation and Context

Uh oh!

sdotpeng commented Feb 5, 2026

Uh oh!

chwarr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chwarr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuslepukhin commented Mar 6, 2026

Uh oh!

sdotpeng commented Mar 10, 2026

Uh oh!

Uh oh!

Uh oh!

yuslepukhin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuslepukhin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sdotpeng commented Feb 5, 2026 •

edited

Loading