Add per-session thread pool work callbacks API#27253
Add per-session thread pool work callbacks API#27253sdotpeng merged 20 commits intomicrosoft:mainfrom
Conversation
|
@microsoft-github-policy-service agree |
chwarr
left a comment
There was a problem hiding this comment.
Looks pretty good. I see some issues around on_enqueue returning NULL.
chwarr
left a comment
There was a problem hiding this comment.
Also consider what happens to any callback state allocated in on_enqueue when the thread pool is shutdown and the work items do not run.
d1534de to
4cdf234
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds a new opt-in C API (SetDefaultThreadPoolCallbacks) and corresponding C++ wrapper (Ort::Env::SetDefaultThreadPoolCallbacks) to register lifecycle callbacks for per-session thread pool work items. When enabled via the --session_threadpool_callbacks build flag, callbacks can observe when work is enqueued, started, stopped, or abandoned in per-session thread pools, enabling profiling, tracing, and custom scheduling instrumentation.
Changes:
- New CMake option
onnxruntime_SESSION_THREADPOOL_CALLBACKSand build script argument--session_threadpool_callbacksto opt into the feature - New callback types (
OrtThreadPoolWorkEnqueueFn, etc.) in the C API header, withSetDefaultThreadPoolCallbacksadded to theOrtApistruct (v1.25) - Thread pool implementation updated: introduces a
WorkItemwrapper type bundling task + callback data, withInvokeOnEnqueue/InvokeWorkItem/InvokeOnAbandonhelpers and revocation propagation
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
include/onnxruntime/core/session/onnxruntime_c_api.h |
Defines new callback typedefs and adds SetDefaultThreadPoolCallbacks to OrtApi v1.25 |
include/onnxruntime/core/session/onnxruntime_cxx_api.h |
Adds C++ Env::SetDefaultThreadPoolCallbacks declaration |
include/onnxruntime/core/session/onnxruntime_cxx_inline.h |
Implements the C++ Env::SetDefaultThreadPoolCallbacks wrapper |
include/onnxruntime/core/session/environment.h |
Adds ThreadPoolWorkCallbacks struct and default_session_work_callbacks_ field to Environment |
include/onnxruntime/core/platform/EigenNonBlockingThreadPool.h |
Core implementation: WorkItem type, callback invocation helpers, queue revocation propagation |
onnxruntime/core/session/environment.cc |
Implements SetDefaultSessionWorkCallbacks |
onnxruntime/core/session/onnxruntime_c_api.cc |
Implements OrtApis::SetDefaultThreadPoolCallbacks C API entry point |
onnxruntime/core/session/ort_apis.h |
Declares SetDefaultThreadPoolCallbacks in the OrtApis namespace |
onnxruntime/core/session/inference_session.cc |
Propagates env-level callbacks to per-session thread pool options |
onnxruntime/core/util/thread_utils.h / .cc |
Adds callback fields to OrtThreadPoolParams and wires them into thread pool creation |
onnxruntime/core/platform/env.h |
Adds ThreadPoolWorkCallbacks struct and work_callbacks field to ThreadOptions |
cmake/CMakeLists.txt / adjust_global_compile_flags.cmake |
CMake option and compile definition for the feature flag |
tools/ci_build/build_args.py / build.py |
Build script support for --session_threadpool_callbacks |
onnxruntime/test/platform/threadpool_test.cc |
Unit tests for all callback scenarios |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
The PR makes TP perf dependent on the nature of callbacks. Is there any perf numbers that could characterize the perf under the supposed usage? |
Yes, the overhead does depend on the callback implementation. We benchmarked the three relevant configurations to characterize this:
Methodology: Built ARM64 Release with Results (mean real_time, percentages relative to Flag OFF):
Flag OFF and Flag ON numbers were collected in separate runs, so +/-5% variation is expected run-to-run noise. Summary: The callback overhead is a fixed 100-500ns per dispatch from three kernel calls per work item. On very short loops (100 iterations, total time 230ns), this dominates. On realistic workloads (10K+ iterations), the overhead is 2-10%. In practice, ORT inference kernels run in the hundreds-of-microseconds to milliseconds range, making the per-dispatch callback cost negligible. Builds without |
|
/azp run Linux_TRT_Minimal_CUDA_Test_CI |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Description
Adds per-session thread pool work callbacks, allowing callers to hook into the enqueue/start/stop/abandon lifecycle of thread pool work items. The feature is gated behind a build flag (
--enable_session_threadpool_callbacks) with zero overhead when disabled.API additions
OrtApi::SetPerSessionThreadPoolCallbacks— stores anOrtThreadPoolCallbacksConfigon theOrtEnv, applied to per-session thread poolsOrt::Env::SetPerSessionThreadPoolCallbacksOrtThreadPoolCallbacksConfigwith fields:on_enqueue,on_start_work,on_stop_work,on_abandon,user_contextOrtThreadPoolWorkEnqueueFn,OrtThreadPoolWorkStartFn,OrtThreadPoolWorkStopFn,OrtThreadPoolWorkAbandonFnImplementation
EigenNonBlockingThreadPool.h: Introduced a policy-based design with two compile-time callback policies:WorkNoCallbackPolicy:Work = std::function<void()>, all callback methods are trivial inlines eliminated by the compiler. Zero overhead for non-callback builds.WorkWithCallbackPolicy:Work = WorkItembundling tasks with callback data; invokes user callbacks around task execution viaMakeWork/Execute/OnEnqueue/OnAbandonmethods.ThreadPoolTempl<Environment, CallbackPolicy>uses the policy for all callback-related operations.RunQueue::RevokeWithTagcallspolicy_->OnAbandon(e.w)on successful revocation; the policy implementation decides whether to invoke user callbacks.threadpool.h:extended_eigen_threadpool_changed tounique_ptr<ExtendedThreadPoolInterface>for type erasure across policy instantiations.EnableSpinning/DisableSpinningadded to the virtual interface.threadpool.cc: Single#ifdefselects policy atThreadPoolTemplinstantiation.environment.h/.cc: AddedSetPerSessionWorkCallbacks/GetPerSessionWorkCallbacksonEnvironment.inference_session.cc: Propagates callbacks fromEnvironmentto per-session thread pool options.thread_utils.h/.cc: Added callback fields toOrtThreadPoolParamsand wiring inCreateThreadPoolHelper.env.h:OrtThreadPoolCallbacksConfig*pointer inThreadOptions.Build
onnxruntime_ENABLE_SESSION_THREADPOOL_CALLBACKS;build.pyargument--enable_session_threadpool_callbacksTests
SetPerSessionThreadPoolCallbacksvia ModelBuilder with 1M-element Mul)Motivation and Context
Thread pool work callbacks enable telemetry, tracing, and resource management by providing visibility into when work is enqueued, executed, and abandoned in per-session thread pools. This is needed for production diagnostics and performance instrumentation scenarios.