Add EP and hardware device type to Windows ML telemetry#28477
Merged
angelser merged 3 commits intoMay 19, 2026
Conversation
Closes gaps in inference telemetry for the new Windows ML EP plugin platform so we can answer "which Execution Providers and hardware device types are apps using for inference, and how much?" Two prongs: 1. New EpDeviceUsage ETW event emitted once per (EP, hardware device) tuple at session init and on every RuntimePerf heartbeat (and the final destructor flush). The event is self-contained — it carries EP type, hardware device type (CPU/GPU/NPU), PCI vendor and device IDs, EP vendor, assigned node count, and session-level run counters — so downstream consumers can GROUP BY executionProviderType, hardwareDeviceType without joining back to SessionCreation. This matters for long-lived sessions that span past the telemetry pipeline's 24-hour join window. 2. SessionCreation and SessionCreation_CaptureState now also emit hardwareDeviceTypes and hardwareVendorIds (comma-separated, positionally aligned with �xecutionProviderIds). Bumped schemaVersion 0 -> 1. Implementation: * Added LogEpDeviceUsage to the Telemetry interface (no-op default) and WindowsTelemetry (TraceLogging under the existing Microsoft.ML.ONNXRuntime provider — no new provider GUID). * Added EpDeviceInfo to InferenceSession::Telemetry plus a pre-formatted summary for the SessionCreation enrichment. * InferenceSession::PopulateEpDeviceInfo runs after graph partitioning. For EPs created via the V2 OrtEpDevice path (AppendExecutionProvider_V2 / SetEpSelectionPolicy / RegisterExecutionProviderLibrary) it pulls full hardware metadata from IExecutionProvider::GetEpDevices(). For legacy EPs it falls back to IExecutionProvider::GetDevice() (OrtDevice type + vendor ID; no PCI device ID). * Heartbeat block in `Run()` and the destructor flush in `~InferenceSession` now also emit LogEpDeviceUsage for each entry. No public C API surface changes. Telemetry interface signatures gain two `const std::string&` parameters on LogSessionCreation and one new virtual LogEpDeviceUsage with a no-op default for non-Windows platforms. Tested: full `onnxruntime_test_all` Debug suite — 1571 passed, 0 failed, 3 skipped (CUDA EP, environment-gated). No memory leaks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses cpplint build/include_what_you_use warning from the Optional Lint C++ job: telemetry_.duration_per_batch_size_ is std::unordered_map and was being used without an explicit include. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Windows ML engineers need telemetry that answers: "Which Execution Providers and hardware device types are apps using for inference, and how much?"
Today, the inference telemetry has these gaps:
The new Windows ML EP plugin platform (OrtEpDevice / OrtEpFactory / OrtHardwareDevice) already has all the hardware metadata we need; we just weren't surfacing it.
What this PR does
1. New
EpDeviceUsageETW eventEmitted once per
(EP, hardware device)tuple at session init and on everyRuntimePerfheartbeat (plus a destructor flush). Each event is self-contained:QNNExecutionProviderNPU/GPU/CPU/FPGA/UNKNOWN0x5143/0x0901(PCI IDs)QualcommQualcomm89(count after graph partitioning)This gives downstream consumers a trivial
GROUP BY executionProviderType, hardwareDeviceTypewithout needing to join back toSessionCreation. Works for long-lived sessions that span past the 24h pipeline join window.2.
SessionCreationenrichmentAdded
hardwareDeviceTypesandhardwareVendorIds(comma-separated, positionally aligned with the existingexecutionProviderIds). BumpedschemaVersion0 -> 1.Implementation notes
LogEpDeviceUsageadded to theTelemetryinterface with a no-op default;WindowsTelemetryimplements it via TraceLogging under the existingMicrosoft.ML.ONNXRuntimeprovider (no new provider GUID).InferenceSession::PopulateEpDeviceInforuns after graph partitioning. For EPs created via the V2 path (AppendExecutionProvider_V2/SetEpSelectionPolicy/RegisterExecutionProviderLibrary) it pulls full hardware metadata fromIExecutionProvider::GetEpDevices(). For legacy EPs it falls back toIExecutionProvider::GetDevice()(OrtDevice type + vendor ID; no PCI device ID).Run()and destructor flush in~InferenceSessionboth emitLogEpDeviceUsageper entry.Testing
onnxruntime_test_all(full suite): 1571 passed, 0 failed, 3 skipped (CUDA-EP-gated, environment)Compatibility
Telemetry::LogSessionCreationvirtual gains twoconst std::string¶meters — all in-tree overrides are updated.LogEpDeviceUsagehas a no-op default, so non-Windows platforms are unaffected.