Add EP and hardware device type to Windows ML telemetry by angelser · Pull Request #28477 · microsoft/onnxruntime

angelser · 2026-05-12T18:11:28Z

Problem

Windows ML engineers need telemetry that answers: "Which Execution Providers and hardware device types are apps using for inference, and how much?"

Today, the inference telemetry has these gaps:

Event	Gap
SessionCreation	No hardware device type (CPU/GPU/NPU), no vendor ID. Fires once — falls out of the 24h pipeline join window for long-lived sessions.
RuntimePerf	No EP type, no hardware device — only session_id, requires a join back to SessionCreation.
ExecutionProviderEvent	Only fires for DML. Irrelevant for QNN/OpenVINO/etc.

The new Windows ML EP plugin platform (OrtEpDevice / OrtEpFactory / OrtHardwareDevice) already has all the hardware metadata we need; we just weren't surfacing it.

What this PR does

1. New `EpDeviceUsage` ETW event

Emitted once per (EP, hardware device) tuple at session init and on every RuntimePerf heartbeat (plus a destructor flush). Each event is self-contained:

Field	Example
executionProviderType	`QNNExecutionProvider`
hardwareDeviceType	`NPU` / `GPU` / `CPU` / `FPGA` / `UNKNOWN`
hardwareVendorId / hardwareDeviceId	`0x5143` / `0x0901` (PCI IDs)
hardwareVendor	`Qualcomm`
epVendor	`Qualcomm`
assignedNodeCount	`89` (count after graph partitioning)
totalRunsSinceLast / totalRunDurationSinceLast	session-level run counters

This gives downstream consumers a trivial GROUP BY executionProviderType, hardwareDeviceType without needing to join back to SessionCreation. Works for long-lived sessions that span past the 24h pipeline join window.

2. `SessionCreation` enrichment

Added hardwareDeviceTypes and hardwareVendorIds (comma-separated, positionally aligned with the existing executionProviderIds). Bumped schemaVersion 0 -> 1.

Implementation notes

LogEpDeviceUsage added to the Telemetry interface with a no-op default; WindowsTelemetry implements it via TraceLogging under the existing Microsoft.ML.ONNXRuntime provider (no new provider GUID).
InferenceSession::PopulateEpDeviceInfo runs after graph partitioning. For EPs created via the V2 path (AppendExecutionProvider_V2 / SetEpSelectionPolicy / RegisterExecutionProviderLibrary) it pulls full hardware metadata from IExecutionProvider::GetEpDevices(). For legacy EPs it falls back to IExecutionProvider::GetDevice() (OrtDevice type + vendor ID; no PCI device ID).
Heartbeat block in Run() and destructor flush in ~InferenceSession both emit LogEpDeviceUsage per entry.

Testing

Debug build with Ninja: clean build (1636 targets)
onnxruntime_test_all (full suite): 1571 passed, 0 failed, 3 skipped (CUDA-EP-gated, environment)
No memory leaks reported

Compatibility

No public C API surface changes.
Telemetry::LogSessionCreation virtual gains two const std::string& parameters — all in-tree overrides are updated.
LogEpDeviceUsage has a no-op default, so non-Windows platforms are unaffected.

Closes gaps in inference telemetry for the new Windows ML EP plugin platform so we can answer "which Execution Providers and hardware device types are apps using for inference, and how much?" Two prongs: 1. New EpDeviceUsage ETW event emitted once per (EP, hardware device) tuple at session init and on every RuntimePerf heartbeat (and the final destructor flush). The event is self-contained — it carries EP type, hardware device type (CPU/GPU/NPU), PCI vendor and device IDs, EP vendor, assigned node count, and session-level run counters — so downstream consumers can GROUP BY executionProviderType, hardwareDeviceType without joining back to SessionCreation. This matters for long-lived sessions that span past the telemetry pipeline's 24-hour join window. 2. SessionCreation and SessionCreation_CaptureState now also emit hardwareDeviceTypes and hardwareVendorIds (comma-separated, positionally aligned with �xecutionProviderIds). Bumped schemaVersion 0 -> 1. Implementation: * Added LogEpDeviceUsage to the Telemetry interface (no-op default) and WindowsTelemetry (TraceLogging under the existing Microsoft.ML.ONNXRuntime provider — no new provider GUID). * Added EpDeviceInfo to InferenceSession::Telemetry plus a pre-formatted summary for the SessionCreation enrichment. * InferenceSession::PopulateEpDeviceInfo runs after graph partitioning. For EPs created via the V2 OrtEpDevice path (AppendExecutionProvider_V2 / SetEpSelectionPolicy / RegisterExecutionProviderLibrary) it pulls full hardware metadata from IExecutionProvider::GetEpDevices(). For legacy EPs it falls back to IExecutionProvider::GetDevice() (OrtDevice type + vendor ID; no PCI device ID). * Heartbeat block in `Run()` and the destructor flush in `~InferenceSession` now also emit LogEpDeviceUsage for each entry. No public C API surface changes. Telemetry interface signatures gain two `const std::string&` parameters on LogSessionCreation and one new virtual LogEpDeviceUsage with a no-op default for non-Windows platforms. Tested: full `onnxruntime_test_all` Debug suite — 1571 passed, 0 failed, 3 skipped (CUDA EP, environment-gated). No memory leaks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

dabhattimsft

Addresses cpplint build/include_what_you_use warning from the Optional Lint C++ job: telemetry_.duration_per_batch_size_ is std::unordered_map and was being used without an explicit include. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ashrit-ms

angelserMS and others added 2 commits May 12, 2026 11:10

Apply clang-format fixes to inference_session.h

87d6dc4

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

dabhattimsft previously approved these changes May 13, 2026

View reviewed changes

angelser dismissed dabhattimsft’s stale review via 9b58b79 May 19, 2026 03:22

ashrit-ms approved these changes May 19, 2026

View reviewed changes

angelser merged commit 4d1dce8 into microsoft:main May 19, 2026
88 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add EP and hardware device type to Windows ML telemetry#28477

Add EP and hardware device type to Windows ML telemetry#28477
angelser merged 3 commits into
microsoft:mainfrom
angelser:angelser/winml-ep-device-telemetry

angelser commented May 12, 2026

Uh oh!

dabhattimsft left a comment

Uh oh!

ashrit-ms left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

angelser commented May 12, 2026

Problem

What this PR does

1. New EpDeviceUsage ETW event

2. SessionCreation enrichment

Implementation notes

Testing

Compatibility

Uh oh!

dabhattimsft left a comment

Choose a reason for hiding this comment

Uh oh!

ashrit-ms left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

1. New `EpDeviceUsage` ETW event

2. `SessionCreation` enrichment