Summary
Integrate ModelKit with IHV-specific profiling tools from Qualcomm, Intel, and AMD to enable operator-level performance analysis on hardware NPUs/GPUs.
Context
IHV profiling tools provide deep insights into hardware execution that generic ONNX Runtime profiling cannot — e.g., which NPU ops caused stalls, memory bandwidth bottlenecks, or kernel dispatch overhead. This integration is needed to make performance optimization actionable.
From plans/release/0501_release_plan/P0_CHECKLIST.md (P1-FEATURE-013). Builds on the base profiling work in #402.
Target tools:
- Qualcomm: QNN profiling API (HTP backend profiling output)
- Intel: OpenVINO Performance Analysis tool / VTune
- AMD: ROCm profiler / Ryzen AI profiler
Current State
Desired State
wmk perf --profile enables IHV profiling output for supported EPs
- Profiling data consumed from QNN/OpenVINO/AMD tools
- Bottleneck analysis: identify top-N slowest operators per EP
- Output: profiling summary in
artifacts/profiling_report.json
Acceptance Criteria
Technical Notes
- QNN SDK profiling: enable via session option
qnn_context_enable_graphs_profiling
- OpenVINO profiling:
InferRequest.get_profiling_info() method
- AMD Ryzen AI profiling: available via Ryzen AI SDK; check access with hardware team
- Normalize profiling output to a common schema across all IHV tools
Related Files
Summary
Integrate ModelKit with IHV-specific profiling tools from Qualcomm, Intel, and AMD to enable operator-level performance analysis on hardware NPUs/GPUs.
Context
IHV profiling tools provide deep insights into hardware execution that generic ONNX Runtime profiling cannot — e.g., which NPU ops caused stalls, memory bandwidth bottlenecks, or kernel dispatch overhead. This integration is needed to make performance optimization actionable.
From
plans/release/0501_release_plan/P0_CHECKLIST.md(P1-FEATURE-013). Builds on the base profiling work in #402.Target tools:
Current State
Desired State
wmk perf --profileenables IHV profiling output for supported EPsartifacts/profiling_report.jsonAcceptance Criteria
artifacts/profiling_report.jsongenerated with operator-level timingTechnical Notes
qnn_context_enable_graphs_profilingInferRequest.get_profiling_info()methodRelated Files
plans/release/0501_release_plan/P0_CHECKLIST.md— P1-FEATURE-013plans/release/0501_release_plan/feature-scale.md