Summary
Improve the existing wmk perf command with expanded metrics (latency/throughput/memory), batch size sweep, and cross-EP comparison report.
Context
wmk perf currently provides basic performance benchmarking. For the May 1 delivery, it needs to be production-ready with expanded metrics, batch size support, and cross-EP comparison capability to serve as the foundation for the cross-EP benchmarking effort (P1-EP-011).
From plans/release/0501_release_plan/P0_CHECKLIST.md (P1-FEATURE-002).
Current State
Desired State
wmk perf reports: latency (mean/p50/p95/p99 ms), throughput (fps/tokens/s), memory (peak RSS)
- Batch size sweep: runs with batch sizes 1, 4, 8, 16 (configurable)
--ep all option: runs same model on all available EPs and produces comparison table
- Output: JSON + human-readable terminal table
Acceptance Criteria
Technical Notes
- Build on existing
wmk perf implementation
- Use
time.perf_counter_ns for high-resolution timing
- Memory: use
psutil.Process().memory_info().rss for peak measurement
- Warm-up runs prevent cold-start skew
Related Files
Summary
Improve the existing
wmk perfcommand with expanded metrics (latency/throughput/memory), batch size sweep, and cross-EP comparison report.Context
wmk perfcurrently provides basic performance benchmarking. For the May 1 delivery, it needs to be production-ready with expanded metrics, batch size support, and cross-EP comparison capability to serve as the foundation for the cross-EP benchmarking effort (P1-EP-011).From
plans/release/0501_release_plan/P0_CHECKLIST.md(P1-FEATURE-002).Current State
wmk perfcommand exists and runs basic benchmarksDesired State
wmk perfreports: latency (mean/p50/p95/p99 ms), throughput (fps/tokens/s), memory (peak RSS)--ep alloption: runs same model on all available EPs and produces comparison tableAcceptance Criteria
--batch-sizes 1,4,8,16--ep allruns all available EPs and outputs comparisonartifacts/perf_results.json+ formatted terminal tablewmk perfafter this improvementTechnical Notes
wmk perfimplementationtime.perf_counter_nsfor high-resolution timingpsutil.Process().memory_info().rssfor peak measurementRelated Files
plans/release/0501_release_plan/feature-scale.md— P1.2 wmk perf improvementplans/release/0501_release_plan/P0_CHECKLIST.md— P1-FEATURE-002