Skip to content

P1-FEATURE-002: wmk perf Command — Improve Existing #155

@DingmaomaoBJTU

Description

@DingmaomaoBJTU

Summary

Improve the existing wmk perf command with expanded metrics (latency/throughput/memory), batch size sweep, and cross-EP comparison report.

Context

wmk perf currently provides basic performance benchmarking. For the May 1 delivery, it needs to be production-ready with expanded metrics, batch size support, and cross-EP comparison capability to serve as the foundation for the cross-EP benchmarking effort (P1-EP-011).

From plans/release/0501_release_plan/P0_CHECKLIST.md (P1-FEATURE-002).

Current State

Desired State

  • wmk perf reports: latency (mean/p50/p95/p99 ms), throughput (fps/tokens/s), memory (peak RSS)
  • Batch size sweep: runs with batch sizes 1, 4, 8, 16 (configurable)
  • --ep all option: runs same model on all available EPs and produces comparison table
  • Output: JSON + human-readable terminal table

Acceptance Criteria

  • Latency statistics: mean, p50, p95, p99 in milliseconds
  • Throughput: samples/s or tokens/s depending on task type
  • Memory: peak RSS measured per EP run
  • Batch size sweep: --batch-sizes 1,4,8,16
  • Cross-EP comparison: --ep all runs all available EPs and outputs comparison
  • Output: artifacts/perf_results.json + formatted terminal table
  • Warm-up runs configurable (default 10), measurement runs configurable (default 50)
  • All P0 built-in models pass wmk perf after this improvement

Technical Notes

  • Build on existing wmk perf implementation
  • Use time.perf_counter_ns for high-resolution timing
  • Memory: use psutil.Process().memory_info().rss for peak measurement
  • Warm-up runs prevent cold-start skew

Related Files

Metadata

Metadata

Assignees

Labels

dev experienceDeveloper experience improvementsfeature scaleFeature scale work item

Type

No fields configured for Task.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions