Skip to content

Update: benchmark_rounds.sh adds Host/Device, renames Elapsed to Total#832

Merged
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
hw-native-sys-bot:feat/benchmark-add-host-device-cols
May 21, 2026
Merged

Update: benchmark_rounds.sh adds Host/Device, renames Elapsed to Total#832
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
hw-native-sys-bot:feat/benchmark-add-host-device-cols

Conversation

@hw-native-sys-bot
Copy link
Copy Markdown
Collaborator

Summary

Restore Host / Device per-round columns that #828 dropped (the RunTiming walls from #790) alongside the device-log Total / Sched / Orch, and rename ElapsedTotal since "elapsed" reads ambiguously next to two distinct wall clocks. The output is now a 5-column table:

Column Source Meaning
Host (us) RunTiming.host_wall_us (framework) steady_clock wall around dispatch
Device (us) RunTiming.device_wall_us (framework) AICPU mailbox orch_startorch_end
Total (us) device log max(end) − min(start) across sched_* / orch_* events
Sched (us) device log sched_startsched_end
Orch (us) device log orch_startorch_end

Key changes

  • parse_timing takes two inputs (framework stdout + device log) and merges per-round timings into the 5-column table. awk handles file 1 via FNR == NR, file 2 with the existing sched/orch parser.
  • Framework header is matched anywhere on the line, not anchored at column 0 — the SceneTestCase runner prints the case name with end="" so _log_round_timings's header gets concatenated to the same line as Case1 ....
  • Avg / Trimmed Avg lines now cover all five metrics. The /benchmark skill is updated to parse Total Trimmed Avg: (was Trimmed Avg:) and its sample tables rename Elapsed (us)Total (us).
  • Performance Summary table grows to 5 columns; the per-column avg extraction is set -e + pipefail-safe (uses parameter expansion instead of grep | awk pipelines that could silently die on no-match).

Sample output

  Round      Host (us)   Device (us)    Total (us)    Sched (us)     Orch (us)
  ------  ------------  ------------  ------------  ------------  ------------
  0           460759.8       29346.9        1178.9        1178.9         792.1
  ...
  Host Avg: 455769.2 us  |  Device Avg: 8619.4 us  |  Total Avg: 1158.0 us  |  Sched Avg: 1158.0 us  |  Orch Avg: 708.6 us  (25 rounds)
  Host Trimmed Avg: 370587.1 us  (dropped 10 low + 10 high, 5 rounds used)
  Device Trimmed Avg: 6820.6 us  ...
  Total Trimmed Avg: 1156.1 us  ...
  Sched Trimmed Avg: 1156.0 us  ...
  Orch Trimmed Avg: 704.6 us   ...

Test plan

  • bash -n tools/benchmark_rounds.sh
  • ./tools/benchmark_rounds.sh -n 25 -d 1 on paged_attention_unroll → 5 columns populated, Performance Summary + Benchmark complete sections all render
  • /benchmark skill grep targets (Total Trimmed Avg:, Orch Trimmed Avg:) both present in output
  • Full suite (-n 100 over all examples) on a free device

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the benchmarking tool by expanding the reported metrics from three to five: Host, Device, Total, Sched, and Orch. Key changes include updating tools/benchmark_rounds.sh to capture framework stdout for Host and Device timings, refactoring the parse_timing function to process multiple data sources, and updating the summary table to dynamically display available metrics. Documentation in SKILL.md has also been updated to reflect these changes. I have no feedback to provide as there were no review comments to evaluate.

Restore the Host (RunTiming.host_wall) and Device (RunTiming.device_wall)
per-round columns that hw-native-sys#828 dropped, alongside the device-log
Total/Sched/Orch. Rename Elapsed -> Total since "elapsed" is ambiguous
next to two wall clocks; "Total" describes what the column actually is:
the device-log span across all sched/orch events.

- parse_timing takes (framework_stdout, device_log) and merges per-round
  Host/Device from _log_round_timings with Total/Sched/Orch from
  sched_*/orch_* device-log lines into a 5-column table.
- Framework header matched anywhere on the line -- the test runner
  prints the case name with end="" so the header gets glued onto the
  same line as "Case1 ...".
- Avg / Trimmed Avg lines expanded to all five metrics; /benchmark
  skill report updated to show all five columns (Single Mode) and
  five metric sub-rows per example (Compare Mode).
@ChaoWao ChaoWao force-pushed the feat/benchmark-add-host-device-cols branch from 7c8c100 to 07e4702 Compare May 21, 2026 02:42
@ChaoWao ChaoWao merged commit 83da979 into hw-native-sys:main May 21, 2026
15 checks passed
@ChaoWao ChaoWao deleted the feat/benchmark-add-host-device-cols branch May 21, 2026 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants