Update: benchmark_rounds.sh adds Host/Device, renames Elapsed to Total#832
Merged
ChaoWao merged 1 commit intoMay 21, 2026
Conversation
There was a problem hiding this comment.
Code Review
This pull request enhances the benchmarking tool by expanding the reported metrics from three to five: Host, Device, Total, Sched, and Orch. Key changes include updating tools/benchmark_rounds.sh to capture framework stdout for Host and Device timings, refactoring the parse_timing function to process multiple data sources, and updating the summary table to dynamically display available metrics. Documentation in SKILL.md has also been updated to reflect these changes. I have no feedback to provide as there were no review comments to evaluate.
Restore the Host (RunTiming.host_wall) and Device (RunTiming.device_wall) per-round columns that hw-native-sys#828 dropped, alongside the device-log Total/Sched/Orch. Rename Elapsed -> Total since "elapsed" is ambiguous next to two wall clocks; "Total" describes what the column actually is: the device-log span across all sched/orch events. - parse_timing takes (framework_stdout, device_log) and merges per-round Host/Device from _log_round_timings with Total/Sched/Orch from sched_*/orch_* device-log lines into a 5-column table. - Framework header matched anywhere on the line -- the test runner prints the case name with end="" so the header gets glued onto the same line as "Case1 ...". - Avg / Trimmed Avg lines expanded to all five metrics; /benchmark skill report updated to show all five columns (Single Mode) and five metric sub-rows per example (Compare Mode).
7c8c100 to
07e4702
Compare
ChaoWao
approved these changes
May 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Restore Host / Device per-round columns that #828 dropped (the
RunTimingwalls from #790) alongside the device-log Total / Sched / Orch, and renameElapsed→Totalsince "elapsed" reads ambiguously next to two distinct wall clocks. The output is now a 5-column table:RunTiming.host_wall_us(framework)RunTiming.device_wall_us(framework)orch_start→orch_endsched_*/orch_*eventssched_start→sched_endorch_start→orch_endKey changes
parse_timingtakes two inputs (framework stdout + device log) and merges per-round timings into the 5-column table. awk handles file 1 viaFNR == NR, file 2 with the existing sched/orch parser.end=""so_log_round_timings's header gets concatenated to the same line asCase1 ..../benchmarkskill is updated to parseTotal Trimmed Avg:(wasTrimmed Avg:) and its sample tables renameElapsed (us)→Total (us).set -e + pipefail-safe (uses parameter expansion instead ofgrep | awkpipelines that could silently die on no-match).Sample output
Test plan
bash -n tools/benchmark_rounds.sh./tools/benchmark_rounds.sh -n 25 -d 1onpaged_attention_unroll→ 5 columns populated, Performance Summary + Benchmark complete sections all render/benchmarkskill grep targets (Total Trimmed Avg:,Orch Trimmed Avg:) both present in output-n 100over all examples) on a free device