feat(dashboard): level-profile chart (6 lines) + HTML legend + sign-gated parity band by polaz · Pull Request #193 · structured-world/structured-zstd

polaz · 2026-05-19T12:11:03Z

Reopens the work from #192 (closed) after a redesign of the Level Profile chart per maintainer feedback.

Summary

New Level Profile chart — six lines plotted against compression level on the X-axis (provenance per point is governed by the Snapshot dropdown — see below):
- Rust vs FFI compress speed (MiB/s, left axis)
- Rust vs FFI decompress speed (MiB/s, left axis, dashed-dot)
- Rust vs FFI output ratio (right axis, dashed) — compressed/input, lower = better
Filters are shared with the existing delta chart at the top of the page — Target, Scenario, Level. Each accepts __all__ (default), which the profile chart interprets as "average across that dimension" (cumulative view across every cohort). A profile-specific Snapshot dropdown pins the six lines to a single generated_at / commit_sha run, or to latest (most-recent snapshot per cohort+level+metric, picked per-cohort so cumulative mode stays unbiased toward whichever target/scenario ran most recently).
Decompress rows are averaged across both bench source variants (rust_stream + c_stream) per side — both stream variants measure valid decompression numbers, the arithmetic mean is deterministic regardless of iteration order. Compress and ratio have one source per (level, snapshot), so the mean degenerates to the single observation.
Six independent toggle checkboxes flip the matching series in place via chart.update("none"). The chart instance is reused across every filter/snapshot/toggle change — no destroy/recreate.
Custom HTML legend for both charts:
- Each entry is a <button type="button"> with aria-pressed reflecting visibility, full-row click target (fixes the misaligned click box where wrapped labels drifted off the swatch), keyboard activation (Enter/Space), focus-visible outline, screen-reader role.
- Scrollable container so no series is silently hidden.
- Keyboard focus and scroll position are preserved across rerenders — toggling a series with Enter/Space keeps focus on the same row, and rerenders triggered from outside the legend (profile-series checkboxes, top-filter changes) no longer jump the legend back to the top.
Sign-gated "Outside parity band" classification — the regression count and per-row labels now respect metric direction:
- compression_ratio: only flag when delta > BAND_HIGH (Rust produced larger output).
- throughput_bytes_per_sec: only flag when delta < BAND_LOW (Rust slower than FFI).
- peak_alloc_bytes: only flag when delta > BAND_HIGH (Rust allocated more).
- Movements in our favour are surfaced as a separate "notable wins not counted" tally and no longer inflate the regression count.

Closes #183
Supersedes #192

Test plan

Dashboard loads against the live benchmark-relative.json; existing delta-time chart still renders and respects every filter.
Level Profile shows up to six lines for the selected (Target, Scenario, Level, Snapshot); levels sorted numerically on the X-axis.
Top filters drive both charts simultaneously. __all__ on Target / Scenario / Level produces a cumulative arithmetic mean across that dimension; pinning a specific value narrows to that cohort.
Snapshot dropdown pins all six lines to the chosen generated_at / commit_sha; latest uses each cohort's own most-recent snapshot (no global-latest bias when Target or Scenario is __all__).
All six toggle checkboxes flip the matching series in place; legend click also flips them and keeps the checkbox state in sync.
Legend: click anywhere on the row toggles the series; keyboard Enter/Space toggles AND keeps focus on the same row; long legends scroll, and the scroll position is preserved across toggles triggered from anywhere on the page.
Status panel: "Outside parity band: N" doesn't count wins (Rust beat FFI on speed / ratio); wins show under "notable wins not counted".

Summary by CodeRabbit

New Features
- Added a profile chart section displaying compression, decompression, and ratio metrics by level with snapshot comparison capability
- Introduced interactive checkboxes to toggle individual metric series visibility on the chart
Improvements
- Enhanced chart legend with keyboard navigation and accessibility features for better usability

…band - Add Level Profile chart: per-level speed (MiB/s, left axis) + ratio (right axis) with filled+smoothed lines, 4 toggle checkboxes for the Rust/FFI × speed/ratio cross-product, and target/scenario/stage filters. Toggles use dataset.hidden + chart.update("none") so the chart is not recreated on every interaction. - Replace Chart.js built-in legend on both charts with a custom HTML legend: full-row click target (fixes the misaligned click box where only the colour swatch responded while wrapped labels drifted), and a scrollable container so no series is silently hidden when there are too many to fit. Click toggles dataset visibility via the same Chart.js meta.hidden mechanism, preserving the prior UX. - Sign-gate "Outside parity band" classification by metric direction: ratio regressions only when delta > BAND_HIGH (Rust produced larger output), throughput regressions only when delta < BAND_LOW (Rust slower), peak-alloc regressions only when delta > BAND_HIGH (Rust allocated more). Wins (deviation in our favour) are surfaced as a separate "notable wins not counted" tally and no longer inflate the regression count. Closes #183

…n state - renderProfileChart now patches labels + datasets in place and calls chart.update("none") on every selector change. The previous path destroyed and recreated the chart on each switch, contradicting the "repopulate without recreating the chart" acceptance criterion and dropping internal Chart.js state (animation continuity, scale state) between scenario/stage/target switches. - Custom HTML legend now uses chart.isDatasetVisible(i) to decide whether to apply the hidden-series style. The previous check on meta.hidden missed datasets constructed with dataset.hidden = true (where Chart.js leaves meta.hidden === null), so the legend row rendered as visible even when the series was hidden. - Profile chart legend clicks now flow through profileSeriesVisibility and sync the matching checkbox, so toggling a series via the legend no longer produces a checkbox/series UI mismatch and survives the next scenario switch. Delta-chart legend keeps the simpler direct toggle path. - Removed bandLabelForRow — was unused.

Adds a fourth dropdown ("Snapshot") that pins the 4 profile lines to a single benchmark run rather than always picking the latest record per (level, metric). Default value is "latest" (previous behaviour). The dropdown lists every generated_at value (or commit_sha when timestamp is missing) that produced data for the currently chosen target / scenario / stage, and repopulates when those upstream filters change while preserving the active selection where possible. Use case: compare two specific commits side-by-side by opening the dashboard in two browser tabs and pinning each tab to a different snapshot — the 4 smoothed lines then describe exactly one commit per view, with no implicit "newest wins" merging across runs.

- Add a Source dropdown to the Level Profile chart, mirroring the delta chart's Source filter. Without it, stages that have multiple source values (e.g. decompress benches emitting rust_stream and c_stream) would silently collide on the same level keys and render a mixed/incorrect profile. The source value is now part of the filter cascade (target/scenario/stage → source → snapshot), and the row-selection key includes source so decompress profiles are unambiguous. - Render custom HTML legend rows as <button type="button"> instead of <div>, with aria-pressed reflecting series visibility. Native buttons give keyboard activation (Enter/Space), focus styling, and a button role for screen readers without any extra JS, so the legend is operable without a mouse.

Replace the 4-line per-stage profile (which needed stage + source dropdowns) with a 6-line strategy-comparison view rendered for a single snapshot: - Rust compress speed (MiB/s, left axis) - FFI compress speed (MiB/s, left axis) - Rust decompress speed (MiB/s, left axis, dashed) - FFI decompress speed (MiB/s, left axis, dashed) - Rust output ratio (right axis, dashed) — compressed/input, lower=better - FFI output ratio (right axis, dashed) Decompress rows collapse across the source dimension (rust_stream / c_stream) because each row already carries both rust_value and ffi_value — source is bench bookkeeping, not a comparison side. Stage no longer needs a dropdown because both compress and decompress live on the chart at once. Right axis title is now explicit about ratio semantics (compressed/input, lower=better) so readers don't assume the conventional input/output definition where higher is better. Filters: Target + Scenario + Snapshot. Six independent toggle checkboxes for the series. Legend rows remain accessible buttons with aria-pressed.

coderabbitai · 2026-05-19T12:11:25Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 489c9299-3fa9-4923-a770-316ae3e8f4f1

📥 Commits

Reviewing files that changed from the base of the PR and between b49a970 and c9787a1.

📒 Files selected for processing (1)

.github/bench-dashboard/index.html

📝 Walkthrough

Walkthrough

Adds a Level Profile chart (per-level compress/decompress speed and ratio), a screen-reader-friendly HTML legend replacing Chart.js's legend, persistent profile-series visibility and snapshot semantics, and metric-directional regression detection integrated with existing filters and initialization.

Changes

Benchmark Dashboard UI & Logic Enhancement

Layer / File(s)	Summary
UI styling and HTML structure `.github/bench-dashboard/index.html:81–154, 256–299`	Added CSS for resized chart canvases and a scrollable, accessible HTML legend; inserted Level Profile card with snapshot selector, six series checkboxes, `chart-profile` canvas, and `legend-profile`/`profile-status` containers.
State, selectors & PROFILE_LATEST `.github/bench-dashboard/index.html:342–384`	Extended dashboard `state` with `profileChart` and `profileSeriesVisibility`; wired `profileSelectors.snapshot` and introduced `PROFILE_LATEST` sentinel for per-cohort "latest" semantics.
Snapshot comparator `.github/bench-dashboard/index.html:627–645`	Added `compareSnapshotIdentity(a,b)` to deterministically prefer newer snapshots by timestamp with a commit-sha tie-break when timestamps are unavailable or unparseable.
Metric-directional regression logic `.github/bench-dashboard/index.html:659–743`	Added `isRegression(row)` to sign-gate out-of-parity classification by metric and refactored `updateStatus(filtered)` to use directional logic for regressions vs wins, updating verdict text and focus lists accordingly.
Custom HTML Legend System `.github/bench-dashboard/index.html:754–849`	Implemented `renderHtmlLegend(chart, container, onToggle)` to render full-width accessible legend buttons reflecting `chart.isDatasetVisible`, preserving focus and scroll between rebuilds; delta chart now disables built-in legend and mounts this HTML legend.
Profile Data Pipeline & Rendering `.github/bench-dashboard/index.html:894–1265`	Added profile pipeline (`profileFilter`, stage classifiers, `buildProfileData`) with latest-vs-pinned snapshot semantics, MiB/s unit conversion, `buildProfileDatasets`, `onProfileLegendToggle`, and `renderProfileChart` that updates the profile chart instance in-place and handles empty-state.
Profile toggles & snapshot controls `.github/bench-dashboard/index.html:1266–1350`	Added `applyProfileToggles()` and `bindProfileControls()` to sync checkbox UI with dataset visibility and persist `profileSeriesVisibility`; snapshot utilities (`snapshotLabel`, `repopulateSnapshotOptions`, `renderProfileSelectors`) repopulate options while preserving selection.
Filter binding & initialization `.github/bench-dashboard/index.html:1377–1391, 1485–1487`	Top-level filter changes now trigger delta rerender; target/scenario/level changes also repopulate profile snapshots and re-render the profile chart. `main()` initializes profile selectors, binds profile controls, and renders the initial profile chart.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

structured-world/structured-zstd#79: Overlapping changes to updateStatus(filtered) and delta/status rendering in the dashboard.
structured-world/structured-zstd#65: Related delta/status semantics and parity-band interpretation used by the profile/delta logic.

"I hopped through levels, checked each line,
With toggles bright and legends fine,
I guard the wins, I flag the slips,
A rabbit charts performance trips. 🐇📈"

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main changes: Level Profile chart with 6 lines, custom HTML legend, and sign-gated parity band classification.
Linked Issues check	✅ Passed	All requirements from issue `#183` are met: Level Profile chart with 6 series, dual Y-axes, top-filter reuse, per-cohort Snapshot selection, custom HTML legend with accessibility features, sign-gated parity band logic, and decompress averaging.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to the linked issue `#183`; modifications are confined to dashboard state, chart rendering, aggregation logic, and legend behavior.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/#183-featdashboard-level-profile-chart-speed--ratio-and

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-05-19T12:13:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

Adds a redesigned bench dashboard view that complements the existing delta-over-time chart with a per-compression-level “Level Profile” visualization, plus a custom HTML legend and corrected (sign-aware) “outside parity band” regression classification.

Changes:

Replace Chart.js built-in legends with a custom HTML legend (button rows with aria-pressed) and add scrollable legend containers for both charts.
Add a new “Level Profile” chart plotting Rust vs FFI compress/decompress throughput (MiB/s) and compression output ratio (compressed/input) across levels, with target/scenario/snapshot filters and per-series checkboxes.
Update status/regression classification to be metric-direction aware (wins no longer count as “outside parity band”).

buildProfileData() previously stored only one row per ${level}|decompress_speed and let the most-recent-by-generated_at row win. In practice both decompress source variants (rust_stream and c_stream) share the same generated_at within a single snapshot, so the "winner" was iteration-order-dependent — typically c_stream — and the chart silently reflected only one stream variant instead of both. Replace the "keep one row" path with a sum/count accumulator and emit the arithmetic mean of rust_value and ffi_value across all matching rows. For compress and ratio there is exactly one source per (level, snapshot), so the mean degenerates to the single observation and the displayed value is unchanged. For decompress, both stream variants now contribute equally, giving a deterministic Rust/FFI comparison that doesn't depend on which source row arrived first.

polaz · 2026-05-19T12:28:21Z

@coderabbitai full review

coderabbitai · 2026-05-19T12:28:28Z

✅ Actions performed

Full review triggered.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Drop the profile chart's own Target / Scenario selectors and read those values (plus Level) from the existing page-wide filter row that drives the delta chart. The top selectors already support an `__all__` sentinel which the profile chart interprets as "average across that dimension": - Target = all → mean across every reported target - Scenario = all → mean across every corpus - Level = all → every level on the X-axis (no narrowing) When one of those is pinned to a specific value, the profile narrows to that value. Level = specific renders a single X-axis point. Snapshot stays on the profile section because the top chart's From/To pair defines a time window, not a single-snapshot pin — the two controls express different intents. The top-filter change handler now fans out to both `rerender()` (delta chart) and the profile chart, repopulating the profile's snapshot list when target/scenario/level change. Profile-status text reflects all three dimensions so it's clear when a view is cumulative vs narrowed.

buildProfileData() previously summed every matching row regardless of generated_at when the snapshot dropdown was on "latest", turning the "most recent record per (level, logical-metric)" intent into a time-average across the entire history of that target+scenario+level. Replace the single-pass sum with a two-pass aggregation: Pass 1 — collect every matching row and, when snapshot=latest, remember the most recent generated_at per (level, lm). Pass 2 — fold each row into the running mean only if its generated_at matches the chosen snapshot for that key. For "latest" this restricts the mean to one snapshot per key, then averages across source variants (rust_stream + c_stream for decompress) and across any "all" dimension (scenarios when scenario=all, etc.) within that snapshot. For a pinned snapshot, matchesFilter() already narrowed us down, so this just collapses source. Also pass "none" to chart.update() in the HTML legend click handler so toggling a series is instant — the default 1s animation was visibly janky with six datasets on the profile chart.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/bench-dashboard/index.html:
- Around line 749-751: The legend rerender replaces buttons
(container.replaceChildren()) causing keyboard focus loss; update
renderHtmlLegend to save the currently focused legend button (e.g., detect if
document.activeElement is within container and capture a stable identifier like
a data-index, aria-label, or dataset key) before calling
container.replaceChildren(), then after creating new buttons restore focus to
the matching new button (querySelector by that identifier) and ensure it remains
focusable. Apply the same focus-preservation pattern to the other
legend-rerender blocks at the other chart legend functions (the blocks around
the other replaceChildren calls) and ensure aria-pressed/aria-checked and
tabindex are preserved on the restored button so keyboard users can continue
toggling with Enter/Space.
- Around line 931-955: The bug is that latestPerKey is keyed only by
`${row.level}|${lm}` so when filter.target or filter.scenario is `__all__` you
must make the cohort-local key include those dimensions; update the code that
builds key (used in latestPerKey, matchingRows, and the winner lookup) to append
cohort identifiers such as row.target and row.scenario (or other cohort fields
that `matchesFilter` uses) when the filter for that dimension is `__all__`,
keeping the existing behavior when a specific target/scenario is selected;
adjust any logic that compares snapshots (sameSnapshotLabel,
compareByGeneratedAt) to still operate on the chosen winner row.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b5d6ac02-0911-495d-bbba-6fe17081a5b6

📥 Commits

Reviewing files that changed from the base of the PR and between 94f3187 and 3151b55.

📒 Files selected for processing (1)

.github/bench-dashboard/index.html

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

Two correctness/UX fixes in the bench dashboard: * renderHtmlLegend() now captures document.activeElement's data- legendIndex before replaceChildren() and restores focus to the matching new button after rebuild. Previously every keyboard toggle (Enter/Space) dropped focus out of the legend, forcing the user to Tab back in between every series flip. * buildProfileData() now keys latestPerKey by (level, logical-metric) PLUS any cumulative ("__all__") dimension (target, scenario, level), giving each cohort its own most-recent snapshot before the cross-cohort mean. The previous key collapsed every cohort onto a single global snapshot per (level, lm), silently dropping any cohort whose timestamp didn't match — biasing the result toward whichever target/scenario ran most recently instead of producing the intended unbiased mean across cohorts.

polaz · 2026-05-19T13:20:04Z

@coderabbitai full review

coderabbitai · 2026-05-19T13:20:10Z

✅ Actions performed

Full review triggered.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

renderHtmlLegend() now snapshots container.scrollTop before replaceChildren() and restores it after focus restoration. When a toggle was triggered from outside the legend (a profile-series checkbox above the chart, or a top-filter change) the existing focus-restore path never fired and a long legend silently jumped back to the top on every re-render. Restoring scrollTop keeps the visible window stable regardless of which control fired the rebuild. Done AFTER focus() because focus() on a partially-visible element can itself scroll the container.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

PROFILE_LATEST mode chose the per-key "most recent" row via compareByGeneratedAt() which only inspects generated_at. When records lack a parseable timestamp but carry commit_sha (loaded payloads from older CI runs, local-run artifacts, etc.) every pairwise comparison returned 0 and the "latest" winner was whichever row appeared first in iteration order — arbitrary and not stable across renders. Subsequent filtering by sameSnapshotLabel() (which uses generated_at || commit_sha || "local-run" as identity) could then silently restrict to an inconsistent cohort. Add compareSnapshotIdentity() — same primary ordering as compareByGeneratedAt() (timestamp), with a lexicographic commit_sha fallback when both rows have no parseable timestamp. Not chronologically meaningful in the fallback case, but deterministic across runs and consistent with snapshotLabel()'s identity, which is what the cohort-locking logic actually requires. No regression test added: the dashboard is a single-file HTML view with no JS test harness, so this is verified manually by loading a payload with timestamp-less records.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

.github/bench-dashboard/index.html:1209

PROFILE_LATEST status text currently reports “latest per level”, but the selection logic is actually “latest per cohort and per (level, logical metric)” (and then averaged across sources/cohorts). Consider updating the status string to reflect that it’s not necessarily a single pinned snapshot, to avoid confusing readers when points come from different runs.

        const snapshotNote = filter.snapshot === PROFILE_LATEST
          ? "latest per level"
          : `snapshot=${filter.snapshot}`;
        el("profile-status").textContent =
          `Showing ${levelCount} level point(s) — target=${filter.target}, scenario=${filter.scenario}, level=${filter.level} (${snapshotNote}).`;

The Level Profile section header said "Six lines for a single snapshot" but the default PROFILE_LATEST mode picks each (level, logical-metric, cohort) point from its own most-recent snapshot independently, so the displayed points can mix generated_at values across the X-axis. Rewrite the section description to spell out both modes explicitly, and amend the profile-status note so the user always sees which mode is active and what it implies for cross-point provenance.

polaz added 5 commits May 19, 2026 14:29

Copilot AI review requested due to automatic review settings May 19, 2026 12:11

Copilot started reviewing on behalf of polaz May 19, 2026 12:11 View session