feat(web): benchmark heatmap small-multiples — visual ranking per cell#81
Merged
github-actions[bot] merged 2 commits intodevelopfrom May 8, 2026
Merged
Conversation
…t × aspect) cell Replaces the long flat matrix as the default view of /benchmark with a 3-aspects × N-categories grid of compact heatmap cards. Each cell ranks embeddings by Fmax with horizontal bars colored on a perceptual scale, the leader marked with a medal, and a slot reserved for bootstrap CI whiskers (rendered when persisted). The original full matrix table stays one click away behind a Heatmap | Table toggle so the export-friendly raw-numbers view isn't lost. New components/BenchmarkHeatmap.tsx - bestRowsByEmbedding: collapses the matrix endpoint's per-K rows to one bar per embedding (the cell's best across stages/Ks already in the active selection). - HSL gradient blue→violet by Fmax, bar width proportional. Color is supportive; the bar length is the primary signal for accessibility. - Aspect-tinted card header (MFO blue / BPO violet / CCO emerald) so the per-aspect column reads at a glance. - Hover tooltip exposes stage / K / Fmax. Future CI whiskers will render in the same row without changing the cell layout. apps/web/app/[locale]/benchmark/page.tsx - New viewMode state (default "heatmap"). - Toggle bar (role=tablist, aria-selected) rendered when there's data. - Existing leaderboards (global + in-selection) stay above the toggle unchanged — they're already the per-cell story. Behavior unchanged: - Filters (stage, K, evaluation_set), CSV export, leaderboards, full matrix table — all preserved. Toggle to "Table" for the prior view. CI: next build green; backend untouched.
This was referenced May 8, 2026
github-actions Bot
pushed a commit
that referenced
this pull request
May 8, 2026
…dicators (#82) ## Why Three small frictions, fixed in one PR. 1. **Filters forget themselves on refresh.** Pick a stage chip on `/benchmark`, refresh — the chip resets. Worse: you can't share a deep-link to "PROTEA at stage=reranker, K=5". 2. **Long tables lose their header on scroll.** With the sticky `h-16` chrome header above, `<thead>` should stay visible too — currently it scrolls away with the body, and you forget which column is which. 3. **No scroll affordance on overflowing tables.** The user has to discover horizontal scroll by trial. ## What ### URL-sync filters — `apps/web/lib/useUrlParam.ts` - New `useUrlParam(key, default)` and `useUrlNumber(key, default)` hooks: two-way bind a query-string key to React state. `router.replace` with `scroll: false` so chip clicks don't pollute history or scroll the page. Defaults are dropped from the URL (clean copy/paste). - Wired in: - `/benchmark` → `stage`, `k`, `eval_set` - `/jobs` → `status` - `/proteins` → `tab` - Refresh / share-link now lands on the same view. ### Sticky data-table headers — `.protea-thead-sticky` - `position: sticky; top: 4rem` (clears the `h-16` chrome header) + white background + 1px shadow. - Applied to: - `/benchmark` matrix table `<thead>` - `/proteins` browse grid header - `/embeddings` configs grid header ### Scroll-shadow indicator — `.protea-scroll-shadow` - Roman Komarov's local/scroll `background-attachment` trick: faint shadows at the left/right edges of horizontally scrollable containers, masked by white covers anchored to the content. Communicates "there's more this way" without an extra JS observer. - Applied to the three wrappers above. ## Test plan - [x] `next build` green; 18 routes; no TS errors - [x] Backend untouched - [ ] Visual: open `/benchmark`, pick `stage=reranker` and `k=5`, refresh — chips remain selected, URL contains `?stage=reranker&k=5` - [ ] Switch to default stage/K — those keys disappear from the URL - [ ] Scroll long table on `/benchmark` (viewMode=table when PR #81 merges; or current matrix view) — `<thead>` stays pinned below the chrome header - [ ] Narrow viewport — fading shadow appears on the right edge of the wrapper, fades as you scroll right - [ ] `/proteins` tab and `/jobs` status persist on refresh
6 tasks
github-actions Bot
pushed a commit
that referenced
this pull request
May 8, 2026
## Why UX audit flagged: **only 3 of 14 pages used aria/role/label htmlFor**. Drive-by: when PRs #81 (heatmap) and #82 (URL-sync) auto-rebased onto each other, the `viewMode` state on `/benchmark` was dropped — develop currently fails to build that page. Folded the trivial fix into this PR so it can land. ## What ### A11y - **Skip-to-content** link in the locale layout: hidden until focused, lands focus on `<main id="main" tabIndex={-1}>`. Reachable on the first Tab press. - **NavLinks** dropdowns now expose `aria-haspopup="menu"`, `aria-expanded`, `aria-current="page"` on active groups. Inner items carry `role="menuitem"`. Mobile menu locks body scroll while open and restores prior overflow on close (no more dual-scroll trap). - **StatusBadge** gains a redundant leading glyph per state (clock / pulse / check / ✕ / slash-circle). Color-blind users still parse the badge at a glance. Wrapper carries `role="status"` and a verbose `aria-label`. - **Benchmark filter chips** (Pipeline stage, Neighbours K) gain `aria-pressed` plus `role="group"` with descriptive `aria-label`. - **Heatmap | Table toggle** already ships `role="tablist"` / `aria-selected` from #81. ### Drive-by build fix - Restore `viewMode` state and the `BenchmarkHeatmap` import on `/benchmark`. PR #81's toggle UI references `viewMode` / `setViewMode`, but the auto-rebase ladder against #82 dropped the hook line on develop, leaving the page broken at build time. Adds the lines back; default `"heatmap"`. ## Test plan - [x] `next build` green; 18 routes; no TS errors (was failing on develop before this PR) - [x] Backend untouched - [ ] Press Tab on first load — "Skip to main content" link appears top-left, focusing it jumps to `<main>` - [ ] Use a screen reader on `/jobs` — StatusBadge announces "Status: succeeded" etc. - [ ] On a small viewport: open the mobile menu, body underneath stops scrolling; close and underneath scrolls again - [ ] Tab into Pipeline-stage chips — focus ring appears, `aria-pressed` flips on selection
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
/benchmarkships a flat row-per-cell table. With 8 PLMs × ~9 (cat × aspect) cells × 3 stages × 3 K values, the narrative thesis defenders care about — which model wins where, with which scoring + K — gets buried in scrolling. A small-multiples view shows it in one screen.What
New
<BenchmarkHeatmap />componentstage / K / Fmax. Reserved slot for bootstrap CI whiskers — when the persisted CIs land (per thethesis_bootstrap_cisroadmap entry), they render in the same row without layout churn.Toggle wired into
/benchmarkviewModestate, default"heatmap".role="tablist"toggle (aria-selectedper button) sits between the leaderboards and the matrix.Existing surface preserved
Test plan
next buildgreen; 18 routes; no TS errorsapps/web//benchmark— heatmap grid renders by default with NK row, then LK, then PK; aspects flow MFO → BPO → CCO across each row.display_name · stage · K=N · 0.xxx.Notes
protea-benchmark-heatmap, fully isolated. Sister PR (feat(web): visual coherence — slate palette, density bumps, Skeleton adoption #79 coherence sweep) is in flight fromprotea-coherencia-visual— touches different files, no conflict expected.