Skip to content

feat(web): benchmark heatmap small-multiples — visual ranking per cell#81

Merged
github-actions[bot] merged 2 commits intodevelopfrom
feat/web-benchmark-heatmap
May 8, 2026
Merged

feat(web): benchmark heatmap small-multiples — visual ranking per cell#81
github-actions[bot] merged 2 commits intodevelopfrom
feat/web-benchmark-heatmap

Conversation

@frapercan
Copy link
Copy Markdown
Owner

Why

/benchmark ships a flat row-per-cell table. With 8 PLMs × ~9 (cat × aspect) cells × 3 stages × 3 K values, the narrative thesis defenders care about — which model wins where, with which scoring + K — gets buried in scrolling. A small-multiples view shows it in one screen.

What

New <BenchmarkHeatmap /> component

  • 3-aspects × N-categories grid of compact cards.
  • Each card lists embeddings sorted by Fmax descending, one horizontal bar each.
  • Bar width is proportional to Fmax (primary signal for accessibility); color sits on a perceptual blue→violet HSL gradient (secondary signal).
  • Leader marked with a medal 🥇.
  • Aspect-tinted card header (MFO blue / BPO violet / CCO emerald) so the per-aspect column reads at a glance.
  • Hover tooltip exposes stage / K / Fmax. Reserved slot for bootstrap CI whiskers — when the persisted CIs land (per the thesis_bootstrap_cis roadmap entry), they render in the same row without layout churn.

Toggle wired into /benchmark

  • New viewMode state, default "heatmap".
  • role="tablist" toggle (aria-selected per button) sits between the leaderboards and the matrix.
  • Original full matrix table preserved verbatim under the Table toggle — export-friendly raw numbers stay one click away.

Existing surface preserved

  • Filters (stage / K / evaluation_set) drive both views.
  • Both leaderboards (Global champions + In-selection) stay above the toggle.
  • CSV export untouched.

Test plan

  • next build green; 18 routes; no TS errors
  • Backend untouched — no changes outside apps/web/
  • Visual: open /benchmark — heatmap grid renders by default with NK row, then LK, then PK; aspects flow MFO → BPO → CCO across each row.
  • Click a stage chip — both leaderboards + heatmap recompute; bars resort.
  • Click Table in the toggle — original matrix table appears, toggle reflects active state.
  • Hover any bar — tooltip shows display_name · stage · K=N · 0.xxx.
  • Empty cell (no data for cat × aspect under the active filter) — card shows "No data" placeholder.

Notes

…t × aspect) cell

Replaces the long flat matrix as the default view of /benchmark with a
3-aspects × N-categories grid of compact heatmap cards. Each cell ranks
embeddings by Fmax with horizontal bars colored on a perceptual scale,
the leader marked with a medal, and a slot reserved for bootstrap CI
whiskers (rendered when persisted).

The original full matrix table stays one click away behind a Heatmap |
Table toggle so the export-friendly raw-numbers view isn't lost.

New components/BenchmarkHeatmap.tsx
- bestRowsByEmbedding: collapses the matrix endpoint's per-K rows to
  one bar per embedding (the cell's best across stages/Ks already in
  the active selection).
- HSL gradient blue→violet by Fmax, bar width proportional. Color is
  supportive; the bar length is the primary signal for accessibility.
- Aspect-tinted card header (MFO blue / BPO violet / CCO emerald) so
  the per-aspect column reads at a glance.
- Hover tooltip exposes stage / K / Fmax. Future CI whiskers will
  render in the same row without changing the cell layout.

apps/web/app/[locale]/benchmark/page.tsx
- New viewMode state (default "heatmap").
- Toggle bar (role=tablist, aria-selected) rendered when there's data.
- Existing leaderboards (global + in-selection) stay above the toggle
  unchanged — they're already the per-cell story.

Behavior unchanged:
- Filters (stage, K, evaluation_set), CSV export, leaderboards, full
  matrix table — all preserved. Toggle to "Table" for the prior view.

CI: next build green; backend untouched.
@github-actions github-actions Bot enabled auto-merge (squash) May 8, 2026 16:07
github-actions Bot pushed a commit that referenced this pull request May 8, 2026
…dicators (#82)

## Why

Three small frictions, fixed in one PR.

1. **Filters forget themselves on refresh.** Pick a stage chip on
`/benchmark`, refresh — the chip resets. Worse: you can't share a
deep-link to "PROTEA at stage=reranker, K=5".
2. **Long tables lose their header on scroll.** With the sticky `h-16`
chrome header above, `<thead>` should stay visible too — currently it
scrolls away with the body, and you forget which column is which.
3. **No scroll affordance on overflowing tables.** The user has to
discover horizontal scroll by trial.

## What

### URL-sync filters — `apps/web/lib/useUrlParam.ts`
- New `useUrlParam(key, default)` and `useUrlNumber(key, default)`
hooks: two-way bind a query-string key to React state. `router.replace`
with `scroll: false` so chip clicks don't pollute history or scroll the
page. Defaults are dropped from the URL (clean copy/paste).
- Wired in:
  - `/benchmark` → `stage`, `k`, `eval_set`
  - `/jobs` → `status`
  - `/proteins` → `tab`
- Refresh / share-link now lands on the same view.

### Sticky data-table headers — `.protea-thead-sticky`
- `position: sticky; top: 4rem` (clears the `h-16` chrome header) +
white background + 1px shadow.
- Applied to:
  - `/benchmark` matrix table `<thead>`
  - `/proteins` browse grid header
  - `/embeddings` configs grid header

### Scroll-shadow indicator — `.protea-scroll-shadow`
- Roman Komarov's local/scroll `background-attachment` trick: faint
shadows at the left/right edges of horizontally scrollable containers,
masked by white covers anchored to the content. Communicates "there's
more this way" without an extra JS observer.
- Applied to the three wrappers above.

## Test plan
- [x] `next build` green; 18 routes; no TS errors
- [x] Backend untouched
- [ ] Visual: open `/benchmark`, pick `stage=reranker` and `k=5`,
refresh — chips remain selected, URL contains `?stage=reranker&k=5`
- [ ] Switch to default stage/K — those keys disappear from the URL
- [ ] Scroll long table on `/benchmark` (viewMode=table when PR #81
merges; or current matrix view) — `<thead>` stays pinned below the
chrome header
- [ ] Narrow viewport — fading shadow appears on the right edge of the
wrapper, fades as you scroll right
- [ ] `/proteins` tab and `/jobs` status persist on refresh
@github-actions github-actions Bot merged commit 89b7eee into develop May 8, 2026
13 checks passed
github-actions Bot pushed a commit that referenced this pull request May 8, 2026
## Why

UX audit flagged: **only 3 of 14 pages used aria/role/label htmlFor**.
Drive-by: when PRs #81 (heatmap) and #82 (URL-sync) auto-rebased onto
each other, the `viewMode` state on `/benchmark` was dropped — develop
currently fails to build that page. Folded the trivial fix into this PR
so it can land.

## What

### A11y
- **Skip-to-content** link in the locale layout: hidden until focused,
lands focus on `<main id="main" tabIndex={-1}>`. Reachable on the first
Tab press.
- **NavLinks** dropdowns now expose `aria-haspopup="menu"`,
`aria-expanded`, `aria-current="page"` on active groups. Inner items
carry `role="menuitem"`. Mobile menu locks body scroll while open and
restores prior overflow on close (no more dual-scroll trap).
- **StatusBadge** gains a redundant leading glyph per state (clock /
pulse / check / ✕ / slash-circle). Color-blind users still parse the
badge at a glance. Wrapper carries `role="status"` and a verbose
`aria-label`.
- **Benchmark filter chips** (Pipeline stage, Neighbours K) gain
`aria-pressed` plus `role="group"` with descriptive `aria-label`.
- **Heatmap | Table toggle** already ships `role="tablist"` /
`aria-selected` from #81.

### Drive-by build fix
- Restore `viewMode` state and the `BenchmarkHeatmap` import on
`/benchmark`. PR #81's toggle UI references `viewMode` / `setViewMode`,
but the auto-rebase ladder against #82 dropped the hook line on develop,
leaving the page broken at build time. Adds the lines back; default
`"heatmap"`.

## Test plan
- [x] `next build` green; 18 routes; no TS errors (was failing on
develop before this PR)
- [x] Backend untouched
- [ ] Press Tab on first load — "Skip to main content" link appears
top-left, focusing it jumps to `<main>`
- [ ] Use a screen reader on `/jobs` — StatusBadge announces "Status:
succeeded" etc.
- [ ] On a small viewport: open the mobile menu, body underneath stops
scrolling; close and underneath scrolls again
- [ ] Tab into Pipeline-stage chips — focus ring appears, `aria-pressed`
flips on selection
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant