Skip to content

Component Hardening Protocol

Cindy Zhang edited this page Jun 23, 2026 · 1 revision

Component Hardening Protocol

Prerequisite: The component has been built and merged via the Component Build Protocol. Hardening is the quality pass — it resolves bugs, tightens polish, and surfaces visual decisions for human review. It does not add features driven by external parity.


What Hardening Is

Hardening takes a shipped component and asks: is this correct, complete, and polished?

It operates in three layers, each with a different executor and a different bar:

Layer Executor What it does Bar
1. Automated audit Navi / CI Convention checks — tokens, naming, theming, a11y contracts, exports Objective. Pass/fail.
2. Bug & visual fixes Navi + human review Fix visual bugs, state coverage gaps, edge cases, internal consistency Clear right answer. Ship it.
3. Design review Human (with Navi prep) Evaluate proportions, interaction feel, composition quality, visual polish Visual judgment. Needs a decision.

The key insight: layers 1 and 2 should not require human attention. Automate what's objective, fix what's clearly wrong, and only escalate what genuinely needs a human eye.


What Hardening Is NOT

  • Matching WWW. Astryx OSS is forward-looking. WWW is reference material, not a target. "WWW does X" is not a hardening issue — it's research input for the spec loop.
  • Adding features from external references. Missing states (paused, canceled), new props (hasRemoveOnHover, maxItems), new sub-components motivated by "WWW has this" — these go through the Component Specification Protocol.
  • System-level design changes. Motion systems, edge compensation, spacing algorithms, animation patterns — these affect every component and need their own spec. Once specced, they feed back into hardening via auditor references (see System Specs Feed the Auditor).

Scope: What Counts as Hardening

Hardening makes Astryx consistent with itself. Not with WWW, not with external systems — with its own conventions, tokens, and family contracts.

The Scope Test

Ask: does the component's existing API already promise this behavior?

Situation Verdict Why
isDisabled prop exists but disabled state looks identical to default Hardening The API promises disabled; the rendering is wrong
Input family contract says "inputs have startIcon" but Tokenizer is missing it Hardening Family consistency gap — the system already decided this
Token layer uses success/error but some components still say positive/negative Hardening Internal naming convergence — the system contradicts itself
TextArea padding doesn't match TextInput padding for the same size Hardening Sibling inconsistency within the same family
ProgressBar has no paused state Spec loop New capability — no existing contract promises this
"WWW has numberOfLinesForLabel on Token" Spec loop External feature parity, not internal consistency
Dropdown menu needs submenus Spec loop New feature entirely

API Changes in Hardening

Hardening can make API changes when motivated by internal consistency:

Motivation Route
Internal consistency (our API contradicts itself) Hardening
Family contract gap (sibling components diverged) Hardening
Token/naming convergence (API and tokens disagree) Hardening
WWW has feature X, we should too Spec loop
Users need new capability Y Spec loop
Industry SOTA does Z Spec loop

Naming

Nobody "decides" names by opinion — not engineering, not design. If a naming issue surfaces during hardening:

  1. Flag it — don't resolve it in the hardening pass
  2. Queue a vibe test — LLM results decide naming
  3. If the vibe test shows a clear winner → file a spec issue with the data
  4. If the vibe test is inconclusive → keep the current name

See Vibe Tests for how to run naming evaluations.


Layer 1: Automated Audit

The Night Watch Component Auditor runs these checks nightly. They can also be run on-demand before a hardening review.

Checks

  1. Token usage — No hardcoded colors, shadows, spacing, radii, or typography. Semantic type scale tokens over raw text size tokens. Correct neutral gray token semantics.
  2. xdsThemeProps — Present on the correct element (the one with visual styles). Variant props included. Sub-elements targeted where visually distinct.
  3. Component reuse — Close buttons use Button, dividers use Divider, icons use registry.
  4. Prop naming — Boolean is/has prefix. Callbacks on{Verb}. Uncontrolled defaults default prefix.
  5. Type naming<Component>Props, <Component>Variant, <Component>Context.
  6. StructuredisplayName set. File header present. 'use client' where needed. Extends BaseProps. Has xstyle escape hatch. Uses mergeProps().
  7. Input consistencylabel, value, onChange/onChangeAction. Status shape {type, message?}.
  8. Accessibility contractslabel on interactive components. ARIA wiring on inputs. isDisabled maps to disabled. Busy state uses aria-busy.
  9. Export hygiene — In src/index.ts. Types exported. Entry point in tsup.config.ts.

Output

A checklist of pass/fail results per component. Failures become automatic fix PRs or flagged items depending on whether the fix is objective.


Layer 2: Bug & Visual Fixes

These are issues with a clear right answer. Navi can fix them; a human reviews the PR but shouldn't need to make judgment calls.

State Coverage

Hardening ensures every state the component already claims to support renders correctly. If the component has the prop, the state must work.

  • Rest — Default appearance, no interaction
  • Hover — Visual feedback (with @media (hover: hover) guard)
  • Focus — Focus ring visible via keyboard (focus-visible)
  • Active / Pressed — Press feedback on click/tap
  • Disabled — Visually muted, non-interactive, correct ARIA
  • Loading — If isLoading exists: spinner or skeleton, interaction blocked, dimensions stable
  • Error / Warning / Success — If status exists: border treatment, correct token colors
  • Selected — If selectable: active/current state distinct from rest
  • Empty — Graceful empty state (no blank space, no layout collapse)
  • Overflow — Long text truncates or wraps, no layout breakage

Key distinction: If a component has an isDisabled prop but disabled looks identical to default, that's hardening. If the component has no loading concept at all, adding isLoading is a spec issue.

Visual Correctness

  • Light mode — Correct colors, contrast, readability
  • Dark mode — Correct colors, no light-mode artifacts
  • All themes — Renders correctly across all registered themes
  • Token adherence — All visual values from tokens
  • Elevation — Shadows use elevation tokens
  • Spacing — Internal padding and gaps use spacing tokens
  • Family consistency — Matches sibling components (same padding, same sizes, same token usage within the family)

Keyboard & Accessibility

  • Tab order — Focusable in logical order, no focus traps (except modals)
  • Keyboard activation — Enter/Space for buttons, arrow keys for lists/menus
  • Screen reader — Roles and labels announce correctly
  • ARIA attributes — Expansion, selection, validity, description wired correctly
  • Color contrast — WCAG AA (4.5:1 text, 3:1 UI elements)

Edge Cases

  • Long content — Extremely long text, many items, deep nesting
  • Empty content — No children, no data, undefined slots
  • Single item — One item in a list, one tab, one breadcrumb
  • Rapid interaction — Fast clicks, quick focus/blur cycling
  • Container sizes — Narrow, wide, and constrained parents

Stories

  • All states demonstrated — Stories show every applicable state from the checklist above
  • Composition shown — Inside Dialog, Table, Card, AppShell where applicable
  • Edge cases shown — Long text, empty, dense layout stories exist

Layer 3: Design Review

These require visual judgment from a human. Navi prepares the form; a human fills it out.

Layer 3 is scoped to visual and interaction quality. It does not cover naming, API shape, or whether props should exist — those are engineering and spec concerns.

The Hardening Review Form

After Layers 1 and 2 are complete, Navi generates a review form for each component. The form captures structured decisions with constrained answers.


Visual Quality

Does this look right on its own terms — not compared to any reference, but as a standalone component?

Proportions — sizes, spacing, typography feel balanced?

  • Looks right
  • Adjust: _______________

Interactive states — hover, focus, active, disabled feel visually distinct?

  • States feel right
  • Adjust: _______________ (which state, what feels off)

Density — appropriately compact or spacious for its use case?

  • Good density
  • Too dense / too spacious: _______________

Composition Quality

Does this play well with the rest of the system?

Inside Dialog:

  • Works well / [ ] Issue: _______________ / [ ] N/A

Inside Table:

  • Works well / [ ] Issue: _______________ / [ ] N/A

Inside Card / Layout:

  • Works well / [ ] Issue: _______________

With other components:

  • Composes cleanly / [ ] Issue: _______________

Scope Boundary

Did anything come up that's NOT a hardening fix?

  • No — everything is within hardening scope
  • Yes — filing spec issue(s) for:

  • Naming concern — queue vibe test for:


How the Form Gets Used

  1. Navi generates the form after Layers 1 and 2 are complete, pre-filling what it can (noting composition contexts from stories, listing applicable states).
  2. Human fills it out — checking boxes, writing brief notes where adjustment is needed.
  3. Navi processes the answers:
    • "Adjust" visual items → Layer 2 fix PRs
    • "Naming concern" items → vibe test queue
    • "Spec issue" items → filed as spec protocol issues
  4. The form is archived in the hardening issue as a record of decisions made.

System Specs Feed the Auditor

System-level design decisions (motion, edge compensation, density) go through the spec loop once. After that, they're encoded as wiki reference pages that the auditor checks against.

Design decision → Spec loop → Wiki spec → Auditor references it → Components converge

Over time, the wiki specs stack up:

Wiki spec Auditor checks
Token semantics (exists) Gray token usage, type scale tokens
xdsThemeProps rules (exists) Placement, variant props, sub-elements
Motion (future) Which components animate, transition values, easing
Edge compensation (future) Optical padding adjustments, alignment rules
Density (future) Compact mode token values, minimum touch targets

Each spec makes the automated layers cover more ground, which shrinks the human layer. Hardening gets more powerful over time without getting more manual.


Running the Protocol

For a single component

1. Layer 1: Run automated audit → fix objective failures
2. Layer 2: Walk the state/visual/a11y/edge case checklists → fix clear bugs
3. Layer 3: Generate review form → human fills it out → process answers
4. Update hardening issue → check off the component

For a batch (hardening sprint)

1. Layer 1: Run audit across all components → batch fix PR
2. Layer 2: Walk checklists per component → one PR per component or family
3. Layer 3: Generate review forms for all → human reviews in one sitting
4. Process all form answers → fix PRs + spec issues + vibe test queue

Automation Targets

The long-term goal is to push Layers 1 and 2 toward full automation:

Check Current Target
Token usage Night Watch auditor (nightly) CI check on every PR
xdsThemeProps Night Watch auditor (nightly) CI check on every PR
Prop/type naming Night Watch auditor (nightly) ESLint rule
State coverage Manual checklist Storybook interaction tests
Visual regression Manual checklist Chromatic or Playwright VRT
A11y contracts Manual checklist axe-core in CI
Keyboard nav Manual checklist Playwright keyboard tests
System specs (motion, etc.) Manual (future) Auditor wiki references

Layer 3 stays human. That's the point — it's the layer where visual judgment matters.

Clone this wiki locally