Component Hardening Protocol

Prerequisite: The component has been built and merged via the Component Build Protocol. Hardening is the quality pass — it resolves bugs, tightens polish, and surfaces visual decisions for human review. It does not add features driven by external parity.

What Hardening Is

Hardening takes a shipped component and asks: is this correct, complete, and polished?

It operates in three layers, each with a different executor and a different bar:

Layer	Executor	What it does	Bar
1. Automated audit	Navi / CI	Convention checks — tokens, naming, theming, a11y contracts, exports	Objective. Pass/fail.
2. Bug & visual fixes	Navi + human review	Fix visual bugs, state coverage gaps, edge cases, internal consistency	Clear right answer. Ship it.
3. Design review	Human (with Navi prep)	Evaluate proportions, interaction feel, composition quality, visual polish	Visual judgment. Needs a decision.

The key insight: layers 1 and 2 should not require human attention. Automate what's objective, fix what's clearly wrong, and only escalate what genuinely needs a human eye.

What Hardening Is NOT

Matching WWW. Astryx OSS is forward-looking. WWW is reference material, not a target. "WWW does X" is not a hardening issue — it's research input for the spec loop.
Adding features from external references. Missing states (paused, canceled), new props (hasRemoveOnHover, maxItems), new sub-components motivated by "WWW has this" — these go through the Component Specification Protocol.
System-level design changes. Motion systems, edge compensation, spacing algorithms, animation patterns — these affect every component and need their own spec. Once specced, they feed back into hardening via auditor references (see System Specs Feed the Auditor).

Scope: What Counts as Hardening

Hardening makes Astryx consistent with itself. Not with WWW, not with external systems — with its own conventions, tokens, and family contracts.

The Scope Test

Ask: does the component's existing API already promise this behavior?

Situation	Verdict	Why
`isDisabled` prop exists but disabled state looks identical to default	Hardening	The API promises disabled; the rendering is wrong
Input family contract says "inputs have `startIcon`" but Tokenizer is missing it	Hardening	Family consistency gap — the system already decided this
Token layer uses `success`/`error` but some components still say `positive`/`negative`	Hardening	Internal naming convergence — the system contradicts itself
TextArea padding doesn't match TextInput padding for the same size	Hardening	Sibling inconsistency within the same family
ProgressBar has no `paused` state	Spec loop	New capability — no existing contract promises this
"WWW has `numberOfLinesForLabel` on Token"	Spec loop	External feature parity, not internal consistency
Dropdown menu needs submenus	Spec loop	New feature entirely

API Changes in Hardening

Hardening can make API changes when motivated by internal consistency:

Motivation	Route
Internal consistency (our API contradicts itself)	Hardening
Family contract gap (sibling components diverged)	Hardening
Token/naming convergence (API and tokens disagree)	Hardening
WWW has feature X, we should too	Spec loop
Users need new capability Y	Spec loop
Industry SOTA does Z	Spec loop

Naming

Nobody "decides" names by opinion — not engineering, not design. If a naming issue surfaces during hardening:

Flag it — don't resolve it in the hardening pass
Queue a vibe test — LLM results decide naming
If the vibe test shows a clear winner → file a spec issue with the data
If the vibe test is inconclusive → keep the current name

See Vibe Tests for how to run naming evaluations.

Layer 1: Automated Audit

The Night Watch Component Auditor runs these checks nightly. They can also be run on-demand before a hardening review.

Checks

Token usage — No hardcoded colors, shadows, spacing, radii, or typography. Semantic type scale tokens over raw text size tokens. Correct neutral gray token semantics.
xdsThemeProps — Present on the correct element (the one with visual styles). Variant props included. Sub-elements targeted where visually distinct.
Component reuse — Close buttons use Button, dividers use Divider, icons use registry.
Prop naming — Boolean is/has prefix. Callbacks on{Verb}. Uncontrolled defaults default prefix.
Type naming — <Component>Props, <Component>Variant, <Component>Context.
Structure — displayName set. File header present. 'use client' where needed. Extends BaseProps. Has xstyle escape hatch. Uses mergeProps().
Input consistency — label, value, onChange/onChangeAction. Status shape {type, message?}.
Accessibility contracts — label on interactive components. ARIA wiring on inputs. isDisabled maps to disabled. Busy state uses aria-busy.
Export hygiene — In src/index.ts. Types exported. Entry point in tsup.config.ts.

Output

A checklist of pass/fail results per component. Failures become automatic fix PRs or flagged items depending on whether the fix is objective.

Layer 2: Bug & Visual Fixes

These are issues with a clear right answer. Navi can fix them; a human reviews the PR but shouldn't need to make judgment calls.

State Coverage

Hardening ensures every state the component already claims to support renders correctly. If the component has the prop, the state must work.

Key distinction: If a component has an isDisabled prop but disabled looks identical to default, that's hardening. If the component has no loading concept at all, adding isLoading is a spec issue.

Visual Correctness

Light mode — Correct colors, contrast, readability
Dark mode — Correct colors, no light-mode artifacts
All themes — Renders correctly across all registered themes
Token adherence — All visual values from tokens
Elevation — Shadows use elevation tokens
Spacing — Internal padding and gaps use spacing tokens
Family consistency — Matches sibling components (same padding, same sizes, same token usage within the family)

Keyboard & Accessibility

Tab order — Focusable in logical order, no focus traps (except modals)
Keyboard activation — Enter/Space for buttons, arrow keys for lists/menus
Screen reader — Roles and labels announce correctly
ARIA attributes — Expansion, selection, validity, description wired correctly
Color contrast — WCAG AA (4.5:1 text, 3:1 UI elements)

Edge Cases

Long content — Extremely long text, many items, deep nesting
Empty content — No children, no data, undefined slots
Single item — One item in a list, one tab, one breadcrumb
Rapid interaction — Fast clicks, quick focus/blur cycling
Container sizes — Narrow, wide, and constrained parents

Stories

All states demonstrated — Stories show every applicable state from the checklist above
Composition shown — Inside Dialog, Table, Card, AppShell where applicable
Edge cases shown — Long text, empty, dense layout stories exist

Layer 3: Design Review

These require visual judgment from a human. Navi prepares the form; a human fills it out.

Layer 3 is scoped to visual and interaction quality. It does not cover naming, API shape, or whether props should exist — those are engineering and spec concerns.

The Hardening Review Form

After Layers 1 and 2 are complete, Navi generates a review form for each component. The form captures structured decisions with constrained answers.

Visual Quality

Does this look right on its own terms — not compared to any reference, but as a standalone component?

Proportions — sizes, spacing, typography feel balanced?

Looks right
Adjust: _______________

Interactive states — hover, focus, active, disabled feel visually distinct?

States feel right
Adjust: _______________ (which state, what feels off)

Density — appropriately compact or spacious for its use case?

Good density
Too dense / too spacious: _______________

Composition Quality

Does this play well with the rest of the system?

Inside Dialog:

Works well / [ ] Issue: _______________ / [ ] N/A

Inside Table:

Works well / [ ] Issue: _______________ / [ ] N/A

Inside Card / Layout:

Works well / [ ] Issue: _______________

With other components:

Composes cleanly / [ ] Issue: _______________

Scope Boundary

Did anything come up that's NOT a hardening fix?

No — everything is within hardening scope
Yes — filing spec issue(s) for:
Naming concern — queue vibe test for:

How the Form Gets Used

Navi generates the form after Layers 1 and 2 are complete, pre-filling what it can (noting composition contexts from stories, listing applicable states).
Human fills it out — checking boxes, writing brief notes where adjustment is needed.
Navi processes the answers:
- "Adjust" visual items → Layer 2 fix PRs
- "Naming concern" items → vibe test queue
- "Spec issue" items → filed as spec protocol issues
The form is archived in the hardening issue as a record of decisions made.

System Specs Feed the Auditor

System-level design decisions (motion, edge compensation, density) go through the spec loop once. After that, they're encoded as wiki reference pages that the auditor checks against.

Design decision → Spec loop → Wiki spec → Auditor references it → Components converge

Over time, the wiki specs stack up:

Wiki spec	Auditor checks
Token semantics (exists)	Gray token usage, type scale tokens
xdsThemeProps rules (exists)	Placement, variant props, sub-elements
Motion (future)	Which components animate, transition values, easing
Edge compensation (future)	Optical padding adjustments, alignment rules
Density (future)	Compact mode token values, minimum touch targets

Each spec makes the automated layers cover more ground, which shrinks the human layer. Hardening gets more powerful over time without getting more manual.

Running the Protocol

For a single component

1. Layer 1: Run automated audit → fix objective failures
2. Layer 2: Walk the state/visual/a11y/edge case checklists → fix clear bugs
3. Layer 3: Generate review form → human fills it out → process answers
4. Update hardening issue → check off the component

For a batch (hardening sprint)

1. Layer 1: Run audit across all components → batch fix PR
2. Layer 2: Walk checklists per component → one PR per component or family
3. Layer 3: Generate review forms for all → human reviews in one sitting
4. Process all form answers → fix PRs + spec issues + vibe test queue

Automation Targets

The long-term goal is to push Layers 1 and 2 toward full automation:

Check	Current	Target
Token usage	Night Watch auditor (nightly)	CI check on every PR
xdsThemeProps	Night Watch auditor (nightly)	CI check on every PR
Prop/type naming	Night Watch auditor (nightly)	ESLint rule
State coverage	Manual checklist	Storybook interaction tests
Visual regression	Manual checklist	Chromatic or Playwright VRT
A11y contracts	Manual checklist	axe-core in CI
Keyboard nav	Manual checklist	Playwright keyboard tests
System specs (motion, etc.)	Manual (future)	Auditor wiki references

Layer 3 stays human. That's the point — it's the layer where visual judgment matters.

Uh oh!

Component Hardening Protocol

Component Hardening Protocol

What Hardening Is

What Hardening Is NOT

Scope: What Counts as Hardening

The Scope Test

API Changes in Hardening

Naming

Layer 1: Automated Audit

Checks

Output

Layer 2: Bug & Visual Fixes

State Coverage

Visual Correctness

Keyboard & Accessibility

Edge Cases

Stories

Layer 3: Design Review

The Hardening Review Form

Visual Quality

Composition Quality

Scope Boundary

How the Form Gets Used

System Specs Feed the Auditor

Running the Protocol

For a single component

For a batch (hardening sprint)

Automation Targets

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!