-
Notifications
You must be signed in to change notification settings - Fork 27
Component Hardening Protocol
Prerequisite: The component has been built and merged via the Component Build Protocol. Hardening is the quality pass — it resolves bugs, tightens polish, and surfaces visual decisions for human review. It does not add features driven by external parity.
Hardening takes a shipped component and asks: is this correct, complete, and polished?
It operates in three layers, each with a different executor and a different bar:
| Layer | Executor | What it does | Bar |
|---|---|---|---|
| 1. Automated audit | Navi / CI | Convention checks — tokens, naming, theming, a11y contracts, exports | Objective. Pass/fail. |
| 2. Bug & visual fixes | Navi + human review | Fix visual bugs, state coverage gaps, edge cases, internal consistency | Clear right answer. Ship it. |
| 3. Design review | Human (with Navi prep) | Evaluate proportions, interaction feel, composition quality, visual polish | Visual judgment. Needs a decision. |
The key insight: layers 1 and 2 should not require human attention. Automate what's objective, fix what's clearly wrong, and only escalate what genuinely needs a human eye.
- Matching WWW. Astryx OSS is forward-looking. WWW is reference material, not a target. "WWW does X" is not a hardening issue — it's research input for the spec loop.
-
Adding features from external references. Missing states (paused, canceled), new props (
hasRemoveOnHover,maxItems), new sub-components motivated by "WWW has this" — these go through the Component Specification Protocol. - System-level design changes. Motion systems, edge compensation, spacing algorithms, animation patterns — these affect every component and need their own spec. Once specced, they feed back into hardening via auditor references (see System Specs Feed the Auditor).
Hardening makes Astryx consistent with itself. Not with WWW, not with external systems — with its own conventions, tokens, and family contracts.
Ask: does the component's existing API already promise this behavior?
| Situation | Verdict | Why |
|---|---|---|
isDisabled prop exists but disabled state looks identical to default |
Hardening | The API promises disabled; the rendering is wrong |
Input family contract says "inputs have startIcon" but Tokenizer is missing it |
Hardening | Family consistency gap — the system already decided this |
Token layer uses success/error but some components still say positive/negative
|
Hardening | Internal naming convergence — the system contradicts itself |
| TextArea padding doesn't match TextInput padding for the same size | Hardening | Sibling inconsistency within the same family |
ProgressBar has no paused state |
Spec loop | New capability — no existing contract promises this |
"WWW has numberOfLinesForLabel on Token" |
Spec loop | External feature parity, not internal consistency |
| Dropdown menu needs submenus | Spec loop | New feature entirely |
Hardening can make API changes when motivated by internal consistency:
| Motivation | Route |
|---|---|
| Internal consistency (our API contradicts itself) | Hardening |
| Family contract gap (sibling components diverged) | Hardening |
| Token/naming convergence (API and tokens disagree) | Hardening |
| WWW has feature X, we should too | Spec loop |
| Users need new capability Y | Spec loop |
| Industry SOTA does Z | Spec loop |
Nobody "decides" names by opinion — not engineering, not design. If a naming issue surfaces during hardening:
- Flag it — don't resolve it in the hardening pass
- Queue a vibe test — LLM results decide naming
- If the vibe test shows a clear winner → file a spec issue with the data
- If the vibe test is inconclusive → keep the current name
See Vibe Tests for how to run naming evaluations.
The Night Watch Component Auditor runs these checks nightly. They can also be run on-demand before a hardening review.
- Token usage — No hardcoded colors, shadows, spacing, radii, or typography. Semantic type scale tokens over raw text size tokens. Correct neutral gray token semantics.
- xdsThemeProps — Present on the correct element (the one with visual styles). Variant props included. Sub-elements targeted where visually distinct.
-
Component reuse — Close buttons use
Button, dividers useDivider, icons use registry. -
Prop naming — Boolean
is/hasprefix. Callbackson{Verb}. Uncontrolled defaultsdefaultprefix. -
Type naming —
<Component>Props,<Component>Variant,<Component>Context. -
Structure —
displayNameset. File header present.'use client'where needed. ExtendsBaseProps. Hasxstyleescape hatch. UsesmergeProps(). -
Input consistency —
label,value,onChange/onChangeAction. Status shape{type, message?}. -
Accessibility contracts —
labelon interactive components. ARIA wiring on inputs.isDisabledmaps todisabled. Busy state usesaria-busy. -
Export hygiene — In
src/index.ts. Types exported. Entry point intsup.config.ts.
A checklist of pass/fail results per component. Failures become automatic fix PRs or flagged items depending on whether the fix is objective.
These are issues with a clear right answer. Navi can fix them; a human reviews the PR but shouldn't need to make judgment calls.
Hardening ensures every state the component already claims to support renders correctly. If the component has the prop, the state must work.
- Rest — Default appearance, no interaction
- Hover — Visual feedback (with
@media (hover: hover)guard) - Focus — Focus ring visible via keyboard (
focus-visible) - Active / Pressed — Press feedback on click/tap
- Disabled — Visually muted, non-interactive, correct ARIA
- Loading — If
isLoadingexists: spinner or skeleton, interaction blocked, dimensions stable - Error / Warning / Success — If
statusexists: border treatment, correct token colors - Selected — If selectable: active/current state distinct from rest
- Empty — Graceful empty state (no blank space, no layout collapse)
- Overflow — Long text truncates or wraps, no layout breakage
Key distinction: If a component has an
isDisabledprop but disabled looks identical to default, that's hardening. If the component has no loading concept at all, addingisLoadingis a spec issue.
- Light mode — Correct colors, contrast, readability
- Dark mode — Correct colors, no light-mode artifacts
- All themes — Renders correctly across all registered themes
- Token adherence — All visual values from tokens
- Elevation — Shadows use elevation tokens
- Spacing — Internal padding and gaps use spacing tokens
- Family consistency — Matches sibling components (same padding, same sizes, same token usage within the family)
- Tab order — Focusable in logical order, no focus traps (except modals)
- Keyboard activation — Enter/Space for buttons, arrow keys for lists/menus
- Screen reader — Roles and labels announce correctly
- ARIA attributes — Expansion, selection, validity, description wired correctly
- Color contrast — WCAG AA (4.5:1 text, 3:1 UI elements)
- Long content — Extremely long text, many items, deep nesting
- Empty content — No children, no data, undefined slots
- Single item — One item in a list, one tab, one breadcrumb
- Rapid interaction — Fast clicks, quick focus/blur cycling
- Container sizes — Narrow, wide, and constrained parents
- All states demonstrated — Stories show every applicable state from the checklist above
- Composition shown — Inside Dialog, Table, Card, AppShell where applicable
- Edge cases shown — Long text, empty, dense layout stories exist
These require visual judgment from a human. Navi prepares the form; a human fills it out.
Layer 3 is scoped to visual and interaction quality. It does not cover naming, API shape, or whether props should exist — those are engineering and spec concerns.
After Layers 1 and 2 are complete, Navi generates a review form for each component. The form captures structured decisions with constrained answers.
Does this look right on its own terms — not compared to any reference, but as a standalone component?
Proportions — sizes, spacing, typography feel balanced?
- Looks right
- Adjust: _______________
Interactive states — hover, focus, active, disabled feel visually distinct?
- States feel right
- Adjust: _______________ (which state, what feels off)
Density — appropriately compact or spacious for its use case?
- Good density
- Too dense / too spacious: _______________
Does this play well with the rest of the system?
Inside Dialog:
- Works well / [ ] Issue: _______________ / [ ] N/A
Inside Table:
- Works well / [ ] Issue: _______________ / [ ] N/A
Inside Card / Layout:
- Works well / [ ] Issue: _______________
With other components:
- Composes cleanly / [ ] Issue: _______________
Did anything come up that's NOT a hardening fix?
- No — everything is within hardening scope
- Yes — filing spec issue(s) for:
-
- Naming concern — queue vibe test for:
-
- Navi generates the form after Layers 1 and 2 are complete, pre-filling what it can (noting composition contexts from stories, listing applicable states).
- Human fills it out — checking boxes, writing brief notes where adjustment is needed.
-
Navi processes the answers:
- "Adjust" visual items → Layer 2 fix PRs
- "Naming concern" items → vibe test queue
- "Spec issue" items → filed as spec protocol issues
- The form is archived in the hardening issue as a record of decisions made.
System-level design decisions (motion, edge compensation, density) go through the spec loop once. After that, they're encoded as wiki reference pages that the auditor checks against.
Design decision → Spec loop → Wiki spec → Auditor references it → Components converge
Over time, the wiki specs stack up:
| Wiki spec | Auditor checks |
|---|---|
| Token semantics (exists) | Gray token usage, type scale tokens |
| xdsThemeProps rules (exists) | Placement, variant props, sub-elements |
| Motion (future) | Which components animate, transition values, easing |
| Edge compensation (future) | Optical padding adjustments, alignment rules |
| Density (future) | Compact mode token values, minimum touch targets |
Each spec makes the automated layers cover more ground, which shrinks the human layer. Hardening gets more powerful over time without getting more manual.
1. Layer 1: Run automated audit → fix objective failures
2. Layer 2: Walk the state/visual/a11y/edge case checklists → fix clear bugs
3. Layer 3: Generate review form → human fills it out → process answers
4. Update hardening issue → check off the component
1. Layer 1: Run audit across all components → batch fix PR
2. Layer 2: Walk checklists per component → one PR per component or family
3. Layer 3: Generate review forms for all → human reviews in one sitting
4. Process all form answers → fix PRs + spec issues + vibe test queue
The long-term goal is to push Layers 1 and 2 toward full automation:
| Check | Current | Target |
|---|---|---|
| Token usage | Night Watch auditor (nightly) | CI check on every PR |
| xdsThemeProps | Night Watch auditor (nightly) | CI check on every PR |
| Prop/type naming | Night Watch auditor (nightly) | ESLint rule |
| State coverage | Manual checklist | Storybook interaction tests |
| Visual regression | Manual checklist | Chromatic or Playwright VRT |
| A11y contracts | Manual checklist | axe-core in CI |
| Keyboard nav | Manual checklist | Playwright keyboard tests |
| System specs (motion, etc.) | Manual (future) | Auditor wiki references |
Layer 3 stays human. That's the point — it's the layer where visual judgment matters.