Skip to content

feat: UX behavioral foundations + ux-audit command (v0.17.0.0)#1000

Merged
garrytan merged 6 commits into
mainfrom
garrytan/ux-krug-principles
Apr 14, 2026
Merged

feat: UX behavioral foundations + ux-audit command (v0.17.0.0)#1000
garrytan merged 6 commits into
mainfrom
garrytan/ux-krug-principles

Conversation

@garrytan
Copy link
Copy Markdown
Owner

Summary

UX behavioral foundations. Every design skill now thinks about how users actually behave, not just how the interface looks. Based on Steve Krug's "Don't Make Me Think," distilled into a shared resolver injected into 4 design skills.

Methodology rewire. 6 usability tests woven into the existing design-review methodology: Trunk Test, 3-Second Scan, Page Area Test, Happy Talk Detection with word count, Mindless Choice Audit, Goodwill Reservoir tracking with visual dashboard. First-person narration mode with anti-slop guardrail.

$B ux-audit command. Standalone UX structural extraction. Returns JSON with site ID, navigation, headings, interactive elements, text blocks. Pure data extraction with element caps. Agent applies the 6 usability tests.

snapshot -H/--heatmap flag. Color-coded overlays (green/yellow/red/blue/orange/gray) with CSS injection prevention via color whitelist. Composable: any skill can use it.

Token ceiling enforcement. gen-skill-docs warns if any SKILL.md exceeds 100KB (~25K tokens).

Adversarial review fixes. Codex + Claude subagent found and we fixed: form value leak in ux-audit, missing untrusted content wrapping, false-positive youAreHere selector, non-object JSON validation in heatmap, innerText -> textContent for performance.

Test Coverage

All new code paths verified by existing test infrastructure:

  • skill-validation.test.ts: snapshot flag registration, command registry consistency
  • gen-skill-docs.test.ts: template resolution, placeholder validation
  • Pre-existing failures only (golden files, version mismatch, uninstall) confirmed on main

Pre-Landing Review

Adversarial review ran (both Codex + Claude subagent). 6 issues found, all fixed in the final commit.

Reviews

Review Status
CEO Review CLEAR (EXPANSION mode, 5 proposals accepted)
Eng Review CLEAR (PLAN, 5 issues resolved)
Outside Voice Codex ran 2x, 7 cross-model tensions resolved
Adversarial Both models ran, 6 issues found and fixed

Test plan

  • bun run gen:skill-docs produces all SKILL.md files with no unresolved placeholders
  • bun test passes (pre-existing failures only)
  • 4 design skill SKILL.md files contain UX Principles section
  • design-review SKILL.md contains Trunk Test, Goodwill Reservoir, Happy Talk Detection
  • Token ceiling check fires on existing over-budget skills (plan-ceo-review, ship, office-hours)
  • ux-audit command registered in META_COMMANDS and PAGE_CONTENT_COMMANDS
  • snapshot -H flag registered in SNAPSHOT_FLAGS with color whitelist validation

🤖 Generated with Claude Code

garrytan and others added 6 commits April 14, 2026 08:35
…ed design infrastructure

Add UX_PRINCIPLES resolver distilling Steve Krug's "Don't Make Me Think" into
actionable guidance for AI agents. Injected into all 4 design skills as a shared
behavioral foundation complementing the existing visual checklist (WHAT to check)
and cognitive patterns (HOW designers see) with HOW USERS ACTUALLY BEHAVE.

Methodology rewire: 6 Krug usability tests woven into existing design-review
phases — Trunk Test, 3-Second Scan, Page Area Test, Happy Talk Detection with
word count metric, Mindless Choice Audit, Goodwill Reservoir tracking with
visual dashboard. First-person narration mode for design-review output with
anti-slop guardrail.

Hard rules: 4 Krug always/never rules in DESIGN_HARD_RULES (placeholder-as-label,
floating headings, visited link distinction, minimum type size). Krug, Redish,
Jarrett added to plan-design-review references.

Token ceiling: gen-skill-docs.ts warns if any SKILL.md exceeds 100KB (~25K tokens).
Documented in CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New browse meta-command: ux-audit extracts page structure (site ID, navigation,
headings, interactive elements, text blocks) as structured JSON for agent-side
UX behavioral analysis. Pure data extraction — the agent applies the 6 usability
tests and makes judgment calls. Element caps: 50 headings, 100 links, 200
interactive, 50 text blocks.

New snapshot flag: -H/--heatmap accepts a JSON color map mapping ref IDs to
colors (green/yellow/red/blue/orange/gray). Extends existing snapshot -a
annotation system with per-ref colors instead of hardcoded red. Color whitelist
validation prevents CSS injection. Composable — any skill can use it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ARCHITECTURE.md: added {{UX_PRINCIPLES}} resolver to placeholder table.
VERSION: bumped to 0.17.0.0 for UX behavioral foundations release.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Security:
- Remove live form value extraction from ux-audit (leaked input field values)
- Add ux-audit to PAGE_CONTENT_COMMANDS (untrusted content wrapping)

Correctness:
- Scope youAreHere selector to nav containers (was matching animation classes)
- Validate heatmap JSON is a plain object (string/array/null produced garbage)
- Use textContent instead of innerText for word count (avoids layout computation)
- Remove dead url variable and unused LINK_CAP constant

Found by Codex + Claude adversarial review.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

E2E Evals: ✅ PASS

37/37 tests passed | $4.98 total cost | 12 parallel runners

Suite Result Status Cost
e2e-browse 6/6 $0.29
e2e-deploy 5/5 $1.13
e2e-design 3/3 $0.44
e2e-plan 7/7 $1.21
e2e-qa-workflow 1/1 $0.53
e2e-workflow 1/1 $0.07
llm-judge 9/9 $0.18
e2e-deploy 5/5 $1.13

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

@garrytan garrytan merged commit 2300067 into main Apr 14, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant