-
Notifications
You must be signed in to change notification settings - Fork 27
Vibe Tests
Cindy Zhang edited this page Jun 23, 2026
·
1 revision
Vibe testing in Astryx serves two distinct purposes. Each has its own page:
For API Decisions → API Arbitration
When you're choosing between API shapes (hook vs prop, one name vs another, composition vs config), run an ad-hoc vibe test to resolve the debate with data.
Use when: debating API options, naming disputes, spec review uncertainty.
For System Evaluation → Vibe Evaluation
Nightly and periodic benchmarks that measure how well Astryx performs compared to alternatives (shadcn/Tailwind, raw HTML). The ongoing scorecard.
Use when: checking system health, running the nightly job, comparing targets.
Both workflows share core methodology documented in Vibe Evaluation:
- Sub-Agent Isolation — how to prevent contamination
- Judge Agent Evaluation — comparative scoring by a dedicated judge
- Prompt Design — writing prompts that describe UX, not components
- Interpreting Results — what the numbers mean
| I want to... | Page |
|---|---|
| Choose between two API shapes | API Arbitration |
| Settle a naming debate | API Arbitration |
| Check if Astryx is beating baseline | Vibe Evaluation |
| Run the nightly evaluation | Vibe Evaluation |
| Understand the scoring dimensions | Vibe Evaluation#What Gets Measured |
| Learn about sub-agent isolation | Vibe Evaluation#Sub-Agent Isolation |
| See trend data over time | Vibe Evaluation#Trend Tracking |
- Agent Init Prompt Vibe Testing — Testing the CLI init prompt (separate workflow)
- Component Lifecycle — Where vibe testing fits in specify → build → harden
- Contributing with AI Assistants — How contributors encounter this