Skip to content

Vibe Tests

Cindy Zhang edited this page Jun 23, 2026 · 1 revision

Vibe Tests

Vibe testing in Astryx serves two distinct purposes. Each has its own page:


For API Decisions → API Arbitration

When you're choosing between API shapes (hook vs prop, one name vs another, composition vs config), run an ad-hoc vibe test to resolve the debate with data.

Use when: debating API options, naming disputes, spec review uncertainty.


For System Evaluation → Vibe Evaluation

Nightly and periodic benchmarks that measure how well Astryx performs compared to alternatives (shadcn/Tailwind, raw HTML). The ongoing scorecard.

Use when: checking system health, running the nightly job, comparing targets.


Shared Methodology

Both workflows share core methodology documented in Vibe Evaluation:

  • Sub-Agent Isolation — how to prevent contamination
  • Judge Agent Evaluation — comparative scoring by a dedicated judge
  • Prompt Design — writing prompts that describe UX, not components
  • Interpreting Results — what the numbers mean

Which One Do I Want?

I want to... Page
Choose between two API shapes API Arbitration
Settle a naming debate API Arbitration
Check if Astryx is beating baseline Vibe Evaluation
Run the nightly evaluation Vibe Evaluation
Understand the scoring dimensions Vibe Evaluation#What Gets Measured
Learn about sub-agent isolation Vibe Evaluation#Sub-Agent Isolation
See trend data over time Vibe Evaluation#Trend Tracking

Related

Clone this wiki locally