Vibe Tests

Vibe testing in Astryx serves two distinct purposes. Each has its own page:

For API Decisions → API Arbitration

When you're choosing between API shapes (hook vs prop, one name vs another, composition vs config), run an ad-hoc vibe test to resolve the debate with data.

Use when: debating API options, naming disputes, spec review uncertainty.

For System Evaluation → Vibe Evaluation

Nightly and periodic benchmarks that measure how well Astryx performs compared to alternatives (shadcn/Tailwind, raw HTML). The ongoing scorecard.

Use when: checking system health, running the nightly job, comparing targets.

Shared Methodology

Both workflows share core methodology documented in Vibe Evaluation:

Sub-Agent Isolation — how to prevent contamination
Judge Agent Evaluation — comparative scoring by a dedicated judge
Prompt Design — writing prompts that describe UX, not components
Interpreting Results — what the numbers mean

Which One Do I Want?

I want to...	Page
Choose between two API shapes	API Arbitration
Settle a naming debate	API Arbitration
Check if Astryx is beating baseline	Vibe Evaluation
Run the nightly evaluation	Vibe Evaluation
Understand the scoring dimensions	Vibe Evaluation#What Gets Measured
Learn about sub-agent isolation	Vibe Evaluation#Sub-Agent Isolation
See trend data over time	Vibe Evaluation#Trend Tracking

Uh oh!

Vibe Tests

Vibe Tests

For API Decisions → API Arbitration

For System Evaluation → Vibe Evaluation

Shared Methodology

Which One Do I Want?

Related

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally