Scaffold a production-grade AI feature spec before you write a line of code.
Most AI features are vibe-coded: call the LLM, the output looks reasonable, ship. Then three weeks later a prompt change silently breaks 20% of queries — and nobody knows, because "looks reasonable" was never a measurable criterion.
ai-spec fixes the root cause. One command generates the spec, the eval criteria, a seed set of golden test cases, and an ADR starter — so "done" has a precise definition before you start.
By Ruchit Suthar — Software Architect & Technical Leader. 📖 Method: AI-Driven Development: The Spec-First Workflow
npx @ruchit07/ai-spec init "semantic search for support tickets"That's it. You'll be prompted for the feature kind, problem statement, latency/cost budgets, and owner — then it writes:
specs/
└── semantic-search-for-support-tickets/
├── spec.md # Problem, I/O contract, acceptance criteria, failure modes
├── eval-criteria.md # The metrics that gate CI, with threshold rationale
├── eval-criteria.json # Machine-readable thresholds for your CI
├── test-cases.json # Seed golden test cases tailored to the feature kind
└── adr.md # Architecture Decision Record starter
ai-spec init "ticket classifier" --kind classification --yes
ai-spec init "support agent" -k agent -d ./ai-features --force| Without a spec | With ai-spec |
|---|---|
| "Looks good" is the bar | Measurable thresholds (accuracy ≥ 0.8, latency p95 ≤ 2000ms) |
| Regressions found by users | Regressions caught in CI |
| Prompt is the only documentation | Spec + ADR explain intent and decisions |
| No baseline when you switch models | Golden test set is the baseline |
The discipline is the value. ai-spec makes the disciplined path the easy path.
The seed test cases and spec guidance adapt to what you're building:
| Kind | Tailored guidance |
|---|---|
rag |
Retrieval quality + groundedness as the dominant metric |
chat |
Conversation scope, tools, context-window budget |
classification |
Exact label set, out-of-distribution handling |
extraction |
Typed output schema, field-level accuracy |
agent |
Action space, stopping conditions, safety guardrails |
import { generateFiles, slugify } from '@ruchit07/ai-spec';
const files = generateFiles({
featureName: 'My Feature',
slug: slugify('My Feature'),
kind: 'rag',
problem: '...',
primaryProvider: 'openai',
latencyP95Ms: 2000,
costPerQueryUsd: 0.005,
accuracyThreshold: 0.8,
groundednessThreshold: 0.85,
owner: '@you',
});
// files: { path, content }[]- ai-native-app-blueprint — the production reference these specs target. Its
packages/evalsruns thetest-cases.jsonthis tool generates. - The Spec-First Workflow — the full method.
MIT