ai-spec

Scaffold a production-grade AI feature spec before you write a line of code.

Most AI features are vibe-coded: call the LLM, the output looks reasonable, ship. Then three weeks later a prompt change silently breaks 20% of queries — and nobody knows, because "looks reasonable" was never a measurable criterion.

ai-spec fixes the root cause. One command generates the spec, the eval criteria, a seed set of golden test cases, and an ADR starter — so "done" has a precise definition before you start.

By Ruchit Suthar — Software Architect & Technical Leader. 📖 Method: AI-Driven Development: The Spec-First Workflow

Quick start

npx @ruchit07/ai-spec init "semantic search for support tickets"

That's it. You'll be prompted for the feature kind, problem statement, latency/cost budgets, and owner — then it writes:

specs/
└── semantic-search-for-support-tickets/
    ├── spec.md              # Problem, I/O contract, acceptance criteria, failure modes
    ├── eval-criteria.md     # The metrics that gate CI, with threshold rationale
    ├── eval-criteria.json   # Machine-readable thresholds for your CI
    ├── test-cases.json      # Seed golden test cases tailored to the feature kind
    └── adr.md               # Architecture Decision Record starter

Non-interactive (for scripts / CI)

ai-spec init "ticket classifier" --kind classification --yes
ai-spec init "support agent" -k agent -d ./ai-features --force

Why this matters

Without a spec	With ai-spec
"Looks good" is the bar	Measurable thresholds (accuracy ≥ 0.8, latency p95 ≤ 2000ms)
Regressions found by users	Regressions caught in CI
Prompt is the only documentation	Spec + ADR explain intent and decisions
No baseline when you switch models	Golden test set is the baseline

The discipline is the value. ai-spec makes the disciplined path the easy path.

Feature kinds

The seed test cases and spec guidance adapt to what you're building:

Kind	Tailored guidance
`rag`	Retrieval quality + groundedness as the dominant metric
`chat`	Conversation scope, tools, context-window budget
`classification`	Exact label set, out-of-distribution handling
`extraction`	Typed output schema, field-level accuracy
`agent`	Action space, stopping conditions, safety guardrails

Programmatic API

import { generateFiles, slugify } from '@ruchit07/ai-spec';

const files = generateFiles({
  featureName: 'My Feature',
  slug: slugify('My Feature'),
  kind: 'rag',
  problem: '...',
  primaryProvider: 'openai',
  latencyP95Ms: 2000,
  costPerQueryUsd: 0.005,
  accuracyThreshold: 0.8,
  groundednessThreshold: 0.85,
  owner: '@you',
});
// files: { path, content }[]

Pairs with

ai-native-app-blueprint — the production reference these specs target. Its packages/evals runs the test-cases.json this tool generates.
The Spec-First Workflow — the full method.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ai-spec

Quick start

Non-interactive (for scripts / CI)

Why this matters

Feature kinds

Programmatic API

Pairs with

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ai-spec

Quick start

Non-interactive (for scripts / CI)

Why this matters

Feature kinds

Programmatic API

Pairs with

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages