specpilot — AI-Powered Specification-Driven Development

An AI agent pipeline that turns a vague idea into a needs document, a formal specification, an implementation plan, and working code — automatically, using Claude.

Built from scratch as a fully readable, minimal Python implementation so every concept is explicit and understandable. Useful as a standalone tool and as a reference for understanding how agentic frameworks (BMAD, SPECKIT) work under the hood.

1. What Is SDD?

Specification-Driven Development is a discipline where a formal specification document is produced before any implementation begins, and the entire project (planning, coding, testing) is traceable back to that spec.

An SDD AI framework automates that process using AI agents:

User's vague idea
       |
       v
  [DISCOVERY]  <-- AI asks clarifying questions until the need is precise
       |
       v
 [SPECIFICATION] <-- AI formalises the need into a structured spec document
       |
       v
   [PLANNING]  <-- AI breaks the spec into an ordered implementation plan
       |
       v
[IMPLEMENTATION] <-- AI guides the developer task-by-task through the build
       |
       v
  Documented, working software

Each stage produces a markdown artifact saved to disk (needs.md, spec.md, plan.md, impl_notes.md). These artifacts are the single source of truth — later stages always read from them rather than relying on conversation memory.

2. Architecture Overview

┌──────────────────────────────────────────────────────────────────────────┐
│         main.py  /  tests/simple_test.py  /  tests/test_run.py           │
│                          (CLI entry points)                              │
└──────────────────────────────┬───────────────────────────────────────────┘
                               │ creates & wires
                               v
┌──────────────────────────────────────────────────────────────────────────┐
│                           Orchestrator                                   │
│                                                                          │
│   Stage:  DISCOVERY → SPECIFICATION → PLANNING → IMPLEMENTATION → DONE  │
│                                                                          │
│   Routing rule: send user input to agents[current_stage]                │
│   Transition:   when agent sets context.stage_advance_requested = True  │
└────────┬──────────────────────────────────────────────────────┬──────────┘
         │ reads/writes                                          │ reads/writes
         v                                                       v
┌─────────────────────┐                             ┌───────────────────────┐
│   ProjectContext    │                             │      Agent (x4)       │
│   (shared state)    │◄────────────────────────────│                       │
│                     │                             │  name, role           │
│  raw_need           │  injected into every        │  system_prompt        │
│  clarified_need     │  LLM call as context        │  skills  (list)       │
│  spec_document      │  summary                    │  conversation_history │
│  plan_document      │                             │                       │
│  impl_notes         │                             │  run(user_msg) ──┐    │
│  stage              │                             │                  │    │
│  stage_advance_*    │                             │  _tool_use_loop()│    │
│  workspace_dir      │                             └──────────────────┼────┘
└─────────────────────┘                                                │
                                                                       │ calls
                                                     ┌─────────────────v────┐
                                                     │   Anthropic API      │
                                                     │   (Claude)           │
                                                     │                      │
                                                     │  stop_reason:        │
                                                     │  "end_turn" → text   │
                                                     │  "tool_use" → loop   │
                                                     └──────────────────────┘
                                                          │          │
                                               tool_use   │          │ result
                                               block      v          │
                                              ┌───────────────────┐  │
                                              │   Skill.execute() │──┘
                                              │                   │
                                              │  write_document()   │ → workspace/*.md
                                              │  write_code_file() │ → workspace/*.py etc.
                                              │  advance_stage()   │ → context flag
                                              └───────────────────┘

Key design principle

Agents do not talk to each other. They communicate through the shared ProjectContext.

The Elicitation agent writes needs.md and sets context.clarified_need. The Specification agent reads that field (injected into its system prompt) — it never "calls" the Elicitation agent. This loose coupling means any agent can be replaced independently.

3. Core Concepts Explained

3.1 ProjectContext — the shared blackboard

File: framework/core/context.py

ProjectContext
├── raw_need           str   — user's first, unpolished sentence
├── clarified_need     str   — refined summary after elicitation
├── spec_document      str   — formal spec written by SpecificationAgent
├── plan_document      str   — task breakdown written by PlanningAgent
├── impl_notes         str   — progress log written by ImplementationAgent
├── stage              Stage — current stage (enum)
├── stage_advance_requested  bool — flipped by advance_stage skill
├── stage_advance_summary    str  — what the agent accomplished
└── workspace_dir      str   — where *.md files are saved ("workspace/")

Every agent receives a text summary of the context injected into its system prompt on every LLM call:

# agent.py
def _system_prompt_with_context(self) -> str:
    return (
        self._base_system_prompt
        + "\n\n## Current Project Context\n"
        + self.context.summary_for_agents()
    )

This means even if the same agent is called many turns later, it always has an up-to-date view of what previous stages produced — without any explicit handoff message.

Why a shared blackboard instead of message passing?

Message passing (agent A sends a message to agent B) creates tight coupling and requires a shared message bus. A blackboard is simpler: every agent reads and writes the same object. This is the pattern used by BMAD's document dependency chain and SPECKIT's SPEC.md / PLAN.md / TASKS.md artifacts.

3.2 Skill — a Python function exposed as an LLM tool

File: framework/core/skill.py

A Skill wraps a plain Python function and exposes it to Claude as a tool definition (JSON Schema).

@dataclass
class Skill:
    name: str          # "write_document"
    description: str   # shown to the LLM to help it decide when to call it
    parameters: dict   # JSON Schema of the function's arguments
    execute: Callable  # the actual Python function

Converting a skill to the Anthropic API format is one method:

def to_tool_schema(self) -> dict:
    return {
        "name": self.name,
        "description": self.description,
        "input_schema": self.parameters,   # Anthropic's required field name
    }

The three skills in this framework:

Skill	Who has it	What it does	Side effect
`write_document`	All agents	Saves a markdown file to `workspace/`	Updates the matching context field (e.g. `context.spec_document`)
`write_code_file`	ImplementationAgent	Writes any source file (`.py`, `.toml`, …) to `workspace/` so code can be run	Appends path to `context.code_files`
`advance_stage`	All agents	Signals that the stage is complete	Sets `context.stage_advance_requested = True`

Agents call skills by requesting them in the LLM response — they never call skill.execute() directly. The agent's tool-use loop dispatches the call.

3.3 Agent — role + system prompt + tool-use loop

File: framework/core/agent.py

Each agent is an instance of the Agent class with:

A name and role (e.g. "Elicitor / Product Analyst")
A system prompt defining its expertise and instructions
A list of skills it can invoke
Its own conversation history (messages within this stage only)

The tool-use loop

This is the heart of the framework. When an agent calls the LLM, Claude may respond with text (done) or with a tool_use block (it wants to run a skill).

Agent.run(user_message)
  │
  ├─ append message to conversation_history
  │
  └─ _tool_use_loop()
       │
       ├─ call Anthropic API with:
       │    - system prompt (base + context summary)
       │    - full conversation history
       │    - tools = [skill.to_tool_schema() for skill in self.skills]
       │
       ├─ if stop_reason == "tool_use":
       │    for each tool_use block in response:
       │      result = _dispatch_skill(block.name, block.input)
       │    append assistant turn (with tool_use blocks) to messages
       │    append user turn (with tool_result blocks) to messages
       │    └─ LOOP AGAIN (Claude gets the tool results and continues)
       │
       └─ if stop_reason == "end_turn":
            extract text from content blocks
            return text

In a single turn, Claude may call multiple skills before returning text. For example the Specification agent calls write_document then advance_stage in the same response — the loop handles both before returning.

After _tool_use_loop returns, run() checks:

if self.context.stage_advance_requested:
    self._stage_complete = True
    self.context.stage_advance_requested = False   # consumed

This is how stage transitions work: the skill writes to the context, the agent reads from it, the orchestrator reads from the agent.

3.4 Orchestrator — the stage-machine router

File: framework/core/orchestrator.py

The orchestrator holds a dict[Stage, Agent] and a reference to the shared ProjectContext. Its job is purely routing:

def process(self, user_input: str) -> tuple[str, bool]:
    agent = self.agents[self.context.stage]   # pick agent for current stage
    response = agent.run(user_input)           # delegate

    stage_changed = False
    if agent.stage_complete:
        agent.reset_stage_complete()
        stage_changed = self._advance_stage()  # move context.stage forward

    return response, stage_changed

_advance_stage walks a fixed ordered list:

DISCOVERY → SPECIFICATION → PLANNING → IMPLEMENTATION → DONE

When DONE is reached, orchestrator.is_done() returns True and the REPL exits.

Why a linear state machine?

It maps directly onto SDD's sequential workflow. Each stage has a single clear purpose and must complete before the next begins. This is intentional for a learning framework — real frameworks like LangGraph use directed graphs that allow loops and parallel branches (see Section 8).

4. Step-by-Step: What Happens During a Run

This traces every event for a single "I want to build a todo list app" input.

Step 1 — Bootstrapping (`main.py` or `test_run.py`)

client = anthropic.Anthropic(api_key=...)
context = ProjectContext(workspace_dir="workspace")

agents = {
    Stage.DISCOVERY:       make_elicitation_agent(context, client),
    Stage.SPECIFICATION:   make_specification_agent(context, client),
    Stage.PLANNING:        make_planning_agent(context, client),
    Stage.IMPLEMENTATION:  make_implementation_agent(context, client),
}

orchestrator = Orchestrator(context=context, agents=agents)

All four agents share the same context object and the same client. Nothing is called yet.

Step 2 — First user message enters

User: "I want to build a todo list app"

orchestrator.process("I want to build a todo list app") is called. context.stage == Stage.DISCOVERY, so the ElicitationAgent receives the message.

Step 3 — ElicitationAgent calls the LLM

agent.run() → _tool_use_loop() → Anthropic API call with:

System prompt: "You are a senior product analyst..." + context summary
Messages: [{"role": "user", "content": "I want to build..."}]
Tools: [write_document schema, advance_stage schema]

Claude responds with stop_reason == "end_turn" and a text question:

"Great idea! Who are the main users and what are the 3 core features?"

No skill was called. The loop exits immediately. The text is returned to the REPL.

Step 4 — Several more turns

The user answers the clarifying questions. Each turn:

User input → orchestrator.process() → agent.run()
LLM responds with another question (no tool call yet)
Text printed to user

Step 5 — Elicitation agent decides it has enough information

After the user confirms the scope, Claude's response includes tool_use blocks:

[
  {
    "type": "tool_use",
    "id": "toolu_01",
    "name": "write_document",
    "input": {
      "filename": "needs.md",
      "content": "# Project Needs\n...",
      "doc_type": "needs"
    }
  },
  {
    "type": "tool_use",
    "id": "toolu_02",
    "name": "advance_stage",
    "input": { "summary": "Clarified a personal todo app with 3 features" }
  }
]

stop_reason == "tool_use". The loop:

Calls write_document(filename="needs.md", content="...", doc_type="needs") → saves file to workspace/needs.md → sets context.clarified_need = content → returns "Saved to workspace/needs.md"
Calls advance_stage(summary="...") → sets context.stage_advance_requested = True → returns "Stage advance requested"
Appends both tool results to messages and calls the LLM again.
LLM returns a text confirmation: "needs.md saved, moving to Specification..." stop_reason == "end_turn" → loop exits, text returned.

Step 6 — Orchestrator advances stage

Back in orchestrator.process():

if agent.stage_complete:          # True, because advance_stage was called
    agent.reset_stage_complete()
    stage_changed = self._advance_stage()   # context.stage = SPECIFICATION

stage_changed = True is returned to the REPL which prints the new banner.

Step 7 — SpecificationAgent takes over

Next user message → orchestrator.process() → now routes to SpecificationAgent.

The agent's system prompt says "read the clarified need from Project Context". The context summary injected into the system prompt includes:

Clarified need: Personal todo app with add/complete/delete tasks, local JSON storage, Python CLI

The SpecificationAgent writes spec.md with FR-01…FR-N sections, calls advance_stage → orchestrator moves to PLANNING.

Step 8 — Planning and Implementation follow the same pattern

Each agent reads previous artifacts from the context summary, produces its own document, and calls advance_stage to hand off.

Step 9 — DONE

When ImplementationAgent calls advance_stage, the orchestrator sets context.stage = Stage.DONE. orchestrator.is_done() returns True. The REPL prints the completion summary and exits.

workspace/ now contains:

needs.md       — clarified requirements
spec.md        — formal functional + non-functional requirements
plan.md        — phased implementation plan with tasks
impl_notes.md  — what was built, design decisions, remaining work

5. File Reference

mysdd/
│
├── main.py                    Entry point for interactive CLI
├── config.py                  API key + model + workspace dir settings
├── requirements.txt           anthropic>=0.40.0
├── tests/
│   ├── simple_test.py         Fast smoke test (~2 min, word-count tool)
│   ├── test_run.py            Automated end-to-end test (~5 min)
│   └── persist_test.py        Session save/load round-trip test
├── docs/
│   ├── ROADMAP.md             14 missing features with design sketches
│   ├── INSTALL.md             Installation guide
│   ├── USAGE.md               Usage guide
│   ├── DISTRIBUTION.md        Distribution procedure
│   ├── QUICK_REFERENCE.md     30-minute publishing checklist
│   └── GETTING_OTHERS_TO_USE.md
│
├── framework/
│   ├── core/
│   │   ├── context.py         ProjectContext dataclass + Stage enum
│   │   ├── skill.py           Skill dataclass (wraps Python fn as LLM tool)
│   │   ├── agent.py           Agent base class with tool-use loop
│   │   └── orchestrator.py    Stage-machine router
│   │
│   ├── skills/
│   │   ├── document_writer.py  write_document skill factory
│   │   └── advance_stage.py    advance_stage skill factory
│   │
│   └── agents/
│       ├── elicitation.py      Stage 1: ElicitationAgent
│       ├── specification.py    Stage 2: SpecificationAgent
│       ├── planning.py         Stage 3: PlanningAgent
│       └── implementation.py   Stage 4: ImplementationAgent
│
└── workspace/                 Generated documents land here
    ├── needs.md
    ├── spec.md
    ├── plan.md
    └── impl_notes.md

6. Setup and Installation

Prerequisites

Python 3.10 or later
An Anthropic API key (sk-ant-...)

Install

pip install specpilot

Or install from source:

git clone https://github.com/malif78/specpilot.git
cd specpilot
pip install -e .

API key — one-time setup

Create a .env file in the project root (it is git-ignored and never committed):

ANTHROPIC_API_KEY=sk-ant-...

config.py loads this file automatically on every run — no need to set an environment variable in each terminal session. A real environment variable always takes precedence over .env if both are set.

7. Running the Tests

7.1 Quick smoke test

simple_test.py runs a focused demo on a word-count CLI tool — all four stages in roughly 2 minutes (~8 API calls). Good for a fast sanity check.

python tests/simple_test.py

Expected workspace output:

workspace_simple/
  needs.md          spec.md
  plan.md           impl_notes.md
  wc_tool.py        pyproject.toml   ← actual runnable code

Run the generated tool immediately after:

python workspace_simple\wc_tool.py README.md

7.2 Full automated end-to-end test

test_run.py drives all four stages using a longer scripted conversation about a "personal expense tracker CLI" (~5 minutes, ~20 API calls).

python tests/test_run.py

What it does:

Phase	Scripted messages sent	Expected agent behaviour
Discovery (3 turns)	Describes app, answers clarifying questions	Asks 1-2 questions per turn, writes `needs.md`, advances
Specification (1 turn)	"Go ahead and write the spec"	Writes `spec.md` with FR-01…, advances
Planning (1 turn)	Confirms stdlib + argparse stack	Writes `plan.md` with phases + tasks, advances
Implementation (3 turns)	"Start Phase 1", "next phase", "done"	Writes source files + `impl_notes.md`, advances

Expected output (final lines):

Final stage   : done
Has spec doc  : yes
Has plan doc  : yes
Has impl notes: yes

Workspace files:
  impl_notes.md  (≈3 KB)
  needs.md       (≈1 KB)
  plan.md        (≈6 KB)
  spec.md        (≈4 KB)

To re-run cleanly (fresh workspace):

Remove-Item workspace\*.md
python tests/test_run.py

7.3 Interactive CLI

main.py runs the real conversational REPL. Type your own application idea.

python main.py

Example session:

------------------------------------------------------------
  SDD Framework  —  Specification-Driven Development
------------------------------------------------------------

Type your idea below.  'quit' or Ctrl-C to exit.

------------------------------------------------------------
  DISCOVERY      — Understanding your need
------------------------------------------------------------

[Elicitor · Product Analyst] Welcome! Tell me about your idea...

You > I want to build a recipe manager web app
...

Type quit to exit at any point.

7.4 Reading the workspace output

After a run, inspect the generated documents:

# List all generated files with sizes
Get-ChildItem workspace\

# Read the spec
Get-Content workspace\spec.md

# Read the plan
Get-Content workspace\plan.md

What good output looks like:

needs.md — 300-600 words, mentions: problem, users, 3-5 MVP features, constraints
spec.md — structured with FR-01…FR-N numbered requirements, NFRs, out-of-scope
plan.md — 3-6 phases, each phase has checkboxed tasks naming specific files/modules
impl_notes.md — records what was built, design decisions, remaining phases

8. What Is Missing: Gap Analysis vs BMAD and SPECKIT

This framework is a learning skeleton. Below is an honest comparison with two production-grade SDD frameworks and a full list of gaps.

8.1 BMAD-METHOD

BMAD (Breakthrough Method for Agile AI-Driven Development) is an open-source framework that orchestrates 12+ specialized AI agents through a full agile workflow, with IDE integration for Claude Code, Cursor, and VSCode.

BMAD feature	Our framework	Gap
12+ specialized agents (Analyst, PM, Architect, Scrum Master, QA, Dev, PO…)	4 hardcoded agents	Only 4 agents with fixed roles; no configurable personas
Adaptive complexity — same workflow scales from a bug fix to an enterprise platform	Single fixed 4-stage pipeline	No way to skip stages, add stages, or loop back
Cross-agent delegation — agents can hand off sub-tasks to other agents	None	Agents are isolated; no inter-agent messaging
Quality gates and checklists between stages	None	No formal accept/reject between stages; an agent can advance prematurely
BMad Builder — users build and share custom agents	Agents are Python classes only	No plugin system; adding an agent requires code changes
Agile artifacts — user stories, sprint backlog, acceptance criteria	Only 4 markdown docs	No user story format, no backlog, no sprint concept
Session persistence — resume a project across sessions	Context dies with the process	Every run starts from scratch
Multiple LLM support	Claude only	No model routing or fallback

8.2 SPECKIT (GitHub Spec Kit)

SPECKIT treats specifications as executable, first-class artifacts. Its key innovations are context discovery hooks (probing the codebase before planning) and validation hooks (checking artifacts after each stage).

SPECKIT feature	Our framework	Gap
7-phase workflow (Constitution → Specification → Clarification → Planning → Task Breakdown → Implementation → Validation)	4 phases	Missing: Constitution (project governance), Task Breakdown (granular task list), Validation (post-implementation checks)
Context discovery hooks — agents read existing code/APIs/conventions before planning	None	Agents have no awareness of an existing codebase; they hallucinate file names and APIs
Validation hooks — post-phase checks verify artifacts (do the files exist? do the tests pass?)	None	No verification that what was planned actually got built
SPEC.md → PLAN.md → TASKS.md pipeline — each artifact is a typed, structured document	Freeform markdown	Documents have no enforced schema; a misbehaving agent could produce garbage
Agent-agnostic — works with any AI assistant (Claude, Copilot, Gemini, Cursor…)	Claude only	Hard dependency on Anthropic SDK
Customization presets and extensions	None	No configuration file; all customization requires Python code changes
Task tracking — TASKS.md with explicit done/not-done state	impl_notes.md is prose	No machine-readable task state; cannot resume mid-plan

8.3 Full Gap List

The following features exist in production frameworks but are absent here. They are roughly ordered from highest to lowest impact.

Persistence and Memory

Gap	Description	How to add it
No session resumption	Killing the process loses all context	Serialize `ProjectContext` to JSON on every state change; load on startup if file exists
No cross-session memory	Agents forget previous projects	Add a vector store (ChromaDB, FAISS) indexed by project; inject relevant past decisions into system prompts
No long-term agent memory	Each agent's conversation history resets per run	Persist `agent.conversation_history` to disk alongside context

Orchestration

Gap	Description	How to add it
Linear only	Stages go forward only; no loops, no branches	Replace the list-based state machine with a directed graph (LangGraph pattern); add loop-back edges for "needs more clarification"
No parallel agents	Agents run sequentially	Use `asyncio` + `asyncio.gather` to run independent agents concurrently (e.g., Architect and QA reviewing the spec simultaneously)
No agent delegation	An agent cannot spawn a sub-agent	Add a `delegate_to(agent_name, task)` skill that calls another agent as a sub-task
No human-in-the-loop gates	Stages advance automatically when an agent says so	Add a formal approval step — pause, show the user the artifact, require explicit "approve" or "request changes"

Context and Grounding

Gap	Description	How to add it
No codebase discovery	Agents don't know the existing project structure	Add a `discover_context` skill that runs `git ls-files`, reads key files, and injects findings into the planning stage
No web/doc search	Agents can't look up libraries, APIs, or standards	Add a `web_search` skill backed by a search API
No RAG	No retrieval of relevant past decisions or docs	Add vector-search over the workspace documents so later agents can query earlier artifacts semantically

Output Quality

Gap	Description	How to add it
No artifact schema validation	Agents can produce malformed documents	Define JSON schemas for each document type; parse the LLM output and retry if validation fails
No retry / fallback logic	Any API error or bad output crashes the run	Wrap `_tool_use_loop` in exponential backoff; add an output validator that triggers a re-prompt on failure
No output evaluation	No way to score whether the spec is complete	Add an Evaluator agent that scores each artifact against a rubric and returns a pass/fail with feedback

Developer Experience

Gap	Description	How to add it
No streaming	Responses appear all at once (blocking)	Use `client.messages.stream()` and print tokens as they arrive
No async	Everything is synchronous; UI freezes during LLM calls	Rewrite `_tool_use_loop` with `asyncio`; use `client.messages.create_async()`
No observability	No tracing, token counts, or cost tracking	Log every LLM call with timestamp, tokens in/out, cost; integrate with LangSmith or a custom logger
No prompt versioning	System prompts are hardcoded strings	Move prompts to YAML/TOML files; version them in git; A/B test variants
Hardcoded agents	Adding a new agent requires Python code	Define agents in a config file (YAML); the framework loads them dynamically
No tool library	Only 2 skills available	Add: `run_code`, `read_file`, `search_web`, `run_tests`, `create_github_issue`, `send_email`, …

8.4 Summary Table

Feature                        Our Framework   SPECKIT   BMAD
--------------------------------------------------------------
Core SDD workflow                   Y            Y        Y
Multi-stage artifacts               Y            Y        Y
Tool use (skills)                   Y            Y        Y
Session persistence                 Y            Y        Y
Codebase discovery hooks            N            Y        N
Post-stage validation               N            Y        N
Non-linear orchestration            N            N        Y
12+ specialized agents              N            N        Y
Human-in-the-loop gates             N            Y        Y
Long-term memory / RAG              N            N        Y
Parallel agent execution            N            N        N
Streaming responses                 N            Y        Y
Artifact schema validation          N            Y        N
Retry / fallback logic              N            Y        N
Observability / tracing             N            N        Y
Configurable agents (no code)       N            Y        Y
Multi-LLM support                   N            Y        N

8.5 Session Persistence (Implemented)

Session persistence has been implemented. It is the foundation for all other advanced features — you cannot build evaluation pipelines or long-term memory without it.

How it works

workspace/
  .session.json        ← written atomically after every agent turn
  needs.md
  spec.md
  plan.md
  impl_notes.json

The session file stores two things:

Context snapshot — all ProjectContext fields serialized as JSON. The stage enum is stored as its string value ("planning"). Transient flags (stage_advance_requested) are always reset to False.
Agent conversation histories — each agent's per-stage message list, keyed by stage name. This is what allows an agent to resume mid-conversation without re-asking questions it already answered.

{
  "version": 1,
  "saved_at": "2026-05-27T13:47:55",
  "context": {
    "raw_need": "a note-taking CLI app",
    "clarified_need": "...",
    "spec_document": "...",
    "stage": "planning",
    "workspace_dir": "workspace"
  },
  "agent_histories": {
    "discovery":       [{"role": "user", "content": "..."}, ...],
    "specification":   [...],
    "planning":        [...],
    "implementation":  [...]
  }
}

Key design decisions

Decision	Reason
In-place context restore (`restore_from_dict`)	All agents hold a reference to the same context object. Replacing it with a new one would leave agents pointing at stale data.
Atomic write (temp file → `os.replace`)	A crash mid-save never produces a corrupt session file — the old file remains intact until the new one is fully written.
Transient flags not persisted	`stage_advance_requested` is an in-flight signal, not state. Persisting it could cause the stage to advance twice on resume.
Session deleted on DONE	A completed project should start fresh next time. The workspace documents (`spec.md`, etc.) are the durable artifacts — the session file is scaffolding.
`session_metadata()` fast-read	The resume prompt reads only the small metadata header, not the full document content, so the prompt appears instantly even for large sessions.

Resume flow in `main.py`

python main.py
  │
  ├─ build_orchestrator()          — fresh context + agents (all blank)
  │
  ├─ _maybe_resume()
  │     ├─ session_metadata()      — fast-read: stage, saved_at, raw_need preview
  │     ├─ print resume prompt
  │     └─ if Y: orchestrator.load_session()
  │               ├─ context.restore_from_dict()  — fills all context fields
  │               └─ agent.conversation_history = saved_history  (per stage)
  │
  └─ run_repl(resumed=True/False)
        ├─ if fresh: send opening message → ElicitationAgent greets user
        └─ if resumed: skip opening message → user types next message directly

Running the persistence test

python tests/persist_test.py

This test verifies the full round-trip without running the complete pipeline:

Sends 2 turns to the elicitation agent (makes 2 real API calls)
Asserts the session file was written correctly
Builds a brand-new orchestrator (simulating a restart)
Loads the session and asserts every field and history message matches
Verifies session_metadata() fast-read
Verifies delete_session() removes the file

Every other gap (memory, RAG, validation) builds on top of persistence.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
configs		configs
framework		framework
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
config.py		config.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

specpilot — AI-Powered Specification-Driven Development

Table of Contents

1. What Is SDD?

2. Architecture Overview

Key design principle

3. Core Concepts Explained

3.1 ProjectContext — the shared blackboard

3.2 Skill — a Python function exposed as an LLM tool

3.3 Agent — role + system prompt + tool-use loop

The tool-use loop

3.4 Orchestrator — the stage-machine router

4. Step-by-Step: What Happens During a Run

Step 1 — Bootstrapping (main.py or test_run.py)

Step 2 — First user message enters

Step 3 — ElicitationAgent calls the LLM

Step 4 — Several more turns

Step 5 — Elicitation agent decides it has enough information

Step 6 — Orchestrator advances stage

Step 7 — SpecificationAgent takes over

Step 8 — Planning and Implementation follow the same pattern

Step 9 — DONE

5. File Reference

6. Setup and Installation

Prerequisites

Install

API key — one-time setup

7. Running the Tests

7.1 Quick smoke test

7.2 Full automated end-to-end test

7.3 Interactive CLI

7.4 Reading the workspace output

8. What Is Missing: Gap Analysis vs BMAD and SPECKIT

8.1 BMAD-METHOD

8.2 SPECKIT (GitHub Spec Kit)

8.3 Full Gap List

Persistence and Memory

Orchestration

Context and Grounding

Output Quality

Developer Experience

8.4 Summary Table

8.5 Session Persistence (Implemented)

How it works

Key design decisions

Resume flow in main.py

Running the persistence test

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Step 1 — Bootstrapping (`main.py` or `test_run.py`)

Resume flow in `main.py`

Packages