diff --git a/.agents/skills/api-perplexity/SKILL.md b/.agents/skills/api-perplexity/SKILL.md new file mode 100644 index 00000000..219f2174 --- /dev/null +++ b/.agents/skills/api-perplexity/SKILL.md @@ -0,0 +1,193 @@ +--- +name: api-perplexity +description: "Search the live web via Perplexity Search API. Use when you need current documentation, release notes, vendor pages, news, domain-constrained web search, or date/recency filtering. Not for local codebase search or stable docs already in context." +--- + +# Perplexity Search API + +Thin retrieval tool for live web search. The agent plans queries and interprets +results; this skill handles raw retrieval only. + +**Endpoint:** `POST https://api.perplexity.ai/search` +**Auth:** `Authorization: Bearer $PERPLEXITY_API_KEY` +**Pricing:** $5.00 per 1K requests. No token-based charges. + +## When to Use + +- Current docs, changelogs, release notes +- Vendor/library documentation lookup +- News and time-sensitive information +- Domain-constrained retrieval (e.g. only official docs sites) +- Date-filtered search (published/updated before or after a date) + +## When Not to Use + +- Local codebase search (use Grep/finder) +- Stable docs already in context +- Page interaction or scraping (use browser tools) +- LLM-generated summaries (use Sonar API instead) + +## Request Schema + +All fields except `query` are optional. + +| Field | Type | Default | Notes | +|---|---|---|---| +| `query` | `string \| string[]` | — | **Required.** Up to 5 queries for batch. | +| `max_results` | `integer` | `10` | Range: 1–20 | +| `max_tokens` | `integer` | `10000` | Total content budget across all results | +| `max_tokens_per_page` | `integer` | `4096` | Per-page content extraction limit | +| `country` | `string` | — | ISO 3166-1 alpha-2 (e.g. `"US"`, `"DE"`) | +| `search_language_filter` | `string[]` | — | ISO 639-1 codes, max 20 (e.g. `["en"]`) | +| `search_domain_filter` | `string[]` | — | Max 20. Prefix `"-"` to deny (see below) | +| `search_recency_filter` | `string` | — | `"hour"` `"day"` `"week"` `"month"` `"year"` | +| `search_after_date_filter` | `string` | — | `MM/DD/YYYY` — results published after | +| `search_before_date_filter` | `string` | — | `MM/DD/YYYY` — results published before | +| `last_updated_after_filter` | `string` | — | `MM/DD/YYYY` — results updated after | +| `last_updated_before_filter` | `string` | — | `MM/DD/YYYY` — results updated before | + +### Domain Filter Rules + +One array, two modes. **Cannot mix allow and deny in the same request.** + +```json +// Allowlist — only these domains +"search_domain_filter": ["docs.stripe.com", "stripe.com"] + +// Denylist — exclude these domains (prefix with "-") +"search_domain_filter": ["-pinterest.com", "-reddit.com", "-quora.com"] +``` + +## Response Schema + +```ts +{ + id: string + server_time: string | null + results: Array<{ + title: string // page title + url: string // page URL + snippet: string // extracted content + date: string | null // publication date + last_updated: string | null + }> +} +``` + +**There is no `score` field.** Results are ranked by relevance — use list order +as rank. For single queries, `results` is a flat list. For multi-query batch, +results are grouped per query in submission order. + +## Query Strategy + +1. **Start narrow.** One precise query beats a broad one. +2. **Use domain filters before broadening query text.** Constrain sources first. +3. **Apply recency only when freshness matters.** Suppresses good evergreen content. +4. **Use low token budgets for quick lookups.** `max_tokens_per_page: 512` for headlines. +5. **Batch only for parallel angles.** Not for rephrasing the same question. +6. **Prefer 3–5 results** unless broad recall is needed. + +## Examples + +### Basic search + +```bash +curl -X POST https://api.perplexity.ai/search \ + -H "Authorization: Bearer $PERPLEXITY_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "query": "drizzle orm sqlite migration guide", + "max_results": 5 + }' +``` + +### Domain-constrained with recency + +```bash +curl -X POST https://api.perplexity.ai/search \ + -H "Authorization: Bearer $PERPLEXITY_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "query": "breaking changes v5", + "max_results": 5, + "search_domain_filter": ["docs.stripe.com"], + "search_recency_filter": "month" + }' +``` + +### Date-filtered search + +```bash +curl -X POST https://api.perplexity.ai/search \ + -H "Authorization: Bearer $PERPLEXITY_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "query": "vite 7 release", + "max_results": 5, + "search_after_date_filter": "01/01/2026" + }' +``` + +### Lightweight extraction + +```bash +curl -X POST https://api.perplexity.ai/search \ + -H "Authorization: Bearer $PERPLEXITY_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "query": "latest AI developments", + "max_results": 3, + "max_tokens": 3000, + "max_tokens_per_page": 512 + }' +``` + +### Multi-query batch + +```bash +curl -X POST https://api.perplexity.ai/search \ + -H "Authorization: Bearer $PERPLEXITY_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "query": [ + "temporal workflow determinism typescript", + "temporal activity heartbeat pattern" + ], + "max_results": 5, + "search_domain_filter": ["docs.temporal.io"] + }' +``` + +### Denylist with language filter + +```bash +curl -X POST https://api.perplexity.ai/search \ + -H "Authorization: Bearer $PERPLEXITY_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "query": "react server components best practices", + "max_results": 5, + "search_domain_filter": ["-pinterest.com", "-w3schools.com"], + "search_language_filter": ["en"] + }' +``` + +## Guardrails + +- **Date format is `MM/DD/YYYY`** — not ISO 8601. +- **Domain filter is one array** — allow or deny, not both simultaneously. +- **No score field** — trust result order; do not threshold on numeric scores. +- **`max_tokens` is content budget, not pricing** — controls how much text is extracted, not cost. +- **Batch limit is 5 queries** per request. +- **Deduplicate by URL** when combining results from multiple searches. +- **Cite sources** — include URLs in any answer derived from search results. + +## Suggested Workflow + +1. Decide if the task needs live web retrieval (current info, external docs). +2. Formulate 1–3 focused queries. Prefer domain filters over broad text. +3. Run search with appropriate `max_results` and token budgets. +4. Inspect top results — check URL relevance and snippet quality. +5. If results are thin, refine: adjust query, broaden domains, relax recency. +6. Synthesize answer from retained evidence. Cite URLs. Flag uncertainty when + only one source supports a claim. diff --git a/memory/PLAN.md b/memory/PLAN.md index cf8bf3c9..886af036 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -91,26 +91,49 @@ - Acceptance: 61 tests pass (10 new: 6 SSE adapter, 3 core, 1 app integration); tool-call-streaming-start/delta/tool-call SSE events emitted for SDK tool_use blocks; client renders dynamic-tool parts with state labels - Branch: `ln/fe-541-rich-chat-ui` -4. **Structured interview: scope phase** — Replace flat chat with structured turns. Implement the scope phase as an agent skill — the agent generates a question with options, grounding ("why this matters"), and impact signal. User selects an option or types a response. Turn persists with phase provenance. UI renders the turn card (question + options + grounding). `not-started` +4. **Structured interview: scope phase (server)** `FE-554` — Replace flat chat with structured turns. Implement the scope phase as an agent skill — the agent generates a question with options, grounding ("why this matters"), and impact signal via `ask_question` MCP tool. Turn persists with phase provenance (question, why, impact, options). `done` - Requirements: → SPEC.md §Requirements #2, #3 - - Assumptions: → SPEC.md §Assumptions A7, A13 - - Invariants to respect: → SPEC.md §Invariants I1, I2, I3, I5, I6 - - Acceptance: start a project, agent asks structured scope questions with options and grounding, user answers, turns persist with parent chain + - Assumptions: → SPEC.md §Assumptions A7, A13 (validated) + - Invariants established: → SPEC.md §Invariants I16 + - Invariants respected: → SPEC.md §Invariants I1, I2, I3, I5, I6, I12, I13 + - Acceptance: 90 tests pass (16 new interview tests, 2 new app integration); `ask_question` MCP tool validates agent output via Zod schema; per-turn MCP server created via closure over db + turnId; `getSystemPrompt(phase)` returns phase-specific prompt; structured turn fields (question, why, impact, options) persist correctly + - Branch: `ln/fe-554-structured-interview` - **Verification approach**: inner — schema validation on agent tool output (Zod parse, establishes I16); unit tests for phase-tagged turn persistence. Middle — round-trip: structured turn → persist → active path query → verify phase provenance intact. Outer — manual interview walkthrough, assess question quality. → SPEC.md §Oracle Strategy, §Acknowledged Blind Spots (interview quality) -5. **Observer agent + entity persistence** — After each answered turn, core invokes a second agent call that extracts decisions and assumptions. Writes to decision/assumption tables with turn linkage and dependency edges. Core yields `observer-complete` DomainEvent; web adapter signals client to refetch entities. `not-started` +4a. **Parts-based persistence + context builders** — Schema migration: add `user_parts` and `assistant_parts` JSON columns to turn table. Server-side: assemble final assistant `parts[]` from DomainEvents on stream finish, persist alongside scalars. Define `BrunchUIMessage` type with custom Data Parts (`data-option-selection`, `data-confirmation`). Extract `formatHistory()` into typed context builders (`buildInterviewerContext`, `buildObserverContext`). No backward-compatible fallback — DB can be re-initialized if needed. `not-started` + - Requirements: → SPEC.md §Requirements #4, #14 + - Assumptions: → SPEC.md §Assumptions A22, A23 + - Decisions: → SPEC.md §Decisions D23, D24, D25 + - Invariants established: → SPEC.md §Invariants I17, I18, I19 + - Invariants respected: → SPEC.md §Invariants I1, I5, I6, I11, I12, I13, I16 + - Acceptance: schema migration adds parts columns; assistant parts persisted on stream finish (reasoning, tool-call states, text); Data Part schemas validated via Zod on write/read (I17); parts round-trip: DomainEvents → assemble → persist → load → hydrate matches original (I18); `buildInterviewerContext()` produces equivalent output to current `formatHistory()` (I19); observer context builder produces extraction-optimized projection + - Branch: `ln/fe-554-structured-interview` (continues current branch) + - **Verification approach**: inner — round-trip oracle for parts fidelity (I18); Zod schema validation on Data Parts (I17); unit tests for context builder output shape and equivalence (I19). → SPEC.md §Oracle Strategy (inner: fast unit tests — parts). Middle — integration: full `conductTurn()` → parts persisted → reload → hydration matches live state. Outer — manual resume test via `/cli-cdp` (reasoning + tool states visible on refresh). → SPEC.md §Acknowledged Blind Spots (parts/scalar consistency). + +4b. **Structured interview: client UI** — Turn card rendering (question + options + grounding + impact badge). Option selection UI using `data-option-selection` Data Part (persist `is_selected` + structured answer). Outer-loop visual verification via `/cli-cdp`. `not-started` + - Requirements: → SPEC.md §Requirements #2, #3 + - Assumptions: → SPEC.md §Assumptions A22, A23 + - Decisions: → SPEC.md §Decisions D23, D24 + - Invariants respected: → SPEC.md §Invariants I1, I16 + - Acceptance: turn card rendering (question text, option list, grounding block, impact badge); option selection persists as `data-option-selection` Data Part in `user_parts` + `is_selected` on option row; outer-loop visual verification via `/cli-cdp` + - Branch: `ln/fe-554-structured-interview` (continues current branch) + - **Verification approach**: outer — manual interview walkthrough, assess rendering quality and option selection flow. → SPEC.md §Acknowledged Blind Spots (interview quality) + +5. **Observer agent + entity persistence** — After each answered turn, core invokes a second agent call that extracts decisions and assumptions. Writes to decision/assumption tables with turn linkage and dependency edges. Core yields `observer-complete` DomainEvent **post-commit** (after SQLite transaction); SSE adapter emits as typed data part on existing chat stream (in-band sync per D22). `not-started` - Requirements: → SPEC.md §Requirements #5 - - Assumptions: → SPEC.md §Assumptions A3, A4, A14 (validated by spike) - - Acceptance: answer a scope question, observer extracts decision + assumptions, dependency edges in DB, extraction within user think time, sidebar refetch triggered - - **Verification approach**: inner — unit tests for entity writes with dependency edges, observer-complete DomainEvent emission. Middle — differential oracle from spike fixtures (observer extraction vs golden master, ≥80% capture). Outer — debug mode: raw observer extraction visible per-turn in UI; fixture capture from confirmed-good manual runs. → SPEC.md §Oracle Strategy, §Observer History Projection, §Acknowledged Blind Spots (extraction variance, cumulative graph integrity) + - Assumptions: → SPEC.md §Assumptions A3, A4, A14 (validated by spike), A20 + - Decisions: → SPEC.md §Decisions D22 (in-band sync — observer-complete as data part) + - Acceptance: answer a scope question, observer extracts decision + assumptions, dependency edges in DB, `observer-complete` event emitted post-commit with entity IDs, extraction within user think time + - **Verification approach**: inner — unit tests for entity writes with dependency edges, observer-complete DomainEvent emission post-commit, SSE adapter data-part encoding. Middle — differential oracle from spike fixtures (observer extraction vs golden master, ≥80% capture). Outer — debug mode: raw observer extraction visible per-turn in UI; fixture capture from confirmed-good manual runs. → SPEC.md §Oracle Strategy, §Observer History Projection, §Acknowledged Blind Spots (extraction variance, cumulative graph integrity) -6. **Entity sidebar (read-only)** — React sidebar in interview workspace showing decisions, assumptions, requirements, and criteria on the active path. Tabbed display. Updates after each observer extraction via `observer-complete` event. Dependency edges visible. Stale badges for soft-invalidated entities. `not-started` +6. **Entity sidebar (read-only)** — React sidebar in interview workspace showing decisions, assumptions, requirements, and criteria on the active path. Tabbed display. TanStack Query (`useQuery`) for entity data; cache populated via `queryClient.setQueryData` from `useChat`'s `onData` callback when `observer-complete` data parts arrive (in-band sync per D22). Dependency edges visible. Stale badges for soft-invalidated entities. `not-started` - Requirements: → SPEC.md §Requirements #6 - - Assumptions: — + - Assumptions: → SPEC.md §Assumptions A21 + - Decisions: → SPEC.md §Decisions D22 (TanStack Query + in-band sync) - Invariants to respect: → SPEC.md §Invariants I9, I10 - - Acceptance: entities appear in categorized tabs as interview progresses, dependency links navigable, stale badges render correctly + - Acceptance: entities appear in categorized tabs as interview progresses, `onData` → `setQueryData` reactively updates sidebar, dependency links navigable, stale badges render correctly - Ref: → docs/design/BREADBOARD.md §UI Affordances → P2 Entity sidebar - - **Verification approach**: inner — unit tests for entity query on active path, stale badge computation. Outer — manual visual inspection (entities render correctly, tabs work, stale badges appear). Debug mode overlay (observer extraction detail per-turn) should land here or in slice 5. → SPEC.md §Oracle Strategy (outer loop), §Acknowledged Blind Spots (cumulative graph integrity) + - **Verification approach**: inner — unit tests for entity query on active path, stale badge computation. Middle — validate A21: `onData` → `setQueryData` updates sidebar without stale closure (if stale, fall back to parallel `EventSource`). Outer — manual visual inspection (entities render correctly, tabs work, stale badges appear). Debug mode overlay (observer extraction detail per-turn) should land here or in slice 5. → SPEC.md §Oracle Strategy (outer loop), §Acknowledged Blind Spots (cumulative graph integrity) ## Phase 4: Full Interview @@ -196,7 +219,7 @@ ``` Phase 1: 1 (skeleton) ──→ 2 (SQLite) Phase 2: 2 ──→ 3 (turn schema) ──→ 3c (Drizzle+core) ──→ 3d (routing) -Phase 3: 3c ──→ 3b (rich chat UI) ──→ 4 (scope interview) ──→ 5 (observer) +Phase 3: 3c ──→ 3b (rich chat UI) ──→ 4 (scope server) ──→ 4a (parts+context) ──→ 4b (client UI) ──→ 5 (observer) spike (observer fidelity) ──→ 5 3d + 5 ──→ 6 (entity sidebar) Phase 4: 6 ──→ 7 (transitions) ──→ 8 (design) ──→ 9 (requirements) ──→ 10 (criteria) @@ -209,7 +232,8 @@ Phase 6: 13 ──→ 14 (npx + CLI) ### Parallelism opportunities - ~~Slice 3b and 3d can proceed in parallel after 3c~~ (done — both landed) -- Observer spike and slice 4 can proceed in parallel now — spike is independent, slice 4 is on the critical path +- ~~Observer spike and slice 4 can proceed in parallel~~ (slice 4 server done — spike is next on critical path) +- Observer spike can proceed in parallel with 4a (parts persistence) - Slice 7 (transitions) and 11 (branching) can start in parallel once slice 6 lands - Slice 12 (entity lifecycle API) can proceed in parallel with slice 11 - Slice 14 (npx) can start early with a basic launcher, completing after slice 13 diff --git a/memory/SPEC.md b/memory/SPEC.md index ca258e4f..926cff2b 100644 --- a/memory/SPEC.md +++ b/memory/SPEC.md @@ -40,6 +40,7 @@ The architecture (layered: db → core → adapters): - **No Dolt** — replaced by SQLite with turn-tree versioning - **No AG-UI / CopilotKit** — AI SDK SSE protocol is sufficient - **No assistant-ui** — its runtime abstraction layer (`AssistantRuntimeProvider`) adds unnecessary indirection over `useChat`; brunch emits custom SSE from Express, not from AI SDK server-side, so the adapter chain (useChat → useChatRuntime → AssistantRuntimeProvider) is overhead without benefit +- **No TanStack DB** — designed for local-first client-side collections with sync engines (ElectricSQL, PowerSync); brunch is server-authoritative, single-user, with no offline or multi-tab requirements. TanStack Query + SSE-driven invalidation is sufficient. Re-evaluate if offline, multi-tab, or complex cross-collection client queries become requirements ## Requirements @@ -75,13 +76,17 @@ The architecture (layered: db → core → adapters): | A10 | The `useChat` hook can consume custom SSE without AI SDK server runtime | **validated** | D9 | Walking skeleton | Validated: useChat consumes custom SSE via DefaultChatTransport | | A11 | Stateless `query()` with prompt-stuffed history is sufficient for multi-turn interviewing — SDK session persistence is unnecessary and undesirable | **validated** | D8, D12 | SQLite foundation | Validated: formatting history into prompt works. SDK sessions rejected as competing source of truth — opaque, machine-local, incompatible with portable data goals (atomic YAML / git-versionable). Turn tree is sole session model. | | A12 | `useChat` hook accepts initial messages to hydrate conversation state from server-stored history | **validated** | D9 | SQLite foundation | Validated: `useChat` doesn't have `initialMessages` prop but `setMessages` works for hydration | -| A13 | Claude Agent SDK supports defining interview phases as agent skills with distinct system prompts and tool sets | medium | D2 | Interview phases | Test SDK skill/agent configuration API | +| A13 | Phase-specific interview behavior is achievable via system prompt switching + in-process MCP tools on `query()` — the SDK's formal `AgentDefinition` skill system is unnecessary | **validated** | D2 | Interview phases | Validated: slice 4 uses `getSystemPrompt(phase)` + `createInterviewMcpServer()` per turn; 88 tests pass. SDK `AgentDefinition` subagent system not used — simpler approach with less indirection. | | A14 | A second-thread observer agent can reliably extract decisions, assumptions, and dependency edges from a single turn's Q&A | medium | D1 | Observer agent | Probe with realistic interview exchanges; measure extraction fidelity | | A15 | The LLM can reliably judge when a phase interview has reached sufficient understanding (is_resolution) | medium | D3 | Phase resolution | Probe across varied project types; measure false-positive resolution rate | | A16 | AI SDK `useChat` hook's `ToolUIPart` state machine (`input-streaming` → `input-available` → `output-available` / `output-error` / `approval-requested` → `approval-responded` / `output-denied`) models all permutations of pending, error, and success for both interim (thinking, tool calls) and final (response) data | high | D14 | Rich chat UI | Partially validated: SSE adapter emits tool-call events, client renders `dynamic-tool` parts with state labels (input-streaming, input-available, output-available, output-error). Browser outer-loop pending. | | A17 | AI Elements copy-paste components can be restyled without forking — they are ownable source files, not npm-locked dependencies | high | D14 | Rich chat UI | Install via CLI, inspect source, confirm no hidden npm runtime dependency | | A18 | Drizzle ORM migration runner reliably auto-applies schema changes from a migrations folder at startup with better-sqlite3 | **validated** | D18 | Drizzle refactor | Validated: migrate() auto-applies at startup in createDb(); all 39 existing tests pass against Drizzle-managed schema | | A19 | `AsyncIterable` from core can be consumed by both SSE streaming (web) and line-by-line terminal output (CLI) without buffering issues | **validated** | D19 | Core extraction | Validated: conductTurn() yields DomainEvents consumed by Express SSE adapter; 12 new core tests + 9 app integration tests pass | +| A20 | Observer results can be delivered as typed data parts on the existing chat SSE stream without holding the connection open unacceptably long — observer is synchronous, runs within the same `conductTurn()` request, completes during user read time | high | D22 | Observer agent, Entity sidebar | Measure observer latency in slice 5; if >5s, fall back to out-of-band SSE (Option 2 in research doc) | +| A21 | `useChat` `onData` callback reliably bridges to `queryClient.setQueryData` without stale-closure issues — known `onFinish` stale-closure bug (ai-sdk#550) may or may not affect `onData` | medium | D22 | Entity sidebar | Test in slice 6: verify `setQueryData` from `onData` updates sidebar reactively; if stale, use parallel `EventSource` instead | +| A22 | AI SDK `UIMessage.parts[]` with custom Data Parts (typed via `dataPartsSchema`) persisted as JSON on the turn table is sufficient for faithful UI resume — no separate `turn_message` table needed for current scope | high | D23, D24 | Parts persistence | Validate by implementing parts persistence in slice 4a: hydrate `useChat` from persisted parts, verify reasoning + tool-call state round-trip on refresh | +| A23 | Custom Data Parts for structured user input (option selection, confirmation) can replace scalar `turn.answer` as the primary user-response model without breaking `formatHistory()` or observer context | high | D24 | Parts persistence | Validate in slice 4a: structured user input round-trips through persistence → hydration → re-rendering | ## Decisions @@ -100,7 +105,13 @@ The architecture (layered: db → core → adapters): 13. **Observer captures derived intelligence** — The observer agent's extraction mandate extends beyond decisions and assumptions to include derived observations (e.g. codebase analysis, domain insights) that the interviewer surfaced through tool use during a turn. These are persisted so subsequent stateless `query()` calls can inject them as context. The exact entity model is TBD — candidates include a dedicated `observation` table, enriched `decision.rationale`, or a `notes` field on `turn`. Depends on: A14, D12. Supersedes: —. 14. **Part-type rendering for rich chat UI** — Client renders message parts by type: `reasoning` (collapsible `
`), `text` (paragraph), `dynamic-tool` (tool name + state indicator with lifecycle labels). AI Elements (copy-paste components via `npx ai-elements`) deferred — hand-built rendering is sufficient for current slices. AI Elements remain the target for when richer tool-call state rendering (7 states) is needed. Depends on: A16. Supersedes: hand-rolled message rendering in App.tsx. -15. **Transitional turn-field inversion** — During the pre-structured-interview phase (slices 1–3), `turn.answer` holds the user's chat message and `turn.question` holds the agent's streamed response. This inverts the canonical interview semantics where the agent asks (`question`) and the user answers (`answer`). The inversion is temporary — slice 4 (structured interview) populates turns in their canonical direction. No schema change needed; the fields carry correct types, just with flipped temporal ordering. Client hydrates `useChat` by mapping each turn to two `UIMessage` entries (answer → user, question → assistant). Depends on: D1. Supersedes: flat `message` table with `role` field from slice 2. +15. **~~Transitional turn-field inversion~~** — **Superseded by D23 (parts-based persistence)**. Previously: `turn.answer` held user text, `turn.question` held assistant text with inverted semantics during slices 1–3. This was always marked transitional. D23 replaces both scalar fields with persisted `UIMessage.parts[]` as the source of truth for UI rendering and resume. Scalar fields (`question`, `why`, `impact`, `answer`) retained for queryability only — domain queries (active path, phase filtering, entity joins) read scalars; UI hydration reads parts. Depends on: D1. Supersedes: flat `message` table with `role` field from slice 2. + +23. **Parts-based persistence model (UIMessage/ModelMessage split)** — Two separate data layers: (1) **UI render state** (`UIMessage.parts[]` JSON) persisted per turn for faithful resume — captures reasoning blocks, tool-call lifecycle states, text, and custom Data Parts. (2) **Inference context** (`ModelMessage`-equivalent) derived at call time by typed context builders, never persisted. Turn table gains `user_parts` and `assistant_parts` JSON columns (nullable). On stream finish, core assembles final assistant `parts[]` from DomainEvents and persists alongside scalar fields. Hydration reads persisted parts when available, falls back to scalar synthesis for older turns. The turn tree remains canonical for domain semantics (branching, phase, entity joins); parts are the source of truth for rendering. Research: `docs/research/Chat Application Data Models Conversation Turns, Structured Data & Generative UI Persistence.md`. Depends on: A22. Supersedes: D15's scalar-only persistence model. + +24. **Custom Data Parts for structured user input** — User responses are not always plain text. AI SDK Data Parts (`data-{name}` typed via Zod schema) model structured user input: `data-option-selection` (`{ turnId, selectedOptionId, rationale? }`), `data-confirmation` (`{ turnId, confirmed: boolean }`), plain `text` for freeform responses. Defined as a `BrunchDataParts` type passed as generic to `UIMessage` for full-stack type safety. Assistant messages use the same mechanism for domain-specific content not covered by built-in part types: `data-phase-summary`, `data-observer-result`, `data-entity-snapshot`. Depends on: A22, A23. Supersedes: implicit assumption that `turn.answer` is always a text string. + +25. **Typed context builders replace monolithic `formatHistory()`** — Different consumers of the turn tree need different projections of the same data. `buildInterviewerContext(activePath, currentInput, entities, phase)` for conversational continuity. `buildObserverContext(turn, activePathSummary, linkedEntities)` for extraction-optimized context (see §Observer History Projection). Future: `buildPhaseResolutionContext(...)`, `buildRequirementsReviewContext(...)`. Each builder reads from the domain model (turn scalars + entity tables), NOT from persisted `UIMessage.parts[]`. The parts are for rendering; context builders are for inference. Depends on: D23, D12. Supersedes: single `formatHistory()` function in core.ts. ### Technical stack @@ -111,9 +122,10 @@ The architecture (layered: db → core → adapters): 11. **Drop list** — Dolt/mysql2, OpenCode sidecar, Preact, both existing frontend implementations, NDJSON protocol, JSON Schema definitions (→ Zod), @tanstack/react-table, @dnd-kit/, dompurify, marked, four streaming functions in claude.js, dispatch.js. Depends on: —. Supersedes: —. 16. **Integer autoincrement primary keys** — All entity tables use `INTEGER PRIMARY KEY AUTOINCREMENT` instead of `TEXT` UUIDs. SQLite ROWID alias is simpler, matches the original DBML design, avoids UUID generation. No external systems reference these IDs. Client coerces to strings for `useChat` hydration (`turn-${id}-answer`, `turn-${id}-question`). Depends on: D7. Supersedes: `randomUUID()` TEXT PKs from slice 2. 18. **Drizzle ORM replaces raw DDL** — TypeScript schema definition (`drizzle/schema.ts`) is single source of truth for types, DDL, and migrations. Auto-applies from `drizzle/migrations/` at startup. Drizzle Studio available for DB inspection during development. Depends on: A18, D7. Supersedes: raw DDL strings in db.ts, DBML design document, hand-written TypeScript interfaces. -19. **Layered architecture with DomainEvent streaming** — Core interview orchestration extracted from Express handlers into interface-agnostic service layer. Core operations: turn tree (createProject, conductTurn, getActivePath, branch, checkout), entity lifecycle (revisitDecision, falsifyAssumption, verifyAssumption, CRUD for requirements/criteria, reviewRequirement/reviewCriterion), observer (runObserver), phase (getPhaseStatus), export (exportSpec). `conductTurn()` returns `AsyncIterable` — domain events (`stream-start`, `thinking`, `text-delta`, `tool-call-start`, `tool-call-delta`, `tool-call-end`, `stream-end`, `turn-created`, `error`; future: `observer-complete`, `phase-resolved`) that each adapter translates to its transport format. Web (Express+SSE), CLI, and MCP adapters are thin transport layers. Depends on: A19, D8, D12. Supersedes: interview logic embedded in Express POST handler. +19. **Layered architecture with DomainEvent streaming** — Core interview orchestration extracted from Express handlers into interface-agnostic service layer. Core operations: turn tree (createProject, conductTurn, getActivePath, branch, checkout), entity lifecycle (revisitDecision, falsifyAssumption, verifyAssumption, CRUD for requirements/criteria, reviewRequirement/reviewCriterion), observer (runObserver), phase (getPhaseStatus), export (exportSpec). `conductTurn()` returns `AsyncIterable` — domain events (`stream-start`, `thinking`, `text-delta`, `tool-call-start`, `tool-call-delta`, `tool-call-end`, `stream-end`, `turn-created`, `error`, `observer-complete`; future: `phase-resolved`) that each adapter translates to its transport format. `observer-complete` is emitted post-commit (after SQLite transaction) and carries created entity IDs for cache coherence (see D22). Web (Express+SSE), CLI, and MCP adapters are thin transport layers. Depends on: A19, D8, D12. Supersedes: interview logic embedded in Express POST handler. 21. **oxlint + oxfmt + tsgolint replaces eslint + tsc** — oxlint for linting (including 59 type-aware rules via tsgolint, the Go-based TypeScript backend), oxfmt for formatting (single quotes, 110 width, sorted imports). `npm run fix` (lint:fix + fmt) is the fast inner loop; `npm run verify` (check + test + build) is the commit gate. `--type-check` flag replaces `tsc --noEmit`. Depends on: —. Supersedes: eslint (removed), separate `tsc --noEmit` step. 20. **CLI executable with subcommands** — `npx brunch` launches web UI (default). `npx brunch [command]` for CLI operations on the same DB. Future: sidecar MCP server. Depends on: D10, D19. Supersedes: web-only distribution model in D10. +22. **TanStack Query + SSE-driven invalidation for observer entity sync** — Observer-created entities (decisions, assumptions, edges) sync to the React UI via two mechanisms: (1) **In-band data parts** (default): `conductTurn()` yields `observer-complete` DomainEvents after the SQLite transaction commits; the Express SSE adapter emits these as typed data parts on the existing chat stream; `useChat`'s `onData` callback bridges to `queryClient.setQueryData` for instant sidebar updates. (2) **Out-of-band SSE** (fallback): if the observer moves to async post-processing, a dedicated `/api/events/:projectId` `EventSource` in a React context drives `queryClient.invalidateQueries`. TanStack Query owns all persisted entity state; a small Zustand store handles transient UI state only (observer-running indicator, phase progress). TanStack DB evaluated and rejected — overkill for server-authoritative single-user app without offline, multi-tab, or complex cross-collection query needs. Research: `docs/research/Async Server-State to UI Sync for Chat + Observer Agents.md`. Depends on: A20, A21, D4, D9, D19. Supersedes: —. ## Invariants @@ -140,6 +152,10 @@ The architecture (layered: db → core → adapters): | I13 | Core/adapter separation | Slice 3c (Drizzle) | core.test.ts, app.test.ts | D19 | | I14 | Project-scoped API routes | Slice 3d (routing) | app.test.ts | D9 | | I15 | Route loader hydration | Slice 3d (routing) | manual (outer loop) | D9 | +| I16 | Schema validation on agent tool output | Slice 4 (scope interview) | interview.test.ts | D2, A13 | +| I17 | Data Part schema validation | Slice 4a (parts persistence) | parts.test.ts | D24 | +| I18 | Parts round-trip fidelity | Slice 4a (parts persistence) | parts.test.ts | D23 | +| I19 | Context builder equivalence | Slice 4a (parts persistence) | context.test.ts | D25 | ## Lexicon @@ -172,17 +188,27 @@ The architecture (layered: db → core → adapters): | **active path** | The branch from HEAD to root in the turn tree. Determines which turns, decisions, and assumptions are currently active | | **branch** (verb) | Fork the turn tree from a given turn, creating a new path and moving HEAD. Analogous to git branch + checkout | | **checkout** (verb) | Move HEAD to an existing turn on a different branch without creating new turns. Analogous to git checkout | -| **phase** | A stage of the interview: `scope`, `design`, `requirements`, `criteria`. Immutable provenance on each turn. Each phase is backed by an agent skill | +| **phase** | A stage of the interview: `scope`, `design`, `requirements`, `criteria`. Immutable provenance on each turn. Each phase is implemented via `getSystemPrompt(phase)` + a per-turn MCP tool server (`createInterviewMcpServer`). See D2, A13 | | **phase resolution** | LLM judgment that shared understanding has been reached for a phase. Marked by `turn.is_resolution = true` on the last turn of a phase | -| **interviewer** | The primary agent role: conducts the interview with structured questions, grounding, and impact signals. Does not extract entities | +| **ask_question tool** | The MCP tool the interviewer must use each turn. Accepts `{ question, why, impact, options[] }`, validated by `structuredQuestionSchema` (Zod). The tool handler persists structured data to the turn and options tables via closure over `db` + `turnId`. Defined in `interview.ts` | +| **interview MCP server** | A per-turn MCP server created by `createInterviewMcpServer(db, turnId)`. Exposes the `ask_question` tool. The closure captures the current turn ID so the tool handler writes to the correct row. Passed to `query()` via `mcpServers` option. Defined in `interview.ts` | +| **interviewer** | The primary agent role: conducts the interview with structured questions, grounding, and impact signals. Must use the `ask_question` tool every turn. Does not extract entities | | **observer** | The secondary agent role: extracts decisions, assumptions, and dependency edges from each answered turn. Runs post-answer during user read time | | **core** | The interface-agnostic service layer between the database and transport adapters. Owns interview orchestration, entity lifecycle, observer invocation. Returns `AsyncIterable` for streaming | -| **domain event** | A typed event yielded by `conductTurn()` — `stream-start`, `thinking`, `text-delta`, `tool-call-start`, `tool-call-delta`, `tool-call-end`, `stream-end`, `turn-created`, `error`. Future: `observer-complete`, `phase-resolved`. Each adapter translates to its transport format (SSE, terminal, MCP) | +| **domain event** | A typed event yielded by `conductTurn()` — `stream-start`, `thinking`, `text-delta`, `tool-call-start`, `tool-call-delta`, `tool-call-end`, `stream-end`, `turn-created`, `error`, `observer-complete`. Future: `phase-resolved`. Each adapter translates to its transport format (SSE, terminal, MCP). `observer-complete` is emitted post-commit and drives cache coherence (D22) | | **decision graph** | The DAG of decisions and their dependencies (on prior decisions and assumptions). Revisiting a decision forks the turn tree | | **path exclusion** | Invalidation by moving HEAD so entities on the abandoned branch leave the active path. Lazy — computed by the active-path query, no eager writes. Triggered by `revisitDecision` / `branch` | | **flag propagation** | Invalidation by walking dependency graph edges and marking entities stale (nulling `reviewed_at`). Eager — triggered by `falsifyAssumption` or `updateRequirement` | | **soft invalidation** | Umbrella term for both path exclusion and flag propagation. Entities are flagged for re-review but never deleted or modified. See D17 | | **spec readiness** | Compound predicate: all four phases resolved AND requirements reviewed AND criteria confirmed. Only then is export enabled | +| **UIMessage** | AI SDK source of truth for UI state. `{ id, role, parts[], metadata? }`. Persisted for faithful resume. Reconstructed from stored `user_parts`/`assistant_parts` JSON on hydration. See D23 | +| **ModelMessage** | AI SDK representation optimized for LLM inference. Derived at call time by context builders (D25), never persisted. Leaner than `UIMessage` — no tool states, no reasoning, no custom data parts | +| **parts[]** | Ordered array of typed content blocks in a `UIMessage`. Built-in types: `text`, `reasoning`, `tool-{name}` (4 states), `file`. Custom types via Data Parts: `data-option-selection`, `data-confirmation`, `data-phase-summary`, etc. Source of truth for rendering. See D23, D24 | +| **Data Part** | Custom typed `UIMessage` part (`data-{name}`) defined via Zod schema. Enables structured user input (option selection, confirmation) and domain-specific assistant output (phase summary, observer result). Persisted in `parts[]` JSON. See D24 | +| **context builder** | A typed function that projects turn-tree + entity data into inference context for a specific consumer (interviewer, observer, phase judge). Reads from domain model, not from persisted parts. See D25 | +| **in-band sync** | Observer entity updates delivered as typed data parts on the existing `conductTurn()` SSE stream. Default mechanism — zero additional infrastructure (D22) | +| **out-of-band sync** | Observer entity updates delivered via a dedicated `EventSource` SSE channel (`/api/events/:projectId`). Fallback mechanism if observer becomes async (D22) | +| **cache invalidation** | Signaling TanStack Query that cached data is stale. Two forms: `queryClient.setQueryData` (push new data directly into cache) and `queryClient.invalidateQueries` (trigger background refetch). Driven by `observer-complete` events (D22) | ## Verification Design @@ -242,6 +268,8 @@ End-to-end slices must be **user-testable**, not just programmatically tested. E | Fast unit tests — SSE | `SDKMessage` → correct SSE event strings | I1, I3, I7 | ms | | Fast unit tests — DB | Turn persistence with phase provenance, entity writes with dependency edges | I5, I6, I9, I10, I11 | ms | | Fast unit tests — core | DomainEvent streaming, core/adapter separation, structured turn creation | I12, I13 | ms | +| Fast unit tests — parts | Parts round-trip (DomainEvents → assemble → persist JSON → load → hydrate); Data Part schema validation (Zod parse on structured user input); context builder output shape | I17, I18, I19 | ms | +| Fast unit tests — observer sync | `observer-complete` emitted post-commit with entity IDs matching DB state; SSE adapter encodes as typed data part | D22, A20 | ms | | Type-aware linting | Semantic static checks (oxlint + tsgolint) | All | ms | **Middle loop** (seconds–minutes): regression gates @@ -251,6 +279,7 @@ End-to-end slices must be **user-testable**, not just programmatically tested. E | Differential testing (observer) | Observer extraction meets ≥80% entity capture rate against golden master fixtures | A14 | seconds per fixture; requires Claude API | | Round-trip oracle (turn tree) | Structured turns → active path → entity resolution intact | I6, I9, I10 | ms | | Integration tests | SSE stream contains expected event types in order; DB lifecycle survives close/reopen | I2, I5, I13, I14 | seconds | +| Round-trip oracle (observer sync) | Full `conductTurn()` with observer → `observer-complete` is last event before `stream-end` → entity IDs in event match committed DB rows | D22 | seconds; requires Claude API | **Outer loop** (minutes–hours): human observer @@ -261,6 +290,7 @@ End-to-end slices must be **user-testable**, not just programmatically tested. E | Fixture capture from manual runs | Bootstrap golden master fixtures by querying DB after confirmed-good sessions | Human judgment + SQL query | | Rich chat rendering | Tool call states, reasoning collapse, message parts render by type | Human + `/cli-cdp` | | Resume test | Close/reopen browser, verify state intact | Human + browser | +| Observer → sidebar reactivity | `onData` → `setQueryData` bridge updates sidebar after observer extraction; validates A21 | Human + `/cli-cdp` (slice 6) | ### Observer History Projection @@ -282,6 +312,8 @@ This projection difference is a deliberate design choice, not an implementation | Cumulative entity graph integrity | Individual extractions may be correct but compose into an incoherent graph over 15-20 turns. No programmatic check for drift. | Debug mode (human eyeballs the growing graph). Future: structural property tests (no orphaned edges, no DAG cycles, monotonic entity count). | After observer slice lands and manual testing reveals graph-level issues. | | Phase transition UX | Summary quality, resolution timing, confirmation flow. Fully visual. | Manual testing during slices 7-10. | If phase transitions feel wrong during testing. | | Performance under realistic load | 20+ turns, growing history summaries, observer latency. No budget oracle. | Acceptable for single-user tool. | If latency becomes noticeable during manual testing. | +| `onData` stale-closure correctness | Client-side `useChat` `onData` → `queryClient.setQueryData` bridge cannot be tested in inner/middle loop (requires browser runtime). Known `onFinish` stale-closure bug (ai-sdk#550) may affect `onData`. | Manual outer-loop validation in slice 6; if broken, fall back to parallel `EventSource` (D22 Option 2). | If sidebar fails to update after observer extraction during manual testing. | +| Parts/scalar consistency | Persisted `assistant_parts` and scalar fields (`question`, `why`, `impact`, options) are two representations of the same turn content. No programmatic check that they agree. | Acceptable for initial delivery — scalars are written by MCP tool handler, parts assembled from stream. Both derive from the same `query()` call. Future: metamorphic oracle (text in parts matches scalars). | If turns appear correct in one view (parts-based UI) but wrong in another (scalar-based entity queries or export). | ### Current Coverage @@ -291,8 +323,9 @@ This projection difference is a deliberate design choice, not an implementation | ------------------- | ----- | --------------------------- | | sse-adapter.test.ts | 18 | I1, I3, I7 | | db.test.ts | 24 | I5, I6, I9, I10, I11 | -| app.test.ts | 15 | I2, I3, I6, I7, I13, I14 | +| app.test.ts | 17 | I2, I3, I6, I7, I13, I14 | | core.test.ts | 15 | I12, I13 | +| interview.test.ts | 16 | I16 | ## Acceptance Criteria (exit conditions) diff --git a/package-lock.json b/package-lock.json index b0d27266..6add99a0 100644 --- a/package-lock.json +++ b/package-lock.json @@ -14,7 +14,8 @@ "drizzle-orm": "^0.45.2", "express": "^5.2.1", "react": "^19.2.4", - "react-dom": "^19.2.4" + "react-dom": "^19.2.4", + "zod": "^4.3.6" }, "devDependencies": { "@types/better-sqlite3": "^7.6.13", diff --git a/package.json b/package.json index 99935f00..46589dc3 100644 --- a/package.json +++ b/package.json @@ -24,7 +24,8 @@ "drizzle-orm": "^0.45.2", "express": "^5.2.1", "react": "^19.2.4", - "react-dom": "^19.2.4" + "react-dom": "^19.2.4", + "zod": "^4.3.6" }, "devDependencies": { "@types/better-sqlite3": "^7.6.13", diff --git a/src/server/app.test.ts b/src/server/app.test.ts index 2589e430..673eb127 100644 --- a/src/server/app.test.ts +++ b/src/server/app.test.ts @@ -7,6 +7,13 @@ import type { DB } from './db.js'; const mockQuery = vi.fn(); vi.mock('@anthropic-ai/claude-agent-sdk', () => ({ query: mockQuery, + createSdkMcpServer: () => ({ name: 'interview', instance: {} }), + tool: (name: string, desc: string, schema: any, handler: any) => ({ + name, + description: desc, + inputSchema: schema, + handler, + }), })); // Import app factory after mocking diff --git a/src/server/core.test.ts b/src/server/core.test.ts index c54bbdb5..7602138e 100644 --- a/src/server/core.test.ts +++ b/src/server/core.test.ts @@ -6,6 +6,13 @@ import type { DB } from './db.js'; const mockQuery = vi.fn(); vi.mock('@anthropic-ai/claude-agent-sdk', () => ({ query: mockQuery, + createSdkMcpServer: () => ({ name: 'interview', instance: {} }), + tool: (name: string, desc: string, schema: any, handler: any) => ({ + name, + description: desc, + inputSchema: schema, + handler, + }), })); const { conductTurn, extractPrompt, formatHistory } = await import('./core.js'); @@ -52,8 +59,8 @@ describe('formatHistory', () => { it('formats turns into conversation history', () => { const turns = [{ answer: 'Hi', question: 'Hello back' }] as any[]; const result = formatHistory(turns, 'next'); - expect(result).toContain('User: Hi'); - expect(result).toContain('Assistant: Hello back'); + expect(result).toContain('Answer: Hi'); + expect(result).toContain('Question: Hello back'); expect(result).toContain('User: next'); }); }); diff --git a/src/server/core.ts b/src/server/core.ts index d1da2fed..ae823d98 100644 --- a/src/server/core.ts +++ b/src/server/core.ts @@ -3,6 +3,8 @@ import { query } from '@anthropic-ai/claude-agent-sdk'; import { getProject, getActivePath, + getOptionsForTurn, + getTurn, createTurn, updateTurn, advanceHead, @@ -12,6 +14,7 @@ import { type DB, type Project, } from './db.js'; +import { getSystemPrompt, createInterviewMcpServer } from './interview.js'; /** Domain events yielded by conductTurn(). Transport-agnostic. */ export type DomainEvent = @@ -39,13 +42,33 @@ export function extractPrompt(messages: unknown[]): string { ); } +/** Turn with optional options for richer history formatting. */ +export interface TurnWithOptions extends Turn { + options?: Array<{ content: string; is_recommended: boolean; is_selected: boolean }>; +} + /** Format conversation history from active-path turns for multi-turn context. */ -export function formatHistory(turns: Turn[], currentPrompt: string): string { +export function formatHistory(turns: TurnWithOptions[], currentPrompt: string): string { if (turns.length === 0) return currentPrompt; const lines: string[] = []; for (const turn of turns) { - if (turn.answer) lines.push(`User: ${turn.answer}`); - if (turn.question) lines.push(`Assistant: ${turn.question}`); + if (turn.question) { + let questionLine = `Question: ${turn.question}`; + if (turn.why) questionLine += `\n Why it matters: ${turn.why}`; + if (turn.impact) questionLine += `\n Impact: ${turn.impact}`; + if (turn.options?.length) { + const optionList = turn.options + .map((o, i) => { + const rec = o.is_recommended ? ' (recommended)' : ''; + const sel = o.is_selected ? ' [selected]' : ''; + return ` ${i + 1}. ${o.content}${rec}${sel}`; + }) + .join('\n'); + questionLine += `\n Options:\n${optionList}`; + } + lines.push(questionLine); + } + if (turn.answer) lines.push(`Answer: ${turn.answer}`); } if (lines.length === 0) return currentPrompt; return `Previous conversation:\n${lines.join('\n')}\n\n---\nUser: ${currentPrompt}`; @@ -71,14 +94,19 @@ export async function* conductTurn( db: DB, projectId: number, userMessage: string, + phase: Turn['phase'] = 'scope', ): AsyncGenerator { const project = getProject(db, projectId); if (!project) throw new Error(`Project ${projectId} not found`); - const activePath = getActivePath(db, projectId); + const rawActivePath = getActivePath(db, projectId); + const activePath = rawActivePath.map((t) => ({ + ...t, + options: getOptionsForTurn(db, t.id), + })); const turn = createTurn(db, projectId, { parent_turn_id: project.active_turn_id, - phase: 'scope', + phase, question: '', answer: userMessage, }); @@ -89,6 +117,9 @@ export async function* conductTurn( let assistantText = ''; let errored = false; + // Create per-turn MCP server — tool handler persists structured data via closure + const interviewServer = createInterviewMcpServer(db, turn.id); + try { const stream = query({ prompt: fullPrompt, @@ -96,7 +127,8 @@ export async function* conductTurn( model: process.env.ANTHROPIC_MODEL || 'claude-sonnet-4-20250514', maxTurns: 1, includePartialMessages: true, - systemPrompt: 'You are a helpful assistant.', + systemPrompt: getSystemPrompt(phase), + mcpServers: { interview: interviewServer }, }, }); @@ -163,7 +195,9 @@ export async function* conductTurn( } if (!errored) { - if (assistantText) { + // Only persist raw text if no structured question was set via MCP tool handler + const currentTurn = getTurn(db, turn.id); + if (assistantText && (!currentTurn?.question || currentTurn.question === '')) { updateTurn(db, turn.id, { question: assistantText }); } advanceHead(db, projectId, turn.id); diff --git a/src/server/db.ts b/src/server/db.ts index a4974a04..fa90f222 100644 --- a/src/server/db.ts +++ b/src/server/db.ts @@ -58,6 +58,10 @@ export function getProject(db: DB, id: number): Project | undefined { return db.select().from(schema.project).where(eq(schema.project.id, id)).get() as Project | undefined; } +export function getTurn(db: DB, turnId: number): Turn | undefined { + return db.select().from(schema.turn).where(eq(schema.turn.id, turnId)).get() as Turn | undefined; +} + export function createTurn(db: DB, projectId: number, input: CreateTurnInput): Turn { const result = db .insert(schema.turn) @@ -76,11 +80,26 @@ export function createTurn(db: DB, projectId: number, input: CreateTurnInput): T return result as Turn; } -export function updateTurn(db: DB, turnId: number, updates: { question?: string; answer?: string }): void { - if (updates.question === undefined && updates.answer === undefined) return; - const values: Record = {}; +export interface UpdateTurnInput { + question?: string; + answer?: string; + why?: string | null; + impact?: Impact | null; +} + +export function updateTurn(db: DB, turnId: number, updates: UpdateTurnInput): void { + if ( + updates.question === undefined && + updates.answer === undefined && + updates.why === undefined && + updates.impact === undefined + ) + return; + const values: Record = {}; if (updates.question !== undefined) values.question = updates.question; if (updates.answer !== undefined) values.answer = updates.answer; + if (updates.why !== undefined) values.why = updates.why; + if (updates.impact !== undefined) values.impact = updates.impact; db.update(schema.turn).set(values).where(eq(schema.turn.id, turnId)).run(); } @@ -119,6 +138,25 @@ export function getActivePath(db: DB, projectId: number): Turn[] { return rows as Turn[]; } +export function getOptionsForTurn(db: DB, turnId: number): Option[] { + return db + .select() + .from(schema.option) + .where(eq(schema.option.turn_id, turnId)) + .orderBy(schema.option.position) + .all() as Option[]; +} + +export function selectOption(db: DB, turnId: number, position: number): void { + // Clear any previous selection for this turn + db.update(schema.option).set({ is_selected: false }).where(eq(schema.option.turn_id, turnId)).run(); + // Select the chosen option + db.update(schema.option) + .set({ is_selected: true }) + .where(sql`${schema.option.turn_id} = ${turnId} AND ${schema.option.position} = ${position}`) + .run(); +} + export function advanceHead(db: DB, projectId: number, turnId: number): void { db.update(schema.project) .set({ active_turn_id: turnId, updated_at: sql`datetime('now')` }) diff --git a/src/server/interview.test.ts b/src/server/interview.test.ts new file mode 100644 index 00000000..05f4b537 --- /dev/null +++ b/src/server/interview.test.ts @@ -0,0 +1,342 @@ +import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest'; + +import type { DB, Turn } from './db.js'; +import { structuredQuestionSchema, getSystemPrompt, createInterviewMcpServer } from './interview.js'; +import type { StructuredQuestion } from './interview.js'; + +// Mock the Claude Agent SDK — hoisted, so no local variable references in factory +const { mockQuery, mockCreateSdkMcpServer } = vi.hoisted(() => ({ + mockQuery: vi.fn(), + mockCreateSdkMcpServer: vi.fn().mockReturnValue({ name: 'interview', instance: {} }), +})); +vi.mock('@anthropic-ai/claude-agent-sdk', () => ({ + query: mockQuery, + createSdkMcpServer: (...args: any[]) => mockCreateSdkMcpServer(...args), + tool: (name: string, desc: string, schema: any, handler: any) => ({ + name, + description: desc, + inputSchema: schema, + handler, + }), +})); + +const { conductTurn, formatHistory } = await import('./core.js'); +const { createDb, getOrCreateProject, createTurn } = await import('./db.js'); + +let db: DB; + +beforeEach(() => { + mockQuery.mockReset(); + mockCreateSdkMcpServer.mockClear(); + db = createDb(); +}); + +afterEach(() => { + db.$client.close(); +}); + +/** Create a mock async generator of SDK messages */ +async function* makeMockStream(messages: Record[]) { + for (const msg of messages) { + yield msg; + } +} + +// --- Acceptance criterion: structured-turn-schema --- + +describe('structuredQuestionSchema', () => { + it('parses a valid structured question', () => { + const valid: StructuredQuestion = { + question: 'What is the primary goal of your project?', + why: 'Understanding the goal shapes all downstream decisions.', + impact: 'high', + options: [ + { content: 'Build a new product from scratch', is_recommended: false }, + { content: 'Improve an existing product', is_recommended: true }, + ], + }; + expect(structuredQuestionSchema.parse(valid)).toEqual(valid); + }); + + it('rejects a question with no options', () => { + const invalid = { + question: 'What?', + why: 'Because.', + impact: 'high', + options: [], + }; + expect(() => structuredQuestionSchema.parse(invalid)).toThrow(); + }); + + it('rejects a question with only one option', () => { + const invalid = { + question: 'What?', + why: 'Because.', + impact: 'high', + options: [{ content: 'Only one', is_recommended: false }], + }; + expect(() => structuredQuestionSchema.parse(invalid)).toThrow(); + }); + + it('rejects a question with empty question text', () => { + const invalid = { + question: '', + why: 'Because.', + impact: 'high', + options: [ + { content: 'A', is_recommended: false }, + { content: 'B', is_recommended: false }, + ], + }; + expect(() => structuredQuestionSchema.parse(invalid)).toThrow(); + }); + + it('rejects an invalid impact level', () => { + const invalid = { + question: 'What?', + why: 'Because.', + impact: 'critical', + options: [ + { content: 'A', is_recommended: false }, + { content: 'B', is_recommended: false }, + ], + }; + expect(() => structuredQuestionSchema.parse(invalid)).toThrow(); + }); +}); + +// --- Acceptance criterion: scope-system-prompt --- + +describe('getSystemPrompt', () => { + it('returns a scope-specific system prompt', () => { + const prompt = getSystemPrompt('scope'); + expect(prompt).toContain('scope'); + expect(prompt).not.toBe('You are a helpful assistant.'); + expect(prompt.length).toBeGreaterThan(50); + }); + + it('returns different prompts for different phases', () => { + const scope = getSystemPrompt('scope'); + const design = getSystemPrompt('design'); + expect(scope).not.toBe(design); + }); +}); + +// --- Acceptance criterion: tool definition --- + +describe('createInterviewMcpServer', () => { + it('creates an MCP server with an ask_question tool', () => { + const project = getOrCreateProject(db); + const turn = createTurn(db, project.id, { phase: 'scope', question: '' }); + + createInterviewMcpServer(db, turn.id); + + expect(mockCreateSdkMcpServer).toHaveBeenCalledOnce(); + const opts = mockCreateSdkMcpServer.mock.calls[0][0]; + expect(opts.name).toBe('interview'); + expect(opts.tools).toHaveLength(1); + expect(opts.tools[0].name).toBe('ask_question'); + }); + + it('tool handler persists structured data to the turn', async () => { + const { getOptionsForTurn } = await import('./db.js'); + const project = getOrCreateProject(db); + const turn = createTurn(db, project.id, { phase: 'scope', question: '' }); + + createInterviewMcpServer(db, turn.id); + + // Extract and invoke the handler from the mock call + const toolDef = mockCreateSdkMcpServer.mock.calls[0][0].tools[0]; + const result = await toolDef.handler({ + question: 'What is the primary goal?', + why: 'Understanding the goal shapes all downstream decisions.', + impact: 'high', + options: [ + { content: 'Build a new product', is_recommended: false }, + { content: 'Improve an existing product', is_recommended: true }, + ], + }); + + // Verify persistence — read turn directly (HEAD not advanced) + const { eq } = await import('drizzle-orm'); + const { turn: turnTable } = await import('./schema.js'); + const updatedTurn = db.select().from(turnTable).where(eq(turnTable.id, turn.id)).get(); + expect(updatedTurn?.question).toBe('What is the primary goal?'); + expect(updatedTurn?.why).toBe('Understanding the goal shapes all downstream decisions.'); + expect(updatedTurn?.impact).toBe('high'); + + const options = getOptionsForTurn(db, turn.id); + expect(options).toHaveLength(2); + expect(options[0].content).toBe('Build a new product'); + expect(options[1].content).toBe('Improve an existing product'); + expect(options[1].is_recommended).toBe(true); + + // Tool returns a result + expect(result.content[0].text).toBe('Question presented to user.'); + }); +}); + +// --- Acceptance criterion: conductTurn uses interview config --- + +describe('conductTurn with interview config', () => { + function mockMinimalStream() { + return makeMockStream([ + { type: 'stream_event', event: { type: 'message_start', message: { id: 'msg-1' } } }, + { + type: 'stream_event', + event: { + type: 'content_block_delta', + index: 0, + delta: { type: 'text_delta', text: 'Hello' }, + }, + }, + { type: 'stream_event', event: { type: 'message_stop' } }, + ]); + } + + it('passes scope system prompt to SDK query', async () => { + mockQuery.mockReturnValue(mockMinimalStream()); + + const project = getOrCreateProject(db); + for await (const _ of conductTurn(db, project.id, 'hello')) { + /* consume */ + } + + expect(mockQuery).toHaveBeenCalledOnce(); + const callArgs = mockQuery.mock.calls[0][0]; + expect(callArgs.options.systemPrompt).toContain('scope'); + expect(callArgs.options.systemPrompt).not.toBe('You are a helpful assistant.'); + }); + + it('passes interview MCP server to SDK query', async () => { + mockQuery.mockReturnValue(mockMinimalStream()); + + const project = getOrCreateProject(db); + for await (const _ of conductTurn(db, project.id, 'hello')) { + /* consume */ + } + + expect(mockQuery).toHaveBeenCalledOnce(); + const callArgs = mockQuery.mock.calls[0][0]; + expect(callArgs.options.mcpServers).toBeDefined(); + expect(callArgs.options.mcpServers.interview).toBeDefined(); + }); +}); + +// --- Acceptance criterion: history-includes-structure --- + +describe('formatHistory with structured turns', () => { + it('includes grounding and impact in history', () => { + const turns = [ + { + question: 'What is the primary goal?', + answer: 'Build a new product', + why: 'Understanding the goal shapes all downstream decisions.', + impact: 'high', + }, + ] as Turn[]; + const result = formatHistory(turns, 'next question'); + expect(result).toContain('Build a new product'); + expect(result).toContain('What is the primary goal?'); + expect(result).toContain('Understanding the goal'); + expect(result).toContain('Impact: high'); + }); + + it('includes options with recommendation and selection markers', () => { + const turns = [ + { + question: 'What is the primary goal?', + answer: 'Build a new product', + why: 'Shapes downstream decisions.', + impact: 'high', + options: [ + { content: 'Build a new product', is_recommended: false, is_selected: true }, + { content: 'Improve an existing product', is_recommended: true, is_selected: false }, + ], + }, + ] as any[]; + const result = formatHistory(turns, 'next'); + expect(result).toContain('Build a new product'); + expect(result).toContain('[selected]'); + expect(result).toContain('(recommended)'); + }); +}); + +// --- Round-trip oracle: structured turn → persist → active path --- + +describe('round-trip: structured turn persistence', () => { + it('persisted structured turn is retrievable via active path', async () => { + const { createOption, getOptionsForTurn, advanceHead: advance } = await import('./db.js'); + const project = getOrCreateProject(db); + const turn = createTurn(db, project.id, { + phase: 'scope', + question: 'What is the primary goal?', + why: 'Understanding the goal shapes all downstream decisions.', + impact: 'high', + answer: 'Build a new product', + }); + createOption(db, turn.id, { + position: 0, + content: 'Build a new product', + is_recommended: false, + is_selected: true, + }); + createOption(db, turn.id, { position: 1, content: 'Improve existing', is_recommended: true }); + advance(db, project.id, turn.id); + + const { getActivePath } = await import('./db.js'); + const turns = getActivePath(db, project.id); + expect(turns).toHaveLength(1); + expect(turns[0].question).toBe('What is the primary goal?'); + expect(turns[0].why).toBe('Understanding the goal shapes all downstream decisions.'); + expect(turns[0].impact).toBe('high'); + expect(turns[0].phase).toBe('scope'); + expect(turns[0].answer).toBe('Build a new product'); + + const options = getOptionsForTurn(db, turns[0].id); + expect(options).toHaveLength(2); + expect(options[0].is_selected).toBe(true); + expect(options[1].is_recommended).toBe(true); + }); +}); + +// --- Acceptance criterion: option-selection (DB layer) --- + +describe('option selection persistence', () => { + it('getOptionsForTurn returns options for a turn', async () => { + const { createOption, getOptionsForTurn } = await import('./db.js'); + const project = getOrCreateProject(db); + const turn = createTurn(db, project.id, { + phase: 'scope', + question: 'What?', + answer: 'Something', + }); + + createOption(db, turn.id, { position: 0, content: 'Option A', is_recommended: true }); + createOption(db, turn.id, { position: 1, content: 'Option B' }); + + const options = getOptionsForTurn(db, turn.id); + expect(options).toHaveLength(2); + expect(options[0].position).toBe(0); + expect(options[1].position).toBe(1); + }); + + it('selectOption marks an option as selected', async () => { + const { createOption, selectOption, getOptionsForTurn } = await import('./db.js'); + const project = getOrCreateProject(db); + const turn = createTurn(db, project.id, { + phase: 'scope', + question: 'What?', + answer: 'Something', + }); + + createOption(db, turn.id, { position: 0, content: 'Option A' }); + createOption(db, turn.id, { position: 1, content: 'Option B' }); + + selectOption(db, turn.id, 1); + + const options = getOptionsForTurn(db, turn.id); + expect(options[0].is_selected).toBe(false); + expect(options[1].is_selected).toBe(true); + }); +}); diff --git a/src/server/interview.ts b/src/server/interview.ts new file mode 100644 index 00000000..eb3b9a1d --- /dev/null +++ b/src/server/interview.ts @@ -0,0 +1,105 @@ +import { createSdkMcpServer, tool } from '@anthropic-ai/claude-agent-sdk'; +/** + * Interview module — structured question schema, phase prompts, and MCP tool server. + * + * Pure domain: structuredQuestionSchema, getSystemPrompt, SYSTEM_PROMPTS. + * Shell boundary: createInterviewMcpServer — the tool handler captures db + turnId + * via closure and persists structured data when the agent uses ask_question. + */ +import { z } from 'zod'; + +import { createOption, updateTurn, type DB, type Impact, type Phase } from './db.js'; + +/** Zod schema for the ask_question tool output. */ +export const structuredQuestionSchema = z.object({ + question: z.string().min(1), + why: z.string().min(1), + impact: z.enum(['high', 'medium', 'low']), + options: z + .array( + z.object({ + content: z.string().min(1), + is_recommended: z.boolean(), + }), + ) + .min(2), +}); + +export type StructuredQuestion = z.infer; + +const SYSTEM_PROMPTS: Record = { + scope: `You are a spec elicitation interviewer conducting the SCOPE phase. + +Your job is to understand the user's project goal, target audience, and high-level constraints through structured questions. Work from broad framing questions toward specific scope boundaries. + +For every turn, you MUST use the ask_question tool to generate your question. Never respond with plain text — always use the tool. + +Each question should: +- Be clear and specific, not vague or open-ended +- Include 2-4 options that represent meaningfully different directions +- Mark exactly one option as recommended based on what you know so far +- Include a "why" field explaining why this question matters for the spec +- Include an impact level (high/medium/low) reflecting how much this decision affects downstream choices + +Ask one question at a time. Build on previous answers to go deeper.`, + + design: `You are a spec elicitation interviewer conducting the DESIGN phase. + +Your job is to walk the design decision tree — exploring architectural choices, module boundaries, data models, and integration points. Each question drills into a branch of the design space. + +For every turn, you MUST use the ask_question tool. Never respond with plain text. + +Each question should present meaningfully different design alternatives with clear tradeoffs in the options.`, + + requirements: `You are a spec elicitation interviewer conducting the REQUIREMENTS REVIEW phase. + +Your job is to walk the accumulated requirements, check for gaps, suggest additions, and confirm completeness. Present requirements for the user to confirm, modify, or flag as missing. + +For every turn, you MUST use the ask_question tool. Never respond with plain text.`, + + criteria: `You are a spec elicitation interviewer conducting the CRITERIA phase. + +Your job is to propose testable acceptance criteria for each confirmed requirement. Criteria should be specific, observable, and verifiable. + +For every turn, you MUST use the ask_question tool. Never respond with plain text.`, +}; + +/** Phase-specific system prompts. */ +export function getSystemPrompt(phase: Phase): string { + return SYSTEM_PROMPTS[phase]; +} + +/** + * Create an in-process MCP server with the ask_question tool. + * The tool handler persists structured data to the given turn. + */ +export function createInterviewMcpServer(db: DB, turnId: number) { + return createSdkMcpServer({ + name: 'interview', + tools: [ + tool( + 'ask_question', + 'Ask the user a structured interview question with options, strategic grounding, and impact signal.', + structuredQuestionSchema.shape, + async (args) => { + // Persist structured data to the turn + updateTurn(db, turnId, { + question: args.question, + why: args.why, + impact: args.impact as Impact, + }); + for (let i = 0; i < args.options.length; i++) { + createOption(db, turnId, { + position: i, + content: args.options[i].content, + is_recommended: args.options[i].is_recommended, + }); + } + return { + content: [{ type: 'text' as const, text: 'Question presented to user.' }], + }; + }, + ), + ], + }); +}