diff --git a/docs/design/AGENT_MUTATION_SURFACE.md b/docs/design/AGENT_MUTATION_SURFACE.md new file mode 100644 index 00000000..f79e1b30 --- /dev/null +++ b/docs/design/AGENT_MUTATION_SURFACE.md @@ -0,0 +1,243 @@ +# Agent Mutation Surface Audit + +Status: FE-698 design audit, 2026-05-07. + +## Purpose + +Requirement 42 and D143 establish a hard boundary: durable Brunch data mutations initiated by agents must enter through Brunch-owned handlers, not direct ORM access or harness-specific tool implementations. This document inventories the current mutation paths that are agent-originated or agent-adjacent, names the semantic operations behind them, and identifies holes before implementing an agent capability / mutation-surface registry. + +This is a boundary map, not a registry implementation. + +## Terms used here + +- **Agent-originated**: an LLM/tool loop chooses content or an action that causes a durable write. +- **Agent-adjacent**: a user action, route, or runtime step persists agent-produced artifacts or operations intended to become agent-addressable later. +- **Authority class**: + - `read_only`: no durable mutation. + - `provisional_artifact`: durable or replayed context that is not accepted graph truth. + - `proposal_only`: model/user proposes a change, but separate acceptance owns truth. + - `commit_truth`: writes durable semantic or workflow truth. + - `commit_process_debt`: writes obligations such as reconciliation needs. + - `runtime_replay`: writes replay/status artifacts tied to an existing durable unit. +- **Boundary quality**: + - `strong`: named application handler/transition owns validation and write semantics. + - `mixed`: some semantic grouping exists, but DB helpers remain exposed at agent/tool call sites. + - `thin`: route or agent code directly orchestrates DB helper calls. + - `missing`: projected capability has no handler yet. + +## Current mutation inventory + +| Area | Current entry points | Initiator | Tables touched | Semantic operation | Authority | Boundary quality | Notes | +| -------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------- | ---------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Specification creation | `createNewSpecification()` in `src/server/core.ts`; `POST /api/specifications` in `src/server/app.ts`; `createSpecification()` in `src/server/db.ts` | user/system | `specification` | Create specification workspace record | `commit_truth` | `strong` | Not agent-originated today, but future CLI/TUI harnesses may need it. Keep as a product handler, not a raw DB tool. | +| Phase entry / projected start | `submitPhaseIntentWithRuntimeCompatibility()` via `POST /api/specifications/:id/phase-intent`; chat route `phase-entry` command via `applyChatRouteTransition()` | user/system, future harness | `turn`, `specification.active_turn_id` after finalization | Start or continue a workflow phase by creating a successor frontier turn | `commit_truth` | `strong` | Good candidate for a future mutation-surface contract because route code delegates to runtime/transition helpers. | +| Chat continuation / answering frontier | `applyChatRouteTransition()` in `src/server/chat-route-transition.ts`; `prepareTurn()`, `resolveTurn()`, `prepareSuccessorTurn()`, `finalizeTurn()` in `src/server/core.ts` | user message plus interviewer runtime | `turn`, `specification.active_turn_id`, possibly `phase_outcome` supersession | Resolve current turn, advance interview head, create next frontier | `commit_truth` | `strong` | This is the main workflow-write seam today. Future agents should call a handler with chat-command semantics, not `createTurn()` / `advanceHead()` directly. | +| Interviewer question persistence | AI SDK tool `ask_question` from `createAskQuestionTool()`; `persistStructuredQuestion()` in `src/server/interview.ts` | internal interviewer agent | `turn`, `option` | Populate prepared assistant turn with question, rationale, impact, options, and review metadata | `commit_truth` / `proposal_only` for review-set content until accepted | `mixed` | Agent tool execution directly calls persistence helper. Semantics are named, but this should become a Brunch-owned mutation handler before exposing interviewer-like tools to external harnesses. | +| Interviewer preface presentation | AI SDK tool `present_preface`; `materializeTurnArtifacts()` in `src/server/turn-artifacts.ts`; `updateTurn()` in `app.ts` on stream finish | internal interviewer agent | `turn.assistant_parts` | Persist provisional context/preface and activity artifacts for replay | `provisional_artifact` / `runtime_replay` | `mixed` | Tool itself returns success only; durable write happens later from response artifacts. Future contract should preserve the rule that prefaces do not directly mutate graph truth. | +| Phase closure proposal | AI SDK tool `propose_phase_closure`; `createPhaseOutcome()` in `src/server/interview.ts` | internal interviewer agent | `phase_outcome` | Propose closing grounding/design for user confirmation | `proposal_only` | `mixed` | Current tool writes proposed workflow state directly. Future mutation surface should expose `phase.proposeClosure`, with confirmation as a separate handler. | +| Phase closure confirmation | `applyChatRouteTransition()` confirm branch; `confirmPhaseOutcome()`, `finalizeTurn()` | user, future harness | `turn`, `phase_outcome`, `specification.active_turn_id` | Accept interviewer-proposed phase closure | `commit_truth` | `strong` | Already a coherent transition handler. Good model for future agent mutation handlers. | +| Forced phase closure | `applyChatRouteTransition()` force-close branch; `createConfirmedPhaseOutcome()` | user, future harness | `turn`, `phase_outcome`, `specification.active_turn_id` | Close phase without interviewer recommendation | `commit_truth` | `strong` | User-authority only today. External agents should not get this by default; if exposed, authority class must remain explicit. | +| Structured response submission | `submitTurnResponseTransition()` via `POST /turns/:turnId/response` | user, future harness | `option`, `specification.mode`, `turn`, possibly `knowledge_item`, `turn_knowledge_item`, `knowledge_edge`, `phase_outcome` | Persist selected options/free text, grounding mode, and accepted review decisions | `commit_truth` | `strong` | Good existing product handler. It also materializes review truth on accept, so future tools should not bypass it by writing accepted requirements/criteria directly. | +| Requirements/criteria review materialization | `materializeAcceptedRequirementsReviewSet()`, `materializeAcceptedCriteriaReviewSet()` from `submitTurnResponseTransition()` | user acceptance of agent-generated review set | `knowledge_item`, `turn_knowledge_item`, `knowledge_edge`, `phase_outcome` | Convert accepted review set into durable requirements/criteria and grounding edges | `commit_truth` | `strong` when reached through response transition; `thin` if called directly | The semantic operation is acceptance-gated materialization. Future agents may propose review sets but must not commit them without acceptance. | +| Observer capture | `runObserver()` via `ensureObserverCapture()` / `POST /observer-capture` and trailing runtime capture | internal observer agent | `knowledge_item`, `turn_knowledge_item`, `knowledge_edge` | Extract intent items and supported intent edges from validated turns | `commit_truth` for captured intent-graph truth | `mixed` | Agent runtime directly creates intent items and edges through DB helpers (`knowledge_item` / `knowledge_edge` today). This is the most important current agent-originated write surface to wrap before external harness access. | +| Observer result attachment / replay | `ensureObserverCapture()` in `src/server/app.ts`; observer result data parts on originating turn | internal observer runtime | `turn.assistant_parts` plus graph tables via `runObserver()` | Attach observer status/results to originating turn for replay | `runtime_replay` plus `commit_truth` | `mixed` | Needs separation between graph mutation handler and replay/status handler. Current endpoint dedupes runtime execution but not a future general mutation contract. | +| Intent item edit | `handlePatchKnowledgeItem()` in `src/server/edit-route.ts`; `updateKnowledgeItemContent()` | user graph edit today; future agent proposal/commit | `knowledge_item` | Edit accepted intent-item content after impact classification | `commit_truth` when soft; `proposal_only` when hard impact | `thin` | Route handler owns policy directly. Future agents must not call `updateKnowledgeItemContent()`; this should become a named mutation handler with reconciliation semantics. | +| Knowledge edge create/delete | `handleCreateKnowledgeEdge()`, `handleDeleteKnowledgeEdge()`; `addKnowledgeRelationship()`, `removeKnowledgeRelationship()` | user graph edit today; observer agent for create only | `knowledge_edge` | Add or remove semantic relationship | `commit_truth` | `thin` for route; `mixed` for observer | Edge writes need a single semantic handler that applies relation policy, provenance, support/status, and future reconciliation behavior. | +| Edge validation | `handleValidateKnowledgeEdge()` | user/UI, future agent/harness | none | Check relation policy before edge mutation | `read_only` | `strong enough` | Should become a read-only capability contract available to probes/harnesses. | +| Annotation create/delete | `handleCreateAnnotation()`, `handleDeleteAnnotation()` in `src/server/annotation-route.ts` | user side-chat/selection surface today; future agent notes possible | `annotation` | Attach or remove human annotation anchored to intent item/span | `commit_truth` but commentary, not intent-graph truth | `thin` | User-authored today. If agents can annotate, authority should likely be `proposal_only` or visibly agent-authored. | +| Side-chat response | `handleSideChatRequest()` in `src/server/side-chat-route.ts` | side-chat assistant agent | none durable today | Generate refinement discussion around pinned graph item | `read_only` / non-durable | `strong enough` | It does not persist chat messages today. Future multi-chat substrate will convert this into durable chat turns and likely graph proposals. | +| Workspace exploration tools | `src/server/tools/*` via interviewer tool set | internal interviewer agent | none durable directly | Read files, grep, find, list directory, optionally present preface | `read_only` plus provisional preface artifact | `strong enough` for read-only | These are harness-like tools already. They should be adapted from read-only capability contracts if exposed to CLI/TUI/Pi. | +| Scenario runner artifacts | `src/server/scenario-runner.ts` | developer/probe harness | none durable today | Capture rendered prompt/context/model/output placeholders | `read_only` / artifact outside product state | `strong enough` | Future artifact persistence should use a schema and remain outside product truth unless explicitly imported. | + +## Functional set vs semantic set + +Current code exposes many low-level DB helpers (`createTurn`, `updateTurn`, `advanceHead`, `createKnowledgeItem`, `addKnowledgeRelationship`, etc.). These are functional primitives, not agent-addressable operations. The mutation surface should instead expose semantic handlers such as: + +| Semantic operation | Current functional primitives | Notes for future handler | +| ----------------------------------------------------------------- | ---------------------------------------------------------- | ------------------------------------------------------------------------------- | +| `workflow.startPhase` | `prepareSuccessorTurn`, `createTurn` | Must check landing/runtime availability and active path. | +| `workflow.answerFrontier` | `resolveTurn`, `finalizeTurn`, `prepareSuccessorTurn` | Must preserve turn lineage and observer-capture scheduling. | +| `interviewer.persistQuestion` | `persistStructuredQuestion`, `createOption`, `updateTurn` | Agent-originated but production-internal today. | +| `workflow.proposePhaseClosure` | `createPhaseOutcome` | Proposal-only; separate from confirmation. | +| `workflow.confirmPhaseClosure` | `confirmPhaseOutcome`, `finalizeTurn` | User/harness authority gate. | +| `review.submitResponse` | `submitTurnResponseTransition` | Already a good handler; accepts review sets only through user action. | +| `observer.captureTurnIntent` | `runObserver`, `createKnowledgeItem`, relationship helpers | Should split model execution from intent-graph write application eventually. | +| `changeset.submit` with `intentItem.updateContent` | `handlePatchKnowledgeItem`, `updateKnowledgeItemContent` | Needs reconciliation/changeset-ledger integration before external agent writes. | +| `changeset.submit` with `intentEdge.create` / `intentEdge.delete` | edge route handlers, relationship helpers | Needs relation support/status/provenance semantics. | +| `annotation.create` / `annotation.delete` | annotation route handlers | Not core graph truth, but still durable state. | +| `workspace.read*` | `src/server/tools/*` | Read-only capability family; useful first adapter target. | + +## Holes and pressure points + +1. **Observer graph writes are agent-originated and still DB-helper-shaped.** `runObserver()` directly creates items and edges. Before external or scenario-driven agents can write graph truth, this should become an application handler that accepts validated observer output and applies graph mutations with provenance and relation policy. + +2. **Interviewer tools are already tools, but not Brunch capability contracts.** `ask_question` and `propose_phase_closure` are AI SDK tools whose `execute` functions write durable state. This is acceptable internally, but external harnesses should not copy those tool definitions; they should adapt Brunch-owned handlers. + +3. **Graph edit routes are product handlers but not agent-safe mutation contracts.** They validate IDs and relation support, but they do not yet create reconciliation needs, changeset history, support/status metadata, or agent provenance. They should be considered UI handlers awaiting migration into a mutation surface. + +4. **Review acceptance is a strong existing pattern.** `submitTurnResponseTransition()` shows the desired shape: one semantic handler validates a user action and materializes durable truth. Future proposal-generating agents should feed this kind of acceptance path rather than directly creating requirements/criteria. + +5. **Read-only tools are safe first registry candidates.** Workspace read/grep/find/list and relation validation can prove registry/adapters without mutation authority risk. + +6. **No durable side-chat substrate yet.** Side-chat is currently SSE-only for assistant output. Multi-chat will create new durable chat mutation needs; those should be designed through the same mutation surface rather than as side-chat-specific tools. + +7. **Scenario runner has no tool/capability inventory.** Probe artifacts should eventually record available capability ids and authority classes even when execution is not run, so prompt reviews can see what the agent was allowed to do. + +## Operation nomenclature + +The candidate capability registry should use product-operation names, not implementation-lineage names. Current functions such as `createAskQuestionTool`, `applyChatRouteTransition`, and `submitTurnResponseTransition` describe how the code got here: AI SDK tool creation, Express chat-route plumbing, or transition helper extraction. Canonical capability ids should instead name the durable product noun being acted on and the semantic verb being requested. + +### Canonical nouns + +Use these nouns for operation ids and handler names unless a later spec decision renames the underlying product entity: + +- `specification` — the workspace-scoped intent-spec container. +- `chat` — a durable conversation container below a specification once multi-chat lands. +- `turn` — a branch-bearing conversational lineage node. +- `phase` — workflow phase state and phase-outcome decisions. +- `intentItem` — a durable typed claim in the intent graph. Current storage is `knowledge_item`; new operation vocabulary should not inherit that table name. +- `intentEdge` — a durable semantic relation in the intent graph. Current storage is `knowledge_edge`. +- `reviewSet` — interviewer-generated requirements/criteria set awaiting user action. +- `annotation` — durable commentary anchored to an intent item/span. +- `changeset` — one semantic mutation bundle in the future changeset ledger. +- `change` — one atomic semantic mutation inside a changeset. +- `reconciliationNeed` — process debt saying existing truth may need renewed judgment. +- `workspace` — read-only project filesystem context. +- `scenario` — pre-UI prompt/context probe execution or artifact capture. + +Use `changeset` / `change` as canonical future schema and operation vocabulary. `patch` remains a historical design-doc synonym only. + +### Canonical verbs + +Use verbs by authority level: + +| Authority level | Preferred verbs | Notes | +| --- | --- | --- | +| Read-only | `get`, `list`, `query`, `render`, `validate` | No durable mutation. | +| Provisional/generated | `draft`, `propose`, `capture`, `render` | Produces candidate or replayable artifact, not accepted truth by itself. | +| User/handler submission | `submit` | Entry point for a caller request that may validate, route, or produce a proposal. | +| Durable transition | `apply`, `accept`, `reject`, `supersede`, `resolve`, `close`, `advance` | Changes durable product truth or process state. | +| Persistence primitive | `insert`, `update`, `delete` | Keep inside DB/repository helpers; do not expose as agent capability verbs. | + +Rule of thumb: agent-addressable operations should almost never be named `create`, `update`, or `delete`. Those are persistence verbs. Capability ids should name product intent: `turn.answer`, `phase.proposeClosure`, `reviewSet.accept`, `changeset.submit`. + +### Operation id grammar + +Use dotted ids: + +```text +. +``` + +Examples: + +```text +specification.create +specification.get +specification.export +chat.start +chat.submitMessage +turn.answer +turn.attachQuestion +turn.attachArtifact +phase.proposeClosure +phase.confirmClosure +phase.forceClose +reviewSet.submitResponse +reviewSet.accept +observer.captureTurnIntent +observer.applyCapture +intentGraph.query +intentGraph.validateEdge +changeset.submit +changeset.apply +changeset.reject +reconciliationNeed.list +reconciliationNeed.proposeResolution +reconciliationNeed.applyResolution +workspace.readFile +workspace.search +scenario.render +scenario.captureArtifact +``` + +Adapter-specific tool names may differ to satisfy AI SDK, Pi, CLI/TUI, or external-agent conventions, but those names are projections over canonical Brunch operation ids. + +### Changeset-centered graph mutation design + +Most future intent-graph mutations should not become separate top-level tools. Instead, they should become `change.kind` variants submitted through a small number of changeset operations: + +```text +changeset.submit +changeset.apply +changeset.reject +changeset.listPending +``` + +Candidate `change.kind` values: + +```text +intentItem.create +intentItem.updateContent +intentItem.retire +intentEdge.create +intentEdge.delete +annotation.create +annotation.delete +reconciliationNeed.create +reconciliationNeed.resolve +``` + +A future changeset payload should carry origin (`user`, `internal-agent`, `external-agent`), harness (`ui`, `cli`, `pi`, `scenario-runner`), provenance (`turnId`, `chatId`, or prior `changesetId`), purpose (`graph-edit`, `observer-capture`, `architect-proposal`, `reconciliation`), and one or more atomic changes. This lets architect proposals, graph edits, reconciliation resolutions, and external-agent edits share one semantic mutation entry point while preserving user/HITL acceptance where required. + +Conversational/workflow operations should remain explicit rather than being forced into changesets. `turn.answer`, `phase.proposeClosure`, and `reviewSet.accept` manipulate lineage, workflow, and replay state; their side effects may eventually create changesets, but the requested operation is still workflow/turn/review-domain behavior. + +### Current-to-target name map + +| Current name | Target operation vocabulary | +| --- | --- | +| `createAskQuestionTool` | `turn.attachQuestion` as the handler; AI SDK `ask_question` as an adapter tool. | +| `persistStructuredQuestion` | `turn.attachQuestion`. | +| `createProposePhaseClosureTool` | `phase.proposeClosure`. | +| `applyChatRouteTransition` | Split across `chat.submitMessage`, `turn.answer`, `phase.confirmClosure`, and `phase.forceClose`. | +| `submitTurnResponseTransition` | `turn.submitResponse`; review-specific branches become `reviewSet.submitResponse` / `reviewSet.accept`. | +| `materializeAcceptedRequirementsReviewSet` / `materializeAcceptedCriteriaReviewSet` | `reviewSet.accept`. | +| `runObserver` | Split into `observer.captureTurnIntent` for model execution and `observer.applyCapture` for durable graph writes. | +| `handlePatchKnowledgeItem` | `changeset.submit` / `changeset.apply` with `intentItem.updateContent`. | +| `handleCreateKnowledgeEdge` / `handleDeleteKnowledgeEdge` | `changeset.submit` / `changeset.apply` with `intentEdge.create` / `intentEdge.delete`. | +| `handleCreateAnnotation` / `handleDeleteAnnotation` | `annotation.create` / `annotation.delete`, or changeset variants if annotations join the ledger. | +| `createExplorationTools` | `workspace.*` read-only capabilities adapted as tools. | + +## Projected future capability holes + +| Future scenario | Needed capability contracts | Authority concerns | +| ------------------------------ | ----------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- | +| CLI/TUI harness driving Brunch | create/list specs, start phase, answer frontier, read graph, maybe export | Must use workflow handlers; no ORM access. Mutations should be user-commanded or explicit. | +| Pi harness prompt probes | read graph/context packs, workspace read tools, scenario artifact capture | Keep Pi adapter read-only/proposal-only until mutation surface exists. | +| Web research probe | web search/fetch, attach provisional research preface, propose intent items/sources | Research output should be provisional until accepted/observed; avoid direct graph writes initially. | +| Behavioral kernels | read graph neighborhoods, propose disambiguating questions/examples/invariants | Proposal-only until ontology/checkability handlers exist. | +| Architect proposals | read graph, propose changesets, create reconciliation needs | Must wait for changeset-ledger/reconciliation semantics before committing truth. | +| Reconciliation review | list needs, propose resolution, apply accepted resolution | Requires process-debt handlers and user/HITL acceptance boundary. | +| External agent graph edit | edit intent item, add intent edge, retire intent item, create example/invariant | Needs mutation handlers with provenance, support/status, reconciliation, and changeset history. | + +## Recommended next slices + +1. **Agent capability registry skeleton** — Define stable ids, descriptions, input/output schemas, authority classes, and adapter-neutral metadata. Seed it with read-only capabilities only (`workspace.readFile`-style contracts if reused, relation validation, graph read projections) plus non-executable placeholders for mutating handlers discovered here. + +2. **Observer graph mutation handler extraction** — Split `runObserver()` into model execution and `applyObserverCaptureOutput()` so the graph write operation is named, testable, provenance-aware, and eventually reusable by scenario/harness adapters. + +3. **Interviewer tool handler extraction** — Move `persistStructuredQuestion()` / `createPhaseOutcome()` tool execution behind Brunch-owned handlers, then make AI SDK tools adapters over those handlers. + +4. **Graph edit mutation surface design** — Before exposing graph edit tools to agents, align intent-item / intent-edge edit handlers with reconciliation needs and changeset-ledger direction. + +5. **Scenario artifact capability inventory** — Extend no-provider probe artifacts to record which capability ids and authority classes were available for a run, without executing them yet. + +## Verification notes + +Code-search cross-checks used for this audit: + +- DB mutation helpers: `rg "export function (create|add|update|delete|set|link|record|advance|apply|insert|save|start|complete)" src/server src/shared`. +- ORM write calls: `rg "insert\\(|update\\(|delete\\(" src/server src/shared`. +- Agent/runtime seams: `runObserver`, `createAskQuestionTool`, `createProposePhaseClosureTool`, `applyChatRouteTransition`, `submitTurnResponseTransition`, side-chat and edit route handlers. + +The inventory should be refreshed after the multi-chat/reconciliation substrate lands, because chat containers and `reconciliation_need` rows will add new write families. diff --git a/memory/PLAN.md b/memory/PLAN.md index dd8df51c..e7a4f095 100644 --- a/memory/PLAN.md +++ b/memory/PLAN.md @@ -4,17 +4,17 @@ # Plan -The interaction model is mature: four-phase interview, interviewer-autonomous question format, phase-agnostic preface cards with workspace exploration, structured review with per-item commenting, observer knowledge extraction, workflow ownership extraction, distribution hardening, graph view's structured-list peer route, and the first relation-first observer capture seam all ship as working product. The live frontier now centers on the **multi-chat substrate**: introducing chat containers and reconciliation needs as the first durable foundation for side-chats, direct graph edits, revisit/cascade, and future semantic patch history. +The interaction model is mature: four-phase interview, interviewer-autonomous question format, phase-agnostic preface cards with workspace exploration, structured review with per-item commenting, observer knowledge extraction, workflow ownership extraction, distribution hardening, graph view's structured-list peer route, and the first relation-first observer capture seam all ship as working product. The live frontier now centers on the **multi-chat substrate**: introducing chat containers and reconciliation needs as the first durable foundation for side-chats, direct graph edits, revisit/cascade, and future semantic changeset history. -The May 2026 intent-spec, multi-chat, and patch-ledger design notes are now reconciled into one direction. `docs/design/MULTI_CHAT.md` is the concrete phase-one substrate proposal; `docs/design/PATCH_LEDGER.md` remains deeper design pressure for semantic mutation history; `docs/design/INTENT_SPEC_EVOLUTION.md` carries the broader synthesis. The product-layer ontology trajectory is split out as `docs/design/INTENT_GRAPH_SEMANTICS.md` (canonical reference for the FE-700 frontier) and `docs/design/BEHAVIORAL_KERNELS.md` (canonical reference for the FE-702 kernel probes). The dev-layer self-tooling trajectory — the `ln-*` skill family, the proposed file-backed spec registry, and the long-horizon convergence between dev and product ontologies — lives in `docs/design/DEV_WORKFLOW_EVOLUTION.md`. Older portability work remains a future-facing boundary map rather than a live roadmap item until a hosted, remote, or adapter-backed substrate becomes a product goal. +The May 2026 intent-spec, multi-chat, and changeset-ledger design notes are now reconciled into one direction. `docs/design/MULTI_CHAT.md` is the concrete phase-one substrate proposal; `docs/design/PATCH_LEDGER.md` remains historical deeper design pressure for semantic mutation history, but canonical future-facing vocabulary is `changeset` / `change`; `docs/design/INTENT_SPEC_EVOLUTION.md` carries the broader synthesis. The product-layer ontology trajectory is split out as `docs/design/INTENT_GRAPH_SEMANTICS.md` (canonical reference for the FE-700 frontier) and `docs/design/BEHAVIORAL_KERNELS.md` (canonical reference for the FE-702 kernel probes). The dev-layer self-tooling trajectory — the `ln-*` skill family, the proposed file-backed spec registry, and the long-horizon convergence between dev and product ontologies — lives in `docs/design/DEV_WORKFLOW_EVOLUTION.md`. Older portability work remains a future-facing boundary map rather than a live roadmap item until a hosted, remote, or adapter-backed substrate becomes a product goal. ## Active ### Track B — Infrastructure 1. **Multi-chat substrate + reconciliation needs** — add durable `chat` containers, transitional `turn.chat_id`, `specification.primary_chat_id`, mirrored `chat.active_turn_id`, and a minimal `reconciliation_need` queue while keeping legacy spec-scoped pointers during the first slice. - - Why now / unlocks: side-chats, direct graph edits, revisit/cascade, and architect-style proposals all need a substrate below `specification` before a full patch ledger exists. This slice relieves the one-rope-per-spec pressure without making semantic changesets first-class yet. - - Recommended shape: follow `docs/design/MULTI_CHAT.md`; keep `turn.specification_id` and `specification.active_turn_id` during phase one; populate both legacy and chat pointers on new writes; add application assertions for same-spec and same-chat ancestry; create item-to-item reconciliation needs from semantic edge traversal first; carry `caused_by_turn_id` now and nullable `caused_by_patch_id` as a future placeholder. + - Why now / unlocks: side-chats, direct graph edits, revisit/cascade, and architect-style proposals all need a substrate below `specification` before a full changeset ledger exists. This slice relieves the one-rope-per-spec pressure without making semantic changesets first-class yet. + - Recommended shape: follow `docs/design/MULTI_CHAT.md`; keep `turn.specification_id` and `specification.active_turn_id` during phase one; populate both legacy and chat pointers on new writes; add application assertions for same-spec and same-chat ancestry; create item-to-item reconciliation needs from semantic edge traversal first; carry `caused_by_turn_id` now and nullable `caused_by_changeset_id` as the future provenance placeholder. - Traceability: Requirement 39; A71, A82, A83; D135, D137, D138; I111. - Design doc: `docs/design/MULTI_CHAT.md`. @@ -23,15 +23,15 @@ The May 2026 intent-spec, multi-chat, and patch-ledger design notes are now reco 2. **Prompt/context scenario substrate** — externalize server-side prompts and reusable agent doctrines into markdown assets; add typed prompt loading/composition, graph context-pack builders, and a lightweight scenario runner for pre-UI prompt probes. Include a Pi SDK/RPC spike as a candidate harness adapter for tool and agent-flow experiments, without adopting Pi as product runtime truth. - Linear: FE-698. Pi harness spike: FE-635. - Why now / unlocks: multi-chat removes the single transcript spine as default agent context, while ontology, observer, candidate-spec, web research, behavioral-kernel, architect, and post-spec decomposition work all need shared prompt/context machinery. This prevents every future agent feature from inventing its own prompt-context hack and lets LLM-heavy flows be tested before UI work. - - Recommended shape: inventory current interviewer/observer prompts; move prompt text and reusable policies into packaged markdown; define scenario-specific context packs for observer capture, next-question generation, candidate-spec synthesis, criteria/witness generation, web research, reconciliation, architect proposals, and decomposition/oracle probes; build a CLI/test runner that captures rendered prompt, context pack, model/provider settings, raw output, structured parse status, and review notes; add a Brunch-owned agent capability / mutation-surface registry with stable ids, schemas, authority metadata, and adapter-neutral contracts that scenario probes and future CLI/TUI/Pi harnesses can reference, while keeping execution adapters and durable mutating handlers out of the first slice unless they are read-only/proposal-only. The key rule is that future agent-originated writes must go through Brunch-owned handlers rather than direct ORM access. + - Recommended shape: inventory current interviewer/observer prompts; move prompt text and reusable policies into packaged markdown; define scenario-specific context packs for observer capture, next-question generation, candidate-spec synthesis, criteria/witness generation, web research, reconciliation, architect proposals, and decomposition/oracle probes; build a CLI/test runner that captures rendered prompt, context pack, model/provider settings, raw output, structured parse status, and review notes; add a Brunch-owned agent capability / mutation-surface registry with stable ids, schemas, authority metadata, and adapter-neutral contracts that scenario probes and future CLI/TUI/Pi harnesses can reference, while keeping execution adapters and durable mutating handlers out of the first slice unless they are read-only/proposal-only. The key rule is that future agent-originated writes must go through Brunch-owned handlers rather than direct ORM access. Registry naming should follow `docs/design/AGENT_MUTATION_SURFACE.md`: product nouns plus semantic verbs, with intent-graph mutations converging on `changeset.submit` / `changeset.apply` and atomic `change` variants rather than many ad hoc mutating tools. - Verification approach: inner-loop prompt-loader/context-pack unit tests plus seeded scenario snapshots; middle-loop multi-run prompt probes should be designed before judging generative quality. - Traceability: Requirements 40, 41, 42; A84, A85, A86, A87; D139, D140, D141, D142, D143; I112. - - Design docs: `docs/design/INTENT_SPEC_EVOLUTION.md`; `docs/design/MULTI_CHAT.md`; Pi SDK docs as spike input. + - Design docs: `docs/design/INTENT_SPEC_EVOLUTION.md`; `docs/design/MULTI_CHAT.md`; `docs/design/AGENT_MUTATION_SURFACE.md` (agent-originated mutation audit and registry input); Pi SDK docs as spike input. 3. **Intent graph semantics + progressive checkability foundation** — refine the ontology and relation policy so the graph can represent invariants, examples/counterexamples, constraint subtypes, narrowed decisions, witness strength, and checkability gaps as source/destination material for future generative features. - Linear: FE-700. - Why now / unlocks: candidate generation, behavioral kernels, architect proposals, and downstream verification-aware decomposition need a sharper semantic target than the current exploration/review ontology. - - Recommended shape: add `invariant` and `example` as first-class durable kinds; subtype examples (positive / negative / edge-case / trace / not-relevant); narrow `decision` per the decision-capture criteria; enrich `constraint` subtypes (non_goal / scope / technical / policy / resource / compatibility / environmental); add `criterion` subtypes (acceptance / test / manual_review / runtime_check / proof / observability) and `invariant` subtypes (state / transition / authority / provenance / consistency / security / data_integrity); add `checkability` and `witness strength` fields on claims per the progressive-checkability ladder; introduce the five-family relation taxonomy (justification / dependency / boundary / refinement / verification) plus first-class negative relations (`rules_out`, `counterexample_for`); add edge epistemic metadata (`support`, `status`, `provenanceTurnId`, `rationale`); land a relation-policy registry whose axes distinguish `visible`, `cascade`, `export_trace`, `staleness`, `reconciliation`, `criteria_help`, and `weak_suggestion` participation. Full enumerations and worked examples in `docs/design/INTENT_GRAPH_SEMANTICS.md`. + - Recommended shape: add `invariant` and `example` as first-class durable kinds; subtype examples (positive / negative / edge-case / trace / not-relevant); narrow `decision` per the decision-capture criteria; enrich `constraint` subtypes (non_goal / scope / technical / policy / resource / compatibility / environmental); add `criterion` subtypes (acceptance / test / manual_review / runtime_check / proof / observability) and `invariant` subtypes (state / transition / authority / provenance / consistency / security / data_integrity); add `checkability` and `witness strength` fields on intent items per the progressive-checkability ladder; introduce the five-family relation taxonomy (justification / dependency / boundary / refinement / verification) plus first-class negative relations (`rules_out`, `counterexample_for`); add edge epistemic metadata (`support`, `status`, `provenanceTurnId`, `rationale`); land a relation-policy registry whose axes distinguish `visible`, `cascade`, `export_trace`, `staleness`, `reconciliation`, `criteria_help`, and `weak_suggestion` participation. Full enumerations and worked examples in `docs/design/INTENT_GRAPH_SEMANTICS.md`. - Verification approach: corpus/fixture observer probes comparing old vs refined ontology; graph-review manual assessment for precision/noise; context-pack probe outputs must show authority and witness labels. - Traceability: Requirement 38; A77, A78, A80, A81, A84; D134, D136, D137, D139, D140. - Design docs: `docs/design/INTENT_GRAPH_SEMANTICS.md` (canonical reference); `docs/design/INTENT_SPEC_EVOLUTION.md` (broader synthesis context). @@ -39,7 +39,7 @@ The May 2026 intent-spec, multi-chat, and patch-ledger design notes are now reco 4. **Generative prompt probes before UI** — use the scenario substrate to prototype web research, behavioral kernels, candidate-spec completion, and post-spec design/oracle/decomposition flows against intent-graph fixtures before committing product surfaces. - Linear: FE-702 for post-spec decomposition probes; FE-649 and FE-640 are productization children under FE-698. - Why now / unlocks: proves whether progressive checkability and graph-first context can be taught to agents, and de-risks the next generation of UI features. - - Recommended shape: start with one web-research context/query scenario, the first three behavioral kernels (`state & lifecycle`, `containment & topology`, `authority & capability`) per the v0.1 kernel ontology, candidate-spec set generation, and exploratory oracle/decomposition scenarios inspired by `.agents/skills/ln-design/` and `.agents/skills/ln-oracles/`. Each kernel probe should follow the kernel-card structure (detection signals, contrastive question templates, artifact schema, validators) and emit typed claims/edges per `docs/design/INTENT_GRAPH_SEMANTICS.md`. Outputs remain probe artifacts or proposal-only structures, not committed graph mutations. + - Recommended shape: start with one web-research context/query scenario, the first three behavioral kernels (`state & lifecycle`, `containment & topology`, `authority & capability`) per the v0.1 kernel ontology, candidate-spec set generation, and exploratory oracle/decomposition scenarios inspired by `.agents/skills/ln-design/` and `.agents/skills/ln-oracles/`. Each kernel probe should follow the kernel-card structure (detection signals, contrastive question templates, artifact schema, validators) and emit typed intent items / intent edges per `docs/design/INTENT_GRAPH_SEMANTICS.md`. Outputs remain probe artifacts or proposal-only structures, not committed graph mutations. - Verification approach: scenario-runner fixtures, raw output review, structured parse validation, and qualitative scorecards before product UI. - Traceability: Requirements 20, 21, 31, 32, 40, 41; A67, A68, A80, A85, A87; D126, D127, D139, D141. - Design docs: `docs/design/BEHAVIORAL_KERNELS.md` (kernel ontology + cards); `docs/design/INTENT_GRAPH_SEMANTICS.md` (artifact target). @@ -53,12 +53,12 @@ The May 2026 intent-spec, multi-chat, and patch-ledger design notes are now reco ### Intent graph and reconciliation -- **Semantic changeset / patch ledger** — make semantic mutations first-class once non-primary surfaces can change graph truth. +- **Semantic changeset ledger** — make semantic mutations first-class once non-primary surfaces can change intent-graph truth. - Linear: FE-701. - - Recommended shape: prefer the invariant "one semantic mutation set contains one or more atomic changes"; naming remains open between `changeset` / `change` and `patch` / `patch_change`. Connect `reconciliation_need.caused_by_patch_id` once patches exist. + - Recommended shape: one `changeset` contains one or more atomic `change` records. Use `changeset` / `change` as canonical schema and operation vocabulary; `patch` / `patch_change` remain historical design-doc terms only. Connect `reconciliation_need.caused_by_changeset_id` once changesets exist. - Depends on: multi-chat substrate + reconciliation needs; prompt/context context packs for reconciliation scenarios. - Traceability: A71, A79, A83; D135, D138, D140. - - Design doc: `docs/design/PATCH_LEDGER.md`. + - Design doc: `docs/design/PATCH_LEDGER.md` (historical file name; future vocabulary is changeset/change). - **Relation-first observer capture enrichment** — after the next ontology/relation-policy probes, broaden observer relationship extraction across the refined ontology where edge support and operational participation are understood. - Recommended shape: keep `runObserver()` as the public turn-owned seam, but feed it scenario-specific context packs and validate output through the relation-policy registry. The FE-639 first cut has landed; remaining work should be driven by corpus/manual proving. @@ -67,7 +67,7 @@ The May 2026 intent-spec, multi-chat, and patch-ledger design notes are now reco - **Architect / generator loop** — autonomous agent that iterates over the intent graph and proposes semantic changes for HITL review through the same future changeset / reconciliation pathway as user-driven edits. - Recommended shape: keep productized architect proposals behind multi-chat + reconciliation + semantic changesets; use the scenario substrate for shadow/proposal-only probes first. - - Traceability: A73, A85, A87; D139, D141; depends on chat containers + reconciliation needs and semantic changeset / patch ledger. + - Traceability: A73, A85, A87; D139, D141; depends on chat containers + reconciliation needs and semantic changeset ledger. ### User-facing capabilities @@ -130,7 +130,7 @@ The May 2026 intent-spec, multi-chat, and patch-ledger design notes are now reco ## Recently Completed -- [2026-05-07] FE-698 prompt/context scenario substrate — Packaged markdown prompt registry + observer context-pack foundation + scenario runner capture skeleton/composition. Server interviewer, observer, and side-chat role prompts now load from markdown assets through a typed prompt registry, observer capture renders its existing prompt context through the first typed scenario-specific context pack, and seeded observer-capture prompt scenarios now compose the production observer prompt with typed context-pack output into deterministic no-provider probe artifacts. Review fixes moved observer prompt composition into a pure module and made prompt scenario prompt sources explicit. Verified: `npm run verify`. Watch: next FE-698 slices still need broader context-pack scenarios, real provider/harness execution probes, and/or Pi adapter spike work. +- [2026-05-07] FE-698 prompt/context scenario substrate — Packaged markdown prompt registry + observer context-pack foundation + scenario runner capture skeleton/composition + agent mutation-surface audit. Server interviewer, observer, and side-chat role prompts now load from markdown assets through a typed prompt registry, observer capture renders its existing prompt context through the first typed scenario-specific context pack, and seeded observer-capture prompt scenarios now compose the production observer prompt with typed context-pack output into deterministic no-provider probe artifacts. Review fixes moved observer prompt composition into a pure module and made prompt scenario prompt sources explicit. The agent mutation-surface audit inventories current and projected agent-originated write paths as input to the registry/handler slices. Verified: `npm run verify` for code slices; audit verified by code-search/document consistency. Watch: next FE-698 slices still need the capability registry skeleton, broader context-pack scenarios, real provider/harness execution probes, and/or Pi adapter spike work. - [2026-05-01] Side-chat V1.1 — Explore vertical slice. End-to-end graph-launched chat interaction shipped: prompt builder, POST `/side-chat` SSE endpoint, popover host, graph-view wiring, SSE consumer, and active-button activation. Follow-up refactor collapsed pending assistant text into the message list and extracted `SideChatHost` so activation is a tree-mount fact. This is complete implementation history; future conceptual work is multi-chat / reconciliation, not Side-chat V2/V3. - [2026-05-04] Graph view structured-list peer route — `/specification/$id/graph` now renders project-wide entities through the structured-list layout with relationship subsections, relation chips, empty state, row controls, and a back-to-chat affordance. Follow-up active-path filtering and spatial canvas remain horizon work. Verified: `npm run verify` in the FE-643 slice family. - [2026-04-30] FE-639 relation-first observer capture first cut — eligible answered turns now enter one background observer-capture backlog, observer prompts use compact existing-knowledge anchors, observer output persists validated graph-delta relationship candidates, and accepted review grounding refs reuse the same conservative relation policy. Verified: `npm run verify`. Watch: A66 remains open until corpus/manual graph-review proves edge precision and density are useful. @@ -151,9 +151,9 @@ multi-chat-substrate + reconciliation-needs (active) │ │ ├──→ productized candidate-spec completion assist (horizon) │ │ └──→ post-spec oracle/decomposition frontier (probe/future product) │ └──→ continuous-workspace (next, independent UI track but graph-context aware) - └──→ semantic-changeset / patch-ledger (horizon) + └──→ semantic-changeset ledger (horizon) ├──→ relation-first observer enrichment (horizon, after ontology/policy probes) - └──→ architect-loop (horizon, proposal-only until patch/reconciliation path) + └──→ architect-loop (horizon, proposal-only until changeset/reconciliation path) TRACK B — Graph/workspace surfaces graph-view-structured-list (completed) @@ -166,7 +166,7 @@ workspace hygiene gitignore assist (bounded, dashboard-surface candidate) dashboard metrics two-axis interview framing progressive detail / recursive deflation -revisit / edit-mode (reshaped by reconciliation needs + patch ledger) +revisit / edit-mode (reshaped by reconciliation needs + changeset ledger) structured development spec registry (tooling experiment) portability boundaries (deferred until substrate goal exists) ``` diff --git a/memory/SPEC.md b/memory/SPEC.md index 0374bd2c..06f18a92 100644 --- a/memory/SPEC.md +++ b/memory/SPEC.md @@ -24,7 +24,7 @@ Brunch is an AI-guided spec elicitation tool that turns natural-language goals i - **requirements** — capability review and gap-finding - **criteria** — verification coverage -An interviewer agent conducts the conversation. A separate observer agent extracts typed knowledge items from each answered turn and links them into a knowledge graph. The interviewer may also invoke context-gathering capabilities when it lacks enough orientation for the next move; their visible outputs appear in the stream as preface cards. The workspace stream is turn-centered rather than message-shaped: durable conversational turns provide the branch-bearing lineage spine, while projected control cards, phase markers, and activity cards frame them. An open phase should always bottom out in one visible next action — a projected kickoff card, actionable frontier turn, visible generation state, projected recovery card, or closed-phase handoff / completion control. +An interviewer agent conducts the conversation. A separate observer agent extracts typed intent items from each answered turn and links them into an intent graph. The interviewer may also invoke context-gathering capabilities when it lacks enough orientation for the next move; their visible outputs appear in the stream as preface cards. The workspace stream is turn-centered rather than message-shaped: durable conversational turns provide the branch-bearing lineage spine, while projected control cards, phase markers, and activity cards frame them. An open phase should always bottom out in one visible next action — a projected kickoff card, actionable frontier turn, visible generation state, projected recovery card, or closed-phase handoff / completion control. Brunch is strongest while certainty is still being formed: when the real work is clarifying the target, surfacing commitments, and making unresolvedness legible before downstream implementation decomposition takes over. Its output is a calibrated handoff, not fake closure — a truthful starting point for implementation that makes visible what is known, chosen, constrained, required, and still open. Export is therefore built from the active path's accepted review outputs plus reviewed knowledge, not from laundering unresolved uncertainty into a prematurely final document. @@ -32,7 +32,7 @@ The product direction is from **planning specs** toward **intent specs**. Planni Brunch operates inside a **workspace**: the cwd-backed software context whose local `.brunch/` directory stores one or more specifications. Grounding supports two strategies: **elicitation-first** for greenfield work and **analysis-first** for brownfield work. Brownfield grounding begins with read-only workspace analysis that produces a visible preface card (grounding brief), and the interviewer may gather more context via preface cards in any phase when it needs orientation. -Post-launch, Brunch should support specification work across two axes rather than one: `greenfield <> brownfield` and `end-to-end build <> incremental feature`. That means the interview cannot assume one long whole-product drill-down. It should be able to start broad, deepen recursively where needed, synthesize candidate directions when the user wants help filling in the gaps, and let the knowledge graph itself become a working surface for refinement instead of only a sidebar summary. +Post-launch, Brunch should support specification work across two axes rather than one: `greenfield <> brownfield` and `end-to-end build <> incremental feature`. That means the interview cannot assume one long whole-product drill-down. It should be able to start broad, deepen recursively where needed, synthesize candidate directions when the user wants help filling in the gaps, and let the intent graph itself become a working surface for refinement instead of only a sidebar summary. ## Constraints & Non-goals @@ -53,7 +53,7 @@ Post-launch, Brunch should support specification work across two axes rather tha 3. Brownfield grounding can use read-only workspace analysis to ground the opening flow and the first substantive question. 4. Structured responses support turn-appropriate option selections or explicit action submissions, an explicit `none of the above` path where relevant, and one attached response note. The interviewer autonomously chooses whether to include options on each question based on conversational trajectory; grounding requires free-text on every submission (options, when present, are optional enrichment), while design preserves the current selection-required gate with a structural "none of the above" path. A single turn may carry multiple assistant-part artifacts (e.g. a preface card followed by a question card, or a revision card followed by a review set) rendered as stacked cards with one unified response submission. 5. Users can see thinking, tool usage, and streaming progress in real time; if live-only artifacts are shown, replay keeps concise durable activity metadata (at minimum elapsed thinking time plus a coarse tool-use summary / placeholder seam) instead of dropping them completely. -6. The observer extracts typed knowledge items and graph edges from answered turns. +6. The observer extracts typed intent items and intent edges from answered turns. 7. The accumulated knowledge layer and readiness state stay visible during the interview. 8. Each workflow mode has deterministic closeability plus a separate readiness signal. 9. Phase close records summary text and closure basis. @@ -78,18 +78,18 @@ Post-launch, Brunch should support specification work across two axes rather tha 28. Observer capture treats the full turn — including any turn-internal preface card or revision card plus the question or review set plus the user response — as one atomic validated unit for knowledge extraction. 29. Grounding captures both workspace novelty (`greenfield` / `brownfield`) and delivery posture (`end-to-end build` / `incremental feature`), and interviewer behavior adapts to any point in that matrix rather than assuming a whole-product greenfield interview. 30. Observer extraction treats typed relationships as first-class across the ontology and records them whenever they can be reasonably traced from a turn or accepted review state, while abstaining when support is weak. Relationship extraction must stay prompt-budgeted: existing entities should be presented as compact identity anchors, not full Markdown inventories or graph dumps. -31. Users can request a turn-owned candidate-spec set during grounding or design instead of only skipping the remainder of a phase; each candidate direction includes implications, tradeoffs, likely generated knowledge, and what it rules out, and the user can accept a direction, request refinement, reject, or regenerate candidates. Accepting a candidate direction may steer the next interview move and materialize knowledge items, but does not itself close the phase. +31. Users can request a turn-owned candidate-spec set during grounding or design instead of only skipping the remainder of a phase; each candidate direction includes implications, tradeoffs, likely generated knowledge, and what it rules out, and the user can accept a direction, request refinement, reject, or regenerate candidates. Accepting a candidate direction may steer the next interview move and materialize intent items, but does not itself close the phase. 32. Interview detail can proceed as a progressive broad-pass-to-detail flow with explicit `next level of detail` actions, rather than only as one monolithic linear drill-down. -33. Graph view is a first-class alternative to chat view, accessed as a peer route, and projects the knowledge graph as a navigable workspace with visible relationship topology and supports launching refinement side-chats from graph selections. The first ship is a structured-list layout; a spatial canvas layout follows as a layout switch inside graph mode. +33. Graph view is a first-class alternative to chat view, accessed as a peer route, and projects the intent graph as a navigable workspace with visible relationship topology and supports launching refinement side-chats from graph selections. The first ship is a structured-list layout; a spatial canvas layout follows as a layout switch inside graph mode. 34. First-run setup detects missing expected LLM provider credentials before the user starts a specification, makes the missing-key state visible on the dashboard, and offers a guided setup path rather than requiring README / shell-env debugging. 35. If Brunch accepts an API key through the UI, it stores credentials outside the project workspace in XDG-compliant user auth/config state; project `.env` files and `.brunch/` never become the default secret-storage target. 36. LLM provider configuration is owned by a shared AI runtime provider seam, so interviewer and observer model creation do not encode direct provider imports or environment-variable reads as product truth. That seam must preserve provider-specific capabilities such as Anthropic thinking / reasoning options or degrade them explicitly. 37. Workspace hygiene detects whether the local `.brunch/` directory is git-ignored and, with explicit user confirmation, can add an idempotent `.gitignore` entry, creating `.gitignore` when absent. 38. The product ontology should expand beyond the current exploration + review kinds to support `invariant` and `example` as first-class durable knowledge kinds, with observer prompts and promotion rules that distinguish descriptive context, constraints, decisions, assumptions, requirements, invariants, criteria, and examples without treating every answer as a decision. -39. Specifications can own multiple durable chat containers below the specification, with turns gradually moving toward chat ownership while preserving current spec-scoped compatibility during transition. The same substrate records directed `reconciliation_need` process debt when changed knowledge may affect other graph truth; semantic `knowledge_edge` remains separate. +39. Specifications can own multiple durable chat containers below the specification, with turns gradually moving toward chat ownership while preserving current spec-scoped compatibility during transition. The same substrate records directed `reconciliation_need` process debt when changed intent items may affect other graph truth; semantic intent edges remain separate (currently persisted as `knowledge_edge` rows during transition). 40. Prompt and context engineering are first-class server subsystems: prompts and reusable policy doctrines live as inspectable markdown assets, while typed context-pack builders derive scenario-specific intent-graph renderings for interviewer, observer, research, candidate synthesis, behavioral kernels, reconciliation, architect, and downstream decomposition probes. 41. Agent-heavy future capabilities can be tested before product UI exists through a lightweight scenario substrate that runs prompt/context packs against seeded graphs or transcript fixtures, captures raw and structured outputs, and supports provider/harness comparison. Pi may be evaluated as the initial lower-level agent harness, especially for tool experiments and pre-UI probes, but Brunch product authority over durable workflow, replay, graph mutation, and reconciliation remains explicit. -42. Agent-originated mutations of Brunch data use one typed server-owned mutation surface regardless of caller. Internal interviewer/observer flows, scenario probes, CLI/TUI harnesses, Pi or other harness adapters, and future external agents may not mutate durable Brunch state by calling the ORM directly; they must invoke stable mutation handlers with input/output schemas, authority metadata, replay policy, and reconciliation/patch-ledger semantics. Read-only capability contracts may share the same registry shape, but the hard invariant is single-entry mutation authority. +42. Agent-originated mutations of Brunch data use one typed server-owned mutation surface regardless of caller. Internal interviewer/observer flows, scenario probes, CLI/TUI harnesses, Pi or other harness adapters, and future external agents may not mutate durable Brunch state by calling the ORM directly; they must invoke stable mutation handlers with input/output schemas, authority metadata, replay policy, and reconciliation/changeset-ledger semantics. Read-only capability contracts may share the same registry shape, but the hard invariant is single-entry mutation authority. ## Assumptions @@ -116,20 +116,20 @@ Post-launch, Brunch should support specification work across two axes rather tha | A67 | Users who are tired, rushed, or under-informed will converge faster by reacting to synthesized candidate directions than by continuing a long direct interview or force-closing early. | medium | open | D126, D127 | Manual user-flow comparison between direct questioning, skip-close, and candidate-spec reaction flows. | | A68 | Broad-pass interviewing followed by explicit deepen-detail actions will preserve coherence better than a single depth-first drill-down while still producing export-worthy specifications. | medium | open | D127 | Prototype broad-pass-first flows and compare resulting knowledge completeness and user comprehension. | | A69 | A graph-centric refinement surface can launch side-chats without splitting durable specification truth, so chat view and graph view stay two projections over one evolving graph. | medium | open | D128, D114 | Prototype graph-launched refinement with reload/resume checks to ensure side-chat state and graph state stay coherent. | -| A70 | The structured-list graph-view layout provides standalone enumeration value beyond relationship density: users benefit from seeing all knowledge items grouped by kind even when most have no edges yet, and graceful degradation (collapse the relations footer when zero edges) keeps the view honest while relation-first observer capture matures. | medium | open | A66, D128, D129 | Manual walkthroughs at low and high edge density once the structured list ships; check whether the layout still feels valuable when most items have empty relations footers, and whether observer-density growth visibly improves the view over time. | -| A71 | Semantic mutations will eventually need a changeset / patch-ledger history distinct from conversational turn ancestry, but the first implementation should prove chat containers and reconciliation needs before committing the full ledger shape. | medium | open | D135 | Build chat containers plus reconciliation needs first; revisit whether turn-linked provenance remains sufficient before adding full semantic changesets. | -| A72 | Knowledge-graph items can carry version history without breaking the active-path durable-truth contract: each version is the result of an applied semantic mutation, prior versions are queryable for diff / comparison / audit, and the active-path projection always reflects the latest version for each item. | low | future | A71, D135 | Prototype item versioning behind the patch ledger; verify that revisit cascades, span-anchored annotations, and soft-edit audit trails behave correctly across versions. | +| A70 | The structured-list graph-view layout provides standalone enumeration value beyond relationship density: users benefit from seeing all intent items grouped by kind even when most have no edges yet, and graceful degradation (collapse the relations footer when zero edges) keeps the view honest while relation-first observer capture matures. | medium | open | A66, D128, D129 | Manual walkthroughs at low and high edge density once the structured list ships; check whether the layout still feels valuable when most items have empty relations footers, and whether observer-density growth visibly improves the view over time. | +| A71 | Semantic mutations will eventually need a changeset-ledger history distinct from conversational turn ancestry, but the first implementation should prove chat containers and reconciliation needs before committing the full ledger shape. | medium | open | D135 | Build chat containers plus reconciliation needs first; revisit whether turn-linked provenance remains sufficient before adding full semantic changesets. | +| A72 | Intent items can carry version history without breaking the active-path durable-truth contract: each version is the result of an applied semantic mutation, prior versions are queryable for diff / comparison / audit, and the active-path projection always reflects the latest version for each item. | low | future | A71, D135 | Prototype item versioning behind the changeset ledger; verify that revisit cascades, span-anchored annotations, and soft-edit audit trails behave correctly across versions. | | A73 | Autonomous architect / generator loops can propose useful graph mutations only after human-driven multi-chat and reconciliation surfaces prove the shared mutation pipeline. | low | future | A71, D135 | Run architect proposals in shadow mode after multi-chat / reconciliation seams stabilize, then compare proposed changes against user-driven edits. | | A74 | OpenRouter will reduce first-run friction for Brunch's likely users compared with requiring direct Anthropic keys, but model capability parity and AI SDK support need proof before making it the default provider path. | medium | open | D130, D131 | Spike provider configuration against interviewer/observer calls, especially model naming, structured output, tool use, and reasoning/thinking support. | | A75 | XDG-compliant user-scoped auth/config storage is acceptable for UI-entered API keys and safer than writing secrets to the project workspace, while environment variables remain useful for automation and CI. | medium | open | D130, D132 | Prototype key save/load/delete precedence and inspect OS/XDG paths; manual first-run walkthrough verifies users understand where the key is stored. | | A76 | Users will accept Brunch editing `.gitignore` when the action is explicit, previewable, and idempotent; doing so should reduce accidental commits of `.brunch/` without feeling like surprising repo mutation. | high | open | D133 | Unit-test ignore detection / append behavior and manual dashboard walkthrough with absent, present, and already-covering `.gitignore` states. | -| A77 | Progressive checkability will improve generated specs more than a binary "formal / not formal" framing, because the weakest sufficient witness may be prose, example, test, runtime contract, invariant, proof obligation, or explicit unresolved ambiguity depending on the claim. | medium | open | D134 | Prototype claim-to-witness review on a small corpus and compare whether users can validate meaning without being forced into formal-methods terminology. | +| A77 | Progressive checkability will improve generated specs more than a binary "formal / not formal" framing, because the weakest sufficient witness may be prose, example, test, runtime contract, invariant, proof obligation, or explicit unresolved ambiguity depending on the intent item. | medium | open | D134 | Prototype intent-item-to-witness review on a small corpus and compare whether users can validate meaning without being forced into formal-methods terminology. | | A78 | Adding `invariant` and `example` as product ontology candidates will make intent drift easier to detect without overwhelming early interviews, provided examples carry subtypes such as positive, negative / counterexample, edge-case, and not-relevant rather than expanding into many top-level kinds. | medium | open | D134 | Run transcript probes for examples, counterexamples, not-relevant cases, and state/transition rules; check whether items improve export and review quality or create noisy capture. | | A79 | Once semantic truth can change through graph edits, side-chats, reconciliation, verifier feedback, or implementation feedback, turn ancestry alone will be insufficient as the semantic history spine. | medium | open | D135 | Prototype chat containers and reconciliation needs before full patch history; revisit if turn-linked provenance remains enough for first-class graph editing. | | A80 | Behavioral kernels can generate higher-yield disambiguating questions than generic elicitation prompts, but only if kernels stay as interviewer / architect / wizard machinery that emits checkable artifacts rather than user-visible formalism. | low | open | D134 | Try state/lifecycle and containment/topology prototypes first, and compare question value against current prompt-only interviewing. | | A81 | Knowledge edges can carry intent semantics without becoming too noisy only if relation policy distinguishes semantic relations from reconciliation needs, and distinguishes display edges, cascade-participating edges, export-relevant edges, staleness-producing edges, and low-confidence suggestions. | medium | open | D137 | Design relation-policy semantics before broad observer edge expansion; test low- and high-density graphs for user trust and operational noise. | | A82 | A soft dual-pointer migration can introduce chat containers without destabilizing current spec-scoped reads: `turn.specification_id` and `specification.active_turn_id` can remain temporarily while `turn.chat_id`, `specification.primary_chat_id`, and `chat.active_turn_id` become the future ownership path. A separate `active_chat_id` is deferred until multiple active chat surfaces need an explicit UI-level pointer. | medium | open | D138 | Migration/read-path tests compare legacy and chat-derived active heads; application assertions prove `turn.specification_id === chat.specification_id` and parent turns remain chat-scoped. | -| A83 | A minimal item-to-item `reconciliation_need` table is enough for the first queue if it carries narrow kind/status values plus nullable provenance placeholders, and if future relation targets / patch provenance can extend the shape without renaming the concept. | medium | open | D137, D138 | Implement deterministic need creation from changed items plus `knowledge_edge` traversal; review whether relation-targeted needs or richer basis/strength fields are required before adding them. | +| A83 | A minimal item-to-item `reconciliation_need` table is enough for the first queue if it carries narrow kind/status values plus nullable provenance placeholders, and if future relation targets / changeset provenance can extend the shape without renaming the concept. | medium | open | D137, D138 | Implement deterministic need creation from changed items plus `knowledge_edge` traversal; review whether relation-targeted needs or richer basis/strength fields are required before adding them. | | A84 | Scenario-specific graph context packs can replace transcript-as-default prompt context without losing conversational nuance, provided packs preserve authority, provenance, unresolvedness, relation neighborhoods, and recency where relevant. | medium | open | D139, D140 | Build prompt/context probes over seeded graphs and compare generated observer, interviewer, candidate, and oracle/decomposition outputs against transcript-heavy baselines. | | A85 | A lightweight prompt scenario substrate will let Brunch validate LLM-heavy product directions faster than building UI first, if it captures rendered prompts, context packs, model settings, raw outputs, structured parses, and human review notes as repeatable artifacts. | medium | open | D139 | Run multi-scenario prompt probes for observer ontology, behavioral kernels, candidate-spec assist, and downstream oracle/decomposition before productizing their UI. | | A86 | Pi can serve as a useful pre-UI agent harness or tool-spike backend without forcing Brunch to adopt Pi as its production agent runtime, as long as integration remains adapter-shaped and Brunch-owned authority/replay/mutation semantics stay outside the harness. | low | open | D142 | Spike Pi SDK or RPC with in-memory sessions, custom tools, controlled prompts, and Brunch graph context packs; evaluate event capture, tool ergonomics, provider handling, packaging, and isolation. | @@ -144,7 +144,7 @@ Post-launch, Brunch should support specification work across two axes rather tha 50. **Knowledge relationships live behind one typed graph seam** — persisted graph edges are first-class and drive dependency, derivation, and revisit behavior. 65. **Phase outcomes are explicit durable records** — workflow status, closeability, readiness, and closure provenance project from durable phase outcomes on the active path. 66. **Interviewer-recommended and user-forced closes share one transcript-friendly seam** — one phase-close transport handles both paths, with explicit closure basis. -80. **Knowledge-graph revisit replaces hard turn-tree branching for V1** — revisit starts from edit mode on knowledge items, traces cascade through graph edges, and resolves through a secondary thread. **Updated 2026-05-07 (D135):** the older modal secondary-thread and side-chat V2/V3 persistence shapes are superseded by the multi-chat + reconciliation-need direction; the user-facing revisit/cascade goal remains live. **Chat-level branching note:** the no-turn-tree-branching invariant remains in force at the *turn* level, but multiple chats per spec are explicitly allowed at the *chat* level once the multi-chat substrate lands. Branching at the chat level is not user-surfaced as a generic `branch this thread` affordance by default; it manifests through graph-anchored refinement / reconciliation surfaces. +80. **Intent-graph revisit replaces hard turn-tree branching for V1** — revisit starts from edit mode on intent items, traces cascade through intent edges, and resolves through a secondary thread. **Updated 2026-05-07 (D135):** the older modal secondary-thread and side-chat V2/V3 persistence shapes are superseded by the multi-chat + reconciliation-need direction; the user-facing revisit/cascade goal remains live. **Chat-level branching note:** the no-turn-tree-branching invariant remains in force at the *turn* level, but multiple chats per spec are explicitly allowed at the *chat* level once the multi-chat substrate lands. Branching at the chat level is not user-surfaced as a generic `branch this thread` affordance by default; it manifests through graph-anchored refinement / reconciliation surfaces. 86. **The client is organized by phase-addressable routing and three concentric layout shells** — AppLayout, SpecificationWorkspaceLayout, and ViewLayout own the user-facing route structure. Interview phases remain router-addressable for deep links, gating, and sibling route composition even if the center pane later renders them inside one continuous workspace surface. 87. **Layout-level data ownership partitions invalidation** — the specification bundle and entity collections subscribe through separately owned query domains / route surfaces instead of one monolithic refresh boundary, so entity refreshes do not remount or tear down the transcript-owning surface. 89. **Primary grounding/design input is workspace-owned and card-owned** — substantive elicitation in grounding and design proceeds through durable turn cards inside the workspace stream, while structural phase-entry, recovery, and handoff affordances project as control cards in that same stream; the global bottom composer is not the canonical input seam. Preface cards accept optional comment + continue, while question cards collect substantive answers. Depends on: A51. Supersedes: —. @@ -186,20 +186,22 @@ Post-launch, Brunch should support specification work across two axes rather tha 132. **UI-entered credentials are user-scoped auth state, not workspace state** — if the app collects an API key, it writes to an XDG-compliant user auth/config location, never to `.brunch/` or the project `.env` by default. Existing environment-variable configuration remains supported as an override path for scripted use. Depends on: A75. Supersedes: project-local `.env` as the only persistent setup mechanism. 133. **`.brunch/` gitignore support is confirm-gated deterministic workspace mutation** — Brunch may inspect the workspace repository and offer to add `.brunch/` to `.gitignore`, but it must not mutate repository files without explicit confirmation. The mutation should be idempotent, preserve existing file content, and create `.gitignore` only when the user accepts. Depends on: A76. Supersedes: relying solely on user memory / docs to ignore the generated workspace directory. -134. **Brunch specs evolve toward recognition-first intent graphs with progressive checkability** — the product direction is to preserve meaning as typed claims, semantic edges, examples / counterexamples, verification witnesses, unresolved ambiguity, and user validation status rather than treating the spec as a planning document or prose inventory. Requirements and criteria remain distinct product items for now: a requirement is a commitment and a criterion is an oracle / witness. `invariant` and `example` should become first-class product ontology kinds, with positive, negative / counterexample, edge-case, and not-relevant examples represented as subtypes rather than separate top-level kinds. A shared `Property`-like claim primitive remains a design candidate rather than a committed storage or UI surface. Behavioral kernels are hidden interviewer / architect / wizard machinery for surfacing latent state, containment, authority, concurrency, migration, and evidence questions while emitting the weakest useful checkable artifact for the claim. Depends on: A77, A78, A80, D50, D125, Requirement 38. Supersedes: the implicit framing that requirements / criteria review is the terminal semantic model of product intent. +134. **Brunch specs evolve toward recognition-first intent graphs with progressive checkability** — the product direction is to preserve meaning as typed intent items, semantic edges, examples / counterexamples, verification witnesses, unresolved ambiguity, and user validation status rather than treating the spec as a planning document or prose inventory. Requirements and criteria remain distinct product items for now: a requirement is a commitment and a criterion is an oracle / witness. `invariant` and `example` should become first-class product ontology kinds, with positive, negative / counterexample, edge-case, and not-relevant examples represented as subtypes rather than separate top-level kinds. A shared `Property`-like intent primitive remains a design candidate rather than a committed storage or UI surface. Behavioral kernels are hidden interviewer / architect / wizard machinery for surfacing latent state, containment, authority, concurrency, migration, and evidence questions while emitting the weakest useful checkable artifact for the intent item. Depends on: A77, A78, A80, D50, D125, Requirement 38. Supersedes: the implicit framing that requirements / criteria review is the terminal semantic model of product intent. -135. **Semantic mutation history should split from conversational turn history when graph editing becomes first-class** — turns remain conversational provenance and replay; the intent graph remains current semantic truth; a future changeset / patch ledger records semantic mutation history; and reconciliation needs record semantic debt caused by changes that may stale existing graph truth. The first implementation should follow the multi-chat substrate in D138: chat containers plus durable reconciliation needs before a full patch ledger, keeping turn-linked provenance and legacy spec-scoped pointers as compatibility while making room for patch-backed provenance later. User-direct-edit mode should be allowed to land a committed group of knowledge-item changes immediately, synchronously create reconciliation needs from existing dependency and historical relations, then queue an asynchronous observer pass that may immediately add newly implied edges and additional reconciliation needs as a later interpretive-structure patch. That observer pass may not silently rewrite, retire, or weaken existing accepted intent; content changes that require judgment go through reconciliation review. This explicitly reshapes the older revisit-session draft: revisit / cascade remains a product capability, but `revisit_session` is no longer the preferred persistence foundation once multiple chats, direct graph edits, and reconciliation review sets are in scope. Depends on: A71, A79, D80, D110, D112, D125, D128, D134, D138. Supersedes: turn ancestry as the only plausible semantic history spine, and the `docs/archive/design/REVISIT_MODULE.md` table shape as canonical persistence design. +135. **Semantic mutation history should split from conversational turn history when graph editing becomes first-class** — turns remain conversational provenance and replay; the intent graph remains current semantic truth; a future changeset ledger records semantic mutation history; and reconciliation needs record semantic debt caused by changes that may stale existing graph truth. The first implementation should follow the multi-chat substrate in D138: chat containers plus durable reconciliation needs before a full changeset ledger, keeping turn-linked provenance and legacy spec-scoped pointers as compatibility while making room for changeset-backed provenance later. User-direct-edit mode should be allowed to land a committed group of intent-item changes immediately, synchronously create reconciliation needs from existing dependency and historical relations, then queue an asynchronous observer pass that may immediately add newly implied intent edges and additional reconciliation needs as a later interpretive-structure changeset. That observer pass may not silently rewrite, retire, or weaken existing accepted intent; content changes that require judgment go through reconciliation review. This explicitly reshapes the older revisit-session draft: revisit / cascade remains a product capability, but `revisit_session` is no longer the preferred persistence foundation once multiple chats, direct graph edits, and reconciliation review sets are in scope. Depends on: A71, A79, D80, D110, D112, D125, D128, D134, D138. Supersedes: turn ancestry as the only plausible semantic history spine, and the `docs/archive/design/REVISIT_MODULE.md` table shape as canonical persistence design. -136. **Observer ontology should classify claims by modality, not answer shape** — observer capture should distinguish value / outcome claims (`goal`), descriptive claims (`context`), boundary claims (`constraint`), uncertainty claims (`assumption`), choice claims (`decision`), obligation claims (`requirement`), preservation claims (`invariant`), oracle claims (`criterion`), and concrete witness claims (`example`). `Decision` should narrow to chosen directions among plausible alternatives with durable consequences; `constraint` should remain top-level but gain subtypes such as `non_goal`, `scope`, `technical`, `policy`, `resource`, `compatibility`, and `environmental`. Generic `context` should be promoted when the content carries stronger semantics: success condition -> requirement or invariant, solution boundary -> constraint, uncertain material belief -> assumption, chosen alternative -> decision, mere interpretation aid -> context. Depends on: D134, Requirement 38. Supersedes: treating all user commitments or selected options as decisions by default. +136. **Observer ontology should classify intent items by modality, not answer shape** — observer capture should distinguish value / outcome items (`goal`), descriptive items (`context`), boundary items (`constraint`), uncertainty items (`assumption`), choice items (`decision`), obligation items (`requirement`), preservation items (`invariant`), oracle items (`criterion`), and concrete witness items (`example`). `Decision` should narrow to chosen directions among plausible alternatives with durable consequences; `constraint` should remain top-level but gain subtypes such as `non_goal`, `scope`, `technical`, `policy`, `resource`, `compatibility`, and `environmental`. Generic `context` should be promoted when the content carries stronger semantics: success condition -> requirement or invariant, solution boundary -> constraint, uncertain material belief -> assumption, chosen alternative -> decision, mere interpretation aid -> context. Depends on: D134, Requirement 38. Supersedes: treating all user commitments or selected options as decisions by default. -137. **Knowledge edges are intent semantics, while reconciliation needs are process debt** — item kinds say what claims exist; edge kinds say how claims justify, constrain, depend on, refine, illustrate, and verify one another. A negative example is intent content; a boundary relation such as `rules_out`, `excludes`, or `counterexample_for` is intent semantics; a `reconciliation_need` is directed process obligation saying existing semantic truth may require renewed judgment because a change, contradiction, verifier result, or historical premise may affect it. The observer and future graph tools should provide edge-local neighborhoods around active claims, but not every inferred edge should drive cascade, staleness, export explanation, criteria generation, or reconciliation. Relation policy should classify edge support (`explicit`, strong inference, weak candidate) and operational participation before relation-first capture broadens beyond today's limited edge set. Observer-created interpretive structure may land immediately when it adds supported edges, examples, or reconciliation needs; rewriting accepted intent remains reconciliation-review work. Depends on: A66, A81, D50, D125, D128, D134, D135, D138. Supersedes: treating graph edges as only display infrastructure, and also supersedes treating every visible edge as equally authoritative process truth or work queue state. +137. **Intent edges are semantic relations, while reconciliation needs are process debt** — intent-item kinds say what semantic units exist; intent-edge kinds say how items justify, constrain, depend on, refine, illustrate, and verify one another. A negative example is intent content; a boundary relation such as `rules_out`, `excludes`, or `counterexample_for` is intent semantics; a `reconciliation_need` is directed process obligation saying existing semantic truth may require renewed judgment because a change, contradiction, verifier result, or historical premise may affect it. The observer and future graph tools should provide edge-local neighborhoods around active intent items, but not every inferred edge should drive cascade, staleness, export explanation, criteria generation, or reconciliation. Relation policy should classify edge support (`explicit`, strong inference, weak candidate) and operational participation before relation-first capture broadens beyond today's limited edge set. Observer-created interpretive structure may land immediately when it adds supported edges, examples, or reconciliation needs; rewriting accepted intent remains reconciliation-review work. Depends on: A66, A81, D50, D125, D128, D134, D135, D138. Supersedes: treating graph edges as only display infrastructure, and also supersedes treating every visible edge as equally authoritative process truth or work queue state. -138. **Multi-chat substrate is the first concrete persistence slice before the full patch ledger** — add `chat`, nullable `turn.chat_id`, `specification.primary_chat_id`, mirrored `chat.active_turn_id`, and a minimal `reconciliation_need` table while keeping legacy `turn.specification_id` and `specification.active_turn_id` during transition. Do not add `active_chat_id` in phase one; `primary_chat_id -> chat.active_turn_id` covers the interview head until multiple active chat surfaces need their own pointer. New writes populate both legacy and chat pointers; application assertions preserve same-spec and same-chat ancestry; later cleanup can make chat ownership canonical and remove the legacy pointers. `reconciliation_need` uses directed item-to-item source / target fields, narrow `kind` and `status`, free-text reason, immediate `caused_by_turn_id`, and nullable `caused_by_patch_id` as a future patch-ledger placeholder. This supersedes older side-chat substrate assumptions and makes `docs/design/MULTI_CHAT.md` the concrete phase-one design while `docs/design/PATCH_LEDGER.md` remains the deeper semantic mutation history. Depends on: A71, A82, A83, D135, D137, Requirement 39. Supersedes: implementing multi-chat by preserving an in-memory-only side-chat patch list as the durable substrate, and supersedes naming the process-debt table `reconciliation_edge`. +138. **Multi-chat substrate is the first concrete persistence slice before the full changeset ledger** — add `chat`, nullable `turn.chat_id`, `specification.primary_chat_id`, mirrored `chat.active_turn_id`, and a minimal `reconciliation_need` table while keeping legacy `turn.specification_id` and `specification.active_turn_id` during transition. Do not add `active_chat_id` in phase one; `primary_chat_id -> chat.active_turn_id` covers the interview head until multiple active chat surfaces need their own pointer. New writes populate both legacy and chat pointers; application assertions preserve same-spec and same-chat ancestry; later cleanup can make chat ownership canonical and remove the legacy pointers. `reconciliation_need` uses directed item-to-item source / target fields, narrow `kind` and `status`, free-text reason, immediate `caused_by_turn_id`, and nullable `caused_by_changeset_id` as a future changeset-ledger placeholder. This supersedes older side-chat substrate assumptions and makes `docs/design/MULTI_CHAT.md` the concrete phase-one design while `docs/design/PATCH_LEDGER.md` remains historical deeper semantic mutation history. Depends on: A71, A82, A83, D135, D137, Requirement 39. Supersedes: implementing multi-chat by preserving an in-memory-only side-chat patch list as the durable substrate, and supersedes naming the process-debt table `reconciliation_edge`. 139. **Prompt/context scenario substrate is a first-class foundation** — Brunch should externalize server-side prompts and reusable agent doctrines into inspectable markdown assets, load and compose them through a typed server seam, and introduce context-pack builders that render the current intent graph for a specific generative scenario rather than letting each call site hand-roll prompt context. The same substrate should support lightweight prompt probes over seeded graphs and transcripts before UI surfaces are built. Depends on: A84, A85, D134, D136, D137, Requirement 40, Requirement 41. Supersedes: scattered TypeScript prompt strings and transcript-dump context as the default mechanism for new agent features. 140. **Intent graph context packs are scenario-specific semantic briefings** — a context pack is an explicit rendering of graph truth, workflow state, relevant provenance, unresolved ambiguity, relation neighborhoods, and authority labels for one agent task. Packs should exist for observer capture, next-question generation, candidate-spec synthesis, criteria/witness generation, web research query framing, reconciliation review, architect proposals, and downstream decomposition/oracle probes. They should be bounded, ranked, and typed rather than raw graph dumps. Depends on: A84, D125, D134, D137, D138, Requirement 40. Supersedes: assuming the active chat transcript is the canonical prompt context after multi-chat. 141. **Post-spec decomposition remains a probe frontier, not a committed Brunch UI** — the next-after-spec direction is to derive design alternatives, oracle strategy, execution slices, and verification-aware orchestration constraints from the intent graph and its checkability implications. This should first run through the prompt/context scenario substrate, borrowing cognitive patterns from `ln-design` and `ln-oracles`, before deciding whether it belongs inside Brunch or a successor product. Depends on: A87, D134, D139, D140, Requirement 41. Supersedes: treating export prose as the only meaningful handoff target. -142. **Pi is a candidate harness adapter, not current product runtime truth** — Pi may be evaluated via SDK or RPC as the first lower-level agent harness for prompt probes, web/tool experiments, and future decomposition scenarios because it already provides sessions, custom tools, provider support, event streams, and embedding modes. Brunch should not assume Pi owns product workflow, durable replay, graph mutation authority, reconciliation review, or credential UX unless a later spike proves and explicitly adopts those boundaries. Depends on: A86, D139, Requirement 41. Supersedes: deciding the web-research tool spike only at the individual tool API level. -143. **Brunch owns the agent mutation surface; harnesses adapt it as tools** — Any mutation of durable Brunch data initiated by an agent must route through Brunch-owned mutation handlers, not direct ORM access or harness-specific tool implementations. Those handlers define the product operation: stable id, input/output schemas, description, authority class, replay policy, and reconciliation/patch-ledger behavior. AI SDK, Pi, CLI/TUI, or future adapters may expose the handlers as tools, but adapters only translate transport and tool shape; they do not define mutation authority. Read-only capabilities can use the same contract registry for consistency, but the binding rule is that agent-originated writes enter through one server-owned surface. Depends on: Requirement 42, D138, D139, D142. Supersedes: defining separate mutating tool surfaces inside each agent harness or letting agent flows bypass application handlers to call the ORM. +142. **Pi is a candidate harness adapter, not current product runtime truth** — Pi may be evaluated via SDK or RPC as the first lower-level agent harness for prompt probes, web/tool experiments, and future decomposition scenarios because it already provides sessions, custom tools, provider support, event streams, and embedding modes. Brunch should not assume Pi owns product workflow, durable replay, intent-graph mutation authority, reconciliation review, or credential UX unless a later spike proves and explicitly adopts those boundaries. Depends on: A86, D139, Requirement 41. Supersedes: deciding the web-research tool spike only at the individual tool API level. +143. **Brunch owns the agent mutation surface; harnesses adapt it as tools** — Any mutation of durable Brunch data initiated by an agent must route through Brunch-owned mutation handlers, not direct ORM access or harness-specific tool implementations. Those handlers define the product operation: stable id, input/output schemas, description, authority class, replay policy, and reconciliation/changeset-ledger behavior. AI SDK, Pi, CLI/TUI, or future adapters may expose the handlers as tools, but adapters only translate transport and tool shape; they do not define mutation authority. Read-only capabilities can use the same contract registry for consistency, but the binding rule is that agent-originated writes enter through one server-owned surface. Depends on: Requirement 42, D138, D139, D142. Supersedes: defining separate mutating tool surfaces inside each agent harness or letting agent flows bypass application handlers to call the ORM. +144. **Intent graph vocabulary supersedes knowledge graph vocabulary** — Canonical product vocabulary is `intent graph`, made of `intent items` and `intent edges`. Current schema/code may still use `knowledge_item` and `knowledge_edge` as implementation names during transition, but new planning, agent capability contracts, context packs, operation ids, and user-facing design should prefer intent vocabulary unless referring to current persistence/API names. `Claim` may remain an explanatory generic for natural-language content, but it is not a product/schema noun. Depends on: D134, D136, D137. Supersedes: using `knowledge graph`, `knowledge item`, `knowledge edge`, or `claim` as future-facing product nouns. +145. **Changeset/change supersedes patch/patch_change** — Semantic mutation history uses `changeset` for one submitted semantic mutation bundle and `change` for one atomic mutation inside it. `Patch` and `patch_change` remain historical design-doc vocabulary and may appear in older file names, but new schema, capability contracts, operation ids, and planning language should use `changeset` / `change` unless this decision is explicitly reversed. Depends on: D135, D138, D143. Supersedes: treating naming as open between patch and changeset. ## Interaction Stream Model @@ -287,7 +289,7 @@ Scroll container: ChatScroll (ScrollArea + useStickToBottom). - "Knowledge Graph" title - Item count + connection count -**Body — Grouped knowledge items:** +**Body — Grouped intent items:** | Group label | Kinds | Visible | | ----------------------- | -------------------------------------------------------- | ------- | @@ -375,10 +377,10 @@ Each row in this table is a **formalization candidate** ascending the progressiv | **workspace** | The cwd-backed software context whose local `.brunch/` directory stores specifications and runtime state. | | **prompt/context scenario substrate** | The server-side and test-harness foundation for loading markdown prompts, composing reusable doctrines, deriving typed intent-graph context packs, and running prompt probes before UI commitment. | | **context pack** | A scenario-specific semantic briefing derived from intent graph truth, workflow state, provenance, unresolvedness, relation neighborhoods, and authority labels for one agent task. It is bounded and typed, not a raw graph or transcript dump. | -| **progressive checkability** | The discipline of representing claims at the weakest useful witness level today — prose, example, counterexample, criterion, executable test, runtime invariant, state/transition property, or formal model — while preserving paths toward stronger witnesses where valuable. | +| **progressive checkability** | The discipline of representing intent items at the weakest useful witness level today — prose, example, counterexample, criterion, executable test, runtime invariant, state/transition property, or formal model — while preserving paths toward stronger witnesses where valuable. | | **behavioral kernel** | Hidden interviewer / architect machinery that recognizes recurring correctness patterns such as lifecycle, containment, authority, concurrency, migration, and evidence, then elicits checkable artifacts without exposing formalism as product ceremony. | | **scenario runner** | A lightweight pre-UI harness that runs a selected prompt scenario against fixtures, context packs, tools, and model/provider settings, then records outputs for qualitative and structural review. | -| **agent mutation surface** | The Brunch-owned typed handler layer for any durable data mutation initiated by an agent, internal or external. It is the only write entry point agents may use; handlers own schemas, authority, replay behavior, and reconciliation/patch-ledger semantics rather than letting agents call the ORM directly. | +| **agent mutation surface** | The Brunch-owned typed handler layer for any durable data mutation initiated by an agent, internal or external. It is the only write entry point agents may use; handlers own schemas, authority, replay behavior, and reconciliation/changeset-ledger semantics rather than letting agents call the ORM directly. | | **agent capability contract** | A Brunch-owned typed contract addressable by agents or harnesses, with a stable id, description, input/output schemas, authority class, and replay policy. Read-only capabilities and mutating handlers can share this registry shape, but mutating contracts must route through the agent mutation surface. | | **tool adapter** | A provider- or harness-specific projection of an agent capability contract into a concrete tool format such as AI SDK tools, Pi tools, CLI/TUI commands, or a future external-agent API. Adapters translate shape and transport while preserving Brunch-owned authority semantics. | | **authority class** | The contract metadata that says whether an agent capability is read-only, proposal-only, or commits durable product truth, and therefore which replay, reconciliation, and mutation boundaries govern it. | @@ -443,34 +445,37 @@ Each row in this table is a **formalization candidate** ascending the progressiv | **non-goal** | A `constraint` subtype expressing an explicit exclusion from the current specification scope. | | **decision** | A chosen direction among plausible alternatives, with durable consequences for future design, implementation, or interpretation. Not every user answer or option selection is a decision. | | **assumption** | A durable material belief supporting a direction or decision that could later prove false. | -| **knowledge item** | Typed semantic record in the durable ontology. Before review acceptance this means exploration knowledge; durable `requirement` / `criterion` items arise only from accepted review outputs. | -| **knowledge graph** | Typed relationships among knowledge items, including `depends_on`, `derived_from`, `constrains`, `verifies`, and `refines`. | -| **intent graph** | The semantic product truth formed by knowledge items, review-authoritative requirements / criteria, graph edges, examples / counterexamples, validation status, and semantic mutation state. Chat and graph views are projections over this truth; reconciliation needs are process state attached to the graph, not intent content. | -| **progressive checkability** | The stance that each claim should receive the weakest sufficient witness: human-readable claim, concrete example, counterexample, regression test, runtime contract, state-machine rule, invariant, proof obligation, or explicit unresolved ambiguity. | -| **property** *(candidate ontology)* | A normalized claim primitive that requirements could commit to and criteria could observe. It is a design candidate, not a committed storage or UI surface. | +| **intent graph** | Canonical product term for Brunch's semantic substrate: typed intent items, intent edges, examples / counterexamples, validation status, and semantic mutation state. Chat and graph views are projections over this truth; reconciliation needs are process state attached to the graph, not intent content. Supersedes `knowledge graph` as future-facing product vocabulary. | +| **intent item** | Canonical product term for one durable typed semantic unit in the intent graph. Current schema/code may still persist these as `knowledge_item` rows during transition. Use `knowledge item` only when referring to current implementation names. | +| **intent edge** | Canonical product term for one durable typed semantic relation between intent items. Current schema/code may still persist these as `knowledge_edge` rows during transition. Use `knowledge edge` only when referring to current implementation names. | +| **knowledge item / knowledge edge** | Legacy implementation names for current persistence/API records backing intent items and intent edges. Avoid these in new product concepts, capability contracts, and operation ids unless referring to existing code or database schema. | +| **progressive checkability** | The stance that each intent item should receive the weakest sufficient witness: human review, concrete example, counterexample, regression test, runtime contract, state-machine rule, invariant, proof obligation, or explicit unresolved ambiguity. | +| **property** *(candidate ontology)* | A normalized intent primitive that requirements could commit to and criteria could observe. It is a design candidate, not a committed storage or UI surface. | | **invariant** *(planned ontology kind)* | A property that must remain true across relevant states, transitions, executions, versions, or semantic revisions. | | **example** *(planned ontology kind)* | A concrete scenario, trace, input/output, edge case, approved example, rejected example, not-relevant label, or counterexample that disambiguates or witnesses intent. Expected subtypes include positive, negative / counterexample, edge-case, and not-relevant. | -| **edge-local neighborhood** | The focused relation context around one claim: incoming and outgoing edges with nearby item summaries, support strength, and relation semantics. Used by interviewer / observer prompts and graph refinement instead of dumping all grouped knowledge. | +| **edge-local neighborhood** | The focused relation context around one intent item: incoming and outgoing intent edges with nearby item summaries, support strength, and relation semantics. Used by interviewer / observer prompts and graph refinement instead of dumping all grouped knowledge. | | **behavioral kernel** | Reusable interviewer machinery for one class of latent correctness question, such as state/lifecycle, containment, authority, concurrency, transactionality, migration, or evidence. Kernels are not user-facing formalism by default. | -| **intent spec** | The complementary framing to a planning spec: a specification optimized for preserving and validating meaning rather than sequencing downstream work. Carries typed claims, examples and counterexamples, witness strength, unresolved ambiguity, and validation status. The intent graph is the durable substrate; an intent spec is the human-facing projection of that graph. Contrast with `planning spec`. | +| **intent spec** | The complementary framing to a planning spec: a specification optimized for preserving and validating meaning rather than sequencing downstream work. Carries typed intent items, examples and counterexamples, witness strength, unresolved ambiguity, and validation status. The intent graph is the durable substrate; an intent spec is the human-facing projection of that graph. Contrast with `planning spec`. | | **planning spec** | A specification optimized for downstream work sequencing — what to build, what scope is in or out, which slices follow. Brunch's product direction is for planning to remain a useful projection from the intent graph rather than the source artifact. | -| **checkability** | A typed field on a claim describing the strongest oracle that currently witnesses it, drawn from the progressive-checkability ladder: `human_review` / `example` / `counterexample` / `regression_test` / `runtime_contract` / `state_machine_rule` / `invariant` / `proof_obligation` / `unresolved_ambiguity`. The discipline is `progressive checkability`; the field is `checkability`. | -| **witness strength** | The breadth of what a claim's oracle actually covers, distinct from which oracle exists. "Checked on three examples" and "proved for all reachable states" can both be `checkability: invariant`, but they have very different `strength`. The pairing forces honesty about what is actually verified. | -| **formalization candidate** | A Brunch-internal claim that is worth promoting along the progressive-checkability ladder. Critical invariants are formalization candidates: each one states a property currently witnessed by a regression test, with stronger oracles (state-machine model, runtime contract, proof obligation) as deliberate future moves rather than implicit expectations. | +| **checkability** | A typed field on an intent item describing the strongest oracle that currently witnesses it, drawn from the progressive-checkability ladder: `human_review` / `example` / `counterexample` / `regression_test` / `runtime_contract` / `state_machine_rule` / `invariant` / `proof_obligation` / `unresolved_ambiguity`. The discipline is `progressive checkability`; the field is `checkability`. | +| **witness strength** | The breadth of an intent item's oracle coverage, distinct from which oracle exists. "Checked on three examples" and "proved for all reachable states" can both be `checkability: invariant`, but they have very different `strength`. The pairing forces honesty about what is actually verified. | +| **formalization candidate** | A Brunch-internal intent item that is worth promoting along the progressive-checkability ladder. Critical invariants are formalization candidates: each one states a property currently witnessed by a regression test, with stronger oracles (state-machine model, runtime contract, proof obligation) as deliberate future moves rather than implicit expectations. | | **disambiguating example** | An `example` whose primary purpose is to settle ambiguity between plausible interpretations of a requirement, invariant, or decision. Linked through the `disambiguates` relation. Generalizes the TiCoder move beyond test cases: the interviewer generates cases where interpretations diverge, and the user's classification settles the meaning. | -| **spec drift** | A divergence between a claim's recorded intent and the artifact (criterion, generated requirement, candidate spec, export bundle, or downstream implementation behavior) meant to satisfy it. Surfaced in human terms — "original intent vs generated behavior vs potential mismatch" — so the user can validate meaning at the point where it could have changed, rather than after the divergence has been laundered into a final document. | +| **spec drift** | A divergence between an intent item's recorded meaning and the artifact (criterion, generated requirement, candidate spec, export bundle, or downstream implementation behavior) meant to satisfy it. Surfaced in human terms — "original intent vs generated behavior vs potential mismatch" — so the user can validate meaning at the point where it could have changed, rather than after the divergence has been laundered into a final document. | | **relation family** | One of five semantic groupings that organize the relation kinds in the intent graph: `justification`, `dependency`, `boundary`, `refinement`, and `verification`. Distinct from the relation `kind` itself; a single kind belongs to exactly one family. Drives prompt grouping, default policy, and observer classification heuristics. | | **relation policy** | The per-relation, per-axis registry that decides whether each edge participates in `visible`, `cascade`, `export_trace`, `staleness`, `reconciliation`, `criteria_help`, or `weak_suggestion` capabilities. Replaces the implicit assumption that every edge is equally authoritative. Gated by edge `support` (`explicit` / `strong_inference` / `weak_candidate`) and `status` (`proposed` / `accepted` / `rejected` / `stale`). | | **structured list** | The first-ship graph-view layout: kind-grouped item rows with a relations footer of Outgoing / Incoming relation chips. Item-first; relationships visible inline. It currently renders the whole-spec entity set because D129 ships the whole-spec fetch first; the intended default becomes active-path items over whole-spec data once the active-path membership seam and `Show all` toggle land. | -| **spatial canvas** | A deferred future graph-view layout where knowledge items render as nodes with visible edges in a 2D scene. Shares the projection seam and intent contract of D128 with the structured-list layout. | -| **relation chip** | A compact UI element representing one knowledge-graph edge endpoint inside a relations footer, carrying the target item's reference code and content snippet. Hover reveals a preview card; click navigates to the target item via hash anchor. | +| **spatial canvas** | A deferred future graph-view layout where intent items render as nodes with visible edges in a 2D scene. Shares the projection seam and intent contract of D128 with the structured-list layout. | +| **relation chip** | A compact UI element representing one intent-edge endpoint inside a relations footer, carrying the target item's reference code and content snippet. Hover reveals a preview card; click navigates to the target item via hash anchor. | | **relations footer** | The grouped Outgoing / Incoming subsections beneath an item row in the structured list, listing relation chips for that item's incoming and outgoing edges. Soft-truncates at 6 chips per direction with an inline `+N more` expander; collapses to nothing when an item has zero edges. | | **action rail** | The per-row right-aligned slot in graph view's structured list reserved for node-level action affordances. Actions emit intents into the existing workspace lifecycle rather than owning their own state. The first ship reserves the slot with one disabled `chat-with` placeholder. | | **secondary thread** | Modal revisit conversation anchored to a primary-path turn and used to resolve cascade implications. | | **needs-revisit** | Flag meaning an item is affected by upstream invalidation and must be explicitly resolved before the specification is whole again. | | **chat** *(planned persistence seam)* | A conversation container inside one specification. The primary interview, side-chats, reconciliation chats, verifier feedback, and review discussions may all own turns without owning semantic truth directly. Phase one adds the table and transitional pointers before making chat ownership canonical. | -| **semantic changeset / patch** *(future persistence seam)* | An atomic set of semantic mutations to the intent graph. It records what changed and why, separate from the conversational turn that may have initiated it. Naming remains open between `changeset` / `change` and `patch` / `patch_change`. | -| **reconciliation need** *(planned persistence seam)* | Durable semantic debt saying existing graph truth may require renewed judgment because an upstream item, relation, verifier, contradiction, or historical premise changed. Phase one stores directed item-to-item needs with narrow kind/status and provenance placeholders; later phases may add relation targets and patch-backed cause/resolution. It is process state, not a knowledge edge or intent content. | +| **changeset** *(future persistence seam)* | Canonical term for one submitted semantic mutation bundle against the intent graph. It records what changed and why, separate from the conversational turn that may have initiated it. Supersedes `patch` as the future-facing schema/contract noun. | +| **change** *(future persistence seam)* | Canonical term for one atomic semantic mutation inside a changeset, such as `intentItem.create`, `intentItem.updateContent`, `intentEdge.create`, or `intentEdge.delete`. Supersedes `patch_change`. | +| **patch / patch_change** | Historical design-doc vocabulary for changeset/change. Avoid in new schema, capability contracts, and operation ids unless referring to older docs or source-control-style analogy. | +| **reconciliation need** *(planned persistence seam)* | Durable semantic debt saying existing intent-graph truth may require renewed judgment because an upstream item, relation, verifier, contradiction, or historical premise changed. Phase one stores directed item-to-item needs with narrow kind/status and provenance placeholders; later phases may add relation targets and changeset-backed cause/resolution. It is process state, not an intent edge or intent content. | | **DrawerCard** | Shared card primitive with header/summary/children slots that supports static, summary-peeking, and toggleable (minimized ↔ maximized) render modes. A `locked` prop disables toggle for controlled-state cards. | | **ChatScroll** | Composite scroll container that wires Radix ScrollArea (custom scrollbar) with `useStickToBottom` (auto-scroll-to-bottom + scroll-down indicator). Used for the center pane transcript. | | **phase stepper** | The vertical timeline navigation in the left sidebar showing phases as sequential steps with connecting line, status, readiness, and turn count. | @@ -565,7 +570,7 @@ Every meaningful code change should pass `npm run fix` in the inner loop and `np | Real browser scroll behavior under JSDOM | `scrollIntoView` is shimmed in JSDOM — component tests cannot prove real scroll happens after chip click | Outer-loop manual walkthrough explicitly checks scroll-into-view + highlight on chip click | Reports of chip click "doing nothing" or scroll behaving inconsistently across browsers | | Hover-card timing and popover positioning feel | Animation delay and placement perception are not text-observable | Outer-loop manual review with shadcn defaults (~300ms open, ~150ms close) | Users report flicker, misplaced popovers, or unintended dismissal | | Mobile / touch / keyboard-only ergonomics for relation chips | HoverCard pattern is mouse-biased; long-press fallback is designed but has no automated test surface | Manual walkthrough on touch device once per slice family | Touch users report missing or undiscoverable preview | -| Performance under large knowledge graphs | No render or memory budget yet; relation-first observer expansion (A66) will increase edge density | Defer until specs with hundreds of items + dense edges become common | Render lag visible on representative manual walkthroughs | +| Performance under large intent graphs | No render or memory budget yet; relation-first observer expansion (A66) will increase edge density | Defer until specs with hundreds of items + dense edges become common | Render lag visible on representative manual walkthroughs | | Cross-session "Back to chat" target persistence | sessionStorage clears on tab close so the deep-linked entry to graph view has no remembered chat origin | Falls back to current reachable phase via workflow state | Users report "Back to chat" landing in the wrong phase after a fresh tab | | Visual regression infrastructure | Manual-heavy stance accepted across the project; no Chromatic/Playwright-screenshot seam yet | Outer-loop manual walkthrough on the named graph-view fixture scenarios | Three or more visual regressions caught only after merge | @@ -613,9 +618,9 @@ Every meaningful code change should pass `npm run fix` in the inner loop and `np 15. Grounding and elicitation persist only the durable exploration ontology, with `non-goal` represented as a `constraint` subtype rather than a separate top-level kind. 16. Observer prompt, shared kind registry, schema / API types, fixtures, and UI copy describe the same ontology and accepted-review semantics without per-layer language drift. 17. The interview can orient itself anywhere in the `greenfield <> brownfield` by `end-to-end build <> incremental feature` matrix without forcing whole-project assumptions. -18. Observer capture records graph relationships broadly enough that most durable knowledge items link to upstream or downstream context whenever that relation is reasonably traceable. +18. Observer capture records intent edges broadly enough that most durable intent items link to upstream or downstream context whenever that relation is reasonably traceable. 19. Users who cannot complete a long interview can request candidate directions with explained tradeoffs and refine by reacting to them. 20. The interview can stop at a broad pass and deepen selected areas incrementally through explicit next-detail actions. -21. Graph view renders the knowledge graph as a navigable workspace with visible edges and node-launched refinement flows, not just a grouped list. +21. Graph view renders the intent graph as a navigable workspace with visible edges and node-launched refinement flows, not just a grouped list. 22. First-run setup makes missing provider credentials visible and recoverable from the dashboard without requiring users to hand-edit project `.env` files. 23. Brunch can help users keep `.brunch/` out of version control through an explicit, idempotent `.gitignore` confirmation flow.