diff --git a/.config/wt.toml b/.config/wt.toml new file mode 100644 index 00000000..6ac8dde5 --- /dev/null +++ b/.config/wt.toml @@ -0,0 +1,12 @@ +# Koan project worktree hooks +# Docs: https://worktrunk.dev/hook/ + +[post-create] +deps = "uv sync --dev" + +[post-start] +copy = "wt step copy-ignored" + +[pre-merge] +check = "uv run ruff check ." +test = "uv run pytest" diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml new file mode 100644 index 00000000..1c572e32 --- /dev/null +++ b/.github/workflows/ci.yml @@ -0,0 +1,28 @@ +name: CI + +on: + push: + branches: ["main"] + pull_request: + workflow_dispatch: + +jobs: + test: + runs-on: ubuntu-latest + steps: + - name: Checkout repository + uses: actions/checkout@v4 + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: "3.12" + + - name: Install uv + uses: astral-sh/setup-uv@v4 + + - name: Install dependencies + run: uv sync --dev + + - name: Run tests + run: uv run pytest diff --git a/.gitignore b/.gitignore index 4909416f..804a7d93 100644 --- a/.gitignore +++ b/.gitignore @@ -1,4 +1,14 @@ -node_modules/ -dist/ .pi/ .DS_Store + +.claude/ +plans/ +.env +.env.* +*.log + +# Frontend build output (committed source lives in frontend/src/) +koan/web/static/app/ +frontend/node_modules/ +frontend/dist/ +__pycache__/ diff --git a/.koan/memory/.gitignore b/.koan/memory/.gitignore new file mode 100644 index 00000000..fc5578ba --- /dev/null +++ b/.koan/memory/.gitignore @@ -0,0 +1,2 @@ +.index/ +summary.md diff --git a/.koan/memory/0001-persistent-orchestrator-over-per-phase-cli.md b/.koan/memory/0001-persistent-orchestrator-over-per-phase-cli.md new file mode 100644 index 00000000..510c996f --- /dev/null +++ b/.koan/memory/0001-persistent-orchestrator-over-per-phase-cli.md @@ -0,0 +1,8 @@ +--- +title: Persistent orchestrator over per-phase CLI spawning +type: decision +created: '2026-04-16T07:13:41Z' +modified: '2026-04-16T07:13:41Z' +--- + +This entry documents the orchestrator spawn architecture decision in koan's workflow engine (`koan/driver.py`). On 2026-04-02, Leon redesigned the system to replace per-phase CLI process spawning with a single long-lived orchestrator process running the entire workflow in one continuous session. Previously, each planning phase spawned a fresh `claude`, `codex`, or `gemini` CLI process; a separate `workflow-orchestrator` subagent was then spawned to present the user with a phase-selection decision after each phase completed. Leon's rationale: per-phase spawning caused compounding context loss (each new process re-derived what the previous had explored), and the workflow-orchestrator role was architecturally wasteful -- "a process-boot just to ask a question." Two alternatives were explicitly rejected: (1) API-based conversation (driver calling the LLM API directly) -- would have bypassed the runner abstraction handling model selection, MCP config, output streaming, and thinking mode; (2) context injection into fresh processes per phase -- cheaper but fails to provide a persistent reasoning chain and does not eliminate the workflow-orchestrator overhead. The redesign landed in `koan/driver.py` as a single `spawn_subagent()` call awaiting the orchestrator's exit, and added `koan_set_phase` as the new phase-transition tool replacing the two-tool `koan_propose_workflow` / `koan_set_next_phase` dance. diff --git a/.koan/memory/0002-step-first-workflow-pattern-boot-prompt-is.md b/.koan/memory/0002-step-first-workflow-pattern-boot-prompt-is.md new file mode 100644 index 00000000..9e3a2ea0 --- /dev/null +++ b/.koan/memory/0002-step-first-workflow-pattern-boot-prompt-is.md @@ -0,0 +1,8 @@ +--- +title: Step-first workflow pattern -- boot prompt is exactly one sentence +type: decision +created: '2026-04-16T07:13:50Z' +modified: '2026-04-16T07:13:50Z' +--- + +The step-first workflow pattern governs how all LLM subagent CLI processes in koan receive task instructions. On 2026-02-10, Leon established this as a load-bearing architectural invariant in the koan initial design (documented in `docs/architecture.md` as Invariant 2 and enforced in `koan/web/mcp_endpoint.py`). The rule: every subagent's boot prompt is exactly one sentence -- role identity plus "Call koan_complete_step to receive your instructions." Task details, phase guidance, and tool lists arrive exclusively as the return value of the first `koan_complete_step` MCP call. The pattern was motivated by a failure mode observed with haiku-class (weaker) models: complex task instructions in the boot prompt caused these models to produce text output on the first turn and exit without ever entering the tool-calling loop. Three reinforcement mechanisms make the pattern robust across model capability levels: primacy (boot prompt is the LLM's very first message), recency (`format_step()` in `koan/phases/format_step.py` always appends "WHEN DONE: Call koan_complete_step..." last), and muscle memory (by step 2 the LLM has called the tool multiple times, locking in the pattern). diff --git a/.koan/memory/0003-server-authoritative-projection-via-json-patch.md b/.koan/memory/0003-server-authoritative-projection-via-json-patch.md new file mode 100644 index 00000000..4c2adec3 --- /dev/null +++ b/.koan/memory/0003-server-authoritative-projection-via-json-patch.md @@ -0,0 +1,8 @@ +--- +title: Server-authoritative projection via JSON Patch over symmetric dual fold +type: decision +created: '2026-04-16T07:13:57Z' +modified: '2026-04-16T07:13:57Z' +--- + +The koan projection system maintains frontend-visible workflow state for the browser dashboard, served via Server-Sent Events from `koan/projections.py`. On 2026-03-29, Leon decided to replace a dual fold architecture with a server-authoritative JSON Patch model. The prior design maintained two independent fold implementations -- one in Python (`koan/projections.py`) and one in TypeScript (`frontend/src/sse/connect.ts`) -- required to produce identical projections from the same event sequence. Two production bugs traced directly to these folds diverging: fragmented thinking cards in the activity feed, and scout events appearing incorrectly in the primary agent's conversation feed. Leon's decision: Python computes the fold and the RFC 6902 JSON Patch diff after each event; the browser applies patches mechanically via `fast-json-patch` with no fold logic, no event interpretation, and no business rules. Simultaneously, Leon adopted camelCase for all wire-format keys so patches apply directly to the Zustand store without a field-renaming layer. The correctness guarantee is now structural: one fold in one place. diff --git a/.koan/memory/0004-file-boundary-invariant-llms-write-markdown.md b/.koan/memory/0004-file-boundary-invariant-llms-write-markdown.md new file mode 100644 index 00000000..e464edca --- /dev/null +++ b/.koan/memory/0004-file-boundary-invariant-llms-write-markdown.md @@ -0,0 +1,8 @@ +--- +title: File boundary invariant -- LLMs write markdown, driver writes JSON +type: decision +created: '2026-04-16T07:14:03Z' +modified: '2026-04-16T07:14:03Z' +--- + +The file boundary invariant is a load-bearing architectural constraint in koan governing file ownership across the system's actors. On 2026-02-10, Leon established this rule in the koan initial design (documented in `docs/architecture.md` as Invariant 1). The rule: LLM subagents write markdown files only; the koan driver (`koan/driver.py`) reads and writes JSON state files exclusively; tool code in `koan/web/mcp_endpoint.py` bridges both worlds by writing JSON state (for the driver) and templated markdown status files (for LLMs) in the same operation. Leon's stated rationale: if an LLM writes a JSON file, schema drift and parse errors in the payload become runtime failures in the deterministic driver, while markdown is forgiving. The invariant is enforced structurally -- planning-role subagents have write access scoped to the run directory (`~/.koan/runs//`) but no mechanism to produce JSON state files, and the driver reads JSON state files and exit codes only, never parsing markdown. diff --git a/.koan/memory/0005-phase-trust-model-plan-review-as-designated.md b/.koan/memory/0005-phase-trust-model-plan-review-as-designated.md new file mode 100644 index 00000000..c9177085 --- /dev/null +++ b/.koan/memory/0005-phase-trust-model-plan-review-as-designated.md @@ -0,0 +1,10 @@ +--- +title: Phase trust model -- plan-review as designated adversarial verifier +type: decision +created: '2026-04-16T07:35:13Z' +modified: '2026-04-16T07:35:13Z' +related: +- 0001-persistent-orchestrator-over-per-phase-cli.md +--- + +The plan workflow's phase trust architecture in koan (`docs/phase-trust.md`, `koan/lib/workflows.py`) was designed around an asymmetric verification model. On 2026-02-10, Leon formalized this as part of the initial koan design: phases in the plan pipeline (intake, plan-spec, execute) were built to trust each other's outputs without re-verification; only plan-review was designated as the adversarial verifier. Leon documented the rationale in `docs/phase-trust.md`: cross-phase re-verification is the "intrinsic self-correction" anti-pattern -- research shows the same LLM re-checking its own prior work is more likely to change correct conclusions to incorrect ones than the reverse. Leon gave plan-review the CRITIC role: it uses the actual codebase as an external tool to check every file path, function name, signature, and type claim in `plan.md` against reality. Leon also decided that plan-review would be advisory only -- it reports findings with severity classification and may suggest looping back to plan-spec for critical or major issues, but it does not modify `plan.md` itself. diff --git a/.koan/memory/0006-directory-as-contract-taskjson-over-cli-flags-for.md b/.koan/memory/0006-directory-as-contract-taskjson-over-cli-flags-for.md new file mode 100644 index 00000000..f7211454 --- /dev/null +++ b/.koan/memory/0006-directory-as-contract-taskjson-over-cli-flags-for.md @@ -0,0 +1,10 @@ +--- +title: Directory-as-contract -- task.json over CLI flags for subagent configuration +type: decision +created: '2026-04-16T07:35:24Z' +modified: '2026-04-16T07:35:24Z' +related: +- 0004-file-boundary-invariant-llms-write-markdown.md +--- + +The subagent configuration mechanism in koan (`koan/subagent.py`, `docs/subagents.md`) was redesigned on 2026-02-10 when Leon replaced a 9-CLI-flag approach with a task.json file convention, later documented as Invariant 6 (Directory-as-contract) in `docs/architecture.md`. The previous design passed task configuration as 9 CLI arguments; Leon replaced it after identifying four problems: (1) the flat flag namespace caused naming collisions (`--koan-role` vs `--koan-scout-role`); (2) role-specific fields mixed with common fields without structure; (3) `--koan-retry-context` needed to carry multi-paragraph summaries exceeding practical CLI limits; (4) after a crash, reconstructing what a subagent had been asked required parsing process arguments from system logs. Leon adopted the convention that the driver would write `task.json` atomically (tmp + `os.rename()`) to the subagent directory before spawn. The subagent discovers its MCP endpoint by reading `mcp_url` from that file. No structured configuration flows through CLI flags, environment variables, or other process-level channels. Leon designated `task.json` as write-once by the parent before spawn and read-once by the parent at agent registration, never modified afterward. diff --git a/.koan/memory/0007-dual-fold-system-audit-fold-per-subagent-disk-vs.md b/.koan/memory/0007-dual-fold-system-audit-fold-per-subagent-disk-vs.md new file mode 100644 index 00000000..964122cc --- /dev/null +++ b/.koan/memory/0007-dual-fold-system-audit-fold-per-subagent-disk-vs.md @@ -0,0 +1,11 @@ +--- +title: Dual fold system -- audit fold (per-subagent disk) vs projection fold (workflow + SSE) +type: decision +created: '2026-04-16T07:35:36Z' +modified: '2026-04-16T07:35:36Z' +related: +- 0003-server-authoritative-projection-via-json-patch.md +--- + +The state-management layer of koan (`koan/audit/fold.py`, `koan/projections.py`) was designed around two independent fold systems. On 2026-03-29, Leon documented the distinction in `docs/architecture.md` (section "Two Fold Systems"). Leon designed the audit fold to process per-subagent audit events from each subagent's `events.jsonl`, materialize a per-subagent `Projection` object written to `state.json` on disk after every event, and serve debugging and post-mortem consumers. Leon designed the projection fold to process workflow-level projection events emitted by `ProjectionStore.push_event()`, maintain a single in-memory `Projection` covering all agents and run state for the entire workflow, and serve the browser frontend via SSE. Leon chose to keep the two systems independent rather than merging them: the audit fold needed per-event disk writes for durability, while the projection fold needed to stay in-memory for SSE streaming throughput. Leon established the rule that state visible only in logs belongs to the audit fold, while state visible in the browser UI belongs to the projection fold. diff --git a/.koan/memory/0008-three-tier-model-system-strongstandardcheap-over.md b/.koan/memory/0008-three-tier-model-system-strongstandardcheap-over.md new file mode 100644 index 00000000..f5ac0691 --- /dev/null +++ b/.koan/memory/0008-three-tier-model-system-strongstandardcheap-over.md @@ -0,0 +1,10 @@ +--- +title: Three-tier model system (strong/standard/cheap) over per-role model configuration +type: decision +created: '2026-04-16T07:35:45Z' +modified: '2026-04-16T07:35:45Z' +related: +- 0001-persistent-orchestrator-over-per-phase-cli.md +--- + +The model selection system in koan (`koan/config.py`, `docs/subagents.md` -- Model Tiers section) was designed on 2026-02-10 when Leon grouped the 6+ agent roles into three capability tiers rather than mapping each role to an individual model. Leon defined the tiers as: `strong` (orchestrator -- complex multi-step reasoning), `standard` (executor -- reliable tool use for code implementation), and `cheap` (scout -- narrow codebase investigation). Leon encoded the role-to-tier mapping in `koan/config.py`. Leon adopted a profile-based configuration system persisted to `~/.koan/config.json` that binds each tier to a specific runner type and model name; switching profiles changes all three tier bindings at once without touching role definitions. Leon rejected per-role model configuration because, with 6+ roles, each model change would require updating 6+ bindings; the tier system reduces that to 3 bindings per profile switch. diff --git a/.koan/memory/0009-permission-fence-impractical-across-llm-backends.md b/.koan/memory/0009-permission-fence-impractical-across-llm-backends.md new file mode 100644 index 00000000..1552cc94 --- /dev/null +++ b/.koan/memory/0009-permission-fence-impractical-across-llm-backends.md @@ -0,0 +1,10 @@ +--- +title: Permission fence impractical across LLM backends; planned for removal +type: lesson +created: '2026-04-16T08:34:06Z' +modified: '2026-04-16T08:34:06Z' +related: +- 0001-persistent-orchestrator-over-per-phase-cli.md +--- + +The permission fence in koan (`koan/lib/permissions.py`) was initially designed as a load-bearing default-deny gate enforced on every MCP tool call. On 2026-02-10, Leon established it as Invariant 4 in `docs/architecture.md`, describing it as a load-bearing rule that blocked unknown roles and tools. By approximately 2026-04-08, Leon reversed this assessment, stating in a Claude Code project memory note that the fence is "probably not worth maintaining" because many coding agents do not support accurately disabling tool features, making the gate impractical to enforce reliably across different LLM backends. Leon identified the root cause: enforcement does not work reliably across LLM backends, and the maintenance cost outweighs the benefit. Leon directed that no effort should be invested in extending or hardening the permission fence and that it may be completely removed in a future update. The fence still exists in the codebase as of 2026-04-16, but is deprioritized; the architecture documentation was not updated to reflect this direction change and still describes it as load-bearing. diff --git a/.koan/memory/0010-curation-phase-3-step-layout-collapsed-to-2-to.md b/.koan/memory/0010-curation-phase-3-step-layout-collapsed-to-2-to.md new file mode 100644 index 00000000..53f01d45 --- /dev/null +++ b/.koan/memory/0010-curation-phase-3-step-layout-collapsed-to-2-to.md @@ -0,0 +1,10 @@ +--- +title: 'Curation phase: 3-step layout collapsed to 2 to prevent meaty-step skip failure' +type: lesson +created: '2026-04-16T08:34:15Z' +modified: '2026-04-16T08:34:15Z' +related: +- 0002-step-first-workflow-pattern-boot-prompt-is.md +--- + +The curation phase module in koan (`koan/phases/curation.py`) was originally implemented as a 3-step workflow with step names "Survey", "Curate", and "Finalize/Reporting". During a curation run whose output Leon reviewed in screenshots, the orchestrator was observed to confuse "Survey" with intake-style exploration and then reach "phase complete" without ever calling `koan_memorize` -- a failure mode where the curation phase ended with zero memory writes. Leon identified two root causes: (1) the name "Survey" triggered intake-like behavior; (2) there was no per-step structural framing (no workflow_shape, goal, or tools list) visible at the moment the LLM decided whether to advance. On 2026-04-16, Leon approved a redesign that collapsed the 3 steps to 2 (Inventory and Memorize), named after their primary tool effects (`koan_memory_status` and `koan_memorize`) to make step-skipping visible, and added ``, ``, and `` XML blocks to every step, re-injected at each `koan_complete_step` call so the phase structure is visible at the moment of use rather than only at step 1. diff --git a/.koan/memory/0011-intake-confidence-loop-removed-unnecessary-scout.md b/.koan/memory/0011-intake-confidence-loop-removed-unnecessary-scout.md new file mode 100644 index 00000000..795e4f7b --- /dev/null +++ b/.koan/memory/0011-intake-confidence-loop-removed-unnecessary-scout.md @@ -0,0 +1,15 @@ +--- +title: 'Intake confidence loop removed: unnecessary scout batches and intrinsic self-correction + risk' +type: lesson +created: '2026-04-16T08:34:26Z' +modified: '2026-04-18T16:21:49Z' +related: +- 0002-step-first-workflow-pattern-boot-prompt-is.md +- 0005-phase-trust-model-plan-review-as-designated.md +- 0013-single-cognitive-goal-per-step-prevents-simulated.md +--- + +The intake phase in koan (koan/phases/intake.py) previously included a confidence-gated loop where steps 2-4 would repeat based on a structured confidence value. On 2026-04-12, Leon collapsed intake to a focused 2-step design (Gather + Deepen), removing the loop for three reasons: (a) it produced unnecessary second scout batches; (b) the Reflect step risked intrinsic self-correction -- the same LLM verifying its own prior reasoning rather than checking against actual codebase files; (c) a single thorough Deepen pass was sufficient when that step was well-scoped. Phase completion was redefined by depth of understanding, not iteration count. + +On 2026-04-17, Leon extracted a dedicated Summarize step from Deepen's conclusion, bringing intake to 3 steps total: Gather, Deepen, Summarize. The split applies the single-cognitive-goal-per-step principle (entry 0013): Deepen stays focused on dialogue and codebase verification; Summarize is a distinct step for synthesizing findings into a planning handoff. The confidence-loop removal rationale is unchanged -- the step count change only separates concerns that were already happening at the end of step 2. Note: docs/intake-loop.md still describes the older 2-step design as of 2026-04-18 and requires a separate update. diff --git a/.koan/memory/0012-koan-is-dog-fooded-on-its-own-development-meta.md b/.koan/memory/0012-koan-is-dog-fooded-on-its-own-development-meta.md new file mode 100644 index 00000000..4cff905e --- /dev/null +++ b/.koan/memory/0012-koan-is-dog-fooded-on-its-own-development-meta.md @@ -0,0 +1,8 @@ +--- +title: Koan is dog-fooded on its own development -- meta-context for agents +type: context +created: '2026-04-16T08:34:35Z' +modified: '2026-04-16T08:34:35Z' +--- + +Koan is a solo project maintained by Leon Mergen, as confirmed by Leon in a curation run on 2026-04-16. Since the initial koan design on 2026-02-10, Leon adopted a practice of using koan's own plan workflow to develop koan itself -- dog-fooding the system as its own first user. This creates a meta-context constraint for any agent working on the koan codebase: workflow instructions and phase prompts in `koan/phases/*.py` and `koan/lib/workflows.py` are runtime instructions for koan's orchestrator subagents to execute, not instructions for the agent currently editing the source files. For example, the `SYSTEM_PROMPT` strings in `koan/phases/intake.py` are the intake orchestrator's role instructions; `koan/phases/curation.py` contains the step guidance that koan's curation orchestrator follows. An agent must not conflate "a prompt being analyzed as source material" with "a prompt being given as a direct instruction." Leon named this the "meta use of koan" and stated it explicitly in the task prompt for the 2026-04-16 curation run. diff --git a/.koan/memory/0013-single-cognitive-goal-per-step-prevents-simulated.md b/.koan/memory/0013-single-cognitive-goal-per-step-prevents-simulated.md new file mode 100644 index 00000000..3987abf3 --- /dev/null +++ b/.koan/memory/0013-single-cognitive-goal-per-step-prevents-simulated.md @@ -0,0 +1,11 @@ +--- +title: Single cognitive goal per step -- prevents simulated refinement +type: decision +created: '2026-04-16T08:37:25Z' +modified: '2026-04-16T08:37:25Z' +related: +- 0002-step-first-workflow-pattern-boot-prompt-is.md +- 0010-curation-phase-3-step-layout-collapsed-to-2-to.md +--- + +The step design constraint for koan phases (`docs/architecture.md` -- Pitfalls section, "Don't give a step multiple cognitive goals") was established on 2026-02-10 when Leon set a rule: each `koan_complete_step` call must correspond to exactly one cognitive goal. Leon identified the failure mode that motivated this rule: when a single step combines multiple goals ("do A, then B, then C"), the LLM can engage in "simulated refinement" -- artificially downgrading its output for A in order to manufacture visible improvement in C, without genuinely improving anything. Leon documented this as a design constraint: when adding a new phase, each step must answer "what is the single thing this step accomplishes?" and if the answer requires "and then," the step must be split. Leon's reference designs in `koan/phases/plan_spec.py` (Analyze + Write), `koan/phases/intake.py` (Gather + Deepen), and `koan/phases/curation.py` (Inventory + Memorize) each place cognitively distinct operations into separate `koan_complete_step` calls. diff --git a/.koan/memory/0014-camelcase-wire-format-eliminates-renaming-layer.md b/.koan/memory/0014-camelcase-wire-format-eliminates-renaming-layer.md new file mode 100644 index 00000000..ff095134 --- /dev/null +++ b/.koan/memory/0014-camelcase-wire-format-eliminates-renaming-layer.md @@ -0,0 +1,12 @@ +--- +title: 'CamelCase wire format: eliminates renaming layer between projection and Zustand + store' +type: decision +created: '2026-04-16T08:37:35Z' +modified: '2026-04-16T08:37:35Z' +related: +- 0003-server-authoritative-projection-via-json-patch.md +- 0007-dual-fold-system-audit-fold-per-subagent-disk-vs.md +--- + +The SSE wire format for koan's projection system (`koan/projections.py`, `frontend/src/sse/connect.ts`) was designed to use camelCase keys for all serialized projection fields. On 2026-03-29, Leon documented this decision in `docs/projections.md` (Design Decisions -- "Why camelCase on the wire"). Leon's rationale: emitting snake_case from the server would require a `mapProjectionToStore()` renaming function in the frontend TypeScript plus a `projectionState` shadow object for patch application (patches must apply to the pre-renamed dict, not the renamed Zustand store); every new projection field would require a rename entry in that mapping. Leon identified this mapping layer as frontend business logic, contradicting his "frontend has zero business logic" principle. By adopting camelCase -- via Pydantic's `alias_generator=to_camel` in `KoanBaseModel` (`koan/projections.py`) -- patches produced by `jsonpatch.make_patch()` apply directly to the Zustand store in `frontend/src/store/`, and snapshot state spreads directly into the store at reconnect with no field renaming. diff --git a/.koan/memory/0015-three-active-workflows-plan-milestones-stub.md b/.koan/memory/0015-three-active-workflows-plan-milestones-stub.md new file mode 100644 index 00000000..99e77204 --- /dev/null +++ b/.koan/memory/0015-three-active-workflows-plan-milestones-stub.md @@ -0,0 +1,10 @@ +--- +title: 'Three active workflows: plan, milestones (stub), curation' +type: context +created: '2026-04-16T08:37:42Z' +modified: '2026-04-16T08:37:42Z' +related: +- 0001-persistent-orchestrator-over-per-phase-cli.md +--- + +The koan workflow registry (`koan/lib/workflows.py`) defined three workflows as of 2026-04-16: `plan` (the primary active pipeline), `milestones` (a stub), and `curation` (standalone memory maintenance). Leon added the `curation` workflow when implementing the koan memory system, giving it its own `Workflow` dataclass in the `WORKFLOWS` dict in `koan/lib/workflows.py`. The `plan` workflow runs: intake -> plan-spec -> plan-review -> execute -> curation (postmortem). The `milestones` workflow ran intake only and was a stub as of 2026-04-16, intended for broad multi-subsystem initiatives but not yet implemented beyond the intake phase. The `curation` workflow runs a single curation phase using the `_STANDALONE_DIRECTIVE` string defined in `koan/lib/workflows.py` and is invoked when the user wants to maintain project memory outside of a development workflow run. Note: an earlier Claude Code project memory entry (written approximately 2026-04-08) listed only two workflows (plan and milestones); the curation workflow was added after that entry was written. diff --git a/.koan/memory/0016-steering-vs-phase-boundary-message-routing-dual.md b/.koan/memory/0016-steering-vs-phase-boundary-message-routing-dual.md new file mode 100644 index 00000000..a12348e4 --- /dev/null +++ b/.koan/memory/0016-steering-vs-phase-boundary-message-routing-dual.md @@ -0,0 +1,11 @@ +--- +title: 'Steering vs phase-boundary message routing: dual-queue design' +type: decision +created: '2026-04-16T08:37:51Z' +modified: '2026-04-16T08:37:51Z' +related: +- 0001-persistent-orchestrator-over-per-phase-cli.md +- 0002-step-first-workflow-pattern-boot-prompt-is.md +--- + +The user message routing system in koan (`koan/web/mcp_endpoint.py`, `docs/ipc.md` -- Chat Message Delivery section) was designed around two independent message queues. On 2026-03-29, Leon documented the distinction in `docs/ipc.md`. Leon designed phase-boundary messages (sent while `koan_yield` is blocking and `app_state.yield_future` is set) to be routed to `user_message_buffer` and returned directly as the `koan_yield` MCP tool result when the future resolves. Leon designed steering messages (sent while the orchestrator is mid-step and `yield_future` is `None`) to be routed to `steering_queue` and appended to the next outgoing tool response via `_drain_and_append_steering()`, so the LLM integrates them without abandoning the current step. Leon designated both queues as atomically drained and independent to prevent double-delivery: `drain_user_messages()` clears `user_message_buffer` and `drain_steering_messages()` clears `steering_queue`. The `POST /api/chat` endpoint inspects `yield_future` at the moment of message receipt to determine which queue to route to. diff --git a/.koan/memory/0017-thoughts-parameter-escape-hatch-only-task-output.md b/.koan/memory/0017-thoughts-parameter-escape-hatch-only-task-output.md new file mode 100644 index 00000000..57119f85 --- /dev/null +++ b/.koan/memory/0017-thoughts-parameter-escape-hatch-only-task-output.md @@ -0,0 +1,11 @@ +--- +title: '`thoughts` parameter -- escape hatch only; task output goes to files' +type: procedure +created: '2026-04-16T09:00:44Z' +modified: '2026-04-16T09:00:44Z' +related: +- 0004-file-boundary-invariant-llms-write-markdown.md +- 0002-step-first-workflow-pattern-boot-prompt-is.md +--- + +The `koan_complete_step` tool in the koan orchestration system accepts a `thoughts` parameter. On 2026-04-16, the architecture documentation in `docs/subagents.md` established that `thoughts` must never be used to capture task output: the `thoughts` parameter is an escape hatch only. The rationale recorded in that document: some models (particularly weaker ones) cannot produce text output and a tool call in the same response turn; `thoughts` gives those models a way to call the tool without exiting the workflow. Task output -- summaries, reports, structured data, findings -- was established to go exclusively to files such as `findings.md`, `landscape.md`, and `plan.md` in the run directory at `~/.koan/runs//`. The driver, which runs in `koan/driver.py`, reads those files after the subagent exits; it does not read `thoughts` content, and `thoughts` values are not preserved in the audit log (`events.jsonl`). Any subagent that extracts output through `thoughts` rather than file writes creates a silent data loss path. diff --git a/.koan/memory/0018-behavioral-constraints-require-both-a-prompt.md b/.koan/memory/0018-behavioral-constraints-require-both-a-prompt.md new file mode 100644 index 00000000..8a08505e --- /dev/null +++ b/.koan/memory/0018-behavioral-constraints-require-both-a-prompt.md @@ -0,0 +1,10 @@ +--- +title: Behavioral constraints require both a prompt instruction and a mechanical gate +type: decision +created: '2026-04-16T09:00:52Z' +modified: '2026-04-18T05:06:44Z' +related: +- 0009-permission-fence-impractical-across-llm-backends.md +--- + +The koan orchestration system uses `koan/web/mcp_endpoint.py` and `koan/lib/permissions.py` to enforce behavioral constraints for subagent roles. On 2026-04-16, the architecture documentation in `docs/architecture.md` established that behavioral constraints require both a prompt instruction and a mechanical gate. The maintainer recorded the rationale: prompt instructions alone were found insufficient because LLMs can ignore them without error; mechanical gates alone were found insufficient because they produce cryptic "blocked" tool errors with no context for the model to self-correct and retry. The document identified three enforcement mechanisms: (1) the permission fence (`check_permission` in `koan/lib/permissions.py`), which blocks disallowed tool calls and returns a rejection message; (2) `validate_step_completion()`, which blocks `koan_complete_step` advancement until required pre-calls have been made; and (3) tool descriptions, which provide soft guidance only and cannot be enforced. The maintainer established the rule that any constraint mattering for correctness requires both a prompt instruction (so the LLM understands the requirement) and a mechanical gate (so non-compliance is caught and corrected rather than silently propagated). Caveat recorded on 2026-04-17: the permission-fence exemplar's future is under active consideration. As documented in entry 0009, Leon found the call-level fence impractical to enforce reliably across LLM backends; Leon stated on 2026-04-17 that a final decision has not been made and that an alternative worth considering is simply not exposing the tool to the model at all (tool-vocabulary control at the MCP layer) rather than blocking calls at the handler. The underlying principle (prompt + gate) remains in force via `validate_step_completion` and similar handshake-level gates; only the permission-fence example is unsettled. diff --git a/.koan/memory/0019-projection-events-record-facts-derived-state.md b/.koan/memory/0019-projection-events-record-facts-derived-state.md new file mode 100644 index 00000000..ed1dc04f --- /dev/null +++ b/.koan/memory/0019-projection-events-record-facts-derived-state.md @@ -0,0 +1,11 @@ +--- +title: Projection events record facts; derived state belongs in the fold function +type: decision +created: '2026-04-16T09:01:03Z' +modified: '2026-04-16T09:01:03Z' +related: +- 0007-dual-fold-system-audit-fold-per-subagent-disk-vs.md +- 0003-server-authoritative-projection-via-json-patch.md +--- + +The koan projection system in `koan/projections.py` uses an event-sourced fold architecture shared with the audit system in `koan/audit/fold.py`. On 2026-04-16, the architecture documentation in `docs/architecture.md` established the invariant that events record facts -- things that happened -- while derived state belongs in the fold function, not in the event log. The maintainer documented a specific anti-pattern to avoid: emitting a `subagent_idle` event to signal "no agent is currently running." The maintainer recorded that "no agent" is derived from the `agent_exited` event, not a fact in itself, and that emitting it as a separate event conflates the audit log with the projection. The documented correct pattern was: emit `agent_exited`, and let the fold function derive `primary_agent = None` from that event. The architecture documentation also established that `fold()` is required to be a pure function -- the maintainer specified that given the same event sequence it must produce the same projection with no I/O, randomness, or side effects, and that this purity guarantee is broken when derived state appears as events. diff --git a/.koan/memory/0020-memory-retrieval-static-directive-mechanical.md b/.koan/memory/0020-memory-retrieval-static-directive-mechanical.md new file mode 100644 index 00000000..58ce1a09 --- /dev/null +++ b/.koan/memory/0020-memory-retrieval-static-directive-mechanical.md @@ -0,0 +1,11 @@ +--- +title: 'Memory retrieval: static-directive mechanical injection handles unknown unknowns; + agent tools handle known unknowns' +type: decision +created: '2026-04-16T09:01:12Z' +modified: '2026-04-18T05:06:31Z' +related: +- 0012-koan-is-dog-fooded-on-its-own-development-meta.md +--- + +The koan memory system, documented in `docs/memory-system.md`, implements two retrieval mechanisms. On 2026-04-16, the memory system specification established an asymmetric design: mechanical context injection (automatic, at phase boundaries) using static retrieval directives authored by the workflow designer, and agent-invoked tools called on-demand during reasoning. The maintainer recorded the rationale: the two mechanisms were designed to solve different problems. Mechanical injection was designed to handle unknown unknowns -- knowledge the agent does not know to search for (a procedure about credential handling, a lesson about a past failure); since the agent cannot formulate a query for what it does not know exists, the injection must run without relying on agent reasoning. Agent-invoked tools were designed to handle known unknowns -- gaps the agent recognizes during reasoning and can formulate targeted queries for. The specification explicitly rejected LLM-generated retrieval directives (having the orchestrator generate directives at runtime) because the maintainer established that such directives would produce queries biased toward what the orchestrator already knows, collapsing both mechanisms into one and leaving unknown unknowns uncovered. The static directive was documented as encoding structural knowledge about each phase type's typical needs, independent of any particular agent's reasoning state. Implementation status as of 2026-04-17: the mechanical injection path is wired in `koan/web/mcp_endpoint.py`, and the agent-invoked side exposes one MCP tool, `koan_search` (registered at line 1215 of that file). A second agent-invoked tool `koan_reflect` was specified in the original design -- a CLI stub exists at `koan/cli/memory.py:201` printing "not yet implemented" -- but has not been wired as an MCP tool; it remains a planned surface for reflection-style queries over the memory store. diff --git a/.koan/memory/0021-memory-entry-writing-discipline-temporally.md b/.koan/memory/0021-memory-entry-writing-discipline-temporally.md new file mode 100644 index 00000000..79d40f6a --- /dev/null +++ b/.koan/memory/0021-memory-entry-writing-discipline-temporally.md @@ -0,0 +1,8 @@ +--- +title: Memory entry writing discipline -- temporally grounded, event-style prose +type: procedure +created: '2026-04-16T09:02:41Z' +modified: '2026-04-16T09:02:41Z' +--- + +The koan memory system stores entries in `.koan/memory/NNNN-*.md` files within the project repository. On 2026-04-16, the memory system specification in `docs/memory-system.md` established five writing discipline rules for all memory entry bodies. The maintainer recorded the rationale as grounded in SimpleMem's finding (Liu et al., 2026) that removing temporal normalization reduced Temporal F1 by 56.7%. Rule 1: every statement includes a date in YYYY-MM-DD form -- the date the fact became true or was observed. Rule 2: claims are attributed to their source ("user stated", "LLM inferred", "post-mortem identified"); user-stated facts carry higher trust than LLM-inferred facts. Rule 3: no forward-looking language ("we will", "should") -- instead write "On [date], user stated the plan was to..." Rule 4: name things concretely -- not "the database" but "PostgreSQL 16.2" or "the auth service's primary data store." Rule 5: each entry must stand alone, interpretable without any other file, true regardless of when it is read. The specification further established that the first 1-3 sentences situate the entry in the project by naming a specific subsystem, following Anthropic's contextual retrieval technique to reduce retrieval failures by 35%. diff --git a/.koan/memory/0022-blocking-artifact-review-gate-removed-from-plan.md b/.koan/memory/0022-blocking-artifact-review-gate-removed-from-plan.md new file mode 100644 index 00000000..8653d9f1 --- /dev/null +++ b/.koan/memory/0022-blocking-artifact-review-gate-removed-from-plan.md @@ -0,0 +1,12 @@ +--- +title: Blocking artifact review gate removed from plan workflow; chat-based phase + transitions replace it +type: decision +created: '2026-04-16T09:02:51Z' +modified: '2026-04-16T09:02:51Z' +related: +- 0015-three-active-workflows-plan-milestones-stub.md +- 0016-steering-vs-phase-boundary-message-routing-dual.md +--- + +The koan plan workflow in `koan/lib/workflows.py` includes four phases: intake, plan-spec, plan-review, and execute. On 2026-04-03, the workflow redesign plan (`plans/2026-04-03-workflow-types-and-plan-mode.md`) documented the removal of the blocking `koan_review_artifact` tool and the `POST /api/artifact-review` backend route. The maintainer recorded the rationale (Decision D1): artifact review and "what do I do next?" should be one conversation, not two sequential blocking gates. Under the removed design, the orchestrator wrote an artifact, then a blocking modal required Accept/Reject before phase transition suggestions appeared -- two sequential pauses for what is conceptually one moment: "here's what I did -- what should we do next?" The replacement pattern was established as: the orchestrator writes an artifact, gives a progress update in chat via `koan_yield`, and presents suggested next phases. The user reviews the artifact in the artifacts panel and responds conversationally. The maintainer noted this aligned with Traycer's design (the reverse-engineered origin system), which had no blocking modal -- artifacts appeared in a sidebar and the "what's next?" conversation implicitly covered feedback. diff --git a/.koan/memory/0023-phase-guidance-workflow-scope-framing-is-injected.md b/.koan/memory/0023-phase-guidance-workflow-scope-framing-is-injected.md new file mode 100644 index 00000000..ec42b72a --- /dev/null +++ b/.koan/memory/0023-phase-guidance-workflow-scope-framing-is-injected.md @@ -0,0 +1,12 @@ +--- +title: Phase guidance (workflow scope framing) is injected at the top of step 1, before + procedural instructions +type: decision +created: '2026-04-16T09:03:03Z' +modified: '2026-04-16T09:03:03Z' +related: +- 0002-step-first-workflow-pattern-boot-prompt-is.md +- 0015-three-active-workflows-plan-milestones-stub.md +--- + +The koan orchestration system injects per-workflow scope framing into each phase transition via the `phase_guidance` dict in `koan/lib/workflows.py`. On 2026-04-03, the workflow redesign plan (`plans/2026-04-03-workflow-types-and-plan-mode.md`) established Decision D8: the `phase_guidance` injection must appear at the TOP of step 1 guidance, before procedural instructions, not appended at the bottom. The maintainer recorded the rationale: scope framing is the strongest lever for controlling LLM posture -- "this is a focused change" produces fundamentally different behavior than "this is a broad initiative." If the LLM reads procedural instructions before scope framing, it begins reasoning from the wrong posture and receives the correction too late. The injection contract established by the maintainer specified five required sections per `phase_guidance` entry: Scope, Downstream consumer, Investigation posture, Question posture, and User override (always present, always last). In `koan/web/mcp_endpoint.py`, the `koan_set_phase` handler was designed to store `workflow.phase_guidance.get(phase, "")` in `PhaseContext.phase_instructions`, which step 1 of each phase module renders at the top of the returned guidance string. diff --git a/.koan/memory/0024-getnextstep-must-be-a-pure-query-side-effects-of.md b/.koan/memory/0024-getnextstep-must-be-a-pure-query-side-effects-of.md new file mode 100644 index 00000000..6eafa284 --- /dev/null +++ b/.koan/memory/0024-getnextstep-must-be-a-pure-query-side-effects-of.md @@ -0,0 +1,11 @@ +--- +title: '`get_next_step()` must be a pure query; side effects of loop-backs belong + in `on_loop_back()`' +type: procedure +created: '2026-04-16T09:03:11Z' +modified: '2026-04-16T09:03:11Z' +related: +- 0013-single-cognitive-goal-per-step-prevents-simulated.md +--- + +The koan phase module protocol, defined in `koan/phases/__init__.py`, requires phase modules to implement `get_next_step(step, ctx)` and optionally `on_loop_back(from_step, to_step, ctx)`. On 2026-04-16, the architecture documentation in `docs/architecture.md` established the invariant that `get_next_step()` must be a pure query -- it returns the next step number and nothing else. The maintainer documented the anti-pattern: placing state mutations (counter increments, setting `ctx.confidence = None`), event emissions, or I/O inside `get_next_step()` violates the contract because the function may be called multiple times in a single step transition. The documented correct pattern was: `get_next_step()` returns a step number only; any state changes that must accompany a backward step transition belong in `on_loop_back(from_step, to_step, ctx)`. The maintainer provided a concrete example: `get_next_step(4)` returning `2` for a loop-back is correct; incrementing `self.iteration` inside that call is wrong -- `self.iteration += 1` belongs in `on_loop_back(4, 2, ctx)`. diff --git a/.koan/memory/0025-scout-success-is-determined-by-exit-code-and.md b/.koan/memory/0025-scout-success-is-determined-by-exit-code-and.md new file mode 100644 index 00000000..55571e51 --- /dev/null +++ b/.koan/memory/0025-scout-success-is-determined-by-exit-code-and.md @@ -0,0 +1,10 @@ +--- +title: Scout success is determined by exit code and final_response, not by file existence +type: procedure +created: '2026-04-16T09:25:55Z' +modified: '2026-04-16T09:25:55Z' +related: +- 0006-directory-as-contract-taskjson-over-cli-flags-for.md +--- + +Koan scouts are spawned via `koan_request_scouts` in `koan/web/mcp_endpoint.py` and each produces a `findings.md` output file in their subagent directory under `~/.koan/runs//subagents/`. On 2026-04-16, the architecture documentation in `docs/architecture.md` established that scout success must be derived from the subagent's exit code and final response, not from checking whether `findings.md` exists. The maintainer recorded the rationale: a scout can write a partial `findings.md` and then crash -- file existence is not proof of completion. The documented success check in `koan/web/mcp_endpoint.py` was: `succeeded = result.exit_code == 0; findings = result.final_response or None`. Failed scouts (non-zero exit code) return `None` from the scout runner and are omitted from the concatenated findings returned to the parent orchestrator. The maintainer established that scout failures must be non-fatal -- a failed scout does not abort the parent's workflow; its task ID is reported in the `failures` array and its findings are simply omitted. diff --git a/.koan/memory/0026-recoverable-vs-unrecoverable-error-classification.md b/.koan/memory/0026-recoverable-vs-unrecoverable-error-classification.md new file mode 100644 index 00000000..453c704f --- /dev/null +++ b/.koan/memory/0026-recoverable-vs-unrecoverable-error-classification.md @@ -0,0 +1,11 @@ +--- +title: Recoverable vs unrecoverable error classification for model-output failures + in the MCP endpoint +type: decision +created: '2026-04-16T09:25:58Z' +modified: '2026-04-16T09:25:58Z' +related: +- 0002-step-first-workflow-pattern-boot-prompt-is.md +--- + +The koan MCP endpoint in `koan/web/mcp_endpoint.py` handles tool calls from LLM subagents. On 2026-04-16, the architecture documentation in `docs/architecture.md` established a two-category error classification. The maintainer recorded the rule: fail-fast is scoped to unrecoverable conditions only. Unrecoverable conditions were defined as: invariant/contract violations (e.g., missing or malformed `task.json` at subagent startup), unexpected states where there is no safe deterministic next action, and failures with no simple local recovery path. Recoverable conditions were defined as: malformed tool-call JSON or arguments from the LLM, tool argument schema validation failures, and disallowed or unknown tool calls. The documented handling for recoverable errors was: return a structured tool error so the model can self-correct and retry in the same subagent process. The maintainer noted the rationale: once an LLM subagent process exits due to a parse error, the workflow cannot resume from mid-step -- keeping the process alive for recoverable errors is the only way to maintain continuity. diff --git a/.koan/memory/0027-all-state-file-writes-use-atomic-tmp-file.md b/.koan/memory/0027-all-state-file-writes-use-atomic-tmp-file.md new file mode 100644 index 00000000..6d838f94 --- /dev/null +++ b/.koan/memory/0027-all-state-file-writes-use-atomic-tmp-file.md @@ -0,0 +1,11 @@ +--- +title: All state file writes use atomic tmp-file + os.rename() to prevent partial + reads under concurrent access +type: procedure +created: '2026-04-16T09:26:07Z' +modified: '2026-04-16T09:26:07Z' +related: +- 0004-file-boundary-invariant-llms-write-markdown.md +--- + +The koan driver in `koan/driver.py` and orchestrator tools in `koan/web/mcp_endpoint.py` write state files concurrently with a running web server and SSE subscribers. On 2026-04-16, the architecture documentation in `docs/architecture.md` established the atomic write pattern for all persistent state writes: write to a `.tmp` file, then call `os.rename()` to atomically replace the target. The maintainer recorded the rationale: a partial read of `state.json` caused by a mid-write concurrent access causes silent data corruption or spurious errors. The documented pattern was: `tmp = f"{file_path}.tmp"; json.dump(data, open(tmp, "w")); os.rename(tmp, file_path)`. This pattern was established as mandatory for: `run-state.json` in `~/.koan/runs//`, per-story `state.json` and `status.md` in `stories/{story_id}/`, per-subagent `task.json` written before spawn, and per-subagent `state.json` in the audit projection. The `koan/audit/event_log.py` module was documented as the canonical implementation of this pattern. diff --git a/.koan/memory/0028-frontend-css-token-promotion-hardcode-single-use.md b/.koan/memory/0028-frontend-css-token-promotion-hardcode-single-use.md new file mode 100644 index 00000000..766979d5 --- /dev/null +++ b/.koan/memory/0028-frontend-css-token-promotion-hardcode-single-use.md @@ -0,0 +1,9 @@ +--- +title: Frontend CSS token promotion -- hardcode single-use values, flag multi-component + values, never modify variables.css unilaterally +type: procedure +created: '2026-04-16T09:26:15Z' +modified: '2026-04-16T09:26:15Z' +--- + +The koan frontend design system uses CSS custom properties defined in `frontend/src/styles/variables.css` as the sole source of design tokens. On 2026-04-16, the component development rules in `frontend/src/components/AGENTS.md` established the token promotion rule for agents implementing frontend components. The maintainer established three tiers of handling for CSS values: (1) values used by exactly one component -- hardcode in that component's colocated `.css` file with a descriptive comment explaining what the value represents; (2) values used by multiple components, or clearly about to be -- flag for token promotion in the response to the user, do not add the token yourself; (3) `variables.css` is a protected file requiring explicit user approval before any modification -- agents must never add, rename, or remove tokens unilaterally. The class naming convention was also established: prefix CSS class names with a short component abbreviation to avoid collisions (e.g., `.tcr-` for ToolCallRow, `.hb-` for HeaderBar, `.ep-` for ElicitationPanel). The maintainer further established that `npx tsc --noEmit` must be run after any TypeScript/TSX changes to verify zero compilation errors before considering frontend work done. diff --git a/.koan/memory/0029-headerbar-rendered-phantom-opus-model-label.md b/.koan/memory/0029-headerbar-rendered-phantom-opus-model-label.md new file mode 100644 index 00000000..991d01de --- /dev/null +++ b/.koan/memory/0029-headerbar-rendered-phantom-opus-model-label.md @@ -0,0 +1,9 @@ +--- +title: HeaderBar rendered phantom 'opus' model label -- destructuring default masked + absent orchestrator state +type: lesson +created: '2026-04-16T11:29:34Z' +modified: '2026-04-16T11:29:34Z' +--- + +The `HeaderBar` organism in `frontend/src/components/organisms/HeaderBar.tsx` was found on 2026-04-16 to display 'opus' in the titlebar whenever no orchestrator was running. The user reported that when no primary agent was active, the model section should be empty. Investigation identified the root cause: the `orchestratorModel` parameter used a destructuring default `= 'opus'` (line 49). `App.tsx` correctly computed `orchestratorModel: primary?.model ?? undefined` -- returning `undefined` when `agents` contained no entry with `isPrimary: true`. However, the destructuring default silently substituted `'opus'` for `undefined`, making the prop appear present in all cases. The prop was already typed `orchestratorModel?: string` in `HeaderBarProps`, making the optionality semantically correct but defeated at the call site. On 2026-04-16, the fix was applied: the `= 'opus'` default was removed from the destructuring parameter, and the `hb-orchestrator` div was wrapped in `{orchestratorModel && (...)}` to suppress the entire section (both the `StatusDot` atom and the `hb-model` span) when no model was known. diff --git a/.koan/memory/0030-do-not-use-destructuring-defaults-as-display.md b/.koan/memory/0030-do-not-use-destructuring-defaults-as-display.md new file mode 100644 index 00000000..54e21519 --- /dev/null +++ b/.koan/memory/0030-do-not-use-destructuring-defaults-as-display.md @@ -0,0 +1,11 @@ +--- +title: Do not use destructuring defaults as display-value fallbacks for potentially + absent React props +type: procedure +created: '2026-04-16T11:29:39Z' +modified: '2026-04-16T11:29:39Z' +related: +- 0029-headerbar-rendered-phantom-opus-model-label.md +--- + +Koan's frontend component conventions, established on 2026-04-16 during the fix of a phantom model label in `frontend/src/components/organisms/HeaderBar.tsx`, include the following rule for handling potentially absent props. When a React component prop is typed `T | undefined` and `undefined` means 'data is genuinely absent' (not just unspecified), using a destructuring default that provides a display-value string -- e.g., `orchestratorModel = 'opus'` -- masks the absence and causes the UI element to render when it should not. The correct pattern, confirmed by the user on 2026-04-16, is: (a) omit the destructuring default so the parameter retains `undefined`; (b) use conditional rendering (`{prop &&
}`) to suppress the UI element entirely. This rule applies to any component where an absent prop signals 'nothing to show' rather than 'use a sensible default'. diff --git a/.koan/memory/0031-voyage-ai-as-sole-retrieval-provider-voyage-4.md b/.koan/memory/0031-voyage-ai-as-sole-retrieval-provider-voyage-4.md new file mode 100644 index 00000000..3ec15233 --- /dev/null +++ b/.koan/memory/0031-voyage-ai-as-sole-retrieval-provider-voyage-4.md @@ -0,0 +1,11 @@ +--- +title: Voyage AI as sole retrieval provider -- voyage-4-large embedding + rerank-2.5 + reranking, single VOYAGE_API_KEY +type: decision +created: '2026-04-16T13:30:42Z' +modified: '2026-04-16T13:30:42Z' +--- + +The koan memory retrieval backend (`koan/memory/retrieval/`) chose Voyage AI as its sole external provider for both embedding and reranking. On 2026-04-16, when planning the retrieval backend implementation, the task description had specified Cohere `rerank-v3.5` for reranking (with `COHERE_API_KEY`) alongside Voyage for embedding. The user overrode this during plan-spec and directed consolidation onto Voyage only. Voyage AI's `voyage-4-large` model handles dense embeddings; `rerank-2.5` handles cross-encoder reranking after RRF fusion. Both are accessed via `voyageai.AsyncClient(api_key=VOYAGE_API_KEY)`, requiring only one environment variable. + +The user's rationale: a single provider simplifies credential management (one `VOYAGE_API_KEY` instead of two), reduces the Python dependency count (no `cohere` package required alongside `voyageai`), and keeps the full retrieval pipeline within one vendor relationship. The `voyageai` package provides both `AsyncClient.embed()` and `AsyncClient.rerank()` under the same API key. diff --git a/.koan/memory/0032-plan-review-produced-unverified-critical-finding.md b/.koan/memory/0032-plan-review-produced-unverified-critical-finding.md new file mode 100644 index 00000000..1f62fc60 --- /dev/null +++ b/.koan/memory/0032-plan-review-produced-unverified-critical-finding.md @@ -0,0 +1,15 @@ +--- +title: Plan-review produced unverified Critical finding about voyage-4-large; unverified + bold claims during review cause unnecessary work +type: lesson +created: '2026-04-16T13:30:54Z' +modified: '2026-04-16T13:30:54Z' +--- + +The plan-review phase for the koan retrieval backend (`koan/memory/retrieval/`) produced an incorrect critical finding on 2026-04-16. The review agent flagged `VOYAGE_DIM = 1024` in `koan/memory/retrieval/index.py` as "Critical," asserting that `voyage-4-large` outputs 2048 dimensions and would cause PyArrow schema mismatches on first index write. The assertion was based on inference from the model name ("large" suggesting larger output size), with no documentation check performed. + +The user verified against the Voyage AI documentation and confirmed the constant was correct: `voyage-4-large` supports 256, 512, 1024 (default), and 2048 output dimensions. The plan proceeded unchanged. + +Root cause: the reviewer treated an assumption as a verified fact and labeled it "Critical." Unverified bold claims during adversarial review are particularly harmful because high-severity labels override the planner's judgment, create unnecessary revision cycles, and erode trust in the review phase itself. The cascade effect: if the planner had accepted the finding without checking, the schema would have been changed to 2048 dims, breaking compatibility with the voyage-4-large default output. + +A review agent should cite the specific documentation, test result, or source code reference that grounds a critical claim. An unverified inference stated at high confidence is worse than a verified minor finding. diff --git a/.koan/memory/0033-new-mcp-tool-handlers-in-koanwebmcpendpointpy.md b/.koan/memory/0033-new-mcp-tool-handlers-in-koanwebmcpendpointpy.md new file mode 100644 index 00000000..fb517ee0 --- /dev/null +++ b/.koan/memory/0033-new-mcp-tool-handlers-in-koanwebmcpendpointpy.md @@ -0,0 +1,24 @@ +--- +title: 'New MCP tool handlers in koan/web/mcp_endpoint.py must use try/finally with + result_str: str | None = None' +type: procedure +created: '2026-04-16T13:31:07Z' +modified: '2026-04-16T13:31:07Z' +--- + +When adding any new `@mcp.tool(name="...")` handler to `koan/web/mcp_endpoint.py`, follow the established lifecycle pattern. On 2026-04-16, the plan-review phase caught a deviation in the initial `koan_search` draft: the draft called `end_tool_call` inside both the except block and after the try/except, and placed `_drain_and_append_steering` outside the try block. The user-approved correction, verified against `koan_memorize` at line 906, `koan_forget` at line 966, and `koan_memory_status` at line 1001, uses this structure: + +``` +result_str: str | None = None +try: + # ... do work ... + result_str = json.dumps(...) + result_str = _drain_and_append_steering(result_str, agent) + return result_str +except SpecificError as e: + raise ToolError(json.dumps({"error": "...", "message": str(e)})) +finally: + end_tool_call(agent, call_id, tool_name, result_str) +``` + +`result_str` initialized to `None` before the try block ensures `end_tool_call` receives `None` when an exception occurs before the result is assembled. `_drain_and_append_steering` executes inside the try block, not after it. The decorator uses `@mcp.tool(name="koan_...")` with an explicit name string, not the bare `@mcp.tool()` form. diff --git a/.koan/memory/0034-koan-memory-sync-uses-sha-256-content-hash-not.md b/.koan/memory/0034-koan-memory-sync-uses-sha-256-content-hash-not.md new file mode 100644 index 00000000..8bdcd5c9 --- /dev/null +++ b/.koan/memory/0034-koan-memory-sync-uses-sha-256-content-hash-not.md @@ -0,0 +1,15 @@ +--- +title: koan memory sync uses SHA-256 content hash, not mtime, as change-detection + invariant +type: decision +created: '2026-04-16T13:32:18Z' +modified: '2026-04-16T13:32:18Z' +--- + +The sync layer in `koan/memory/retrieval/index.py` was designed on 2026-04-16, with the user confirming the design in plan-review, to detect changes to `.koan/memory/NNNN-*.md` entry files using SHA-256 content hashes stored as a `content_hash` column in the LanceDB table, rather than file modification timestamps (mtime). + +Two alternatives were considered and rejected by the plan author: +- **mtime**: git operations (branch checkout, `git pull`, `git stash`) update file mtimes without changing content; `touch` changes mtime without changing content. An mtime-based sync would spuriously re-embed files after routine git operations, wasting Voyage AI embedding API calls. +- **A separate metadata sidecar file** (e.g., a JSON file tracking hashes alongside the index): rejected in favor of storing hashes as a LanceDB column, keeping the index fully self-contained with no external tracking file. + +The hash computation uses `hashlib.sha256(path.read_bytes()).hexdigest()` stored in the `content_hash` column of the LanceDB `entries` table. diff --git a/.koan/memory/0037-code-comment-vs-memory-entry-filter-comment.md b/.koan/memory/0037-code-comment-vs-memory-entry-filter-comment.md new file mode 100644 index 00000000..b46f6ce7 --- /dev/null +++ b/.koan/memory/0037-code-comment-vs-memory-entry-filter-comment.md @@ -0,0 +1,9 @@ +--- +title: Code-comment vs memory-entry filter -- COMMENT classification and executor + rationale comments +type: decision +created: '2026-04-17T04:22:11Z' +modified: '2026-04-17T04:22:11Z' +--- + +The curation phase's COMMENT classification in `koan/phases/curation.py` was added on 2026-04-17 to filter implementation-specific knowledge out of the koan memory store. The user identified that entries like "backend.py exposes search_candidates and rerank_results separately" recorded knowledge that would serve agents better as code comments next to the relevant functions. The design introduced a two-part strategy: (1) a COMMENT classification in the curation phase's `PHASE_ROLE_CONTEXT` that applies a test question -- "Would a code comment next to the relevant function give a future agent the same benefit?" -- to filter candidates that describe single-function rationale, parameter defaults, or single-module patterns; (2) a "Rationale comments" directive in `koan/phases/executor.py` step 3 instructing executors to write brief 1-3 line "why" comments at code locations when making implementation choices. Alternative considered: relying solely on the existing "What not to capture" guidance without a formal classification -- rejected because it lacked a mechanical discrimination test and did not redirect the knowledge to code comments. diff --git a/.koan/memory/0038-cross-reference-repetition-in-prompt-instructions.md b/.koan/memory/0038-cross-reference-repetition-in-prompt-instructions.md new file mode 100644 index 00000000..1dec09bc --- /dev/null +++ b/.koan/memory/0038-cross-reference-repetition-in-prompt-instructions.md @@ -0,0 +1,8 @@ +--- +title: Cross-reference repetition in prompt instructions aids LLM instruction following +type: procedure +created: '2026-04-17T04:22:19Z' +modified: '2026-04-17T04:22:19Z' +--- + +The koan phase prompt system (`koan/phases/*.py`) was confirmed on 2026-04-17 to follow a cross-reference repetition principle for LLM instruction following. When the plan proposed adding the COMMENT classification to step 2 substep E's "Apply" list (even though COMMENT was already defined in the classification schema earlier in the prompt), the user confirmed this was correct, stating "these type of cross-references and repetitions work well" for optimizing instruction following. The user described this as fitting koan's existing conventions. The rule: when writing phase prompt instructions, repeat classifications, rules, and categories at each point of use rather than referencing earlier definitions once. The model recognizes the repeated information from earlier context, and the repetition reinforces the expected behavior at the moment of action. diff --git a/.koan/memory/0039-memory-store-content-policy-rag-serves-unknown.md b/.koan/memory/0039-memory-store-content-policy-rag-serves-unknown.md new file mode 100644 index 00000000..ae7bffe2 --- /dev/null +++ b/.koan/memory/0039-memory-store-content-policy-rag-serves-unknown.md @@ -0,0 +1,11 @@ +--- +title: Memory store content policy -- RAG serves "unknown unknowns," implementation + details go near code +type: decision +created: '2026-04-17T04:33:46Z' +modified: '2026-04-17T04:33:46Z' +related: +- 0037-code-comment-vs-memory-entry-filter-comment.md +--- + +The koan memory store's content policy was clarified by the user on 2026-04-17. The RAG system is intended for "unknown unknown" knowledge -- cross-cutting architecture decisions and constraints that do not have a coherent single location in the codebase. When an LLM is extremely likely to open a file anyway, implementation details should be placed as comments in close proximity to the actual implementation; this approach works well for both humans and LLMs. The memory store should not contain knowledge that an agent would encounter through normal file reading. This principle motivated the COMMENT classification added to koan/phases/curation.py on the same date, which filters single-function and single-module rationale out of memory candidates and into code comments. diff --git a/.koan/memory/0040-memory-captures-persistent-always-true.md b/.koan/memory/0040-memory-captures-persistent-always-true.md new file mode 100644 index 00000000..a5e93a08 --- /dev/null +++ b/.koan/memory/0040-memory-captures-persistent-always-true.md @@ -0,0 +1,9 @@ +--- +title: Memory captures persistent "always true" information, not future plans or speculative + principles +type: lesson +created: '2026-04-17T04:33:48Z' +modified: '2026-04-17T04:33:48Z' +--- + +The koan memory system's scope boundary was corrected by the user on 2026-04-17. During the curation phase of a workflow run, the curation agent attempted to elicit prompt engineering principles by asking speculative questions about future design patterns. The user identified this as a non-goal: the memory system is intended to contain persistent, "always true" information -- facts that have already been established through project experience. Speculative knowledge, future plans, and principles not yet grounded in concrete decisions or incidents do not belong in memory. Root cause: the curation agent conflated "potentially useful knowledge" with "established project knowledge." The memory store's value comes from capturing what HAS happened, not what MIGHT be useful. diff --git a/.koan/memory/0041-per-phase-summary-capture-rides-on-orchestrators.md b/.koan/memory/0041-per-phase-summary-capture-rides-on-orchestrators.md new file mode 100644 index 00000000..67def665 --- /dev/null +++ b/.koan/memory/0041-per-phase-summary-capture-rides-on-orchestrators.md @@ -0,0 +1,9 @@ +--- +title: Per-phase summary capture rides on orchestrator's last prose turn before first + koan_yield +type: decision +created: '2026-04-17T09:37:08Z' +modified: '2026-04-17T09:37:08Z' +--- + +This entry documents the per-phase summary capture mechanism for koan's mechanical RAG injection pipeline. On 2026-04-17, user decided that the orchestrator's last assistant text immediately preceding the first `koan_yield` of each phase is captured as that phase's summary, written into `Run.phase_summaries[phase]` via the `phase_summary_captured` event. Subsequent yields within the same phase do not overwrite. Rationale: the orchestrator already writes prose summaries informally before yielding at phase boundaries, so the contract piggybacks on existing behavior with zero new tool calls. Alternative rejected: a dedicated `koan_phase_summary` MCP tool that would have produced cleaner audit artifacts but would have forced the summary to render BOTH as a tool call and as chat text, duplicating the rendering surface and complicating the conversation entry types. Known limitation: runner buffering may deliver the tool call before the final text deltas have been folded into the projection; user accepted this risk and post-mortem identified that captures shorter than 50 characters are logged as warnings via `_extract_last_orchestrator_text` in `koan/web/mcp_endpoint.py`. Implementation surfaced during the 2026-04-17 plan workflow that wired RAG injection into phase transitions. diff --git a/.koan/memory/0042-phasesummaries-dict-stored-on-run-projection-wire.md b/.koan/memory/0042-phasesummaries-dict-stored-on-run-projection-wire.md new file mode 100644 index 00000000..500c587d --- /dev/null +++ b/.koan/memory/0042-phasesummaries-dict-stored-on-run-projection-wire.md @@ -0,0 +1,11 @@ +--- +title: phase_summaries dict stored on Run projection, wire-visible +type: decision +created: '2026-04-17T09:37:19Z' +modified: '2026-04-17T09:37:19Z' +related: +- 0007-dual-fold-system-audit-fold-per-subagent-disk-vs-projection-fold-workflow-sse.md +- 0041-per-phase-summary-capture-rides-on-orchestrators.md +--- + +This entry documents the storage location for koan's per-phase summary state used by mechanical RAG injection. On 2026-04-17, user decided that `phase_summaries: dict[str, str]` lives on the `Run` projection model in `koan/projections.py` and is serialized to the SSE wire alongside every other Run field. Frontend ignores the field for now; future UI work may surface it. Alternatives rejected: storing on `AppState` only (would lose event-log restorability -- the projection is reconstructable from events but AppState is not), or storing on the projection but excluding from `to_wire()` (would break the invariant that the projection IS what the frontend sees, regressing the symmetric fold design captured in entry 7). User stated the wire-visibility "is not a secret, it's just not necessary right now" -- the field is data-only and exposing it carries no risk. Decision surfaced during intake of the RAG-wiring workflow on 2026-04-17. diff --git a/.koan/memory/0043-mechanical-rag-injection-anchor-task-run-dir.md b/.koan/memory/0043-mechanical-rag-injection-anchor-task-run-dir.md new file mode 100644 index 00000000..9fc90887 --- /dev/null +++ b/.koan/memory/0043-mechanical-rag-injection-anchor-task-run-dir.md @@ -0,0 +1,12 @@ +--- +title: 'Mechanical RAG injection anchor: task + run-dir markdown (mtime asc) + immediate + prior phase summary' +type: decision +created: '2026-04-17T09:37:31Z' +modified: '2026-04-17T09:37:31Z' +related: +- 0020-memory-retrieval-static-directive-mechanical-injection.md +- 0041-per-phase-summary-capture-rides-on-orchestrators.md +--- + +This entry documents the anchor composition rule for koan's mechanical RAG injection pipeline. On 2026-04-17, user decided that `_compose_rag_anchor` in `koan/web/mcp_endpoint.py` produces a single anchor string from three sources concatenated in a fixed order: (1) the workflow task description, (2) every `*.md` file in the run directory sorted by mtime ascending (oldest first), (3) the immediate prior phase's summary read from `Run.phase_summaries[completed_phase]`. The cheap query-generation LLM receives this single anchor plus the per-phase `retrieval_directive` and produces 1-3 search queries combined and reranked against the directive. Alternatives rejected: separate RAG queries per source (more LLM calls, harder reranking -- user noted "it's more common to do a single context and guide the cheap LLM into writing useful queries based on all provided context"), and including all prior phase summaries (would dilute anchor topics -- relies on summary-chain compaction: if intake facts still matter in plan-review, plan-spec's summary repeats them). Chronological mtime ordering puts the most recent artifact closest to the prior summary, placing the most directly relevant content where attention is strongest. Decision surfaced during 2026-04-17 intake when user clarified the anchor should be "first task description, then all artifacts in chronological order, then summary of previous phase". diff --git a/.koan/memory/0044-mechanical-rag-injection-scope-orchestrator.md b/.koan/memory/0044-mechanical-rag-injection-scope-orchestrator.md new file mode 100644 index 00000000..3a743b83 --- /dev/null +++ b/.koan/memory/0044-mechanical-rag-injection-scope-orchestrator.md @@ -0,0 +1,10 @@ +--- +title: 'Mechanical RAG injection scope: orchestrator phases only; curation excluded' +type: decision +created: '2026-04-17T09:37:43Z' +modified: '2026-04-17T09:37:43Z' +related: +- 0020-memory-retrieval-static-directive-mechanical-injection.md +--- + +This entry documents the agent-type scope for koan's mechanical RAG memory injection. On 2026-04-17, user decided that mechanical injection runs ONLY for orchestrator phases that declare a non-empty `retrieval_directive` on their `PhaseBinding` in `koan/lib/workflows.py`. Scouts and executors are excluded from injection. The curation phase's binding sets `retrieval_directive=""` explicitly, disabling injection. Rationale: scouts receive a narrow single-shot prompt where memory entries would be noise; executors have richer artifacts to read and benefit less from cross-cutting memory; curation already calls `koan_memory_status` which surfaces the full project summary and entry listing, making mechanical injection redundant for it. Alternatives rejected: include executors with a directive keyed to artifact subsystems (deferred to a future workflow because executors don't yet have a clear directive vocabulary), and emit a non-empty curation directive (rejected because `koan_memory_status` already covers the duplicate-detection use case). Scope surfaced during 2026-04-17 intake when user explicitly answered "orchestrator_only" to the agent-scope question. diff --git a/.koan/memory/0045-end-of-phase-summary-must-be-a-dense-paragraph-it.md b/.koan/memory/0045-end-of-phase-summary-must-be-a-dense-paragraph-it.md new file mode 100644 index 00000000..4b1447e6 --- /dev/null +++ b/.koan/memory/0045-end-of-phase-summary-must-be-a-dense-paragraph-it.md @@ -0,0 +1,12 @@ +--- +title: End-of-phase summary must be a dense paragraph -- it becomes the next phase's + RAG anchor +type: procedure +created: '2026-04-17T09:37:55Z' +modified: '2026-04-17T09:37:55Z' +related: +- 0041-per-phase-summary-capture-rides-on-orchestrators.md +- 0043-mechanical-rag-injection-anchor-task-run-dir.md +--- + +This entry records a behavioral rule for koan orchestrator agents at phase boundaries. On 2026-04-17, the team established the procedure: when the orchestrator is about to call its first `koan_yield` of a phase boundary (the `Phase Complete` boundary that follows the final `koan_complete_step` of a phase), the assistant text immediately preceding that yield must be a standalone, dense, information-rich paragraph that names the decisions made, constraints discovered, artifacts produced, and any unresolved items of the just-finished phase. The rule exists because that text is automatically captured into `Run.phase_summaries[phase]` and fed as the prior-phase summary anchor for the next phase's mechanical RAG injection (see `_extract_last_orchestrator_text` in `koan/web/mcp_endpoint.py`). A terse "done" or single-sentence acknowledgement degrades the next phase's RAG retrieval quality and degrades the user-facing brief. Procedure derived during the 2026-04-17 RAG-wiring workflow. Mechanical reinforcement: an `IMPORTANT:` paragraph in the orchestrator system prompt (`koan/prompts/orchestrator.py`) instructs the orchestrator about the contract; future drift is observable via warning logs when `len(summary_text) < 50` chars or when no text is captured at all. diff --git a/.koan/memory/0046-mechanical-rag-injection-is-fail-soft-log-warning.md b/.koan/memory/0046-mechanical-rag-injection-is-fail-soft-log-warning.md new file mode 100644 index 00000000..fdca3478 --- /dev/null +++ b/.koan/memory/0046-mechanical-rag-injection-is-fail-soft-log-warning.md @@ -0,0 +1,10 @@ +--- +title: 'Mechanical RAG injection is fail-soft: log warning, never block phase handshake' +type: procedure +created: '2026-04-17T09:38:05Z' +modified: '2026-04-17T09:38:05Z' +related: +- 0041-per-phase-summary-capture-rides-on-orchestrators.md +--- + +This entry records a behavioral rule for koan's mechanical memory injection pipeline at the orchestrator's phase handshake. On 2026-04-17, the team established the procedure: when `_compute_memory_injection` in `koan/web/mcp_endpoint.py` raises any exception (missing `VOYAGE_API_KEY`, empty `.koan/memory/`, LanceDB I/O error, embedding API failure, etc.), the helper catches the exception, logs it at `warning` level via `log.warning("mechanical memory injection failed for phase %r ...", exc_info=True)`, and returns an empty injection block. The phase handshake proceeds without the `## Relevant memory` section. The rule exists because retrieval quality is best-effort and never load-bearing -- the orchestrator can complete its phase from the directive + task + artifacts alone. A blocking handshake on retrieval failure would couple workflow correctness to optional infrastructure. Same posture is applied in `koan_yield`: short or missing summary captures emit warnings but never block the yield. Procedure surfaced during 2026-04-17 plan-spec when the user accepted the fail-soft design decision and the executor wired warning log lines per the plan. diff --git a/.koan/memory/0047-docsmemory-systemmd-described-an-unimplemented.md b/.koan/memory/0047-docsmemory-systemmd-described-an-unimplemented.md new file mode 100644 index 00000000..818f68ab --- /dev/null +++ b/.koan/memory/0047-docsmemory-systemmd-described-an-unimplemented.md @@ -0,0 +1,11 @@ +--- +title: docs/memory-system.md described an unimplemented summary.md load step in the + injection pipeline +type: lesson +created: '2026-04-17T09:38:19Z' +modified: '2026-04-17T09:38:19Z' +related: +- 0032-plan-review-produced-unverified-critical-finding.md +--- + +This entry records a documentation-versus-code drift in koan's memory system. On 2026-04-17, plan-review discovered that `docs/memory-system.md` described a 5-step mechanical injection pipeline whose first step was "Load project summary -- summary.md is loaded in full ... runs only at intake" -- a step that was never wired into the orchestrator's phase handshake. The `_step_phase_handshake` code path in `koan/web/mcp_endpoint.py` had no summary.md load at any point; the injection helper composed the anchor from task + artifacts + prior phase summary alone. Root cause: the design spec was authored aspirationally during the memory system design phase and never reconciled when the partial wiring landed. Plan-review caught the drift only because the reviewer cross-checked the doc claim against the actual code path, which is not a routine review move. Correction applied during the 2026-04-17 RAG-wiring workflow: the doc was rewritten to describe a 4-step pipeline (drop the summary.md load) and an "Implementation mapping" subsection was appended pinning the doc to specific file/function names. Adding the summary.md load is left to a future workflow if the user wants it. diff --git a/.koan/memory/0048-inspect-ai-uk-aisi-selected-as-koan-eval.md b/.koan/memory/0048-inspect-ai-uk-aisi-selected-as-koan-eval.md new file mode 100644 index 00000000..31342baa --- /dev/null +++ b/.koan/memory/0048-inspect-ai-uk-aisi-selected-as-koan-eval.md @@ -0,0 +1,8 @@ +--- +title: Inspect AI (UK AISI) selected as koan eval framework over deepeval +type: decision +created: '2026-04-17T12:06:09Z' +modified: '2026-04-17T12:06:09Z' +--- + +The koan eval framework selection covers the choice of evaluation tooling for the test suite overhaul. On 2026-04-17, Leon evaluated Inspect AI (UK AISI) and deepeval as candidate frameworks and selected Inspect AI. Leon's stated rationale: Inspect AI supports black-box subprocess testing as a first-class concept and provides four clean primitives -- Dataset (frozen fixture collection), Task (spec combining Dataset + Solver + Scorers), Solver (a function that transforms TaskState; koan runs as a subprocess here), and Scorer (grades output; LLM-as-judge supported natively via `model_graded_qa`). deepeval was rejected: it lacks the black-box subprocess model that koan's eval approach requires. The eval framework lives under `evals/` in the koan repository root. The four primitives map to koan as follows: Dataset = frozen koan project snapshots, Solver = subprocess runner against a frozen snapshot, Scorer = LLM-as-judge grading plan artifacts. diff --git a/.koan/memory/0049-eval-solver-answers-all-koan-interactive-gates.md b/.koan/memory/0049-eval-solver-answers-all-koan-interactive-gates.md new file mode 100644 index 00000000..3f942d16 --- /dev/null +++ b/.koan/memory/0049-eval-solver-answers-all-koan-interactive-gates.md @@ -0,0 +1,9 @@ +--- +title: Eval Solver answers all koan interactive gates with a fixed message, not a + surrogate LLM +type: decision +created: '2026-04-17T12:06:18Z' +modified: '2026-04-17T12:06:18Z' +--- + +The koan eval Solver's approach to interactive phase handling was established on 2026-04-17 during the test suite overhaul planning session. During a live koan workflow run, the orchestrator calls `koan_yield` (which blocks until a user message arrives via `POST /api/chat`) and `koan_ask_question` (which blocks until answers arrive via `POST /api/answer`). In the eval context these gates would block indefinitely. Leon decided that the Solver in `evals/solver.py` answers every interactive gate with a fixed message: "Please use your best judgment and pick whichever option you think is best." The orchestrator self-selects from available options. Leon rejected the alternative of a surrogate-user LLM -- a second LLM impersonating the user and answering questions on the fly -- because it would add LLM API cost and non-determinism to the eval without proportional signal gain at this early stage of the framework. diff --git a/.koan/memory/0050-eval-benchmark-fixtures-are-manual-git-snapshots.md b/.koan/memory/0050-eval-benchmark-fixtures-are-manual-git-snapshots.md new file mode 100644 index 00000000..39faeee9 --- /dev/null +++ b/.koan/memory/0050-eval-benchmark-fixtures-are-manual-git-snapshots.md @@ -0,0 +1,8 @@ +--- +title: Eval benchmark fixtures are manual git snapshots of koan at specific commits +type: decision +created: '2026-04-17T12:06:26Z' +modified: '2026-04-17T12:06:26Z' +--- + +The koan eval benchmark fixture format was established on 2026-04-17 during the test suite overhaul planning session. Leon decided that the reference benchmark corpus would be the koan project itself, captured manually at specific git commits. Each fixture directory under `evals/fixtures/` contains three artifacts: `task.md` (the task description as UTF-8 plain text), `snapshot.tar.gz` (a `git archive HEAD --format=tar.gz` of the target project at a specific commit), and `memory/` (a copy of `.koan/memory/` at that commit). Leon's rationale: using koan itself as the reference corpus captures real-world complexity; re-capture is simple (take a new snapshot at a new commit). Leon rejected two alternatives: fully synthetic task descriptions against a fictional codebase (risk: synthetic inputs may not expose real failure modes) and live session capture from actual koan runs (concern: fragile and labor-intensive to re-capture). diff --git a/.koan/memory/0051-unit-tests-asserting-llm-prompt-content-deleted.md b/.koan/memory/0051-unit-tests-asserting-llm-prompt-content-deleted.md new file mode 100644 index 00000000..7c7b5988 --- /dev/null +++ b/.koan/memory/0051-unit-tests-asserting-llm-prompt-content-deleted.md @@ -0,0 +1,8 @@ +--- +title: Unit tests asserting LLM prompt content deleted; pure-logic tests retained +type: decision +created: '2026-04-17T12:14:23Z' +modified: '2026-04-17T12:14:23Z' +--- + +The koan test suite retention policy was established on 2026-04-17 during the test suite overhaul planning session. Leon stated that tests asserting on LLM prompt content (hardcoded strings in step guidance text, workflow dataclass structure, phase shape) are low-value because they break whenever prompt engineering changes and provide no signal about actual LLM behavior. Leon decided to delete the following test files entirely: `tests/test_phases.py` (286 lines of step-progression and prompt-text content tests), `tests/test_workflows.py` (288 lines of workflow dataclass structure tests), `tests/phases/test_curation.py` (phase shape and SYSTEM_PROMPT content checks), and `tests/test_driver.py` (17-line import smoke test). Leon decided to retain tests that cover deterministic pure-logic algorithms: `tests/test_permissions.py` (permission gate logic), `tests/test_projections.py` (projection fold), `tests/test_audit_fold.py` (audit fold), `tests/test_runners.py` (stream event parsing), `tests/test_probe.py` (runner probing), `tests/test_mcp_check_or_raise.py` (permission check), `tests/test_interactions.py` (interaction queue FIFO logic), `tests/test_subagent.py`, and all twelve files under `tests/memory/`. diff --git a/.koan/memory/0052-eval-dataset-uses-full-run-fixtures-first-per.md b/.koan/memory/0052-eval-dataset-uses-full-run-fixtures-first-per.md new file mode 100644 index 00000000..32f9cc2e --- /dev/null +++ b/.koan/memory/0052-eval-dataset-uses-full-run-fixtures-first-per.md @@ -0,0 +1,8 @@ +--- +title: Eval dataset uses full-run fixtures first; per-phase checkpoint freeze deferred +type: decision +created: '2026-04-17T12:14:31Z' +modified: '2026-04-17T12:14:31Z' +--- + +The koan eval dataset granularity decision was made on 2026-04-17 during the test suite overhaul planning session. Leon decided that the first iteration of the `evals/` framework would use full-run fixtures only: each `Sample` in the Inspect AI Dataset corresponds to one complete koan workflow run from task description to final artifact set. Leon explicitly deferred per-phase and per-step fixture checkpointing. Leon's stated reason: mid-run resume requires the `--resume` flag on the orchestrator CLI, which Leon described as fragile and not ready to instrument. The design direction (per-phase and per-step freeze points) was documented in the plan at `plan.md` for a future iteration but excluded from the initial implementation scope. diff --git a/.koan/memory/0053-new-read-only-memory-tools-must-be-added-to.md b/.koan/memory/0053-new-read-only-memory-tools-must-be-added-to.md new file mode 100644 index 00000000..9ed6020c --- /dev/null +++ b/.koan/memory/0053-new-read-only-memory-tools-must-be-added-to.md @@ -0,0 +1,8 @@ +--- +title: New read-only memory tools must be added to _UNIVERSAL_MEMORY_TOOLS in koan/lib/permissions.py +type: procedure +created: '2026-04-18T14:36:10Z' +modified: '2026-04-18T14:36:10Z' +--- + +The permission gate in `koan/lib/permissions.py` provides a universal fast-path for read-only memory query tools via the `_UNIVERSAL_MEMORY_TOOLS` frozenset. On 2026-04-18, Leon identified that `koan_memory_status` and `koan_search` had been accidentally scoped to the orchestrator role only -- they appeared in `_ORCHESTRATOR_MEMORY_TOOLS` but were absent from the non-orchestrator `ROLE_PERMISSIONS` dicts (`scout`, `executor`, `intake`, `planner`), causing scouts and executors to be silently blocked from querying memory. Leon directed the fix: add both tools to a new `_UNIVERSAL_MEMORY_TOOLS` frozenset placed between the `_NON_BASH_READ_TOOLS` fast-path and the orchestrator branch in `check_permission()`. The resulting behavioral rule: any new read-only memory tool added to the koan MCP endpoint must also be registered in `_UNIVERSAL_MEMORY_TOOLS` to be available for all agent roles. Placing a new memory read tool only in `_ORCHESTRATOR_MEMORY_TOOLS` will silently restrict it to the orchestrator with no error. diff --git a/.koan/memory/0054-intake-summarize-step-step-3-extracted-to-provide.md b/.koan/memory/0054-intake-summarize-step-step-3-extracted-to-provide.md new file mode 100644 index 00000000..e5cfb616 --- /dev/null +++ b/.koan/memory/0054-intake-summarize-step-step-3-extracted-to-provide.md @@ -0,0 +1,16 @@ +--- +title: Intake Summarize step (step 3) extracted to provide a clean RAG-injection anchor + at phase boundary +type: decision +created: '2026-04-18T16:28:03Z' +modified: '2026-04-18T16:28:03Z' +related: +- 0011-intake-confidence-loop-removed-unnecessary-scout.md +- 0013-single-cognitive-goal-per-step-prevents-simulated.md +- 0041-per-phase-summary-capture-rides-on-orchestrators.md +- 0045-end-of-phase-summary-must-be-a-dense-paragraph-it.md +--- + +The intake phase in koan (koan/phases/intake.py) has a dedicated step 3 (Summarize, TOTAL_STEPS = 3) that was extracted from the end of the Deepen step on 2026-04-17. On 2026-04-18, Leon confirmed the primary rationale: the RAG injection pipeline (entries 0041, 0045) captures the orchestrator's last prose turn before the first koan_yield of each phase as the phase summary. When the synthesis was embedded at the end of step 2 (Deepen), any koan_complete_step call for remaining Deepen work would follow the synthesis text, potentially displacing it as the final text before yield and leaving the RAG capture with noisy or incomplete content. + +The dedicated Summarize step forces synthesis to happen as its own distinct act immediately before the phase boundary, so the prose written between the phase-complete koan_complete_step response and the first koan_yield is an unambiguous summary -- the form the RAG pipeline expects. Secondary rationale: the single-cognitive-goal-per-step principle (entry 0013) -- Deepen stays focused on dialogue and verification; Summarize is a distinct cognitive act. Alternative rejected: embedding the summary at the end of step 2 and relying on step discipline alone, because the RAG capture mechanism has no way to enforce which portion of step 2's output is the synthesis. diff --git a/.koan/memory/0055-opus-47-requires-thinking-display-summarized-to.md b/.koan/memory/0055-opus-47-requires-thinking-display-summarized-to.md new file mode 100644 index 00000000..742d7d25 --- /dev/null +++ b/.koan/memory/0055-opus-47-requires-thinking-display-summarized-to.md @@ -0,0 +1,9 @@ +--- +title: Opus 4.7+ requires --thinking-display summarized to restore thinking tokens + in Claude CLI stream-json +type: context +created: '2026-04-19T06:21:20Z' +modified: '2026-04-19T06:21:20Z' +--- + +This entry records the addition of `--thinking-display summarized` to Claude launch arguments in koan. On 2026-04-19, Leon reported that Anthropic's Opus 4.7 release changed Claude Code CLI behavior: the default thinking-display mode changed to "off", omitting thinking tokens from the stream-json output entirely unless the consumer opts in via the undocumented `--thinking-display` flag. In response, Leon directed that the Claude runner in `koan/runners/claude.py` append `--thinking-display summarized` to the command whenever the selected model alias contained "opus" (case-insensitive substring match, inserted between `--model X` and the `installation.extra_args` spread). The `summarized` value produced condensed reasoning summaries suitable for the projection store's thinking-block rendering path, while the Opus 4.7+ default produced no thinking content at all. Leon chose substring matching over exact or prefix matching on the rationale that users were assumed to run the latest Opus model; earlier Opus releases (4.6 and prior) accepted the flag as a no-op, so the substring match was safe across versions. The flag was not present in Claude Code's published CLI reference at that date; Leon discovered it empirically after the 4.7 upgrade broke thinking-block visibility in the koan frontend. diff --git a/.koan/memory/0056-permission-mode-acceptedits-auto-approves-a-fixed.md b/.koan/memory/0056-permission-mode-acceptedits-auto-approves-a-fixed.md new file mode 100644 index 00000000..75c1114d --- /dev/null +++ b/.koan/memory/0056-permission-mode-acceptedits-auto-approves-a-fixed.md @@ -0,0 +1,9 @@ +--- +title: --permission-mode acceptEdits auto-approves a fixed Bash subset inside --add-dir + scope +type: context +created: '2026-04-19T06:21:23Z' +modified: '2026-04-19T06:21:23Z' +--- + +This entry documents permission-mode behavior for Claude subagents in koan. On 2026-04-19, during the plan-workflow Claude-CLI-flags change, Leon cited Anthropic's documentation establishing that `--permission-mode acceptEdits` auto-approves two categories of Claude Code tool calls: (1) all Write and Edit tool calls, and (2) a fixed set of filesystem Bash commands -- `mkdir`, `touch`, `rm`, `rmdir`, `mv`, `cp`, `sed`, optionally prefixed with safe environment variables (`LANG=C`, `NO_COLOR=1`) or wrapped in process wrappers (`timeout`, `nice`, `nohup`). Auto-approval applied only to paths inside the CLI's working directory and any directories added via `--add-dir`. Koan's Claude subagents ran in headless mode (`-p` + `--output-format stream-json`), which could not respond to interactive permission prompts, so Bash commands outside this safe subset would hang indefinitely when invoked. Leon accepted this tradeoff for the change, and the koan design chose `acceptEdits` as the unconditional permission mode for every Claude subagent (orchestrator, executor, scout). Leon also stated an intent to repurpose the `--yolo` flag into a separate non-interactive mode at a later date, at which point workflows requiring broader Bash execution could bypass the safe-subset restriction. diff --git a/.koan/memory/0057-claude-permissiondirectory-flags-composed-at.md b/.koan/memory/0057-claude-permissiondirectory-flags-composed-at.md new file mode 100644 index 00000000..797928e2 --- /dev/null +++ b/.koan/memory/0057-claude-permissiondirectory-flags-composed-at.md @@ -0,0 +1,9 @@ +--- +title: Claude permission/directory flags composed at spawn time in subagent.py, not + in build_command or extra_args +type: decision +created: '2026-04-19T06:21:28Z' +modified: '2026-04-19T06:21:28Z' +--- + +This entry records the argv-composition split for Claude subagents in koan. On 2026-04-19, during the plan-workflow Claude-CLI-flags change, Leon decided that Claude-specific flags split across two sites: `ClaudeRunner.build_command` in `koan/runners/claude.py` produced a stable command skeleton (model-only-dependent flags like `--thinking-display summarized` lived here, gated by `"opus" in model.lower()`), and `spawn_subagent` in `koan/subagent.py` appended Claude-specific argv extensions afterward via the new `_claude_post_build_args(role, run_dir, project_dir)` helper (run-context-dependent flags: `--add-dir `, `--add-dir `, `--permission-mode acceptEdits`, alongside the previously-existing `--tools`, `--disable-slash-commands`, `--strict-mcp-config`). Two alternatives were explicitly rejected during the intake phase that preceded the decision. First, extending the `Runner.build_command` protocol with `run_dir` and `project_dir` parameters: rejected because it forced signature churn on `CodexRunner` and `GeminiRunner` for flags they did not need, violating the "codex and gemini untouched" scope constraint. Second, writing the flags into `AgentInstallation.extra_args` as done for the old `--dangerously-skip-permissions`: rejected because `run_dir` is per-run and `AgentInstallation.extra_args` was serialized to `KoanConfig` on disk, so run-specific paths could not persist there. The spawn-time helper read `task["run_dir"]` and `task["project_dir"]`, both already populated by every spawn site (`koan/driver.py` for the orchestrator, scout and executor spawners in `koan/web/mcp_endpoint.py`). Leon also removed the `"claude"` entry from `_YOLO_ARGS` in `koan/web/app.py` in the same change, making the `--yolo` flag a no-op for Claude while leaving codex and gemini yolo entries unchanged. diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 00000000..001d0380 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,176 @@ +# Koan Architecture Invariants + +Full architecture documentation: **[docs/architecture.md](docs/architecture.md)** + +## Frontend Design System (read before any frontend work) + +The frontend uses a strict token-driven component system. Visual identity +is user-controlled — agents implement it but do not change it without +approval. Violations compound: a misplaced color becomes a wrong token +becomes an inconsistent component becomes a broken design language. + +**When touching any file under `frontend/`**, read +**[frontend/AGENTS.md](frontend/AGENTS.md)** first. It defines protected +files, the component hierarchy (atoms → molecules → organisms), and CSS +conventions. + +**When building or modifying a UI component**, also read +**[frontend/src/components/AGENTS.md](frontend/src/components/AGENTS.md)**. +It contains the development rules, the tier decision tree, and the +verification checklist. + +--- + +Spoke documents: + +- [docs/subagents.md](docs/subagents.md) -- spawn lifecycle, task manifest, step-first workflow, permissions +- [docs/ipc.md](docs/ipc.md) -- HTTP MCP tool calls, blocking interactions, scout spawning, koan_yield blocking +- [docs/state.md](docs/state.md) -- driver/LLM boundary, run state, orchestrator state +- [docs/intake-loop.md](docs/intake-loop.md) -- two-step intake design, prompt engineering +- [docs/phase-trust.md](docs/phase-trust.md) -- phase trust model, verification boundaries, adversarial review +- [docs/projections.md](docs/projections.md) -- versioned event log, fold function, projection shape, SSE protocol, version-negotiated catch-up +- [docs/token-streaming.md](docs/token-streaming.md) -- runner stdout parsing, SSE delta path + +**Workflow types:** `plan` (intake → plan-spec → plan-review → execute) · `milestones` (stub: intake only) + +--- + +The six core invariants (see architecture.md for full detail + pitfalls): + +## 1. File Boundary + +LLMs write **markdown files only**. The driver maintains **JSON state files** +internally -- no LLM ever reads or writes a `.json` file. Tool code bridges +both worlds. + +## 2. Step-First Workflow Pattern (critical) + +The orchestrator is a single long-lived CLI process (`claude`, `codex`, or +`gemini`) that runs the entire workflow. It connects to the driver's HTTP MCP +endpoint at `http://localhost:{port}/mcp?agent_id={id}` and receives tools via +MCP. The driver handles all tool logic in-process. + +**The first thing the orchestrator does is call `koan_complete_step`.** The +spawn prompt contains _only_ this directive. The tool returns step 1 +instructions. This establishes the calling pattern before the LLM sees complex +instructions. + +``` +Boot prompt: "You are a koan orchestrator agent. Call koan_complete_step to receive your instructions." + | LLM calls koan_complete_step (step 0 -> 1 transition) +Tool returns: Step 1 instructions (phase role context + task details) + | LLM does work... + | LLM calls koan_complete_step +Tool returns: Step 2 instructions (or phase-boundary response) +``` + +When a phase ends, `koan_complete_step` returns a **non-blocking** response +telling the orchestrator to summarize and call `koan_yield`. `koan_yield` is +the generic conversation primitive — it blocks until the user sends a message, +then returns that message as the tool result. The orchestrator calls `koan_yield` +repeatedly for multi-turn conversation, then calls `koan_set_phase` to commit +the transition. Passing `koan_set_phase("done")` ends the workflow (tombstone). +The step counter resets to 0 on each `koan_set_phase` call, then advances to 1 +on the next `koan_complete_step`. Phase-specific role context (`SYSTEM_PROMPT`) +is injected into that step-1 response. + +Step progression is normally linear within a phase, but phase modules may +override `get_next_step()` to implement non-linear flows. See +[docs/intake-loop.md](docs/intake-loop.md). + +Executor subagents are spawned by the orchestrator via `koan_request_executor`. +Scout subagents are spawned via `koan_request_scouts`. + +## 3. Driver Determinism (partially relaxed) + +The driver (`koan/driver.py`) spawns the orchestrator and awaits its exit. +Phase routing is driven by the orchestrator via `koan_set_phase` rather than +the driver's routing loop. + +The driver still: + +- Validates every phase transition (`is_valid_transition()` in the tool handler) +- Updates `run-state.json` atomically +- Emits projection events +- Enforces the permission fence + +The driver does **not** decide which phase runs next. Invalid phase strings +raise `ToolError`; valid transitions are committed. All routing decisions flow +through typed tool parameters, not free text. + +`is_valid_transition(workflow, from_phase, to_phase)` checks that `to_phase` is +in the active workflow's `available_phases` and is not equal to `from_phase`. +Any phase in the workflow is reachable from any other — there is no DAG of +required successors. + +## 4. Default-Deny Permissions + +Two enforcement layers restrict what tools each agent can use: + +1. **CLI tool whitelist** (`CLAUDE_TOOL_WHITELISTS` in `subagent.py`) -- + controls which built-in tools exist in the model's context. Unlisted tools + are not presented to the model; it cannot call them. Agents should not have + access to tools they are never intended to need. +2. **MCP permission fence** (`check_permission()` in `permissions.py`) -- + gates koan MCP tool calls per role and phase. Unknown roles and tools are + blocked. + +The fence also supports step-level gating: `write` and `edit` are blocked +during brief-generation step 1 (the read step). + +**CLI tool whitelists (per agent type):** + +| Role | Built-in tools | +| ------------ | ---------------------------------------------------------------------------------------------------------------------------- | +| orchestrator | `Read`, `Write`, `Edit`, `Bash`, `Glob`, `Grep`, `WebFetch`, `WebSearch` | +| executor | `Read`, `Write`, `Edit`, `Bash`, `Glob`, `Grep`, `TaskCreate`, `TaskUpdate`, `TaskList`, `TaskGet`, `TaskStop`, `TaskOutput` | +| scout | `Read`, `Bash`, `Glob`, `Grep` | + +**MCP permission fence -- orchestrator tool availability by phase:** + +| Tool | Available phases | +| --------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- | +| `koan_complete_step` | All phases | +| `koan_set_phase` | All phases (blocked mid-story during execution); accepts `"done"` as tombstone | +| `koan_yield` | All phases | +| `koan_ask_question` | All phases | +| `koan_request_scouts` | `intake`, `core-flows`, `tech-plan`, `ticket-breakdown`, `cross-artifact-validation`, `plan-spec`, `plan-review` | +| `koan_request_executor` | `execution`, `execute` | +| `koan_select_story`, `koan_complete_story`, `koan_retry_story`, `koan_skip_story` | `execution` only | +| `write`, `edit` (run_dir scoped) | All phases except `brief-generation` step 1 | +| `bash` | `execution`, `implementation-validation` | +| `koan_memorize` | All phases | +| `koan_forget` | All phases | +| `koan_memory_status` | All phases | +| `koan_search` | All phases | + +## 5. Need-to-Know Prompts + +Boot prompt is one sentence. System prompt is minimal (orchestrator identity +only). Phase-specific role context arrives via step 1 guidance after +`koan_set_phase` is called -- the orchestrator doesn't know its next role until +`koan_complete_step` tells it. + +Each workflow provides a `phase_guidance` injection for the phases it defines. +This injection appears at the top of step 1 guidance and sets workflow-specific +posture (investigation depth, question aggressiveness, what to hand off to the +executor). See [docs/architecture.md](docs/architecture.md) for the injection contract. + +## 6. Directory-as-Contract + +The orchestrator has one subagent directory for the entire run. Executor and +scout subagents each get their own directory per the standard contract: + +| File | Writer | Reader | Purpose | +| -------------- | ------------------------- | ------------------------------ | ------------------ | +| `task.json` | Parent (before spawn) | Parent (at agent registration) | What to do | +| `state.json` | Parent (audit projection) | Available for debugging | What has been done | +| `events.jsonl` | Parent (audit log) | Available for replay | Full event history | + +The `mcp_url` field in `task.json` tells the child where to connect for tool +calls. No structured configuration flows through CLI flags. The spawn command +carries the directory path and the MCP config pointing at the driver's HTTP +endpoint. + +The `task.json` for every subagent includes `run_dir` — the path to the current +workflow run directory (`~/.koan/runs//`). diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 00000000..43c994c2 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +@AGENTS.md diff --git a/README.md b/README.md index 10e3e1dc..eb4fde44 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,81 @@ -# Koan Pi Package +# Koan -This repository is structured as a [pi](https://github.com/badlogic/pi-mono/tree/main/packages/coding-agent) package. +Koan runs opinionated multi-turn workflows for LLM-assisted engineering. A local Python process hosts a web dashboard and an MCP endpoint; subagents are invocations of the vendor CLIs (`claude`, `codex`, or `gemini`) that connect back over HTTP MCP and advance through a fixed sequence of phases under user direction. Decisions are written to markdown artifacts during the run, and a per-project memory is carried across runs. + +Koan invokes the vendor CLIs as subprocesses and uses whatever authentication they already have on your machine. It does not call provider APIs directly, does not touch OAuth credentials, and does not proxy traffic, so it runs under your existing CLI subscription and within each CLI's terms of service. + +Koan is alpha. Interfaces, phase names, state files, and the memory schema will change without migration. + +## The Problem + +Long-lived LLM-assisted projects accumulate what I call knowledge debt. The developer no longer knows what is in the code, and the LLM never knew to begin with. Utilities get reimplemented, conventions diverge, and architecture drifts. + +LLMs are good at retrieval, synthesis, and presentation. They are bad at reasoning under uncertainty and at noticing drift. Larger context windows do not help -- attention is finite and uneven regardless of window size. What helps is giving the model a narrower slice of the right information at each step, and writing the decisions down so the next run does not re-derive them. + +## What It Does + +### Workflows + +A workflow is a fixed sequence of phases with narrow roles. The plan workflow runs intake, plan-spec, plan-review, execute. The orchestrator is a long-lived LLM process that advances by calling typed tools; it cannot skip phases or improvise structure. Each phase ends with a summary and a pause for user direction. + +### Decision capture + +Agents write markdown artifacts during the run: landscape.md, plan.md, scout findings, review notes. These are the durable record of what was considered and why. + +### Memory + +A per-project memory lives under `.koan/memory/`. Each entry is a short markdown file with YAML frontmatter, typed as decision, context, lesson, or procedure. Entries are proposed by the orchestrator at the end of a workflow and approved by the user before being committed. + +Three read modes are exposed to agents: + +- `status`: a broad project summary, injected at workflow start +- `query`: hybrid semantic + keyword retrieval +- `reflect`: the agent poses a question and receives a synthesized briefing drawn from multiple entries + +Phase entry also retrieves a contextual slice of memory, scoped to the role and phase. + +### Machine-readable code + +Function docstrings are written for LLM consumption. They include usage examples and explicit "use when..." triggers, so an agent reading the docstring can decide whether the function applies without reading the body. + +### Docs in code + +Architecture decisions and invariants live next to the code they constrain. Workflows read and update these during execution rather than deferring to a cleanup pass. + +### Conventions + +Project conventions are declared, not inferred. Agents check against them during review. + +## Quick Start + +```bash +uv sync +uv run koan +``` + +Open the dashboard, select a workflow, and describe the task. + +## How a Run Looks + +The plan workflow phases: + +- `intake`: explore the codebase, ask clarifying questions, produce landscape.md +- `plan-spec`: produce plan.md +- `plan-review`: adversarial review of plan.md against landscape.md +- `execute`: spawn an executor with plan.md + +Memory is consulted at intake and at each phase boundary. At the end of the run, curation proposes memory additions for user approval. + +## Design Notes + +A single Python process (`koan/driver.py`) hosts the dashboard and the MCP endpoint. Subagents are CLI processes speaking HTTP MCP. The driver validates phase transitions, enforces a default-deny permission fence, and maintains run state. It does not parse LLM output. Agents write markdown; the driver writes JSON. + +Three roles: + +- `orchestrator`: runs the workflow and delegates +- `scout`: parallel, read-only investigator +- `executor`: implements from an approved plan + +## Status + +Alpha. I use it daily across several projects. Interfaces will change. diff --git a/docs/AGENTS.md b/docs/AGENTS.md new file mode 100644 index 00000000..bc5dd232 --- /dev/null +++ b/docs/AGENTS.md @@ -0,0 +1,100 @@ +# docs/ Conventions + +Conventions for agents writing or editing files in this directory. + +--- + +## No temporal contamination + +Documentation describes the current state of the system as if it has always +been this way. It is not a changelog. + +**Forbidden patterns:** + +| Pattern | Example violation | Fix | +|---|---|---| +| "replaces X" (historical) | "This replaces the old polling design" | Describe what the system does | +| "previously" | "Previously, events were cached in a dict" | Delete — describe current state only | +| "the old X" | "the old model's problem was..." | Describe the design principle instead | +| "used to" | "scouts used to be top-level phases" | Delete or restructure | +| "was changed from" | "the event was renamed from pipeline-end" | Delete | +| "we switched to" | "we switched to asyncio.Future" | Delete | +| "ported from" | "ported verbatim from the old CSS" | Delete | +| "formerly" | "formerly called pipeline-end" | Delete | + +**Permitted uses of "replaces":** + +"Replaces" describing a logical operation on data is fine — it is not temporal: + +- ✓ `applySnapshot` atomically replaces store state +- ✓ `artifacts_changed` sets the `artifacts` list wholesale +- ✗ "The projection system replaces the ad-hoc dict" (historical) + +**Plans are exempt.** Files under `plans/` are inherently temporal — they +document what to change and why. The rule applies only to `docs/`, code +comments, and docstrings. + +**Design decisions mentioning rejected alternatives are fine.** A comment +explaining "X was considered but Y is used because Z" documents a design +choice. The framing must be about the decision rationale, not a migration +narrative: + +- ✓ "`python-eventsourcing` was considered but is designed for database persistence, not in-memory UI state" +- ✗ "We tried `python-eventsourcing` but switched to a custom implementation" + +--- + +## Spoke document structure + +Spoke documents cover a subsystem in depth. Every spoke document follows this +structure: + +```markdown +# Title + +One sentence: what this document covers. + +> Parent doc: [architecture.md](./architecture.md) + +--- + +## Overview + +One paragraph: the problem this subsystem solves and the high-level approach. + +**Key invariant (if any):** Bold sentence capturing the non-negotiable rule. + +--- + +## [Concept sections] + +Technical detail organized by concept, not by implementation order. + +--- + +## Design Decisions + +Named subsections, one per decision. Each captures: +- The choice made +- Why (first-principles rationale, not migration history) +- Alternatives considered and why they were not chosen +``` + +**Formatting conventions:** + +- Section separators: `---` on its own line +- Parent doc reference: `> Parent doc: [name](./path.md)` immediately after + the opening description, before the first `---` +- Tables: GFM pipe tables with `|---|---|` separator row +- Code blocks: fenced with language tag (` ```python `, ` ```typescript `, etc.) +- Cross-references: `[section-name](./file.md#anchor)` using lowercase-hyphenated anchors +- Bold for key terms on first use: `**design invariant**`, `**materialized projection**` + +--- + +## Full documentation conventions + +For invisible knowledge, README vs CLAUDE.md, in-code documentation tiers, +and module documentation standards, see: + +[resources/conventions/documentation.md](../resources/conventions/documentation.md) diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 00000000..392a7dd6 --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,578 @@ +# Koan Architecture + +Koan coordinates coding task planning and execution through a single long-lived +orchestrator LLM process that runs the entire workflow in one continuous session. This document captures the design invariants, +principles, and pitfalls that govern the codebase. + +**Spoke documents** cover subsystems in depth: + +- [Subagents](./subagents.md) -- spawn lifecycle, boot protocol, step-first + workflow, phase dispatch, permissions, model tiers +- [IPC](./ipc.md) -- HTTP MCP inter-process communication, blocking tool calls, + scout spawning, koan_yield blocking, chat message delivery +- [Token Streaming](./token-streaming.md) -- runner stdout parsing, SSE delta path +- [State & Driver](./state.md) -- the driver/LLM boundary, JSON vs markdown + ownership, run state, orchestrator state +- [Projections](./projections.md) -- versioned event log, pure fold, JSON Patch + protocol, projection model, camelCase wire format +- [Intake Loop](./intake-loop.md) -- two-step intake design, prompt engineering principles +- [Memory System](./memory-system.md) -- project memory, curation, and the RAG injection wired into phase transitions + +--- + +## Core Invariants + +These are load-bearing rules. Violating any one of them breaks the system in +ways that are difficult to diagnose. + +### 1. File boundary + +LLMs write **markdown files only**. The driver maintains **JSON state files** +internally -- no LLM ever reads or writes a `.json` file. + +Tool code bridges both worlds: orchestrator tools write JSON state (for the +driver) and templated `status.md` (for LLMs). The driver reads JSON and exit +codes; it never parses markdown. + +``` +Orchestrator calls koan_complete_story(story_id) + -> tool code writes state.json + status.md + -> driver reads state.json to route next action + -> LLM reads status.md if it needs to reference the decision +``` + +**Why:** If an LLM writes JSON, schema drift and parse errors become runtime +failures in the deterministic driver. Markdown is forgiving; JSON is not. + +### 2. Step-first workflow + +Every subagent is a CLI process (`claude`, `codex`, or `gemini`) that connects +to the driver's HTTP MCP endpoint at `http://localhost:{port}/mcp?agent_id={id}`. +The subagent receives tools via MCP and calls them over HTTP. Once the LLM +produces text without a tool call, the process may exit -- there is no stdin to +recover. The entire workflow depends on the LLM calling `koan_complete_step` +reliably. + +**The first thing any subagent does is call `koan_complete_step`.** The spawn +prompt contains _only_ this directive. The tool returns step 1 instructions. +This establishes the calling pattern before the LLM sees complex instructions. + +``` +Boot prompt: "You are a koan {role} agent. Call koan_complete_step to receive your instructions." + | LLM calls koan_complete_step (step 0 -> 1 transition) +Tool returns: Step 1 instructions (rich context, task details, guidance) + | LLM does work... + | LLM calls koan_complete_step +Tool returns: Step 2 instructions (or "Phase complete. Call koan_yield.") +``` + +Three reinforcement mechanisms make this robust across model capability levels: + +| Mechanism | Where | Why | +| ----------------- | -------------------------------------------------------------------- | ------------------------------------------------------------ | +| **Primacy** | Boot prompt is the LLM's very first message | First action = tool call, at the top of conversation history | +| **Recency** | `format_step()` appends "WHEN DONE: Call koan_complete_step..." last | LLMs weight end-of-context instructions heavily | +| **Muscle memory** | By step 2+ the LLM has called the tool N times | Pattern is locked in through repetition | + +#### Phase boundaries and koan_yield + +When a phase's final step completes, `koan_complete_step` returns a **non-blocking** +response (`format_phase_complete`) that tells the orchestrator to summarize its +work and call `koan_yield`. The orchestrator then generates a summary and calls +`koan_yield` with structured suggestions. + +`koan_yield` is the **generic conversation primitive** — it blocks the +orchestrator process until the user sends a message, then returns that message +as the tool result. The orchestrator can call `koan_yield` repeatedly for +multi-turn conversation before committing a phase transition. + +``` +koan_complete_step (last step) + -> returns: "Phase complete. Summarize and call koan_yield." + | LLM writes summary, constructs suggestions + | LLM calls koan_yield(suggestions=[{id, label, command}, ...]) +Tool blocks until user sends message + | user types in chat or clicks a suggestion pill +Tool returns: user message text + | LLM responds conversationally + | LLM calls koan_yield again (or calls koan_set_phase if direction confirmed) +... + | LLM calls koan_set_phase("plan-spec") -- or "done" to end the workflow +``` + +`koan_yield` is phase-agnostic — it knows nothing about workflow structure. +Suggestions are constructed by the orchestrator at each yield point; the UI +renders them as clickable pills that pre-fill the chat input. + +#### Ending the workflow + +Passing `"done"` to `koan_set_phase` acts as a tombstone: + +``` +koan_set_phase("done") + -> emits workflow_completed + -> sets AppState.workflow_done = True + -> returns "Workflow complete. Call koan_complete_step to finish." + | LLM calls koan_complete_step +Tool returns: "All phases complete. You may now exit." + | LLM exits (no more tool calls) +``` + +`"done"` is detected before the normal `is_valid_transition()` check and is +not a member of any workflow's `available_phases`. The driver treats the +orchestrator's process exit as the actual workflow end signal. + +### 3. Driver determinism (partially relaxed) + +The driver (`koan/driver.py`) spawns the orchestrator and awaits its exit. +Phase routing is driven by the orchestrator via `koan_set_phase` rather than +the driver's routing loop. The driver still validates every transition +(`is_valid_transition()` in the tool handler), updates `run-state.json` +atomically, emits projection events, and enforces the permission fence. It +never parses free text or makes judgment calls. All routing decisions flow +through typed tool parameters. + +`is_valid_transition(workflow, from_phase, to_phase)` validates that `to_phase` +is a member of the active workflow's `available_phases` and is not equal to +`from_phase`. The special value `"done"` bypasses this check entirely. Any +real phase in the workflow is reachable from any other — suggested transitions +guide the orchestrator's default recommendations at phase boundaries, but the +user can request any available phase. Invalid phase strings raise `ToolError`. + +### 4. Default-deny permissions + +Two enforcement layers restrict what tools each agent can use: + +1. **CLI tool whitelist** (`CLAUDE_TOOL_WHITELISTS` in `subagent.py`) -- + controls which Claude Code built-in tools exist in the model's context. + Unlisted tools are not presented to the model; it cannot call them. +2. **MCP permission fence** (`check_permission()` in `permissions.py`) -- + gates koan MCP tool calls per role and phase. Unknown roles and tools are + blocked. Planning roles can only write inside the run directory. + +Agents should not have access to tools they are never intended to need. +Restricting the tool vocabulary prevents the model from drifting toward +irrelevant capabilities (autonomous scheduling, subagent spawning, plan mode) +that compete with koan's step-first workflow. + +The one accepted limitation: `READ_TOOLS` (bash, read, grep, glob, find, ls) +are always allowed because distinguishing "read bash" from "write bash" is +intractable at the permission layer. **Prompt engineering constrains intended +bash use; enforcement does not.** + +See [subagents.md -- Permissions](./subagents.md#permissions) for per-role +whitelists and the full MCP permission matrix. + +### 5. Need-to-know prompts + +Each subagent receives only the minimum context for its task: + +- The **boot prompt** is one sentence (role identity + "call koan_complete_step") +- The **system prompt** establishes role identity and rules, but no task details +- **Task details** arrive via step 1 guidance (returned by the first tool call) + +This is not just tidiness -- it is load-bearing. Injecting step 1 guidance +into the first user message front-loads complex instructions before the LLM has +established the `koan_complete_step` calling pattern. Weaker models produce +text output and exit without entering the workflow. Step guidance is delivered +exclusively through `koan_complete_step` return values. + +**Phase guidance injection.** Each workflow provides a `phase_guidance` dict +mapping phase names to scope-framing text. When the orchestrator calls +`koan_set_phase(phase)`, the workflow's guidance for that phase is stored in +`PhaseContext.phase_instructions`. The step 1 response renders this injection +at the top of the guidance, before procedural instructions, so scope framing +reaches the LLM before it reads task details. + +The injection contract every `phase_guidance` entry must cover: + +| Section | Purpose | +| ------------------------- | ------------------------------------------------------- | +| **Scope** | What kind of task this workflow targets | +| **Downstream consumer** | What phase reads the output, what detail level it needs | +| **Investigation posture** | Direct reading vs. scouts, typical scout count | +| **Question posture** | How aggressively to ask, typical round count | +| **User override** | Always present, always last: "follow their lead" | + +**Memory injection.** At step 1 of every orchestrator phase, the +`_step_phase_handshake` response may include a `## Relevant memory` +block of top-5 memory entries retrieved by a per-phase static directive. +The mechanism is described in [memory-system.md](./memory-system.md); +the directive for each phase lives on its `PhaseBinding.retrieval_directive` +in `koan/lib/workflows.py`. + +### 6. Directory-as-contract + +The subagent directory is the **sole interface** between parent and child. +Everything a subagent needs -- its task, its observable state -- lives in +well-known files inside that directory. + +Two JSON files and an MCP URL: + +| File | Writer | Reader | Lifecycle | +| ------------------ | ------------------------- | ------------------------ | ------------------------------------------- | +| **`task.json`** | Parent (before spawn) | Parent (at registration) | Write-once, never modified | +| **`state.json`** | Parent (audit projection) | Available for debugging | Eagerly materialized after each audit event | +| **`events.jsonl`** | Parent (audit log) | Available for replay | Append-only event log | + +The `task.json` includes an `mcp_url` field pointing at +`http://localhost:{port}/mcp?agent_id={id}`. The child reads this to discover +its MCP endpoint. No structured configuration flows through CLI flags, +environment variables, or other process-level channels. + +**Why:** CLI flags are a flat namespace -- they cause naming collisions, cannot +represent nested structure, are visible in process listings, and are subject to +`ARG_MAX` limits for large values like retry context. Files are structured, +inspectable (`cat task.json`), typed, and consistent with how we handle +observation (audit). + +See [subagents.md -- Task Manifest](./subagents.md#task-manifest) for the +`task.json` schema and spawn flow. + +### 7. Server-authoritative projection + +The fold runs only in Python. The frontend applies server-computed JSON Patches +mechanically -- it has no fold logic, no event interpretation, and no business +rules. When the frontend's view of state differs from the backend's, the bug is +in the fold or the patch computation -- not in the frontend. + +``` +push_event() -> fold() -> to_wire() -> make_patch() -> broadcast to subscribers + | + Browser receives patch, + applies applyPatch(store, patch) +``` + +**Why:** Maintaining two fold implementations (Python + TypeScript) requires +disciplinary synchronization. Any divergence produces subtle display bugs that +are hard to trace. JSON Patch makes correctness structural: one fold, one +source of truth, mechanical application on the client. + +--- + +## Workflow System + +### Workflow definitions + +A `Workflow` defines the set of phases available for a run, the initial phase, +and suggested transitions between phases. Two workflows are defined in +`koan/lib/workflows.py`: + +**plan** — intake → plan-spec → plan-review → execute + +| Phase | Role | Steps | Artifact | +| ------------- | ---------------------- | ------------------------------- | ------------------------- | +| `intake` | Requirement gathering | 3 (Gather → Deepen → Summarize) | Chat summary only | +| `plan-spec` | Technical planning | 2 (Analyze → Write) | `plan.md` | +| `plan-review` | Quality review | 2 (Read → Evaluate) | Chat report only | +| `execute` | Implementation handoff | 2 (Compose → Request) | Code changes via executor | + +**milestones** — stub workflow; runs intake only, then yields with a single +"done" suggestion. + +### Workflow selection + +The user selects a workflow at run start. The selection is stored in +`AppState.workflow` and used throughout the run for: + +- Phase transition validation (`is_valid_transition`) +- Phase boundary suggestions (`get_suggested_phases`) +- Phase guidance injection (`workflow.phase_guidance[phase]`) + +### Phase transition validation + +```python +def is_valid_transition(workflow: Workflow, from_phase: str, to_phase: str) -> bool: + return ( + to_phase in workflow.available_phases + and to_phase != from_phase + ) +``` + +The special value `"done"` bypasses this function — it is handled before the +validation call in `koan_set_phase`. For real phases, suggested transitions +from `workflow.suggested_transitions[current_phase]` guide the orchestrator's +default `koan_yield` suggestions. These are recommendations, not constraints — +the user can request any phase in `workflow.available_phases`. + +--- + +## Atomic Writes + +All persistent writes (JSON state, status.md, audit state.json) use the same +pattern: write to a `.tmp` file, then `os.rename()` to the target. This +prevents partial reads during concurrent access. + +The `koan/audit/event_log.py` module uses this pattern for all state writes. +This is not optional -- the web server and audit system access files +concurrently. A partial read of `state.json` would cause silent data +corruption or spurious errors. + +--- + +## Tool Registration + +Tools are registered as `fastmcp` tool handlers in `koan/web/mcp_endpoint.py`. +When a tool call arrives via HTTP, the MCP endpoint: + +1. Extracts `agent_id` from the URL query parameter +2. Looks up the agent's state (role, step counter, permissions) in the in-process registry +3. Calls `check_permission()` from `koan/lib/permissions.py` +4. If allowed, dispatches to the tool handler +5. Returns the result as the MCP tool response + +Tools are HTTP handlers; permissions are checked per-call. + +--- + +## Two Fold Systems + +Koan uses two independent fold systems that share the same structural pattern +(pure fold function, append-only log) but serve different purposes: + +### Audit fold (`koan/audit/fold.py`) + +Tracks the internal execution of each individual subagent. Input: per-subagent +audit events written to `events.jsonl`. Output: per-subagent `Projection` +materialized to `state.json`. One fold instance per running subagent. +Consumed by debugging and post-mortem analysis. + +### Projection fold (`koan/projections.py`) + +Tracks the complete frontend-visible state of the entire workflow run. Input: +workflow-level projection events emitted by `ProjectionStore.push_event()`. +Output: a single in-memory `Projection` covering all agents, run state, and +UI interactions. Consumed by the browser frontend via SSE. + +When adding new observable state, decide which system it belongs to: + +- State visible only in logs/debugging → audit fold +- State visible in the browser UI → projection fold + +See [projections.md](./projections.md) for the full event model, fold +specification, and SSE protocol. + +### Rules for both folds + +- **`fold()` is pure** -- given the same event sequence, it must produce the same + projection. No I/O, no randomness, no side effects inside `fold()`. +- **New event types require a fold handler.** Unknown events are silently ignored + (forward compatibility), but a new event that is not folded contributes nothing + to the projection. +- **Projection is eagerly materialized.** Updated after every `push_event()`. +- **Events are facts, not snapshots.** Events record what happened; the fold + derives current state from those facts. Do not store derived state as an event. + +--- + +## SSE Event Lifecycle + +State flows from LLM tool calls to the browser through the projection system. + +``` +[LLM calls tool via HTTP MCP] + | +[MCP endpoint handles call, emits audit event] + | +[fold() updates audit projection, state.json written atomically] + | +[push_event() called with workflow-level event] + | +[ProjectionStore: fold projection, compute JSON Patch, broadcast to subscribers] + | +[Browser receives patch, applies applyPatch(store, patch) — no interpretation] +``` + +### Concrete example: `koan_yield` + +``` +LLM calls koan_yield({ suggestions: [{id:"plan-spec", label:"Write plan", command:"..."}] }) + -> MCP endpoint checks permissions + -> push_event("yield_started", {suggestions: [...]}, agent_id="abc") + -> fold: appends YieldEntry to agent conversation, sets run.active_yield + -> patch: [{op:"add", path:"/run/agents/abc/conversation/entries/-", value:{type:"yield",...}}, + {op:"replace", path:"/run/activeYield", value:{suggestions:[...]}}] + -> broadcast patch to SSE subscribers + -> browser renders suggestion pills in activity feed and above chat input + -> tool handler creates asyncio.Future, stores in app_state.yield_future, awaits it + -> (HTTP connection held open) + +user clicks suggestion pill "Write plan" in the browser + -> YieldCard.onClick -> setChatDraft("write dashboard redesign implementation plan") + -> FeedbackInput useEffect fires -> textarea pre-filled + -> user reviews, presses Enter + -> POST /api/chat { message: "write dashboard redesign implementation plan" } + -> api_chat: yield_future is set -> append to user_message_buffer -> set_result(True) + -> yield_future resolves + -> drain_user_messages -> "write dashboard redesign implementation plan" + -> returns message text as MCP tool result +LLM receives user's message, responds, calls koan_set_phase("plan-spec") +``` + +### Snapshot on reconnect + +The `/events` endpoint accepts `?since=N`. If `since` matches the server's +current version, the client is up to date and only live patches are streamed. +Otherwise — on first connect, page reload, connection drop, or server restart +— a fresh snapshot is sent, then live patches follow. + +``` +event: snapshot +data: {"version": 42, "state": { ...full projection in camelCase... }} + +event: patch +data: {"type": "patch", "version": 43, "patch": [{...}, ...]} +``` + +All reconnect scenarios are handled identically. The client does not distinguish +between a brief disconnect and a server restart — it receives a snapshot and +renders from it. + +--- + +## Pitfalls + +Known invariant violations and their consequences. Check new changes against these. + +### Don't put task content in spawn prompts + +The boot prompt must be exactly one sentence: role identity + "call +koan_complete_step". Putting task content (file paths, instructions, context) +risks the LLM producing text output on the first turn and exiting. This has +happened with haiku-class models and is not recoverable. + +### Don't add `escalated` as a story status + +Escalation flows through `koan_ask_question` (MCP tool call -> web UI -> user +answers -> MCP response). A separate `escalated` status creates a dead routing +path -- the driver has nowhere clean to send it without duplicating the ask UI +flow. + +### Don't add `scouting` as a workflow phase + +Scouts run inside the `koan_request_scouts` tool handler during +intake/planning phases, not as a top-level driver phase. Adding +`scouting` to `WorkflowPhase` would imply a driver state that never exists, +creating dead code paths. + +### Don't rely on file existence for scout success + +Scout success is derived from the JSON projection (`status === "completed"`), +not from checking whether `findings.md` exists. A scout can write a partial +findings file and then crash -- file existence is not proof of completion. + +### Don't crash on recoverable model-output parse errors + +Fail-fast is scoped to **unrecoverable conditions**: + +- invariant/contract violations (e.g., broken `task.json` bootstrap contract) +- unexpected states where there is no safe deterministic next action +- failures with no simple local recovery path + +If a model emits malformed tool-call payloads (invalid JSON/args) or other +per-turn formatting errors, treat them as recoverable execution errors: +return a structured tool error so the model can self-correct and retry in +the same subagent process. + +| Condition | Classification | Expected handling | +| ------------------------------------------------------------- | -------------- | ---------------------------------------- | +| Malformed tool-call JSON/args from LLM | Recoverable | Return tool error, keep process alive | +| Tool argument schema validation failure | Recoverable | Return validation error, let model retry | +| Disallowed/unknown tool call | Recoverable | Return blocked tool error, continue turn | +| Missing/malformed `task.json` at subagent startup | Unrecoverable | Fail fast (bootstrap contract broken) | +| Impossible phase routing / internal invariant breach | Unrecoverable | Fail fast | +| Unexpected runtime state with no clear deterministic recovery | Unrecoverable | Fail fast | + +### Don't assume bash is restricted per role + +`bash` is in `READ_TOOLS` and always allowed. The permission layer cannot +distinguish a read-bash from a write-bash. Prompt engineering is the only +constraint. Do not assume bash calls are blocked for planning roles. + +### Don't rely on prompt instructions alone to restrict step behavior + +**The pattern: prompt expresses intent; mechanical gate catches non-compliance. +Neither alone is sufficient.** + +- **Prompt alone** -- the LLM can ignore it. +- **Gate alone** -- the LLM receives a cryptic "blocked" error with no context. + +Three enforcement mechanisms are available -- use the appropriate one for the +constraint: + +| Mechanism | What it enforces | How | +| ----------------------------------------- | ------------------------------------------ | ------------------------------------------------------------- | +| **Permission fence** (`check_permission`) | Which tools a role (or step) can use | Block at MCP endpoint; LLM sees a rejection message | +| **`validate_step_completion()`** | Required pre-calls before step advancement | Block `koan_complete_step`; LLM sees an error and must comply | +| **Tool description** | Soft guidance on when to call | Cannot be enforced; LLM can ignore it | + +Any behavioral constraint that matters for correctness needs **both** a prompt +instruction (so the LLM knows what to do) and a mechanical gate (so +non-compliance is caught and corrected, not silently propagated). + +See [intake-loop.md -- Step-Aware Permission Gating](./intake-loop.md#step-aware-permission-gating). + +### Don't give a step multiple cognitive goals + +Each step should have exactly one cognitive goal. Grouping multiple goals into +a single step ("do A, then B, then C") enables **simulated refinement**: the +LLM artificially downgrades its output for A to manufacture visible improvement +in C. Separate `koan_complete_step` calls enforce genuinely isolated reasoning. + +When designing a new phase, each step should answer: "What is the single thing +this step accomplishes?" If the answer requires "and then", split the step. + +See [intake-loop.md -- Prompt Chaining over Stepwise](./intake-loop.md#prompt-engineering-principles) +for the detailed rationale. + +### Don't parse free-text for loop control decisions + +Confidence (the gate that controls the intake loop) is a structured enum +value set via a dedicated tool call, not a sentiment extracted from the LLM's +`thoughts` text. The driver determinism invariant prohibits parsing free-text +for routing decisions. Any loop gate must flow through a typed tool parameter +and a structured context field. + +### Don't put side effects in get_next_step() + +`get_next_step()` must be a pure query -- it returns the next step number and +nothing else. Putting state mutations, counter increments, or event emission +inside `get_next_step()` violates this contract. + +Side effects that accompany a loop-back belong in `on_loop_back()`: + +``` +BAD: get_next_step(4) { self.iteration += 1; self.confidence = None; return 2 } +GOOD: get_next_step(4) { return 2 } + on_loop_back(4, 2) { self.iteration += 1; self.confidence = None } +``` + +### Don't pass structured data through CLI flags + +If information is needed by a subagent, write it to `task.json` in the +subagent directory before spawning. CLI flags are for bootstrap only. The +directory-as-contract invariant exists specifically to prevent this. + +### Don't store derived state as an event + +Events record facts — things that happened. Derived state belongs in the fold +function, not in the event log. + +**Bad:** Emitting a `subagent_idle` event to signal "no agent is running." +"No agent" is derived from `agent_exited`, not a fact in itself. Storing it as +an event conflates the log with the projection. + +**Good:** Emitting `agent_exited`. The fold derives `primary_agent = None`. + +### Don't put high-frequency ephemeral data through the audit pipeline + +Token deltas and similar high-frequency signals arrive at hundreds of events +per second. Routing them through the audit pipeline would mean hundreds of +append + fold + atomic-write cycles per second for data that has no persistence +value. The runner stdout parsing path exists for exactly this case. See +[token-streaming.md](./token-streaming.md). + +Note: `stream_delta` events (token deltas) DO go through the projection fold, +but the fold only updates an in-memory string (`pending_text` on the agent's +conversation) — no disk I/O. The distinction is between the audit pipeline +(disk writes per event) and the projection fold (in-memory only). diff --git a/docs/design-system.md b/docs/design-system.md new file mode 100644 index 00000000..93ad956a --- /dev/null +++ b/docs/design-system.md @@ -0,0 +1,863 @@ +# Koan Design System + +The single source of truth for koan's visual design. `src/styles/variables.css` is a mechanical translation of the token tables below. The doc changes first, then the CSS follows. + +--- + +## Tokens + +### Background surfaces + +| Token | Hex | Usage | +| ----------------- | --------- | ------------------------------------------------------------------------- | +| `--bg-danger` | `#fce8e8` | Destructive confirmation backgrounds. Red-family tint. | +| `--bg-toggle-off` | `#d3d1c7` | Toggle track off state. Neutral warm gray, lighter than `--border-input`. | + +### Text colors + +| Token | Hex | Usage | +| -------------------- | --------- | --------------------------------------------------- | +| `--text-danger` | `#791f1f` | Destructive confirmation heading text. Darkest red. | +| `--text-danger-body` | `#a03030` | Destructive confirmation body text. | + +### Border colors + +| Token | Hex | Usage | +| ----------------- | --------- | ------------------------------------------------------------- | +| `--border-danger` | `#e8c8c8` | Danger button borders, destructive confirmation card borders. | +| `--border-teal` | `#b8d8cc` | Teal-accented button borders (Detect, Explore actions). | + +### Interactive colors + +| Token | Hex | Usage | +| ---------------------- | --------- | ------------------------------------------------------------------------ | +| `--color-orange-hover` | `#c06a4f` | Hover state for orange interactive elements (ReviewBlock gutter button). | + +### Component gaps + +| Token | Value | Usage | +| --------------------- | ----- | -------------------------------------------------------------------- | +| `--gap-entity-rows` | 8px | Between entity rows within a settings section card. | +| `--gap-form-rows` | 12px | Between form rows inside an inline form. | +| `--gap-form-controls` | 8px | Between controls in a single form row (e.g., three cascade selects). | + +### Component internal padding + +| Token | Value | Usage | +| ------------------------- | --------- | ----------------------------------------------------------------------------- | +| `--padding-card-settings` | 22px 26px | Settings section cards. | +| `--padding-entity-row` | 12px 16px | Entity rows (profile rows, installation rows). | +| `--padding-inline-form` | 22px 26px | Inline edit/create forms. Matches settings card padding for visual alignment. | + +### Page-level spacing + +| Token | Value | Usage | +| ---------------------- | ----- | ----------------------------------------------------------------- | +| `--settings-nav-width` | 152px | Side navigation column width on the Settings page. | +| `--settings-max-width` | 960px | Max width for the Settings page layout container (nav + content). | + +### Tool family indicator colors + +| Token | Hex | Usage | +| ------------ | --------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `--dot-read` | `#5a9a8a` | `StatusDot` `status="read"`. Identifies `read` operations in tool aggregate cards. Aliases `--color-teal`; the alias pattern matches `--status-done`. | +| `--dot-grep` | `#7ab0a0` | `StatusDot` `status="grep"`. Identifies `grep` operations. Slightly lighter teal than `--dot-read`; distinguishable from `--dot-read` at 8px stat-block size, secondary to the command text at 6px log-row size. | +| `--dot-ls` | `#4a8878` | `StatusDot` `status="ls"`. Identifies `ls` operations. Slightly darker teal than `--dot-read`. | + +All three tokens belong to the teal family because all three tools are +read-only exploration operations. Orange is reserved for active state +(`--color-orange`) and must not appear in tool-family indicator colors. + +--- + +## Atoms + +### StatusDot + +A small colored circle indicating either an operational state or a tool family. + +Container: `display: inline-block`, `border-radius: var(--radius-circle)`, +`flex-shrink: 0`. All variants are static — no animation. In-flight activity +indicators in consuming molecules are implemented inline (see `ToolCallRow`'s +`.tcr-running-dot` pattern) rather than through `StatusDot`, so that +`StatusDot` stays a pure visual primitive and adjacent features that already +use `StatusDot` (e.g., `ScoutRow`) are not affected by changes in this area. + +**Sizes:** + +- `sm`: 6px × 6px. Used inside `ToolLogRow` log rows where vertical density + matters. +- `md`: 8px × 8px. Default. Used in `ToolStatBlock` stat blocks, scout tables, + artifact cards, and the header orchestrator indicator. + +**Status variants — operational state:** + +- `running`: `background: var(--status-running)` (orange). Static. +- `done`: `background: var(--status-done)` (teal). Static. +- `queued`: `background: var(--status-queued)` (neutral warm gray). Static. +- `failed`: `background: var(--status-failed)` (red). Static. + +**Status variants — tool family:** + +- `read`: `background: var(--dot-read)`. Static. +- `grep`: `background: var(--dot-grep)`. Static. +- `ls`: `background: var(--dot-ls)`. Static. + +The tool-family variants share the `status` prop with the operational variants +intentionally — the geometry and usage pattern are identical, and a single +`status` prop keeps consumers' call sites readable. + +Type: `Status = 'running' | 'done' | 'queued' | 'failed' | 'read' | 'grep' | 'ls'`, +`Size = 'sm' | 'md'`. + +Props: `status: Status`, `size?: Size` (default `'md'`). + +### TextInput + +Shared text input used in settings forms, NewRunForm textarea, NewRunForm concurrency input, RadioOption/CheckboxOption custom text input, and FeedbackInput textarea. + +**Field variant (default):** Background `--bg-base`, `1.5px solid --border-input`, `--radius-lg`. Padding: 8px 12px. Font: `--font-body`, 13px, `--text-primary`. Placeholder: `--text-placeholder`. Focus: border-color `--color-orange`, box-shadow `0 0 0 3px var(--focus-ring)`. Error state: border-color `--status-failed`. Disabled: opacity 0.5. + +**Inline variant:** Transparent background, no side/top borders, `border-bottom: 1px solid --border-card`. Padding: 8px 0. Focus: border-bottom-color `--border-input`. Used inside RadioOption and CheckboxOption for the custom "Other" text input. + +**Mono modifier:** When `mono` is true, uses `--font-mono` at 13px. For file paths, extra args, and technical identifiers. + +**Textarea mode:** When rendered as `