hashintel · lunelson · Apr 7, 2026 · Apr 2, 2026 · Apr 2, 2026
diff --git a/memory/PLAN.md b/memory/PLAN.md
@@ -129,12 +129,15 @@
     - Branch: `ln/fe-558-ui-foundation`
     - **Verification approach**: inner — `npm run verify` (lint, format, type-check, all tests, build). Outer — manual visual inspection of interview workspace and project list in dev mode.
 
-5. **Observer agent + entity persistence** — After each answered turn, core invokes a second agent call that extracts decisions and assumptions. Writes to decision/assumption tables with turn linkage and dependency edges. Core yields `observer-complete` DomainEvent **post-commit** (after SQLite transaction); SSE adapter emits as typed data part on existing chat stream (in-band sync per D22). Context builders upgraded to use `md-pen` for structured entity rendering (tables, checklists) in observer context. `not-started`
+5. **Observer agent + entity persistence** `FE-537` — After each answered turn, core invokes a second agent call that extracts decisions and assumptions. Writes to decision/assumption tables with turn linkage and dependency edges. Core yields `observer-complete` DomainEvent **post-commit** (after SQLite transaction); SSE adapter emits as typed data part on existing chat stream (in-band sync per D22). Context builders upgraded to use `md-pen` for structured entity rendering (tables, checklists) in observer context. Agent pattern refactored: conductTurn() is thin sequencer, each agent is async generator composed via yield* (D27). Observer uses outputFormat for structured JSON extraction (D28). ResultMessage inspection for agent metrics (D29). `done`
    - Requirements: → SPEC.md §Requirements #5
-   - Assumptions: → SPEC.md §Assumptions A3, A4, A14 (validated by spike), A20
-   - Decisions: → SPEC.md §Decisions D22 (in-band sync — observer-complete as data part), D26 (md-pen for markdown rendering)
-   - Acceptance: answer a scope question, observer extracts decision + assumptions, dependency edges in DB, `observer-complete` event emitted post-commit with entity IDs, extraction within user think time
-   - **Verification approach**: inner — unit tests for entity writes with dependency edges, observer-complete DomainEvent emission post-commit, SSE adapter data-part encoding. Middle — differential oracle from spike fixtures (observer extraction vs golden master, ≥80% capture). Outer — debug mode: raw observer extraction visible per-turn in UI; fixture capture from confirmed-good manual runs. → SPEC.md §Oracle Strategy, §Observer History Projection, §Acknowledged Blind Spots (extraction variance, cumulative graph integrity)
+   - Assumptions: → SPEC.md §Assumptions A3, A4, A14 (validated by spike), A20, A24, A25
+   - Decisions: → SPEC.md §Decisions D22 (in-band sync — observer-complete as data part), D26 (md-pen), D27 (agent generator composition), D28 (outputFormat), D29 (ResultMessage metrics)
+   - Invariants established: → SPEC.md §Invariants I20, I21, I22
+   - Invariants respected: → SPEC.md §Invariants I1, I5, I6, I9, I10, I12, I13, I17, I19
+   - Acceptance: 147 tests pass (24 new); agent pattern refactored; observer persists entities with turn linkage and dependency edges; observer-complete emitted post-commit; SSE adapter encodes as data-observer-result; observer errors non-fatal; context uses md-pen; agent-metrics emitted
+   - Branch: `ln/fe-537-observer-agent`
+   - **Verification approach**: inner — unit tests for entity writes with dependency edges, observer-complete DomainEvent emission post-commit, SSE adapter data-part encoding, sdk translateStreamEvents parity, observer-error non-fatality, agent-metrics shape. Middle — differential oracle from spike fixtures (deferred to manual testing). Outer — debug mode and fixture capture (deferred to slice 6). → SPEC.md §Oracle Strategy
 
 6. **Entity sidebar (read-only)** — React sidebar in interview workspace showing decisions, assumptions, requirements, and criteria on the active path. Tabbed display. TanStack Query (`useQuery`) for entity data; cache populated via `queryClient.setQueryData` from `useChat`'s `onData` callback when `observer-complete` data parts arrive (in-band sync per D22). Dependency edges visible. Stale badges for soft-invalidated entities. `not-started`
    - Requirements: → SPEC.md §Requirements #6

diff --git a/memory/SPEC.md b/memory/SPEC.md
@@ -87,9 +87,17 @@ The architecture (layered: db → core → adapters):
 | A21 | `useChat` `onData` callback reliably bridges to `queryClient.setQueryData` without stale-closure issues — known `onFinish` stale-closure bug (ai-sdk#550) may or may not affect `onData`                                                                                                                                   | medium        | D22                 | Entity sidebar    | Test in slice 6: verify `setQueryData` from `onData` updates sidebar reactively; if stale, use parallel `EventSource` instead                                                                                                        |
 | A22 | AI SDK `UIMessage.parts[]` with custom Data Parts (typed via `dataPartsSchema`) persisted as JSON on the turn table is sufficient for faithful UI resume — no separate `turn_message` table needed for current scope                                                                                                         | **validated** | D23, D24            | Parts persistence | Validated: parts assembler converts DomainEvents to typed parts, round-trips through JSON persistence (I18). Client hydration from parts deferred to 4b (outer-loop). |
 | A23 | Custom Data Parts for structured user input (option selection, confirmation) can replace scalar `turn.answer` as the primary user-response model without breaking `formatHistory()` or observer context                                                                                                                      | **validated** | D24                 | Parts persistence | Validated: Data Part schemas defined with Zod (I17), context builders read scalars not parts (I19), structured user input round-trip tested. Full UI wiring deferred to 4b. |
+| A24 | SDK `outputFormat` with JSON schema produces equally reliable entity extraction as MCP tool-based extraction — structurally simpler (one API call, no tool round-trip), schema validation built into SDK response via `structured_output` field on `SDKResultMessage`                                                          | high          | D28                 | Observer agent    | Validate in slice 5: compare extraction quality with outputFormat vs spike's MCP tool approach. If outputFormat produces malformed or lower-quality extraction, fall back to MCP tool pattern |
+| A25 | `SDKResultMessage` provides accurate `duration_ms`, `total_cost_usd`, and `usage` for per-agent observability — types confirmed in TS SDK (`SDKResultSuccess`, `SDKResultError`)                                                                                                                                             | high          | D29                 | Observer agent    | Validate in slice 5: inspect ResultMessage after query() iteration, confirm fields are populated |
 
 ## Decisions
 
+27. **Agent module pattern — generator composition** — Each agent (interviewer, observer, future phase agents) is an async generator function yielding `DomainEvent`s. `conductTurn()` is a thin sequencer composing agents via `yield*`. No wrapper around `query()` — each agent calls the SDK directly with whatever options it needs (`outputFormat`, `effort`, `mcpServers`, etc.). A shared `translateStreamEvents()` utility in `sdk.ts` maps SDK `stream_event` messages to DomainEvents; streaming agents use it, silent agents don't. File layout: `interviewer.ts` (evolves from `interview.ts`), `observer.ts` (new), `sdk.ts` (new). Research: `docs/research/claude-agent-sdk-cookbook-patterns-vs-brunch-usage.md`. Depends on: D19. Supersedes: monolithic `conductTurn()` with inline `query()` call and stream parsing.
+
+28. **Observer uses `outputFormat` (structured JSON output)** — The observer agent returns extracted entities via SDK `outputFormat` with a Zod-derived JSON schema, not via MCP tools. The SDK validates the response and places the parsed result in `SDKResultMessage.structured_output`. This is simpler than tool-based extraction (one API call, no tool round-trip) and better suited to the observer's pure-extraction job (no side effects during the call). The interviewer retains MCP tools because `ask_question` has side effects (DB writes during the call). Depends on: A24. Supersedes: MCP tool-based observer extraction from spike.
+
+29. **`ResultMessage` inspection for agent observability** — After each `query()` call, the agent inspects `SDKResultMessage` for `duration_ms`, `duration_api_ms`, `total_cost_usd`, and `usage`. Emitted as `agent-metrics` DomainEvent. Primary use: validate A4 (observer latency) and track cost per turn. Secondary: surface in future debug mode overlay. Depends on: A25. Supersedes: discarding `ResultMessage` (gap #1 in cookbook research).
+
 26. **`md-pen` for programmatic markdown rendering** — Structured data (entity tables, dependency graphs, checklists) rendered to markdown via `md-pen` rather than hand-rolled string concatenation. Pure string-return functions (`table()`, `taskList()`, `mermaid()`, `heading()`, `alert()`, `details()`) compose by nesting — no AST, no intermediate representation. Escaping is context-aware per function (table cells, URLs, code fences), eliminating a class of bugs when rendering user-supplied text from interviews. Primary use cases: (1) observer context builders presenting growing entity graphs to agents (`table()` for decisions/assumptions with metadata, `taskList()` for reviewed/unreviewed items), (2) spec export rendering active-path entities into downloadable markdown (slice 13), (3) any future agent-facing or user-facing projection of structured data. Zero dependencies, ESM-only, TypeScript-first. Depends on: —. Supersedes: hand-rolled string assembly in context builders.
 
 ### Domain model
@@ -158,6 +166,9 @@ The architecture (layered: db → core → adapters):
 | I17 | Data Part schema validation | Slice 4a (parts persistence) | parts.test.ts (7 tests) | D24 |
 | I18 | Parts round-trip fidelity | Slice 4a (parts persistence) | parts.test.ts (8 tests), core.test.ts | D23 |
 | I19 | Context builder equivalence | Slice 4a (parts persistence) | context.test.ts (7 tests) | D25 |
+| I20 | Entity persistence with turn linkage | Slice 5 (observer) | db.test.ts (7 tests), observer.test.ts | D4, D5 |
+| I21 | Observer-complete post-commit | Slice 5 (observer) | observer.test.ts (6 tests), sse-adapter.test.ts (3 tests) | D22 |
+| I22 | Agent generator composition | Slice 5 (observer) | core.test.ts, sdk.test.ts (7 tests) | D27 |
 
 ## Lexicon
 
@@ -323,13 +334,15 @@ This projection difference is a deliberate design choice, not an implementation
 
 | File                | Tests | Protects                    |
 | ------------------- | ----- | --------------------------- |
-| sse-adapter.test.ts | 18    | I1, I3, I7                  |
-| db.test.ts          | 25    | I5, I6, I9, I10, I11, I18   |
+| sse-adapter.test.ts | 21    | I1, I3, I7, I21             |
+| db.test.ts          | 32    | I5, I6, I9, I10, I11, I18, I20 |
 | app.test.ts         | 22    | I2, I3, I6, I7, I13, I14    |
-| core.test.ts        | 16    | I12, I13, I18               |
+| core.test.ts        | 16    | I12, I13, I18, I22           |
 | interview.test.ts   | 16    | I16                         |
 | parts.test.ts       | 23    | I17, I18                    |
-| context.test.ts     | 7     | I19                         |
+| context.test.ts     | 8     | I19                         |
+| sdk.test.ts         | 7     | I22                         |
+| observer.test.ts    | 6     | I20, I21                    |
 
 ## Acceptance Criteria (exit conditions)
 

diff --git a/src/server/app.test.ts b/src/server/app.test.ts
@@ -24,6 +24,23 @@ let db: DB;
 
 beforeEach(() => {
   mockQuery.mockReset();
+  // Default: observer gets empty result for any call not covered by mockReturnValueOnce
+  mockQuery.mockImplementation(() =>
+    makeMockStream([
+      {
+        type: 'result',
+        subtype: 'success',
+        duration_ms: 500,
+        duration_api_ms: 300,
+        total_cost_usd: 0.0005,
+        is_error: false,
+        num_turns: 1,
+        usage: { input_tokens: 100, output_tokens: 50 },
+        result: '',
+        structured_output: { decisions: [], assumptions: [] },
+      },
+    ]),
+  );
   const result = createApp();
   app = result.app;
   db = result.db;
@@ -138,7 +155,7 @@ describe('GET /api/projects/:id', () => {
 
   it('returns turns on active path after a chat exchange', async () => {
     const projectId = await createTestProject('Chat Test');
-    mockQuery.mockReturnValue(mockTextStream('Hi'));
+    mockQuery.mockReturnValueOnce(mockTextStream('Hi'));
 
     await request(app)
       .post(`/api/projects/${projectId}/chat`)
@@ -155,7 +172,7 @@ describe('GET /api/projects/:id', () => {
 describe('POST /api/projects/:id/chat', () => {
   it('returns Content-Type text/event-stream', async () => {
     const projectId = await createTestProject();
-    mockQuery.mockReturnValue(mockTextStream());
+    mockQuery.mockReturnValueOnce(mockTextStream());
 
     const res = await request(app)
       .post(`/api/projects/${projectId}/chat`)
@@ -167,7 +184,7 @@ describe('POST /api/projects/:id/chat', () => {
 
   it('produces well-formed SSE lines with data: prefix and double newline delimiters', async () => {
     const projectId = await createTestProject();
-    mockQuery.mockReturnValue(
+    mockQuery.mockReturnValueOnce(
       makeMockStream([
         {
           type: 'stream_event',
@@ -204,7 +221,7 @@ describe('POST /api/projects/:id/chat', () => {
 
   it('contains at least one text-delta event with non-empty text', async () => {
     const projectId = await createTestProject();
-    mockQuery.mockReturnValue(mockTextStream('Hello!'));
+    mockQuery.mockReturnValueOnce(mockTextStream('Hello!'));
 
     const res = await request(app)
       .post(`/api/projects/${projectId}/chat`)
@@ -218,7 +235,7 @@ describe('POST /api/projects/:id/chat', () => {
 
   it('ends with finish event and [DONE]', async () => {
     const projectId = await createTestProject();
-    mockQuery.mockReturnValue(mockTextStream());
+    mockQuery.mockReturnValueOnce(mockTextStream());
 
     const res = await request(app)
       .post(`/api/projects/${projectId}/chat`)
@@ -233,7 +250,7 @@ describe('POST /api/projects/:id/chat', () => {
 
   it('emits reasoning-delta events for thinking content', async () => {
     const projectId = await createTestProject();
-    mockQuery.mockReturnValue(
+    mockQuery.mockReturnValueOnce(
       makeMockStream([
         {
           type: 'stream_event',
@@ -312,7 +329,7 @@ describe('POST /api/projects/:id/chat', () => {
 describe('POST /api/projects/:id/chat — tool calls', () => {
   it('emits tool-call SSE events for tool-using mock stream', async () => {
     const projectId = await createTestProject();
-    mockQuery.mockReturnValue(
+    mockQuery.mockReturnValueOnce(
       makeMockStream([
         {
           type: 'stream_event',
@@ -395,7 +412,7 @@ describe('POST /api/projects/:id/chat — tool calls', () => {
 describe('GET /api/projects/:id — enriched state', () => {
   it('returns turns with options after structured question', async () => {
     const projectId = await createTestProject();
-    mockQuery.mockReturnValue(mockTextStream('Hi'));
+    mockQuery.mockReturnValueOnce(mockTextStream('Hi'));
 
     await request(app)
       .post(`/api/projects/${projectId}/chat`)
@@ -419,7 +436,7 @@ describe('GET /api/projects/:id — enriched state', () => {
 describe('POST /api/projects/:id/turns/:turnId/select', () => {
   it('persists option selection and sets answer', async () => {
     const projectId = await createTestProject();
-    mockQuery.mockReturnValue(mockTextStream('Hi'));
+    mockQuery.mockReturnValueOnce(mockTextStream('Hi'));
 
     await request(app)
       .post(`/api/projects/${projectId}/chat`)
@@ -451,7 +468,7 @@ describe('POST /api/projects/:id/turns/:turnId/select', () => {
 
   it('returns 400 for missing position', async () => {
     const projectId = await createTestProject();
-    mockQuery.mockReturnValue(mockTextStream('Hi'));
+    mockQuery.mockReturnValueOnce(mockTextStream('Hi'));
 
     await request(app)
       .post(`/api/projects/${projectId}/chat`)
@@ -473,7 +490,7 @@ describe('POST /api/projects/:id/turns/:turnId/select', () => {
 describe('POST /api/projects/:id/chat — turn persistence', () => {
   it('creates a turn with user answer and advances HEAD', async () => {
     const projectId = await createTestProject();
-    mockQuery.mockReturnValue(mockTextStream('Hi there'));
+    mockQuery.mockReturnValueOnce(mockTextStream('Hi there'));
 
     await request(app)
       .post(`/api/projects/${projectId}/chat`)
@@ -492,12 +509,12 @@ describe('POST /api/projects/:id/chat — turn persistence', () => {
 
   it('chains turns with parent pointers across exchanges', async () => {
     const projectId = await createTestProject();
-    mockQuery.mockReturnValue(mockTextStream('First response'));
+    mockQuery.mockReturnValueOnce(mockTextStream('First response'));
     await request(app)
       .post(`/api/projects/${projectId}/chat`)
       .send({ messages: [{ role: 'user', content: 'first' }] });
 
-    mockQuery.mockReturnValue(mockTextStream('Second response'));
+    mockQuery.mockReturnValueOnce(mockTextStream('Second response'));
     await request(app)
       .post(`/api/projects/${projectId}/chat`)
       .send({ messages: [{ role: 'user', content: 'second' }] });

diff --git a/src/server/app.ts b/src/server/app.ts
@@ -113,6 +113,9 @@ export function createApp(dbPath?: string) {
       res.write(formatSSE({ type: 'error', errorText: message }));
     }
 
+    // Protocol termination: finish-step + finish after all events (including observer)
+    res.write(formatSSE({ type: 'finish-step' }));
+    res.write(formatSSE({ type: 'finish', finishReason: 'stop' }));
     res.write(formatSSE('[DONE]'));
     res.end();
   });

diff --git a/src/server/context.test.ts b/src/server/context.test.ts
@@ -184,4 +184,38 @@ describe('observer-context-projection', () => {
     // Should NOT contain the full Q&A pairs from earlier turns
     expect(result).not.toContain('Previous conversation:');
   });
+
+  it('renders entity tables with md-pen (not hand-rolled strings)', () => {
+    const turn: Turn = {
+      id: 5,
+      project_id: 1,
+      parent_turn_id: 4,
+      phase: 'scope',
+      question: 'Q5',
+      answer: 'A5',
+      why: null,
+      impact: null,
+      is_resolution: false,
+      user_parts: null,
+      assistant_parts: null,
+      created_at: '2026-01-01',
+    };
+
+    const result = buildObserverContext({
+      turn,
+      activePathSummary: '',
+      entities: {
+        decisions: [{ id: 1, content: 'Use React' }],
+        assumptions: [{ id: 2, content: 'Users have browsers' }],
+      },
+    });
+
+    // md-pen table() produces pipe-separated markdown tables
+    expect(result).toContain('| ID | Content |');
+    expect(result).toContain('| 1 | Use React |');
+    expect(result).toContain('| 2 | Users have browsers |');
+    // md-pen h3() produces ### headings
+    expect(result).toContain('### Existing Decisions');
+    expect(result).toContain('### Existing Assumptions');
+  });
 });