Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions memory/PLAN.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,12 +129,15 @@
- Branch: `ln/fe-558-ui-foundation`
- **Verification approach**: inner — `npm run verify` (lint, format, type-check, all tests, build). Outer — manual visual inspection of interview workspace and project list in dev mode.

5. **Observer agent + entity persistence** — After each answered turn, core invokes a second agent call that extracts decisions and assumptions. Writes to decision/assumption tables with turn linkage and dependency edges. Core yields `observer-complete` DomainEvent **post-commit** (after SQLite transaction); SSE adapter emits as typed data part on existing chat stream (in-band sync per D22). Context builders upgraded to use `md-pen` for structured entity rendering (tables, checklists) in observer context. `not-started`
5. **Observer agent + entity persistence** `FE-537` — After each answered turn, core invokes a second agent call that extracts decisions and assumptions. Writes to decision/assumption tables with turn linkage and dependency edges. Core yields `observer-complete` DomainEvent **post-commit** (after SQLite transaction); SSE adapter emits as typed data part on existing chat stream (in-band sync per D22). Context builders upgraded to use `md-pen` for structured entity rendering (tables, checklists) in observer context. Agent pattern refactored: conductTurn() is thin sequencer, each agent is async generator composed via yield* (D27). Observer uses outputFormat for structured JSON extraction (D28). ResultMessage inspection for agent metrics (D29). `done`
- Requirements: → SPEC.md §Requirements #5
- Assumptions: → SPEC.md §Assumptions A3, A4, A14 (validated by spike), A20
- Decisions: → SPEC.md §Decisions D22 (in-band sync — observer-complete as data part), D26 (md-pen for markdown rendering)
- Acceptance: answer a scope question, observer extracts decision + assumptions, dependency edges in DB, `observer-complete` event emitted post-commit with entity IDs, extraction within user think time
- **Verification approach**: inner — unit tests for entity writes with dependency edges, observer-complete DomainEvent emission post-commit, SSE adapter data-part encoding. Middle — differential oracle from spike fixtures (observer extraction vs golden master, ≥80% capture). Outer — debug mode: raw observer extraction visible per-turn in UI; fixture capture from confirmed-good manual runs. → SPEC.md §Oracle Strategy, §Observer History Projection, §Acknowledged Blind Spots (extraction variance, cumulative graph integrity)
- Assumptions: → SPEC.md §Assumptions A3, A4, A14 (validated by spike), A20, A24, A25
- Decisions: → SPEC.md §Decisions D22 (in-band sync — observer-complete as data part), D26 (md-pen), D27 (agent generator composition), D28 (outputFormat), D29 (ResultMessage metrics)
- Invariants established: → SPEC.md §Invariants I20, I21, I22
- Invariants respected: → SPEC.md §Invariants I1, I5, I6, I9, I10, I12, I13, I17, I19
- Acceptance: 147 tests pass (24 new); agent pattern refactored; observer persists entities with turn linkage and dependency edges; observer-complete emitted post-commit; SSE adapter encodes as data-observer-result; observer errors non-fatal; context uses md-pen; agent-metrics emitted
- Branch: `ln/fe-537-observer-agent`
- **Verification approach**: inner — unit tests for entity writes with dependency edges, observer-complete DomainEvent emission post-commit, SSE adapter data-part encoding, sdk translateStreamEvents parity, observer-error non-fatality, agent-metrics shape. Middle — differential oracle from spike fixtures (deferred to manual testing). Outer — debug mode and fixture capture (deferred to slice 6). → SPEC.md §Oracle Strategy

6. **Entity sidebar (read-only)** — React sidebar in interview workspace showing decisions, assumptions, requirements, and criteria on the active path. Tabbed display. TanStack Query (`useQuery`) for entity data; cache populated via `queryClient.setQueryData` from `useChat`'s `onData` callback when `observer-complete` data parts arrive (in-band sync per D22). Dependency edges visible. Stale badges for soft-invalidated entities. `not-started`
- Requirements: → SPEC.md §Requirements #6
Expand Down
21 changes: 17 additions & 4 deletions memory/SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,9 +87,17 @@ The architecture (layered: db → core → adapters):
| A21 | `useChat` `onData` callback reliably bridges to `queryClient.setQueryData` without stale-closure issues — known `onFinish` stale-closure bug (ai-sdk#550) may or may not affect `onData` | medium | D22 | Entity sidebar | Test in slice 6: verify `setQueryData` from `onData` updates sidebar reactively; if stale, use parallel `EventSource` instead |
| A22 | AI SDK `UIMessage.parts[]` with custom Data Parts (typed via `dataPartsSchema`) persisted as JSON on the turn table is sufficient for faithful UI resume — no separate `turn_message` table needed for current scope | **validated** | D23, D24 | Parts persistence | Validated: parts assembler converts DomainEvents to typed parts, round-trips through JSON persistence (I18). Client hydration from parts deferred to 4b (outer-loop). |
| A23 | Custom Data Parts for structured user input (option selection, confirmation) can replace scalar `turn.answer` as the primary user-response model without breaking `formatHistory()` or observer context | **validated** | D24 | Parts persistence | Validated: Data Part schemas defined with Zod (I17), context builders read scalars not parts (I19), structured user input round-trip tested. Full UI wiring deferred to 4b. |
| A24 | SDK `outputFormat` with JSON schema produces equally reliable entity extraction as MCP tool-based extraction — structurally simpler (one API call, no tool round-trip), schema validation built into SDK response via `structured_output` field on `SDKResultMessage` | high | D28 | Observer agent | Validate in slice 5: compare extraction quality with outputFormat vs spike's MCP tool approach. If outputFormat produces malformed or lower-quality extraction, fall back to MCP tool pattern |
| A25 | `SDKResultMessage` provides accurate `duration_ms`, `total_cost_usd`, and `usage` for per-agent observability — types confirmed in TS SDK (`SDKResultSuccess`, `SDKResultError`) | high | D29 | Observer agent | Validate in slice 5: inspect ResultMessage after query() iteration, confirm fields are populated |

## Decisions

27. **Agent module pattern — generator composition** — Each agent (interviewer, observer, future phase agents) is an async generator function yielding `DomainEvent`s. `conductTurn()` is a thin sequencer composing agents via `yield*`. No wrapper around `query()` — each agent calls the SDK directly with whatever options it needs (`outputFormat`, `effort`, `mcpServers`, etc.). A shared `translateStreamEvents()` utility in `sdk.ts` maps SDK `stream_event` messages to DomainEvents; streaming agents use it, silent agents don't. File layout: `interviewer.ts` (evolves from `interview.ts`), `observer.ts` (new), `sdk.ts` (new). Research: `docs/research/claude-agent-sdk-cookbook-patterns-vs-brunch-usage.md`. Depends on: D19. Supersedes: monolithic `conductTurn()` with inline `query()` call and stream parsing.

28. **Observer uses `outputFormat` (structured JSON output)** — The observer agent returns extracted entities via SDK `outputFormat` with a Zod-derived JSON schema, not via MCP tools. The SDK validates the response and places the parsed result in `SDKResultMessage.structured_output`. This is simpler than tool-based extraction (one API call, no tool round-trip) and better suited to the observer's pure-extraction job (no side effects during the call). The interviewer retains MCP tools because `ask_question` has side effects (DB writes during the call). Depends on: A24. Supersedes: MCP tool-based observer extraction from spike.

29. **`ResultMessage` inspection for agent observability** — After each `query()` call, the agent inspects `SDKResultMessage` for `duration_ms`, `duration_api_ms`, `total_cost_usd`, and `usage`. Emitted as `agent-metrics` DomainEvent. Primary use: validate A4 (observer latency) and track cost per turn. Secondary: surface in future debug mode overlay. Depends on: A25. Supersedes: discarding `ResultMessage` (gap #1 in cookbook research).

26. **`md-pen` for programmatic markdown rendering** — Structured data (entity tables, dependency graphs, checklists) rendered to markdown via `md-pen` rather than hand-rolled string concatenation. Pure string-return functions (`table()`, `taskList()`, `mermaid()`, `heading()`, `alert()`, `details()`) compose by nesting — no AST, no intermediate representation. Escaping is context-aware per function (table cells, URLs, code fences), eliminating a class of bugs when rendering user-supplied text from interviews. Primary use cases: (1) observer context builders presenting growing entity graphs to agents (`table()` for decisions/assumptions with metadata, `taskList()` for reviewed/unreviewed items), (2) spec export rendering active-path entities into downloadable markdown (slice 13), (3) any future agent-facing or user-facing projection of structured data. Zero dependencies, ESM-only, TypeScript-first. Depends on: —. Supersedes: hand-rolled string assembly in context builders.

### Domain model
Expand Down Expand Up @@ -158,6 +166,9 @@ The architecture (layered: db → core → adapters):
| I17 | Data Part schema validation | Slice 4a (parts persistence) | parts.test.ts (7 tests) | D24 |
| I18 | Parts round-trip fidelity | Slice 4a (parts persistence) | parts.test.ts (8 tests), core.test.ts | D23 |
| I19 | Context builder equivalence | Slice 4a (parts persistence) | context.test.ts (7 tests) | D25 |
| I20 | Entity persistence with turn linkage | Slice 5 (observer) | db.test.ts (7 tests), observer.test.ts | D4, D5 |
| I21 | Observer-complete post-commit | Slice 5 (observer) | observer.test.ts (6 tests), sse-adapter.test.ts (3 tests) | D22 |
| I22 | Agent generator composition | Slice 5 (observer) | core.test.ts, sdk.test.ts (7 tests) | D27 |

## Lexicon

Expand Down Expand Up @@ -323,13 +334,15 @@ This projection difference is a deliberate design choice, not an implementation

| File | Tests | Protects |
| ------------------- | ----- | --------------------------- |
| sse-adapter.test.ts | 18 | I1, I3, I7 |
| db.test.ts | 25 | I5, I6, I9, I10, I11, I18 |
| sse-adapter.test.ts | 21 | I1, I3, I7, I21 |
| db.test.ts | 32 | I5, I6, I9, I10, I11, I18, I20 |
| app.test.ts | 22 | I2, I3, I6, I7, I13, I14 |
| core.test.ts | 16 | I12, I13, I18 |
| core.test.ts | 16 | I12, I13, I18, I22 |
| interview.test.ts | 16 | I16 |
| parts.test.ts | 23 | I17, I18 |
| context.test.ts | 7 | I19 |
| context.test.ts | 8 | I19 |
| sdk.test.ts | 7 | I22 |
| observer.test.ts | 6 | I20, I21 |

## Acceptance Criteria (exit conditions)

Expand Down
43 changes: 30 additions & 13 deletions src/server/app.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,23 @@ let db: DB;

beforeEach(() => {
mockQuery.mockReset();
// Default: observer gets empty result for any call not covered by mockReturnValueOnce
mockQuery.mockImplementation(() =>
makeMockStream([
{
type: 'result',
subtype: 'success',
duration_ms: 500,
duration_api_ms: 300,
total_cost_usd: 0.0005,
is_error: false,
num_turns: 1,
usage: { input_tokens: 100, output_tokens: 50 },
result: '',
structured_output: { decisions: [], assumptions: [] },
},
]),
);
const result = createApp();
app = result.app;
db = result.db;
Expand Down Expand Up @@ -138,7 +155,7 @@ describe('GET /api/projects/:id', () => {

it('returns turns on active path after a chat exchange', async () => {
const projectId = await createTestProject('Chat Test');
mockQuery.mockReturnValue(mockTextStream('Hi'));
mockQuery.mockReturnValueOnce(mockTextStream('Hi'));

await request(app)
.post(`/api/projects/${projectId}/chat`)
Expand All @@ -155,7 +172,7 @@ describe('GET /api/projects/:id', () => {
describe('POST /api/projects/:id/chat', () => {
it('returns Content-Type text/event-stream', async () => {
const projectId = await createTestProject();
mockQuery.mockReturnValue(mockTextStream());
mockQuery.mockReturnValueOnce(mockTextStream());

const res = await request(app)
.post(`/api/projects/${projectId}/chat`)
Expand All @@ -167,7 +184,7 @@ describe('POST /api/projects/:id/chat', () => {

it('produces well-formed SSE lines with data: prefix and double newline delimiters', async () => {
const projectId = await createTestProject();
mockQuery.mockReturnValue(
mockQuery.mockReturnValueOnce(
makeMockStream([
{
type: 'stream_event',
Expand Down Expand Up @@ -204,7 +221,7 @@ describe('POST /api/projects/:id/chat', () => {

it('contains at least one text-delta event with non-empty text', async () => {
const projectId = await createTestProject();
mockQuery.mockReturnValue(mockTextStream('Hello!'));
mockQuery.mockReturnValueOnce(mockTextStream('Hello!'));

const res = await request(app)
.post(`/api/projects/${projectId}/chat`)
Expand All @@ -218,7 +235,7 @@ describe('POST /api/projects/:id/chat', () => {

it('ends with finish event and [DONE]', async () => {
const projectId = await createTestProject();
mockQuery.mockReturnValue(mockTextStream());
mockQuery.mockReturnValueOnce(mockTextStream());

const res = await request(app)
.post(`/api/projects/${projectId}/chat`)
Expand All @@ -233,7 +250,7 @@ describe('POST /api/projects/:id/chat', () => {

it('emits reasoning-delta events for thinking content', async () => {
const projectId = await createTestProject();
mockQuery.mockReturnValue(
mockQuery.mockReturnValueOnce(
makeMockStream([
{
type: 'stream_event',
Expand Down Expand Up @@ -312,7 +329,7 @@ describe('POST /api/projects/:id/chat', () => {
describe('POST /api/projects/:id/chat — tool calls', () => {
it('emits tool-call SSE events for tool-using mock stream', async () => {
const projectId = await createTestProject();
mockQuery.mockReturnValue(
mockQuery.mockReturnValueOnce(
makeMockStream([
{
type: 'stream_event',
Expand Down Expand Up @@ -395,7 +412,7 @@ describe('POST /api/projects/:id/chat — tool calls', () => {
describe('GET /api/projects/:id — enriched state', () => {
it('returns turns with options after structured question', async () => {
const projectId = await createTestProject();
mockQuery.mockReturnValue(mockTextStream('Hi'));
mockQuery.mockReturnValueOnce(mockTextStream('Hi'));

await request(app)
.post(`/api/projects/${projectId}/chat`)
Expand All @@ -419,7 +436,7 @@ describe('GET /api/projects/:id — enriched state', () => {
describe('POST /api/projects/:id/turns/:turnId/select', () => {
it('persists option selection and sets answer', async () => {
const projectId = await createTestProject();
mockQuery.mockReturnValue(mockTextStream('Hi'));
mockQuery.mockReturnValueOnce(mockTextStream('Hi'));

await request(app)
.post(`/api/projects/${projectId}/chat`)
Expand Down Expand Up @@ -451,7 +468,7 @@ describe('POST /api/projects/:id/turns/:turnId/select', () => {

it('returns 400 for missing position', async () => {
const projectId = await createTestProject();
mockQuery.mockReturnValue(mockTextStream('Hi'));
mockQuery.mockReturnValueOnce(mockTextStream('Hi'));

await request(app)
.post(`/api/projects/${projectId}/chat`)
Expand All @@ -473,7 +490,7 @@ describe('POST /api/projects/:id/turns/:turnId/select', () => {
describe('POST /api/projects/:id/chat — turn persistence', () => {
it('creates a turn with user answer and advances HEAD', async () => {
const projectId = await createTestProject();
mockQuery.mockReturnValue(mockTextStream('Hi there'));
mockQuery.mockReturnValueOnce(mockTextStream('Hi there'));

await request(app)
.post(`/api/projects/${projectId}/chat`)
Expand All @@ -492,12 +509,12 @@ describe('POST /api/projects/:id/chat — turn persistence', () => {

it('chains turns with parent pointers across exchanges', async () => {
const projectId = await createTestProject();
mockQuery.mockReturnValue(mockTextStream('First response'));
mockQuery.mockReturnValueOnce(mockTextStream('First response'));
await request(app)
.post(`/api/projects/${projectId}/chat`)
.send({ messages: [{ role: 'user', content: 'first' }] });

mockQuery.mockReturnValue(mockTextStream('Second response'));
mockQuery.mockReturnValueOnce(mockTextStream('Second response'));
await request(app)
.post(`/api/projects/${projectId}/chat`)
.send({ messages: [{ role: 'user', content: 'second' }] });
Expand Down
3 changes: 3 additions & 0 deletions src/server/app.ts
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,9 @@ export function createApp(dbPath?: string) {
res.write(formatSSE({ type: 'error', errorText: message }));
}

// Protocol termination: finish-step + finish after all events (including observer)
res.write(formatSSE({ type: 'finish-step' }));
res.write(formatSSE({ type: 'finish', finishReason: 'stop' }));
res.write(formatSSE('[DONE]'));
res.end();
});
Expand Down
34 changes: 34 additions & 0 deletions src/server/context.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -184,4 +184,38 @@ describe('observer-context-projection', () => {
// Should NOT contain the full Q&A pairs from earlier turns
expect(result).not.toContain('Previous conversation:');
});

it('renders entity tables with md-pen (not hand-rolled strings)', () => {
const turn: Turn = {
id: 5,
project_id: 1,
parent_turn_id: 4,
phase: 'scope',
question: 'Q5',
answer: 'A5',
why: null,
impact: null,
is_resolution: false,
user_parts: null,
assistant_parts: null,
created_at: '2026-01-01',
};

const result = buildObserverContext({
turn,
activePathSummary: '',
entities: {
decisions: [{ id: 1, content: 'Use React' }],
assumptions: [{ id: 2, content: 'Users have browsers' }],
},
});

// md-pen table() produces pipe-separated markdown tables
expect(result).toContain('| ID | Content |');
expect(result).toContain('| 1 | Use React |');
expect(result).toContain('| 2 | Users have browsers |');
// md-pen h3() produces ### headings
expect(result).toContain('### Existing Decisions');
expect(result).toContain('### Existing Assumptions');
});
});
Loading