Episodic synthesis (per-episode LLM call) + general-scope reflections#25
Conversation
Adds per-episode tracking columns to episodes: - synthesized_at: NULL until the episode has been consolidated - synth_failed_at: set on LLM-class failures for 300s cooldown Backfills synthesized_at for closed episodes whose observations are all non-active. Episodes with leftover active observations stay NULL → picked up by the next synth run under the new design. Drops synthesis_runs — the per-episode columns supersede it. Includes test_migration_0006.py covering schema, index, and backfill correctness; updates test_schema.py to drop synthesis_runs assertions and add coverage for the new columns and partial index. Service / route / MCP changes follow in subsequent commits — service tests, UI synth tests, and MCP start_episode tests are expected to fail on this commit but pass after Commit 4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…esize_next Replaces the watermark-driven batch synthesize() with a per-episode synthesize_next() method. Each call processes the oldest closed-but- unsynthesized episode for the project: loads its observations and tech-filtered reflections, calls the LLM once, applies new/augment/ merge/ignore actions inside a per-episode SAVEPOINT, marks episodes.synthesized_at on success. Failure handling splits LLM-class errors (ChatError, SynthesisResponseError) from structural/DB errors: - LLM-class: ROLLBACK, stamp synth_failed_at for 300s cooldown, return SynthesisStep with failure set. Caller (UI auto-chain or MCP drain loop) continues past it; cooldown filter on _pick_oldest_pending prevents the just-failed episode from being immediately re-picked. - Structural: ROLLBACK and propagate; route surfaces 500 and stops chain. New types: EpisodeForPrompt, EpisodeContext, EpisodeQueueCounts, SynthesisStep. ObservationForPrompt gains a 'status' field so the LLM sees consumed observations as historical context. Removed: synthesize, load_context, build_prompt, _should_short_circuit, _upsert_watermark, _bucketed_reflections, last_run_counts, SynthesisContext. Apply layer (_apply_new/_apply_augment/_apply_merge/ _apply_ignore/_auto_ignore_unused) and parse_response are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the batch-banner response with a per-episode step banner. Each POST processes ONE episode; if step.queue.pending > 0 the response includes an htmx auto-fire div that triggers the next request after 200ms. Chain stops naturally when pending hits zero. The route's mutex / worker-thread / WorkerTimeout / BaseException handlers (PR #18 hardening) survive unchanged — only the inner _run coroutine and the success-banner branch change. New template fragments/synth_step_banner.html replaces observations_synth_banner.html with 4 states: - success-with-pending: step card + auto-fire - success-without-pending (queue empty): synth-done card, no auto-fire - failure-with-pending: warning card + auto-fire (chain continues) - failure-without-pending: warning card, no auto-fire HX-Trigger: observations-synthesized fires on every step (live drain visualization). Banner counts come from step.queue (single-connection snapshot in the worker), not a second SQL query — eliminates the race that the original spec draft had. UI tests rewritten to exercise the four states + chain recovery: HX-Trigger fires every step, banner counts come from the service step, failure-with-pending continues the chain via auto-fire, failure-without- pending stops it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
memory.start_episode now drains pending closed episodes via a tight while-loop over synthesize_next, then returns the new episode's reflection context via retrieve_reflections(tech=...). The cooldown stamp on synth_failed_at means persistently-failing episodes are excluded from the next _pick_oldest_pending call, so the loop terminates even when an episode reliably raises ChatError — otherwise the drain would be a stop-the-world infinite loop on the first persistent failure. New seed_pending_episodes() helper in tests/conftest.py used by service, UI, and MCP tests to populate a closed-pending queue without duplicating fixture SQL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds scope ('project' | 'general') to observations and reflections so
cross-project workflow rules surface in every project's memory_retrieve.
Schema (migration 0007):
- observations.scope NOT NULL DEFAULT 'project' CHECK
- reflections.scope NOT NULL DEFAULT 'project' CHECK
- partial index on reflections WHERE scope='general'
- one-shot fix-up: the workflow-rule observation recorded on 2026-05-04
gets scope='general'
Write path:
- ObservationService.create accepts scope kwarg (default 'project',
ValueError on invalid)
- memory.observe MCP tool schema gains optional scope enum field
Synthesis path:
- _derive_new_reflection_scope helper queries source observations'
scopes; returns 'general' iff every source is general
- _apply_new derives the new reflection's scope from sources and
INSERTs it
- _apply_augment / _apply_merge preserve existing reflection scope
- _load_episode_context's reflection query OR-merges scope='general'
rows from any project (still tech-filtered)
Retrieval path:
- retrieve_reflections WHERE clause becomes (project = ? OR scope = 'general')
- memory.retrieve MCP tool flows through unchanged
The user's per-step-confidence workflow rule (recorded today as
project-scoped to better-memory) becomes scope='general' via the
migration's UPDATE, so every project session retrieves it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two small fixes from the final code review: - Migration 0007: comment explains the magic observation id in the backfill - tests/services/test_reflection.py: TestArchivedObservationGuards docstring no longer references the removed load_context method No behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🔴 Claude BugBot Analysis
Found 2 potential bugs in this PR.
low: 2
The PR is overall well-structured and logically sound. Two low-severity defects were found: a dict.get() null-vs-absent trap in the MCP server that causes a ValueError when scope: null is passed, and a type annotation mismatch on EpisodeForPrompt.goal that could produce a misleading prompt for background episodes with a null goal.
Two low-severity findings from BugBot review:
1. memory.observe MCP handler used `args.get("scope", "project")` which
returns None (not the default) when the client sends {"scope": null}.
That None then propagated to ObservationService.create() which raised
ValueError. Fix: `args.get("scope") or "project"` — defaults on both
absent and explicit-null.
2. EpisodeForPrompt.goal was typed as `str` but episodes.goal is a
nullable TEXT column (background episodes from session_start markers
can have NULL goal). _build_episode_prompt would silently render
" goal: None" in the LLM prompt. Fix: type as `str | None` and
render `(unspecified)` when None, matching the existing tech-None
pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🟢 Claude BugBot Analysis
No new bugs found. Both previously reported open issues have been resolved: the scope=None propagation bug in server.py is fixed with or "project", and the EpisodeForPrompt.goal nullable type mismatch in reflection.py is fixed with str | None annotation and an explicit None guard in _build_episode_prompt.
No bugs were detected in this PR.
* chore(pyright): exclude .worktrees from main project scope When working in a git worktree under .worktrees/, the main worktree's pyright was scanning the worktree's source files but resolving imports against main's modules — producing false-positive "unknown attribute" errors for symbols on the worktree's branch that don't yet exist on main. Adds **/.worktrees/** and **/__pycache__ to pyright exclude so each worktree's own 'uv run pyright' is the canonical type-check for that branch. Also confirms the IDE's pyright respects the project config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(superpowers): semantic memories spec + plan Spec: docs/superpowers/specs/2026-05-04-semantic-memories-design.md Plan: docs/superpowers/plans/2026-05-04-semantic-memories.md Brainstormed and approved 2026-05-04. Three commits follow: - migration 0008 (semantic_memories table) - SemanticMemoryService - 4 MCP tools (memory.semantic_observe / _retrieve / _update / _delete) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(db): migration 0008 — semantic_memories table User-stated facts/preferences. Distinct from observations (episodic, recorded as work happens) and reflections (LLM-distilled). Same scope model as PR #25's reflections — 'project' rows are per-project; 'general' rows surface in every project's retrieval. Schema: - semantic_memories(id, content, project, scope DEFAULT 'project' CHECK IN ('project','general'), created_at, updated_at) - idx_semantic_memories_project for the per-project read path - partial idx_semantic_memories_general WHERE scope='general' Service + MCP tools follow in subsequent commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(semantic): SemanticMemoryService for user-stated facts/preferences Service for managing semantic memories — user-asserted current truths, distinct from observations (episodic) and reflections (LLM-distilled). API: - create(*, content, project, scope='project') -> id - update_text(*, id, content) — bumps updated_at; raises if id absent - delete(*, id) — idempotent (no error on missing id) - list_for_project(*, project) -> list[SemanticMemory] Returns project rows + general-scope rows from any project, ordered newest-first by created_at. Validation: - scope must be 'project' or 'general' (ValueError before DB hit; CHECK constraint is the backstop) - content must not be empty/whitespace (ValueError) MCP wiring follows in next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(mcp): memory.semantic_observe / _retrieve / _update / _delete tools Wire SemanticMemoryService into the MCP server with four new tools: - memory.semantic_observe(content, scope='project') -> {id} Records a user-stated fact/preference. scope='general' surfaces it in every project's startup retrieval. - memory.semantic_retrieve(project?) -> [memories...] Returns project rows + general-scope rows from any project, ordered newest-first. Flat list — they're facts, not lessons. - memory.semantic_update(id, content) -> {ok} Edits content in place; bumps updated_at. Raises if id absent. - memory.semantic_delete(id) -> {ok} Idempotent — no error if id absent. Both write tools accepting scope use `args.get("scope") or "project"` to defend against {"scope": null} from MCP clients (PR #25 BugBot finding). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(bugbot PR#34): rollback implicit BEGIN before raising in update_text BugBot finding (medium): SemanticMemoryService.update_text raised ValueError after the UPDATE found no rows, without first calling self._conn.rollback(). Python's sqlite3 with default isolation_level opens an implicit BEGIN before any DML, so the UPDATE held the WAL write lock until the next commit() — blocking other writers for up to busy_timeout (5 s). Mirrors the existing ObservationService.set_outcome pattern at better_memory/services/observation.py:435. Adds a regression test that asserts conn.in_transaction is False after the failed update. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
synthesize_next(one closed episode = one focused LLM call). Dropssynthesis_runs; addsepisodes.synthesized_at+episodes.synth_failed_at(300 s cooldown). UI route is auto-chained via htmx self-firing fragment untilqueue.pending == 0. MCPstart_episodedrains pending then returns tech-filtered reflection buckets.scope('project' | 'general') to observations + reflections so workflow rules surface in every project'smemory_retrieve. Synthesis derives a new reflection's scope from sources (all-general → general); augment/merge preserve. One-shot fix-up flips the workflow-rule observation recorded on 2026-05-04 toscope='general'.Why
The current batch synth produces a ~14 K-token prompt for 67 closed-episode observations on this DB.
llama3:8B's 8 K context silently truncates it; combined withformat=jsonconstrained decoding, generations spend minutes thrashing without converging. Per-episode prompts are ~1-2 KB and fit any model. 5-minute cooldown on per-episode failures means a single bad episode can no longer infinite-loop the MCP drain or spam the auto-fire chain.Spec: `docs/superpowers/specs/2026-05-03-episodic-synthesis-design.md`
Plan: `docs/superpowers/plans/2026-05-04-episodic-synthesis.md`
Test Plan
Commits
🤖 Generated with Claude Code