Episodic synthesis (per-episode LLM call) + general-scope reflections by emp3thy · Pull Request #25 · emp3thy/better-memory

emp3thy · 2026-05-04T11:09:43Z

Summary

Replaces watermark-driven batch synthesis with per-episode synthesize_next (one closed episode = one focused LLM call). Drops synthesis_runs; adds episodes.synthesized_at + episodes.synth_failed_at (300 s cooldown). UI route is auto-chained via htmx self-firing fragment until queue.pending == 0. MCP start_episode drains pending then returns tech-filtered reflection buckets.
Adds cross-project scope ('project' | 'general') to observations + reflections so workflow rules surface in every project's memory_retrieve. Synthesis derives a new reflection's scope from sources (all-general → general); augment/merge preserve. One-shot fix-up flips the workflow-rule observation recorded on 2026-05-04 to scope='general'.

Why

The current batch synth produces a ~14 K-token prompt for 67 closed-episode observations on this DB. llama3:8B's 8 K context silently truncates it; combined with format=json constrained decoding, generations spend minutes thrashing without converging. Per-episode prompts are ~1-2 KB and fit any model. 5-minute cooldown on per-episode failures means a single bad episode can no longer infinite-loop the MCP drain or spam the auto-fire chain.

Spec: `docs/superpowers/specs/2026-05-03-episodic-synthesis-design.md`
Plan: `docs/superpowers/plans/2026-05-04-episodic-synthesis.md`

Test Plan

All 6 commits pass tests individually
`uv run pytest -q` end-to-end: 629 passed, 22 skipped (was 597 / 22 on main; +32 new tests)
`uv run pyright` clean (0 errors)
No `synthesis_runs` references remain outside historical migration files
No bare `.synthesize(` calls (all callers use `.synthesize_next(`)
Smoke run on local DB: drain all pending episodes, verify reflections land, confirm `synthesized_at` is set per-episode

Commits

`25fc146` feat(db): migration 0006 — episodes.synthesized_at + drop synthesis_runs
`edd94a8` refactor(reflection): replace batch synthesize with per-episode synthesize_next
`464a52e` refactor(ui): /observations/synthesize is one-step + auto-chain banner
`f6afcde` refactor(mcp): start_episode drains pending then fetches buckets
`7ddfaf9` feat(scope): general-scope reflections surface across projects
`65bd413` chore(review): address minor review findings

🤖 Generated with Claude Code

Adds per-episode tracking columns to episodes: - synthesized_at: NULL until the episode has been consolidated - synth_failed_at: set on LLM-class failures for 300s cooldown Backfills synthesized_at for closed episodes whose observations are all non-active. Episodes with leftover active observations stay NULL → picked up by the next synth run under the new design. Drops synthesis_runs — the per-episode columns supersede it. Includes test_migration_0006.py covering schema, index, and backfill correctness; updates test_schema.py to drop synthesis_runs assertions and add coverage for the new columns and partial index. Service / route / MCP changes follow in subsequent commits — service tests, UI synth tests, and MCP start_episode tests are expected to fail on this commit but pass after Commit 4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…esize_next Replaces the watermark-driven batch synthesize() with a per-episode synthesize_next() method. Each call processes the oldest closed-but- unsynthesized episode for the project: loads its observations and tech-filtered reflections, calls the LLM once, applies new/augment/ merge/ignore actions inside a per-episode SAVEPOINT, marks episodes.synthesized_at on success. Failure handling splits LLM-class errors (ChatError, SynthesisResponseError) from structural/DB errors: - LLM-class: ROLLBACK, stamp synth_failed_at for 300s cooldown, return SynthesisStep with failure set. Caller (UI auto-chain or MCP drain loop) continues past it; cooldown filter on _pick_oldest_pending prevents the just-failed episode from being immediately re-picked. - Structural: ROLLBACK and propagate; route surfaces 500 and stops chain. New types: EpisodeForPrompt, EpisodeContext, EpisodeQueueCounts, SynthesisStep. ObservationForPrompt gains a 'status' field so the LLM sees consumed observations as historical context. Removed: synthesize, load_context, build_prompt, _should_short_circuit, _upsert_watermark, _bucketed_reflections, last_run_counts, SynthesisContext. Apply layer (_apply_new/_apply_augment/_apply_merge/ _apply_ignore/_auto_ignore_unused) and parse_response are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the batch-banner response with a per-episode step banner. Each POST processes ONE episode; if step.queue.pending > 0 the response includes an htmx auto-fire div that triggers the next request after 200ms. Chain stops naturally when pending hits zero. The route's mutex / worker-thread / WorkerTimeout / BaseException handlers (PR #18 hardening) survive unchanged — only the inner _run coroutine and the success-banner branch change. New template fragments/synth_step_banner.html replaces observations_synth_banner.html with 4 states: - success-with-pending: step card + auto-fire - success-without-pending (queue empty): synth-done card, no auto-fire - failure-with-pending: warning card + auto-fire (chain continues) - failure-without-pending: warning card, no auto-fire HX-Trigger: observations-synthesized fires on every step (live drain visualization). Banner counts come from step.queue (single-connection snapshot in the worker), not a second SQL query — eliminates the race that the original spec draft had. UI tests rewritten to exercise the four states + chain recovery: HX-Trigger fires every step, banner counts come from the service step, failure-with-pending continues the chain via auto-fire, failure-without- pending stops it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

memory.start_episode now drains pending closed episodes via a tight while-loop over synthesize_next, then returns the new episode's reflection context via retrieve_reflections(tech=...). The cooldown stamp on synth_failed_at means persistently-failing episodes are excluded from the next _pick_oldest_pending call, so the loop terminates even when an episode reliably raises ChatError — otherwise the drain would be a stop-the-world infinite loop on the first persistent failure. New seed_pending_episodes() helper in tests/conftest.py used by service, UI, and MCP tests to populate a closed-pending queue without duplicating fixture SQL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds scope ('project' | 'general') to observations and reflections so cross-project workflow rules surface in every project's memory_retrieve. Schema (migration 0007): - observations.scope NOT NULL DEFAULT 'project' CHECK - reflections.scope NOT NULL DEFAULT 'project' CHECK - partial index on reflections WHERE scope='general' - one-shot fix-up: the workflow-rule observation recorded on 2026-05-04 gets scope='general' Write path: - ObservationService.create accepts scope kwarg (default 'project', ValueError on invalid) - memory.observe MCP tool schema gains optional scope enum field Synthesis path: - _derive_new_reflection_scope helper queries source observations' scopes; returns 'general' iff every source is general - _apply_new derives the new reflection's scope from sources and INSERTs it - _apply_augment / _apply_merge preserve existing reflection scope - _load_episode_context's reflection query OR-merges scope='general' rows from any project (still tech-filtered) Retrieval path: - retrieve_reflections WHERE clause becomes (project = ? OR scope = 'general') - memory.retrieve MCP tool flows through unchanged The user's per-step-confidence workflow rule (recorded today as project-scoped to better-memory) becomes scope='general' via the migration's UPDATE, so every project session retrieves it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two small fixes from the final code review: - Migration 0007: comment explains the magic observation id in the backfill - tests/services/test_reflection.py: TestArchivedObservationGuards docstring no longer references the removed load_context method No behaviour change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions

🔴 Claude BugBot Analysis

Found 2 potential bugs in this PR.

low: 2

The PR is overall well-structured and logically sound. Two low-severity defects were found: a dict.get() null-vs-absent trap in the MCP server that causes a ValueError when scope: null is passed, and a type annotation mismatch on EpisodeForPrompt.goal that could produce a misleading prompt for background episodes with a null goal.

Two low-severity findings from BugBot review: 1. memory.observe MCP handler used `args.get("scope", "project")` which returns None (not the default) when the client sends {"scope": null}. That None then propagated to ObservationService.create() which raised ValueError. Fix: `args.get("scope") or "project"` — defaults on both absent and explicit-null. 2. EpisodeForPrompt.goal was typed as `str` but episodes.goal is a nullable TEXT column (background episodes from session_start markers can have NULL goal). _build_episode_prompt would silently render " goal: None" in the LLM prompt. Fix: type as `str | None` and render `(unspecified)` when None, matching the existing tech-None pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions

🟢 Claude BugBot Analysis

No new bugs found. Both previously reported open issues have been resolved: the scope=None propagation bug in server.py is fixed with or "project", and the EpisodeForPrompt.goal nullable type mismatch in reflection.py is fixed with str | None annotation and an explicit None guard in _build_episode_prompt.

No bugs were detected in this PR.

* chore(pyright): exclude .worktrees from main project scope When working in a git worktree under .worktrees/, the main worktree's pyright was scanning the worktree's source files but resolving imports against main's modules — producing false-positive "unknown attribute" errors for symbols on the worktree's branch that don't yet exist on main. Adds **/.worktrees/** and **/__pycache__ to pyright exclude so each worktree's own 'uv run pyright' is the canonical type-check for that branch. Also confirms the IDE's pyright respects the project config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(superpowers): semantic memories spec + plan Spec: docs/superpowers/specs/2026-05-04-semantic-memories-design.md Plan: docs/superpowers/plans/2026-05-04-semantic-memories.md Brainstormed and approved 2026-05-04. Three commits follow: - migration 0008 (semantic_memories table) - SemanticMemoryService - 4 MCP tools (memory.semantic_observe / _retrieve / _update / _delete) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(db): migration 0008 — semantic_memories table User-stated facts/preferences. Distinct from observations (episodic, recorded as work happens) and reflections (LLM-distilled). Same scope model as PR #25's reflections — 'project' rows are per-project; 'general' rows surface in every project's retrieval. Schema: - semantic_memories(id, content, project, scope DEFAULT 'project' CHECK IN ('project','general'), created_at, updated_at) - idx_semantic_memories_project for the per-project read path - partial idx_semantic_memories_general WHERE scope='general' Service + MCP tools follow in subsequent commits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(semantic): SemanticMemoryService for user-stated facts/preferences Service for managing semantic memories — user-asserted current truths, distinct from observations (episodic) and reflections (LLM-distilled). API: - create(*, content, project, scope='project') -> id - update_text(*, id, content) — bumps updated_at; raises if id absent - delete(*, id) — idempotent (no error on missing id) - list_for_project(*, project) -> list[SemanticMemory] Returns project rows + general-scope rows from any project, ordered newest-first by created_at. Validation: - scope must be 'project' or 'general' (ValueError before DB hit; CHECK constraint is the backstop) - content must not be empty/whitespace (ValueError) MCP wiring follows in next commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(mcp): memory.semantic_observe / _retrieve / _update / _delete tools Wire SemanticMemoryService into the MCP server with four new tools: - memory.semantic_observe(content, scope='project') -> {id} Records a user-stated fact/preference. scope='general' surfaces it in every project's startup retrieval. - memory.semantic_retrieve(project?) -> [memories...] Returns project rows + general-scope rows from any project, ordered newest-first. Flat list — they're facts, not lessons. - memory.semantic_update(id, content) -> {ok} Edits content in place; bumps updated_at. Raises if id absent. - memory.semantic_delete(id) -> {ok} Idempotent — no error if id absent. Both write tools accepting scope use `args.get("scope") or "project"` to defend against {"scope": null} from MCP clients (PR #25 BugBot finding). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(bugbot PR#34): rollback implicit BEGIN before raising in update_text BugBot finding (medium): SemanticMemoryService.update_text raised ValueError after the UPDATE found no rows, without first calling self._conn.rollback(). Python's sqlite3 with default isolation_level opens an implicit BEGIN before any DML, so the UPDATE held the WAL write lock until the next commit() — blocking other writers for up to busy_timeout (5 s). Mirrors the existing ObservationService.set_outcome pattern at better_memory/services/observation.py:435. Adds a regression test that asserts conn.in_transaction is False after the failed update. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

emp3thy and others added 6 commits May 4, 2026 10:56

github-actions Bot reviewed May 4, 2026

View reviewed changes

Comment thread better_memory/mcp/server.py Outdated

Comment thread better_memory/services/reflection.py

github-actions Bot reviewed May 4, 2026

View reviewed changes

emp3thy merged commit 5872387 into main May 4, 2026
3 checks passed

emp3thy deleted the episodic-synthesis branch May 4, 2026 11:58

emp3thy mentioned this pull request May 4, 2026

Semantic memories — user-stated facts and preferences #34

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Episodic synthesis (per-episode LLM call) + general-scope reflections#25

Episodic synthesis (per-episode LLM call) + general-scope reflections#25
emp3thy merged 7 commits into
mainfrom
episodic-synthesis

emp3thy commented May 4, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

emp3thy commented May 4, 2026

Summary

Why

Test Plan

Commits

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔴 Claude BugBot Analysis

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🟢 Claude BugBot Analysis

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant