Skip to content

Episodic synthesis (per-episode LLM call) + general-scope reflections#25

Merged
emp3thy merged 7 commits into
mainfrom
episodic-synthesis
May 4, 2026
Merged

Episodic synthesis (per-episode LLM call) + general-scope reflections#25
emp3thy merged 7 commits into
mainfrom
episodic-synthesis

Conversation

@emp3thy
Copy link
Copy Markdown
Owner

@emp3thy emp3thy commented May 4, 2026

Summary

  • Replaces watermark-driven batch synthesis with per-episode synthesize_next (one closed episode = one focused LLM call). Drops synthesis_runs; adds episodes.synthesized_at + episodes.synth_failed_at (300 s cooldown). UI route is auto-chained via htmx self-firing fragment until queue.pending == 0. MCP start_episode drains pending then returns tech-filtered reflection buckets.
  • Adds cross-project scope ('project' | 'general') to observations + reflections so workflow rules surface in every project's memory_retrieve. Synthesis derives a new reflection's scope from sources (all-general → general); augment/merge preserve. One-shot fix-up flips the workflow-rule observation recorded on 2026-05-04 to scope='general'.

Why

The current batch synth produces a ~14 K-token prompt for 67 closed-episode observations on this DB. llama3:8B's 8 K context silently truncates it; combined with format=json constrained decoding, generations spend minutes thrashing without converging. Per-episode prompts are ~1-2 KB and fit any model. 5-minute cooldown on per-episode failures means a single bad episode can no longer infinite-loop the MCP drain or spam the auto-fire chain.

Spec: `docs/superpowers/specs/2026-05-03-episodic-synthesis-design.md`
Plan: `docs/superpowers/plans/2026-05-04-episodic-synthesis.md`

Test Plan

  • All 6 commits pass tests individually
  • `uv run pytest -q` end-to-end: 629 passed, 22 skipped (was 597 / 22 on main; +32 new tests)
  • `uv run pyright` clean (0 errors)
  • No `synthesis_runs` references remain outside historical migration files
  • No bare `.synthesize(` calls (all callers use `.synthesize_next(`)
  • Smoke run on local DB: drain all pending episodes, verify reflections land, confirm `synthesized_at` is set per-episode

Commits

  • `25fc146` feat(db): migration 0006 — episodes.synthesized_at + drop synthesis_runs
  • `edd94a8` refactor(reflection): replace batch synthesize with per-episode synthesize_next
  • `464a52e` refactor(ui): /observations/synthesize is one-step + auto-chain banner
  • `f6afcde` refactor(mcp): start_episode drains pending then fetches buckets
  • `7ddfaf9` feat(scope): general-scope reflections surface across projects
  • `65bd413` chore(review): address minor review findings

🤖 Generated with Claude Code

emp3thy and others added 6 commits May 4, 2026 10:56
Adds per-episode tracking columns to episodes:
- synthesized_at: NULL until the episode has been consolidated
- synth_failed_at: set on LLM-class failures for 300s cooldown

Backfills synthesized_at for closed episodes whose observations are all
non-active. Episodes with leftover active observations stay NULL → picked
up by the next synth run under the new design.

Drops synthesis_runs — the per-episode columns supersede it.

Includes test_migration_0006.py covering schema, index, and backfill
correctness; updates test_schema.py to drop synthesis_runs assertions
and add coverage for the new columns and partial index.

Service / route / MCP changes follow in subsequent commits — service
tests, UI synth tests, and MCP start_episode tests are expected to fail
on this commit but pass after Commit 4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…esize_next

Replaces the watermark-driven batch synthesize() with a per-episode
synthesize_next() method. Each call processes the oldest closed-but-
unsynthesized episode for the project: loads its observations and
tech-filtered reflections, calls the LLM once, applies new/augment/
merge/ignore actions inside a per-episode SAVEPOINT, marks
episodes.synthesized_at on success.

Failure handling splits LLM-class errors (ChatError, SynthesisResponseError)
from structural/DB errors:
- LLM-class: ROLLBACK, stamp synth_failed_at for 300s cooldown, return
  SynthesisStep with failure set. Caller (UI auto-chain or MCP drain
  loop) continues past it; cooldown filter on _pick_oldest_pending
  prevents the just-failed episode from being immediately re-picked.
- Structural: ROLLBACK and propagate; route surfaces 500 and stops chain.

New types: EpisodeForPrompt, EpisodeContext, EpisodeQueueCounts,
SynthesisStep. ObservationForPrompt gains a 'status' field so the LLM
sees consumed observations as historical context.

Removed: synthesize, load_context, build_prompt, _should_short_circuit,
_upsert_watermark, _bucketed_reflections, last_run_counts,
SynthesisContext. Apply layer (_apply_new/_apply_augment/_apply_merge/
_apply_ignore/_auto_ignore_unused) and parse_response are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the batch-banner response with a per-episode step banner.
Each POST processes ONE episode; if step.queue.pending > 0 the response
includes an htmx auto-fire div that triggers the next request after
200ms. Chain stops naturally when pending hits zero.

The route's mutex / worker-thread / WorkerTimeout / BaseException
handlers (PR #18 hardening) survive unchanged — only the inner _run
coroutine and the success-banner branch change.

New template fragments/synth_step_banner.html replaces
observations_synth_banner.html with 4 states:
- success-with-pending: step card + auto-fire
- success-without-pending (queue empty): synth-done card, no auto-fire
- failure-with-pending: warning card + auto-fire (chain continues)
- failure-without-pending: warning card, no auto-fire

HX-Trigger: observations-synthesized fires on every step (live drain
visualization). Banner counts come from step.queue (single-connection
snapshot in the worker), not a second SQL query — eliminates the race
that the original spec draft had.

UI tests rewritten to exercise the four states + chain recovery:
HX-Trigger fires every step, banner counts come from the service step,
failure-with-pending continues the chain via auto-fire, failure-without-
pending stops it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
memory.start_episode now drains pending closed episodes via a tight
while-loop over synthesize_next, then returns the new episode's
reflection context via retrieve_reflections(tech=...).

The cooldown stamp on synth_failed_at means persistently-failing
episodes are excluded from the next _pick_oldest_pending call, so the
loop terminates even when an episode reliably raises ChatError —
otherwise the drain would be a stop-the-world infinite loop on the
first persistent failure.

New seed_pending_episodes() helper in tests/conftest.py used by service,
UI, and MCP tests to populate a closed-pending queue without
duplicating fixture SQL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds scope ('project' | 'general') to observations and reflections so
cross-project workflow rules surface in every project's memory_retrieve.

Schema (migration 0007):
- observations.scope NOT NULL DEFAULT 'project' CHECK
- reflections.scope NOT NULL DEFAULT 'project' CHECK
- partial index on reflections WHERE scope='general'
- one-shot fix-up: the workflow-rule observation recorded on 2026-05-04
  gets scope='general'

Write path:
- ObservationService.create accepts scope kwarg (default 'project',
  ValueError on invalid)
- memory.observe MCP tool schema gains optional scope enum field

Synthesis path:
- _derive_new_reflection_scope helper queries source observations'
  scopes; returns 'general' iff every source is general
- _apply_new derives the new reflection's scope from sources and
  INSERTs it
- _apply_augment / _apply_merge preserve existing reflection scope
- _load_episode_context's reflection query OR-merges scope='general'
  rows from any project (still tech-filtered)

Retrieval path:
- retrieve_reflections WHERE clause becomes (project = ? OR scope = 'general')
- memory.retrieve MCP tool flows through unchanged

The user's per-step-confidence workflow rule (recorded today as
project-scoped to better-memory) becomes scope='general' via the
migration's UPDATE, so every project session retrieves it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two small fixes from the final code review:
- Migration 0007: comment explains the magic observation id in the backfill
- tests/services/test_reflection.py: TestArchivedObservationGuards docstring
  no longer references the removed load_context method

No behaviour change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Claude BugBot Analysis

Found 2 potential bugs in this PR.

low: 2

The PR is overall well-structured and logically sound. Two low-severity defects were found: a dict.get() null-vs-absent trap in the MCP server that causes a ValueError when scope: null is passed, and a type annotation mismatch on EpisodeForPrompt.goal that could produce a misleading prompt for background episodes with a null goal.

Comment thread better_memory/mcp/server.py Outdated
Comment thread better_memory/services/reflection.py
Two low-severity findings from BugBot review:

1. memory.observe MCP handler used `args.get("scope", "project")` which
   returns None (not the default) when the client sends {"scope": null}.
   That None then propagated to ObservationService.create() which raised
   ValueError. Fix: `args.get("scope") or "project"` — defaults on both
   absent and explicit-null.

2. EpisodeForPrompt.goal was typed as `str` but episodes.goal is a
   nullable TEXT column (background episodes from session_start markers
   can have NULL goal). _build_episode_prompt would silently render
   "  goal:    None" in the LLM prompt. Fix: type as `str | None` and
   render `(unspecified)` when None, matching the existing tech-None
   pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Claude BugBot Analysis

No new bugs found. Both previously reported open issues have been resolved: the scope=None propagation bug in server.py is fixed with or "project", and the EpisodeForPrompt.goal nullable type mismatch in reflection.py is fixed with str | None annotation and an explicit None guard in _build_episode_prompt.

No bugs were detected in this PR.

@emp3thy emp3thy merged commit 5872387 into main May 4, 2026
3 checks passed
@emp3thy emp3thy deleted the episodic-synthesis branch May 4, 2026 11:58
emp3thy added a commit that referenced this pull request May 4, 2026
* chore(pyright): exclude .worktrees from main project scope

When working in a git worktree under .worktrees/, the main worktree's
pyright was scanning the worktree's source files but resolving imports
against main's modules — producing false-positive "unknown attribute"
errors for symbols on the worktree's branch that don't yet exist on main.

Adds **/.worktrees/** and **/__pycache__ to pyright exclude so each
worktree's own 'uv run pyright' is the canonical type-check for that
branch. Also confirms the IDE's pyright respects the project config.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(superpowers): semantic memories spec + plan

Spec: docs/superpowers/specs/2026-05-04-semantic-memories-design.md
Plan: docs/superpowers/plans/2026-05-04-semantic-memories.md

Brainstormed and approved 2026-05-04. Three commits follow:
- migration 0008 (semantic_memories table)
- SemanticMemoryService
- 4 MCP tools (memory.semantic_observe / _retrieve / _update / _delete)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(db): migration 0008 — semantic_memories table

User-stated facts/preferences. Distinct from observations (episodic,
recorded as work happens) and reflections (LLM-distilled). Same scope
model as PR #25's reflections — 'project' rows are per-project;
'general' rows surface in every project's retrieval.

Schema:
- semantic_memories(id, content, project, scope DEFAULT 'project'
  CHECK IN ('project','general'), created_at, updated_at)
- idx_semantic_memories_project for the per-project read path
- partial idx_semantic_memories_general WHERE scope='general'

Service + MCP tools follow in subsequent commits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(semantic): SemanticMemoryService for user-stated facts/preferences

Service for managing semantic memories — user-asserted current truths,
distinct from observations (episodic) and reflections (LLM-distilled).

API:
- create(*, content, project, scope='project') -> id
- update_text(*, id, content) — bumps updated_at; raises if id absent
- delete(*, id) — idempotent (no error on missing id)
- list_for_project(*, project) -> list[SemanticMemory]
  Returns project rows + general-scope rows from any project,
  ordered newest-first by created_at.

Validation:
- scope must be 'project' or 'general' (ValueError before DB hit;
  CHECK constraint is the backstop)
- content must not be empty/whitespace (ValueError)

MCP wiring follows in next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(mcp): memory.semantic_observe / _retrieve / _update / _delete tools

Wire SemanticMemoryService into the MCP server with four new tools:

- memory.semantic_observe(content, scope='project') -> {id}
  Records a user-stated fact/preference. scope='general' surfaces it
  in every project's startup retrieval.

- memory.semantic_retrieve(project?) -> [memories...]
  Returns project rows + general-scope rows from any project,
  ordered newest-first. Flat list — they're facts, not lessons.

- memory.semantic_update(id, content) -> {ok}
  Edits content in place; bumps updated_at. Raises if id absent.

- memory.semantic_delete(id) -> {ok}
  Idempotent — no error if id absent.

Both write tools accepting scope use `args.get("scope") or "project"`
to defend against {"scope": null} from MCP clients (PR #25 BugBot
finding).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(bugbot PR#34): rollback implicit BEGIN before raising in update_text

BugBot finding (medium): SemanticMemoryService.update_text raised
ValueError after the UPDATE found no rows, without first calling
self._conn.rollback(). Python's sqlite3 with default isolation_level
opens an implicit BEGIN before any DML, so the UPDATE held the WAL
write lock until the next commit() — blocking other writers for up
to busy_timeout (5 s).

Mirrors the existing ObservationService.set_outcome pattern at
better_memory/services/observation.py:435.

Adds a regression test that asserts conn.in_transaction is False
after the failed update.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant