Skip to content

Synthesis driven by IDE-LLM via MCP — remove Ollama chat client#40

Merged
emp3thy merged 3 commits into
mainfrom
synthesis-mcp-driven
May 5, 2026
Merged

Synthesis driven by IDE-LLM via MCP — remove Ollama chat client#40
emp3thy merged 3 commits into
mainfrom
synthesis-mcp-driven

Conversation

@emp3thy
Copy link
Copy Markdown
Owner

@emp3thy emp3thy commented May 5, 2026

Summary

  • Replaces the local-Ollama-driven ReflectionSynthesisService.synthesize_next with two new MCP tools (memory.synthesize_next_get_context / _apply) so the session's IDE-LLM does the per-episode consolidation. Atomic, validation-aware, queue-driven.
  • Deletes the entire better_memory/llm/ package, the consolidate_model config field, the UI's synth button + /observations/synthesize route + busy-flag mutex, and the synth_step_banner.html fragment.
  • Adds a project-scoped Claude skill (.claude/skills/better-memory-synthesize/SKILL.md) that walks the IDE-LLM through the drain loop, decision schema, and a worked example.
  • 19 new MCP-tool tests; full suite remains green.

Why

Local llama3.2:3b dropped required JSON fields (source_observation_ids, merge.source_id set to null) too often to be reliable for the per-episode synthesis schema. Tightening the prompt helped but the structural answer was to drive synthesis from the IDE-LLM that's already in-session and follows complex schemas without issue. Keeps the unit at one episode, atomic apply, queue-driven loop.

Notable behavioural changes

  • memory.start_episode no longer auto-drains synthesis. Instead it returns pending_synthesis: {pending, in_cooldown, done} so the IDE-LLM (via the new skill) knows to invoke the synth tools.
  • Validation errors on submitted decisions return {ok: false, error: "validation", message} instead of stamping synth_failed_at. Caller retries.
  • The synth_failed_at cooldown column is retained for future skip tooling but is no longer auto-stamped.

Test plan

  • tests/services/test_reflection.pyTestGetNextPendingContext, TestApplyDecision, TestCooldown (replace TestSynthesizeNextHappyPath/FailurePaths).
  • tests/mcp/test_synthesize_tools.py — registration, pure-helper shape tests, end-to-end via service + helpers.
  • tests/mcp/test_episode_tools.py — drop TestStartEpisodeDrainsPending; update test_create_server_wires_reflection_service to assert new tool registration.
  • tests/ui/test_observations.py — remove TestObservationsSynthesize/TestSynthMutex/TestServiceWiring; assert button is gone.
  • Full suite passes locally.

Follow-ups (not in this PR)

  • Dead CSS rules for .btn-synth / .observations-synth-banner etc. in app.css — harmless, can be cleaned up later.
  • Consider a memory.synthesize_next_skip MCP tool that explicitly stamps synth_failed_at for episodes the LLM declines to process (currently the LLM submits an all-empty decision instead).

🤖 Generated with Claude Code

Replace the local-Ollama-driven `ReflectionSynthesisService.synthesize_next`
with two MCP tools that hand episode context to the session's IDE-LLM and
accept its decision JSON back atomically:

- `memory.synthesize_next_get_context` — returns the next pending episode
  + observations + tech-filtered reflections + queue counts
- `memory.synthesize_next_apply` — parses + applies a decision atomically;
  validation errors return `{ok: false, error: "validation"}` without
  stamping the episode failed (caller retries)

Service split (better_memory/services/reflection.py):
- Drop `chat: ChatCompleter` constructor param
- Replace `synthesize_next(*, project)` with `get_next_pending_context(*,
  project)` + `apply_decision(*, episode_id, response, project)`
- Drop automatic `synth_failed_at` stamping (no LLM-class errors here;
  the column stays for future skip tooling)

Removed:
- `better_memory/llm/` directory (`ollama.py`, `fake.py`, `__init__.py`)
- `Config.consolidate_model` + `_DEFAULT_CONSOLIDATE_MODEL` + the
  `CONSOLIDATE_MODEL` env handling
- UI synth button + `/observations/synthesize` route + busy-flag mutex +
  `synth_timeout` param + `ollama_host`/`consolidate_model` extensions
- `synth_step_banner.html` fragment + `observations-synthesized` HX-Trigger
- The `start_episode` MCP auto-drain loop (was: `while
  synthesize_next.processed: pass`). `start_episode` now returns
  `pending_synthesis: {pending, in_cooldown, done}` so the IDE-LLM knows
  to invoke the synthesis skill.

Added:
- `.claude/skills/better-memory-synthesize/SKILL.md` — guides Claude Code
  through the per-episode drain loop with decision-schema rubric and a
  worked example
- `tests/mcp/test_synthesize_tools.py` (19 tests): tool registration,
  pure JSON-payload helpers, end-to-end seeded-DB fixtures
- JSON-payload helpers in `better_memory/mcp/server.py`
  (`_serialize_synth_get_context`, `_serialize_synth_apply_ok`,
  `_serialize_synth_apply_validation_error`, `_serialize_queue`) so
  tests and the handler closure share a single source of truth

Why: llama3.2:3b dropped required JSON fields (`source_observation_ids`,
`merge.source_id` set to null) often enough to be unreliable; tightening
the prompt helped but the structural answer was to drive synthesis from
the IDE-LLM that's already in-session and follows complex schemas without
issue. Keeps the unit at one episode, atomic apply, queue-driven loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Claude BugBot Analysis

Found 2 potential bugs in this PR.

medium: 2

The new apply_decision method, introduced to replace the atomic synthesize_next, accepts a caller-supplied episode_id without verifying it belongs to the given project or that it is still pending synthesis. This allows cross-project contamination (reflections created for the wrong project while the wrong episode is marked synthesized) and silent double-application that creates duplicate reflections. Both bugs stem from the same missing precondition check at the start of the method.

Comment thread better_memory/services/reflection.py
Comment thread better_memory/services/reflection.py
…apply_decision

BugBot flagged two MEDIUM precondition gaps in the new public
`apply_decision` API (PRRT_kwDOSGIXI85_vo36 and PRRT_kwDOSGIXI85_vo3_).
The old `synthesize_next` was protected because it sourced the episode
through `_pick_oldest_pending(project)` — that filter implicitly
guaranteed (a) the episode existed, (b) it belonged to the right
project, and (c) it was not yet synthesized. The split into
`get_next_pending_context` + `apply_decision` removed all three
protections from the apply side.

Without these guards:
- A caller passing project A's `episode_id` with `project=B` would
  silently get reflections written under project B while
  `_mark_synthesized` stamped project A's row.
- An accidental retry on the same `episode_id` would silently create
  duplicate reflections (active observations are already consumed by
  the first call, but `_apply_new` has no idempotency on title).

Fix: add an upfront row lookup in `apply_decision` that raises
`ValueError` for not-found / wrong-project / already-synthesized,
BEFORE entering the SAVEPOINT. The MCP handler catches `ValueError`
and surfaces a structured `{ok: false, error: "state", message}`
payload (paralleling the existing `error: "validation"` path) so the
IDE-LLM can refetch context instead of blindly retrying.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Claude BugBot Analysis

Found 2 potential bugs in this PR.

high: 1 | low: 1

The implementation code itself (reflection.py, server.py, config.py) is structurally sound with proper SAVEPOINT handling, ownership validation, and error surfacing. The primary concern is the newly added plan document which is marked as a required sub-skill for agentic workers but describes a superseded Ollama-driven architecture, containing test code that imports from the deleted better_memory.llm package and uses a constructor parameter (chat=) that no longer exists — any agent following this plan will hit import failures immediately.

Comment thread docs/superpowers/plans/2026-05-04-episodic-synthesis.md
Comment thread better_memory/mcp/server.py
… skip JSON round-trip

Two BugBot findings on the previous fix commit:

HIGH (PRRT_kwDOSGIXI85_v4-w): docs/superpowers/plans/2026-05-04-episodic-synthesis.md
references the deleted `better_memory.llm.fake` module and the removed
`chat=` constructor parameter — an agent following the plan would crash
with ImportError or TypeError. Add a prominent SUPERSEDED notice at the
top pointing at PR #40 and the live skill / test patterns. The plan stays
on disk because the design rationale (per-episode atomicity, SAVEPOINT
pattern, queue-driven loop, scope propagation) is still referenced by
the live design doc.

LOW (PRRT_kwDOSGIXI85_v4-6): the apply MCP handler called
`json.dumps(decision)` to satisfy `parse_response(raw: str)`, which then
called `json.loads` immediately — a redundant encode/decode round-trip
since the MCP framework already decoded the JSON before dispatch. Split
the parser:

- `parse_response_dict(data: object)` — pure dict-shape validation,
  raises SynthesisResponseError on non-dict / missing keys / shape errors
- `parse_response(raw: str)` — thin wrapper, calls `json.loads` then
  delegates to `parse_response_dict`

The MCP handler now calls `parse_response_dict` directly. The string API
is kept for any caller that needs it (and is exercised by the existing
TestParseResponse tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Claude BugBot Analysis

Both previously reported bugs are fixed: the plan document gains a prominent SUPERSEDED notice explicitly prohibiting execution and noting all removed APIs, and the MCP apply handler now calls parse_response_dict() directly instead of round-tripping through json.dumps/json.loads. No new bugs were found in the added code.

No bugs were detected in this PR.

@emp3thy emp3thy merged commit f7adaae into main May 5, 2026
3 checks passed
@emp3thy emp3thy deleted the synthesis-mcp-driven branch May 8, 2026 05:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant