Synthesis driven by IDE-LLM via MCP — remove Ollama chat client by emp3thy · Pull Request #40 · emp3thy/better-memory

emp3thy · 2026-05-05T18:13:47Z

Summary

Replaces the local-Ollama-driven ReflectionSynthesisService.synthesize_next with two new MCP tools (memory.synthesize_next_get_context / _apply) so the session's IDE-LLM does the per-episode consolidation. Atomic, validation-aware, queue-driven.
Deletes the entire better_memory/llm/ package, the consolidate_model config field, the UI's synth button + /observations/synthesize route + busy-flag mutex, and the synth_step_banner.html fragment.
Adds a project-scoped Claude skill (.claude/skills/better-memory-synthesize/SKILL.md) that walks the IDE-LLM through the drain loop, decision schema, and a worked example.
19 new MCP-tool tests; full suite remains green.

Why

Local llama3.2:3b dropped required JSON fields (source_observation_ids, merge.source_id set to null) too often to be reliable for the per-episode synthesis schema. Tightening the prompt helped but the structural answer was to drive synthesis from the IDE-LLM that's already in-session and follows complex schemas without issue. Keeps the unit at one episode, atomic apply, queue-driven loop.

Notable behavioural changes

memory.start_episode no longer auto-drains synthesis. Instead it returns pending_synthesis: {pending, in_cooldown, done} so the IDE-LLM (via the new skill) knows to invoke the synth tools.
Validation errors on submitted decisions return {ok: false, error: "validation", message} instead of stamping synth_failed_at. Caller retries.
The synth_failed_at cooldown column is retained for future skip tooling but is no longer auto-stamped.

Test plan

tests/services/test_reflection.py — TestGetNextPendingContext, TestApplyDecision, TestCooldown (replace TestSynthesizeNextHappyPath/FailurePaths).
tests/mcp/test_synthesize_tools.py — registration, pure-helper shape tests, end-to-end via service + helpers.
tests/mcp/test_episode_tools.py — drop TestStartEpisodeDrainsPending; update test_create_server_wires_reflection_service to assert new tool registration.
tests/ui/test_observations.py — remove TestObservationsSynthesize/TestSynthMutex/TestServiceWiring; assert button is gone.
Full suite passes locally.

Follow-ups (not in this PR)

Dead CSS rules for .btn-synth / .observations-synth-banner etc. in app.css — harmless, can be cleaned up later.
Consider a memory.synthesize_next_skip MCP tool that explicitly stamps synth_failed_at for episodes the LLM declines to process (currently the LLM submits an all-empty decision instead).

🤖 Generated with Claude Code

Replace the local-Ollama-driven `ReflectionSynthesisService.synthesize_next` with two MCP tools that hand episode context to the session's IDE-LLM and accept its decision JSON back atomically: - `memory.synthesize_next_get_context` — returns the next pending episode + observations + tech-filtered reflections + queue counts - `memory.synthesize_next_apply` — parses + applies a decision atomically; validation errors return `{ok: false, error: "validation"}` without stamping the episode failed (caller retries) Service split (better_memory/services/reflection.py): - Drop `chat: ChatCompleter` constructor param - Replace `synthesize_next(*, project)` with `get_next_pending_context(*, project)` + `apply_decision(*, episode_id, response, project)` - Drop automatic `synth_failed_at` stamping (no LLM-class errors here; the column stays for future skip tooling) Removed: - `better_memory/llm/` directory (`ollama.py`, `fake.py`, `__init__.py`) - `Config.consolidate_model` + `_DEFAULT_CONSOLIDATE_MODEL` + the `CONSOLIDATE_MODEL` env handling - UI synth button + `/observations/synthesize` route + busy-flag mutex + `synth_timeout` param + `ollama_host`/`consolidate_model` extensions - `synth_step_banner.html` fragment + `observations-synthesized` HX-Trigger - The `start_episode` MCP auto-drain loop (was: `while synthesize_next.processed: pass`). `start_episode` now returns `pending_synthesis: {pending, in_cooldown, done}` so the IDE-LLM knows to invoke the synthesis skill. Added: - `.claude/skills/better-memory-synthesize/SKILL.md` — guides Claude Code through the per-episode drain loop with decision-schema rubric and a worked example - `tests/mcp/test_synthesize_tools.py` (19 tests): tool registration, pure JSON-payload helpers, end-to-end seeded-DB fixtures - JSON-payload helpers in `better_memory/mcp/server.py` (`_serialize_synth_get_context`, `_serialize_synth_apply_ok`, `_serialize_synth_apply_validation_error`, `_serialize_queue`) so tests and the handler closure share a single source of truth Why: llama3.2:3b dropped required JSON fields (`source_observation_ids`, `merge.source_id` set to null) often enough to be unreliable; tightening the prompt helped but the structural answer was to drive synthesis from the IDE-LLM that's already in-session and follows complex schemas without issue. Keeps the unit at one episode, atomic apply, queue-driven loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions

🔴 Claude BugBot Analysis

Found 2 potential bugs in this PR.

medium: 2

The new apply_decision method, introduced to replace the atomic synthesize_next, accepts a caller-supplied episode_id without verifying it belongs to the given project or that it is still pending synthesis. This allows cross-project contamination (reflections created for the wrong project while the wrong episode is marked synthesized) and silent double-application that creates duplicate reflections. Both bugs stem from the same missing precondition check at the start of the method.

…apply_decision BugBot flagged two MEDIUM precondition gaps in the new public `apply_decision` API (PRRT_kwDOSGIXI85_vo36 and PRRT_kwDOSGIXI85_vo3_). The old `synthesize_next` was protected because it sourced the episode through `_pick_oldest_pending(project)` — that filter implicitly guaranteed (a) the episode existed, (b) it belonged to the right project, and (c) it was not yet synthesized. The split into `get_next_pending_context` + `apply_decision` removed all three protections from the apply side. Without these guards: - A caller passing project A's `episode_id` with `project=B` would silently get reflections written under project B while `_mark_synthesized` stamped project A's row. - An accidental retry on the same `episode_id` would silently create duplicate reflections (active observations are already consumed by the first call, but `_apply_new` has no idempotency on title). Fix: add an upfront row lookup in `apply_decision` that raises `ValueError` for not-found / wrong-project / already-synthesized, BEFORE entering the SAVEPOINT. The MCP handler catches `ValueError` and surfaces a structured `{ok: false, error: "state", message}` payload (paralleling the existing `error: "validation"` path) so the IDE-LLM can refetch context instead of blindly retrying. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions

🔴 Claude BugBot Analysis

Found 2 potential bugs in this PR.

high: 1 | low: 1

The implementation code itself (reflection.py, server.py, config.py) is structurally sound with proper SAVEPOINT handling, ownership validation, and error surfacing. The primary concern is the newly added plan document which is marked as a required sub-skill for agentic workers but describes a superseded Ollama-driven architecture, containing test code that imports from the deleted better_memory.llm package and uses a constructor parameter (chat=) that no longer exists — any agent following this plan will hit import failures immediately.

… skip JSON round-trip Two BugBot findings on the previous fix commit: HIGH (PRRT_kwDOSGIXI85_v4-w): docs/superpowers/plans/2026-05-04-episodic-synthesis.md references the deleted `better_memory.llm.fake` module and the removed `chat=` constructor parameter — an agent following the plan would crash with ImportError or TypeError. Add a prominent SUPERSEDED notice at the top pointing at PR #40 and the live skill / test patterns. The plan stays on disk because the design rationale (per-episode atomicity, SAVEPOINT pattern, queue-driven loop, scope propagation) is still referenced by the live design doc. LOW (PRRT_kwDOSGIXI85_v4-6): the apply MCP handler called `json.dumps(decision)` to satisfy `parse_response(raw: str)`, which then called `json.loads` immediately — a redundant encode/decode round-trip since the MCP framework already decoded the JSON before dispatch. Split the parser: - `parse_response_dict(data: object)` — pure dict-shape validation, raises SynthesisResponseError on non-dict / missing keys / shape errors - `parse_response(raw: str)` — thin wrapper, calls `json.loads` then delegates to `parse_response_dict` The MCP handler now calls `parse_response_dict` directly. The string API is kept for any caller that needs it (and is exercised by the existing TestParseResponse tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions

🟢 Claude BugBot Analysis

Both previously reported bugs are fixed: the plan document gains a prominent SUPERSEDED notice explicitly prohibiting execution and noting all removed APIs, and the MCP apply handler now calls parse_response_dict() directly instead of round-tripping through json.dumps/json.loads. No new bugs were found in the added code.

No bugs were detected in this PR.

github-actions Bot reviewed May 5, 2026

View reviewed changes

Comment thread better_memory/services/reflection.py

Comment thread better_memory/services/reflection.py

github-actions Bot reviewed May 5, 2026

View reviewed changes

Comment thread docs/superpowers/plans/2026-05-04-episodic-synthesis.md

Comment thread better_memory/mcp/server.py

github-actions Bot reviewed May 5, 2026

View reviewed changes

emp3thy merged commit f7adaae into main May 5, 2026
3 checks passed

emp3thy mentioned this pull request May 5, 2026

docs(readme): cross-project install note for better-memory-synthesize skill #42

Merged

2 tasks

emp3thy deleted the synthesis-mcp-driven branch May 8, 2026 05:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synthesis driven by IDE-LLM via MCP — remove Ollama chat client#40

Synthesis driven by IDE-LLM via MCP — remove Ollama chat client#40
emp3thy merged 3 commits into
mainfrom
synthesis-mcp-driven

emp3thy commented May 5, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

emp3thy commented May 5, 2026

Summary

Why

Notable behavioural changes

Test plan

Follow-ups (not in this PR)

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔴 Claude BugBot Analysis

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔴 Claude BugBot Analysis

Uh oh!

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🟢 Claude BugBot Analysis

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant