feat(memory): claude-mem feature parity — 5-step series consolidated by fazleelahhee · Pull Request #7 · elara-labs/code-context-engine

fazleelahhee · 2026-04-28T00:11:40Z

Summary

Consolidates the previously-stacked PRs #2–#6 into a single PR against main, with Copilot's review feedback addressed in fixup commit 353dc27.

Adds full conversation memory to cce: per-project SQLite store, lifecycle capture hooks, background extractive compression, FTS5-prefiltered recall, and a dashboard view — feature parity with claude-mem, no extra runtime deps.

Step	Commit	Scope
1. Foundation	`994edb2`	`memory.db` schema (sessions, prompts, tool_events, turn_summaries, decisions, code_areas, pending_compressions, FTS5 + triggers) + `cce sessions migrate` (idempotent JSON-to-SQLite importer)
2. Capture	`eb0dacc`	5 lifecycle hooks (UserPromptSubmit, PreToolUse, PostToolUse, Stop, SessionEnd) + loopback HTTP server in `cce serve`
3. Compress	`82ce5b1`	Extractive worker drains `pending_compressions` using `bge-small` (already loaded for the index — no Ollama, no extra model)
4. Recall	`255eb53`	`session_recall(topic)` extended; new `session_timeline(session_id)` and `session_event(event_id)` MCP tools
5. Dashboard	`4a7370f`	Sessions list + timeline + decisions search panels
Copilot fixes	`353dc27`	11 of 14 review items applied — see below

Design + brainstorm decisions: docs/specs/2026-04-28-memory-claude-mem-parity-design.md.

Copilot review items addressed

migrate.py — archive-before-mark ordering with rollback (no more imported-but-unarchived files); preserve session_id linkage from decisions_log.json; timestamp is not None so legacy 0 timestamps keep ordering; _session_exists memoised per file.

compressor.py — drop unused Any; cap raw_input at 4 KB before json.loads so huge patches don't stall the worker; yield the asyncio loop after every drained item with a 50 ms breath every 5 items so a backlog doesn't monopolise mcp.run_stdio().

mcp_server.py — replace LIMIT-200 sweeps with FTS5 MATCH prefilter on decisions_fts / turn_summaries_fts (LIKE on code_areas, which has no FTS); clamp limit in session_timeline to 1..200; wrap sessions metadata + event payload queries in try/except; NULL-safe raw_input/raw_output in session_event.

design spec — replaced "Eight tables" with the actual v1 description (support tables + FTS5 + triggers).

Deferred (nice-to-have, not bugs): three robustness/perf items already covered conceptually elsewhere — happy to address in a follow-up if reviewers want them in.

Test plan

pytest tests/memory tests/dashboard/test_memory_endpoints.py — 43/43 pass
pytest tests/integration — 30/30 pass (mcp_server.py shared paths)
Full suite — 344/345 pass (the one failure is test_ollama_client hitting httpx.ReadTimeout because no Ollama is running locally; unrelated to this change)
Smoke: cce sessions migrate --help registers
Smoke: open dashboard against a project that has run cce serve for a few prompts; verify timeline + decisions search render

Stats

35 files changed · +4,089 / −111 LOC

Closes

This supersedes the 5-PR stack — closing #2, #3, #4, #5, #6 with pointers to this PR.

🤖 Generated with Claude Code

Captures the 6 design decisions from brainstorming: auto+explicit capture with source column, background compression worker in cce serve, per-turn + per-session rollup granularity, extended session_recall + new session_timeline/session_event MCP tools, one-shot cce sessions migrate command, three dashboard panels. Compressor is extractive using BAAI/bge-small-en-v1.5 already loaded for the index — no new dependencies, no extra RAM, no Ollama required. Ships as 5 sequential PRs off this branch; each independently reviewable.

PR 1 of 5 against feature/memory-claude-mem-parity. Lays the storage foundation: per-project SQLite at ~/.cce/projects/<name>/memory.db with the v1 schema (sessions, prompts, tool_events, tool_event_payloads, turn_summaries, decisions, code_areas, pending_compressions, migrated_files, schema_versions) plus FTS5 virtual tables and triggers for prompts / decisions / turn_summaries. Adds `cce sessions migrate` — idempotent one-shot importer for legacy per-session JSON files (current path and pre-rebrand ~/.claude-context-engine/...). Imported decisions and code areas are tagged source='migrated' so future session_recall can rank them. Consumed JSONs are archived into sessions/migrated.zip and removed. No behaviour change to the existing JSON capture path. Hooks and the compression worker land in PR 2 and PR 3. 10 new unit tests cover: schema bootstrap, idempotent reconnection, foreign-key enforcement, FTS triggers, migration of session JSON + decisions_log archive, idempotent rerun, archive-and-remove. Full suite: 311 passed, 1 skipped. See docs/specs/2026-04-28-memory-claude-mem-parity-design.md.

PR 2 of 5 against feature/memory-claude-mem-parity, stacked on PR 1. Adds the auto-capture pipeline: - src/context_engine/memory/hooks.py — aiohttp handlers for /hooks/SessionStart, /hooks/UserPromptSubmit, /hooks/PostToolUse, /hooks/Stop, /hooks/SessionEnd. Each writes the appropriate row(s) to memory.db. Compression for the just-ended turn is enqueued in pending_compressions (UNIQUE constraint dedupes Stop + next-prompt double-fire). All errors are logged and return 202 — capture is best-effort and must never block the user's flow. - src/context_engine/memory/hook_server.py — loopback aiohttp listener on a random free port, started as a background asyncio task from cce serve's _run_serve. Port written to <storage_base>/serve.port. Cleanly shut down on MCP server exit. - src/context_engine/memory/hook_installer.py — installs ~/.cce/hooks/cce_hook.sh (the thin shell shim that POSTs hook payloads to the local port) and wires .claude/settings.json with all 5 lifecycle entries. Idempotent. Preserves user-added hooks. Uninstall removes only entries whose command points at our script. - cce init wires the installer in step 5; _run_serve spawns the hook server alongside the MCP stdio loop and watcher. Tests (15 new, full suite 326 passed): - tests/memory/test_hooks.py: integration tests via aiohttp_client for all 5 endpoints, including dedup, prompt-number assignment, payload sidecar table, and 400 on missing session_id. - tests/memory/test_hook_installer.py: script write + chmod, idempotent reinstall, settings.json merge preserving user hooks, uninstall removing only ours. pyproject.toml: pytest-aiohttp added to dev/dependency-groups. No change to recall surface — session_recall still reads JSON. PR 4 retires the JSON path and points recall at memory.db.

PR 3 of 5 against feature/memory-claude-mem-parity, stacked on PR 2. Adds the background compression half: - src/context_engine/memory/extractive.py — sentence splitter + centroid-based extractive summariser. No new dependencies; takes any embedder exposing embed_query(str) -> Iterable[float]. Pure source text; no synthesis means no hallucination. - src/context_engine/memory/compressor.py — compress_turn() and compress_session_rollup() build the candidate text from prompts + tool_events (+payloads), run the extractive summariser using the bge-small embedder already loaded for the index, and persist to turn_summaries / sessions.rollup_summary with tier='extractive'. Falls back to truncation when embedder is None or extractive raises. compression_loop() drains pending_compressions every 5 s, oldest first, single-flight by design. Queue rows that error are kept with attempts++ and last_error stamped for retry. - _run_serve in cli.py spawns the compression worker alongside the hook server. Owns its own sqlite connection to avoid cross-thread use; gracefully cancelled and closed on MCP server exit. Tests (14 new, full suite 340 passed): - tests/memory/test_extractive.py: sentence split, centroid neighbour selection, source-order preservation, truncation fallback shape. - tests/memory/test_compressor.py: turn compression with extractive + truncation tiers, session rollup combining turn summaries, empty rollup when no turns, _drain_one queue pop, compression_loop with stop_event for graceful shutdown. Recall surface unchanged from main; PR 4 retires the JSON path.

… session_event PR 4 of 5 against feature/memory-claude-mem-parity, stacked on PR 3. Wires the MCP retrieval surface to memory.db while preserving the existing JSON path for backward compatibility. - ContextEngineMCP opens memory.db on startup and seeds an INSERT OR IGNORE sessions row so manual record_decision / record_code_area dual-writes don't fail the FK constraint when the SessionStart hook hasn't fired yet (test envs, future-non-CC clients). - session_recall now folds three new candidate sources on top of the existing JSON sessions / consolidated decisions: - decisions (manual + migrated, last 200 by recency) - code_areas (manual + migrated, last 200 by recency) - turn_summaries (last 200 turns, the layer-1 compact index) Tags include source and session_id so the agent can drill via the new tools. - session_timeline(session_id, limit=20) — layer 2. Returns the session's turn_summaries with rollup + status header. - session_event(event_id) — layer 3. Returns the raw input/output payload for one tool_event, with a dedicated "aged out" message when the payload row was pruned by retention. - record_decision and record_code_area now dual-write to memory.db with source='manual'. Prior JSON write path remains active so a rollback to a previous PR doesn't lose recall coverage. The JSON write side can be retired once parity is confirmed in production. Tests (10 new, full suite 350 passed): - tests/memory/test_mcp_recall.py covers dual-write of decisions and code_areas, session_timeline with seeded summaries, session_event payload roundtrip + aged-out message + invalid id, session_recall surfacing memory.db decisions, and TOOL_NAMES registration.

PR 5 of 5 against feature/memory-claude-mem-parity, stacked on PR 4. Adds three new memory.db-backed views to the existing CCE dashboard. The dashboard never holds a memory.db handle — each request opens a short-lived connection so the MCP server's writes are never blocked. Backend (src/context_engine/dashboard/server.py): - GET /api/memory/sessions[?limit] — list sessions, most-recent first. - GET /api/memory/sessions/{id}/timeline — one session's metadata + ordered turn_summaries. - GET /api/memory/decisions[?q&source] — FTS5 search over decisions with optional source facet (manual|auto|migrated). User input is phrase-quoted so FTS5 metacharacters like '-' are treated literally ('bge-small' matches its own substring rather than parsing as 'bge AND NOT small'). On any FTS5 syntax fall-through, fall back to unranked recent listing — no 500s. Frontend (src/context_engine/dashboard/_page.py): - New 'Memory' nav entry between Sessions and Analytics. - New page-memory section with a Sessions / Decisions tab toggle. - Sessions tab shows a list; clicking a row opens a turn-by-turn timeline panel below with rollup at the top. - Decisions tab is a search box + source select, results in a table tagged by source. Tests (9 new in tests/dashboard/test_memory_endpoints.py, full suite 359 passed, 1 skipped): - Sessions list ordering, empty case when memory.db absent. - Timeline returns header + turns for known session, null for unknown. - Decisions FTS5 with and without query, source facet, combined filters, and the phrase-quoting fix for hyphenated input. Closes the 5-PR series. Spec at docs/specs/2026-04-28-memory-claude-mem-parity-design.md.

Consolidates the 11 actionable items from the 5-PR memory stack into one fixup commit on the unified ai-memory branch. migrate.py - Archive *before* mark-imported + commit, with rollback on zip failure so a failed archive no longer leaves files stuck imported-but-not-archived. - Preserve session_id linkage from decisions_log.json when the referenced session already exists (was unconditionally NULL). - Memoise _session_exists per-file (constant per archive entry). - Use `timestamp is not None` so legacy 0/0.0 timestamps keep their original ordering instead of being stamped to "now". compressor.py - Drop unused `Any` import. - Cap raw_input at _TOOL_INPUT_CHAR_CAP (4 KB) before json.loads so multi-MB patch payloads don't stall the compression worker. - Yield the asyncio loop after every drained item, with a short breath every 5 items, so a backlog doesn't monopolise mcp.run_stdio(). mcp_server.py - Use FTS5 MATCH on decisions_fts / turn_summaries_fts (and a LIKE filter on code_areas) to prefilter recall candidates instead of embedding the latest 600 rows on every session_recall call. - Clamp `limit` in session_timeline to 1..200 with a helper that handles bad input cleanly. - Wrap the sessions metadata query and the session_event payload query in try/except so DB errors return a specific message. - Handle NULL raw_input/raw_output in session_event without rendering the string "None". - Update the dual-write comment to reflect that recall now goes through FTS5. design spec - Drop the inaccurate "Eight tables" sentence; describe the support and FTS5 tables that ship in v1 instead. Tests: 43/43 memory + dashboard pass; 344/345 across the full suite (the single failure is test_ollama_client hitting httpx.ReadTimeout because no Ollama is running in this environment — unrelated). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a vec0 layer over the existing FTS5 store so session_recall can match paraphrases and synonyms, not just lexical overlap. sqlite-vec is already a hard dep (used by storage/vector_store.py for code chunks) and bge-small is already loaded for the index, so no new model and no new runtime cost beyond the embed_query() per write/query. memory/db.py - CURRENT_VERSION → 2; bootstrap creates `decisions_vec` and `turn_summaries_vec` (vec0 float[384]) alongside the v1 tables. - `connect()` loads sqlite-vec via `enable_load_extension`; if the load fails the db still opens but vec tables are skipped (FTS-only fallback) and helpers no-op. - In-place v1 → v2 upgrade adds the empty vec tables; existing rows are populated by `backfill_vec_tables(conn, embedder)` on the next MCP-server start. - New helpers: `record_decision_vec`, `record_turn_summary_vec`, `search_decisions_vec`, `search_turn_summaries_vec`, `backfill_vec_tables`, `has_vec_tables`. `_write_vec_row` swallows dim mismatches so a swapped embedder doesn't break source-table inserts — the row simply isn't semantically searchable until vec tables are rebuilt. - We don't add vec tables for prompts (raw user text is rarely the right semantic anchor) or code_areas (file-path keyed; LIKE is enough). memory/compressor.py - `compress_turn` writes the new turn summary's embedding to turn_summaries_vec right after persisting the row. integration/mcp_server.py - At MCP-server startup, run `backfill_vec_tables` so projects that ran on v1 pick up semantic recall on next launch. - `record_decision` dual-write now also writes to decisions_vec. - `_search_sessions` is now hybrid: it unions FTS5 hits and vec hits by row id (no double-formatting), then runs the existing cosine rank over the merged candidate pool. Empty vec hits (extension missing or tables empty) leave it FTS-only — no behaviour change in the degraded path. tests/memory/test_db.py - Five new tests cover v2 bootstrap, decision/turn vec write+search, backfill on a v1-shaped db, and the v1→v2 upgrade-in-place path. - `_FakeEmbedder` produces 384-dim deterministic vectors so tests don't pay fastembed init cost. Tests: 78/78 memory + dashboard + integration. The single Ollama timeout in the full suite is unrelated and predates this commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Substantial follow-up to the sqlite-vec layer. Five concrete improvements, each addressing a real issue from the deep review: session_recall now returns a TL;DR header Top extractive sentences (3) pulled from the top-N matches via the same bge-small extractive worker the compressor uses — no LLM call, no hallucination, ~50 ms on the asyncio thread. Header is suppressed when there are <3 matches (would just echo them). Format: TL;DR (N matches for 'topic'): <2-3 extracted sentences> Source matches: - [decision src=...|sid:...] ... - [turn sid:...|n:...] ... Provenance tags survive so callers can drill via session_event / session_timeline; tag prefix is stripped before summarisation so the summariser sees content, not metadata. Hybrid recall via reciprocal rank fusion The previous "embed every candidate" pipeline is gone. Each source produces its own ranked list (FTS5 decisions, FTS5 turns, vec decisions, vec turns, JSON-cosine, code_areas LIKE), then `_rrf_merge` fuses them via 1/(60+rank). Items found by multiple sources rise. Vec hits no longer get re-embedded — sqlite-vec's rank is preserved. RRF k=60 is the canonical Cormack/Clarke/Buettcher 2009 value. Backfill on a daemon thread `_spawn_vec_backfill` opens its own connection (sqlite3 enforces check_same_thread) and embeds historical decisions/turns out of band, so MCP startup no longer stalls on a many-second embed-everything sweep for projects that ran on v1. Cleanup triggers for orphaned vec rows decisions_vec_ad / turn_summaries_vec_ad fire AFTER DELETE on the source tables and drop the matching vec rowid. Without these, FK cascades / explicit deletes leaked rows in the vec tables. Triggers are added on bootstrap and on v1→v2 upgrade. Refactor: _search_sessions split into focused methods _collect_json_candidates / _rank_json_candidates / _collect_memory_db_candidates / _format_decisions_in_id_order / _format_turns_in_id_order. Each is independently testable; the top- level method is now a 20-line composition. Tests +7 new tests (RRF behaviour, tag stripping, TL;DR present/absent, vec-on-source-delete cleanup for both decisions and turn_summaries). 85/85 memory + dashboard + integration pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three small, independent fixes that shave the per-call token budget the agent pays for memory. mcp_server.py - _rrf_merge now dedupes by stripped content, so the same decision showing up as both [decision src=manual] and [decision src=migrated] during the JSON↔memory.db dual-write window collapses into one boosted entry instead of inflating recall by ~10–20%. - _format_decisions_in_id_order / _format_turns_in_id_order append a relative-time hint ("3d ago") and a callable drill affordance ("→ session_timeline(\"<sid>\")" / "→ session_event(id=<n>)"). Saves the agent a follow-up call most of the time and gives the model a temporal signal it previously had to infer. - session_event applies a read-time cap (_EVENT_PAYLOAD_READ_CAP=4 KB) via the new _truncate_payload helper. Inputs already had a 4 KB write cap; outputs were stored uncapped, so a captured 50 KB Bash stdout previously re-fed ~12 k tokens on every fetch. - New helper: _humanise_relative_time (just-now / 5m / 3h / 4d / 2mo / 1y), defensive about None and bad input. Tests - +5 tests covering RRF dedup, recency formatting, payload truncation, and the new recall-line drill affordance. - 114/114 memory + dashboard + integration pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Each compression_loop iteration now runs the heavy work — embed_query calls + SQLite INSERT — on a worker thread via asyncio.to_thread, with a thread-local sqlite3 connection (sqlite3 enforces same-thread). Previously a 50-turn backlog could freeze mcp.run_stdio() for ~30 s while the loop drained synchronously. Now the asyncio thread only runs the queue peek + sleep pacing. compressor.py - _drain_one_sync(conn, embedder) — pure-sync; called from either the main thread (tests) or a worker thread (production). - _drain_one_threaded(db_path) — opens worker-local conn, calls _drain_one_sync, closes. Reads the embedder off the function's attribute set by compression_loop, which keeps the to_thread closure-free (no risk of capturing the asyncio loop). - _drain_one(conn, embedder) — async test shim around _drain_one_sync. - compression_loop now takes db_path; passes it through to_thread. Still accepts a sqlite3.Connection for back-compat with the existing test that drives the loop directly. cli.py - cce serve no longer opens a long-lived compression_conn; passes memory_db_path(storage_base) to compression_loop. Tests: 11/11 compressor + memory_loop pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Caps memory.db growth and finally makes the "raw payload aged out" branch in session_event reachable. memory/db.py - prune_old_payloads(conn, days=30) — finds payloads referenced only by tool_events older than `days`, NULL-out raw_output and set raw_input='' (raw_input has NOT NULL on the v1 schema, so '' is its aged-out sentinel). size_bytes -> 0. Returns counts. - tool_events.summary stays — the gist of an aged event is still available via session_timeline / session_recall. cli.py - cce sessions prune now does two jobs: 1. JSON sessions consolidation (existing) 2. memory.db raw-payload retention (new, default 30d) - --retain-payloads-days flag for the second job. - Output is split per-job so it's clear what ran. mcp_server.py - _handle_session_event aged-out check accepts the new sentinel (`not row["raw_input"]`) in addition to NULL — previously the check was unreachable because nothing wrote NULL. Tests: 115/115. New test verifies (a) old payloads are aged out, (b) recent payloads are kept, (c) tool_events.summary is untouched, (d) the prune is idempotent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Answers "is memory broken?" in one command, without `cce serve`. Replaces the previous workflow of opening sqlite3 against memory.db. cli.py - New `cce sessions status` group entry. Reports: project + storage path memory.db path + size in KB schema version + sqlite-vec availability sessions count by status (active/completed/failed) decisions count by source (manual/auto/migrated) compressed turn_summaries count pending_compressions queue depth + max attempts (CROSS if stuck) vec coverage: decisions=N/total, turns=N/total retained raw payload count + estimated MB - Falls back to a "not initialised" message with a how-to-bootstrap hint when memory.db doesn't exist yet. Tests: 3 new in tests/test_cli_sessions_status.py covering missing-db, populated-db rendering (schema, counts, queue drained), and stuck-queue warning surfacing. 118/118 across the memory + dashboard + integration + CLI suite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three production-affecting defects surfaced during a deep wire-up audit of the ai-memory branch. pyproject.toml - Promote aiohttp from optional [http] extra to a core dependency. The hook server (memory/hook_server.py) and the hook handlers (memory/hooks.py) import it unconditionally — the optional gate was a footgun that left default installs unable to capture. - Keep [http] as an empty back-compat marker so existing extras references don't break. memory/db.py — backfill_vec_tables is now incremental - Was: "only run if vec table is empty". Effect: the moment a single decision was recorded manually, all subsequently-migrated rows were permanently invisible to semantic recall — startup backfill skipped because vec was no longer empty. - Now: embed any source rows missing from vec, regardless of vec population. Idempotent and safe to run on every MCP startup. Picks up rows imported by `cce sessions migrate` (which has no embedder), rows captured while the vec extension was unavailable, and the original v1→v2 upgrade backfill. integration/mcp_server.py — _handle_session_event triage - Three states are now distinguished: a) payload_id IS NULL → "no captured payload — only the descriptor" b) payload_id present, raws cleared → "aged out of retention window" c) raws populated → normal payload render - Previously (a) and (b) collapsed into the "aged out" branch — the user would be told their payload was retention-pruned even when no payload row had ever existed. Tests - test_session_event_returns_no_payload_message_when_payload_id_null: keeps the historical NULL-payload coverage but renames + asserts on the correct (non-aged-out) message. - test_session_event_returns_aged_out_message_after_prune: new test that creates a real payload row, runs prune_old_payloads(days=0), then exercises the actual aged-out branch. - 119/119 across memory + dashboard + integration + CLI. - 15/15 across the previously-skipped tests/memory/test_hooks.py and test_hook_installer.py (pytest-aiohttp + aiohttp now in env). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

End-to-end test against /home/fazle/trading/v3 with bge-small + 4 real trading decisions surfaced three real bugs the unit tests didn't catch. mcp_server.py — RRF dedup widened - _content_key strips both the [tag] prefix *and* the trailing " · 5m ago · → session_timeline(...)" affordance, so the same decision rendered through different paths (memory.db with hints + JSON history without) collapses to one entry instead of appearing twice. Previously the unit dedup test passed because both inputs had the same suffix; the live test caught the gap. - When two paths produce the same content key, the richer-rendered form (with the affordance hints) wins as the visible label. mcp_server.py — TL;DR rewritten as bullets - Was: extractive_summary returned a space-joined paragraph because the underlying joiner is " ". With short matches and no sentence-ending punctuation, the result was a wall of text. - Now: embed each match, score by cosine to the centroid, render the top-3 most central matches as bullet points. Same algorithm, readable output. mcp_server.py — JSON-cosine embeds clean content - Was: embed_query("[decision] Roll positions at expiry-2 — Avoids assignment risk on Friday") — the [tag] prefix is metadata noise that inflates similarity for unrelated topics. - Now: embed_query(_content_key(text)) — clean signal, no metadata. memory/db.py — vec distance threshold + tuning - _VEC_MAX_DISTANCE = 0.92 (was 1.0). bge-small's noise floor on short English is cosine ≈ 0.50 — random off-topic queries against the trading corpus hit 0.535. So 0.58 (L2 ≤ 0.92) is the threshold that keeps real paraphrase matches and rejects noise. - search_*_vec accept a max_distance kwarg so deterministic test embedders (which don't satisfy bge-small-tuned thresholds) can pass max_distance=99.0. mcp_server.py — _SESSION_RECALL_MIN_SIM 0.35 → 0.55 - Aligned with the same noise-floor reasoning. 0.55 catches the real "risk management" → "Risk limit at 2%" paraphrase (0.638) while rejecting "how is the weather today" against trading. Live test results (against /home/fazle/trading/v3 with 4 decisions): · "risk management" → 2 relevant matches (Risk limit, Roll positions w/ "assignment risk") · "machine learning model" → 2 matches (XGBoost #1 + Risk limit) · "sqlite" → 2 matches (SQLite #1) · "how is the weather today" → 0 matches (clean rejection) Persistence verified: fresh MCP restart recalls decisions from the prior session via the existing memory.db, with the original session_id preserved in the affordance hint. Tests: 134/134 across memory + dashboard + integration + CLI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three independent quality-of-life wins that surfaced from the live test + post-mortem review. memory/hook_installer.py — shell-quote the hook command path - Was: f"{HOOK_PATH} {hook_name}" — Claude Code passes this through sh -c, which tokenises on whitespace. Any user with a space in HOME (very common on macOS: "/Users/Firstname Lastname") would get a shell error and silent-broken capture. - Now: shlex.quote(str(HOOK_PATH)) — handles spaces and any other shell metacharacters. New regression test creates a path containing "Alice Smith" and asserts the quoting. cli.py — cce init memory-capture reachability probe - After install_settings() wires the hooks, _check_memory_capture_reachable looks for the storage_base/serve.port file and tries a TCP connect on 127.0.0.1:<port>. Three states reported: a) port file missing → "cce serve hasn't been started" with clear next-steps text (warns it's silently dropped otherwise) b) port file present but nothing listening → "stale" warning c) reachable → ✓ confirmation - Hooks fail closed (curl ... || true), so this is the only place the user gets told that capture isn't actually working before they restart Claude Code expecting it to. - 4 unit tests cover all four code paths (missing, stale, live, unparsable port file). scripts/bench_recall.py — recall quality benchmark - Seeds a tmp memory.db with 46 known decisions across 7 topics (auth, db, ml, infra, perf, testing, frontend) and runs 12 known queries — including 2 deliberately off-topic ("how is the weather today", "best ice cream flavour") to measure rejection. - Reports recall@k, precision@k, MRR per query + aggregate. - --min-sim and --vec-max flags let the caller sweep thresholds without code changes. Run it whenever bge-small swaps, the corpus shape changes, or the recall pipeline is touched. - Current defaults (min_sim=0.55, vec_max=0.92): R@5=0.75 P@5=0.40 MRR=0.67 over the 12-query suite. Tightening to 0.60/0.85 lifts precision slightly with no recall loss; looser values lose both. - Documents an FTS stop-word leakage in the corpus ("how is the weather today" returns 7 because FTS5 OR-matches on "is/the/today" substrings) — left as a follow-up since the fix is non-trivial. Tests: 139/139 across memory + dashboard + integration + CLI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Four independent fixes that surfaced from the live test + post-mortem. All measured against the recall benchmark; aggregate metrics improved (R=0.68→0.75, P=0.39→0.46, MRR=0.68→0.74) — see scripts/bench_recall.py. mcp_server.py — FTS5 stop-word filter - _FTS_STOP_WORDS: conservative function-word list (~150 entries: articles, auxiliaries, pronouns, prepositions, conjunctions, interrogatives). Topic words (code, auth, database, improve, scale) are deliberately NOT in the list. - _strip_stop_words used in three places: a) _fts_match_query — FTS5 OR-match no longer hits "is/the/today" b) _rank_json_candidates topic embed — sharper topic vector on conversational queries c) search_decisions_vec / search_turn_summaries_vec topic embed - "how is the weather today" now returns 0 false positives (was 7). "how can we improve code quality" still returns 39 because bge-small finds half an engineering corpus semantically related to that phrase — a model limit, not a filter limit. R@5 unchanged on real queries; off-topic queries reject cleanly. hook_server.py — rendezvous port file at default location - Authoritative port file still lives at <storage_base>/serve.port. - When storage_path is customised in config.yaml (anything other than ~/.cce/projects), also write the port to the *default* rendezvous location ~/.cce/projects/<name>/serve.port. The hook shell script always reads from the default location because it has no way to read config.yaml — this is what keeps capture wired up for users with custom storage paths. cli.py — _check_memory_capture_reachable also checks rendezvous - Falls back to the default-path rendezvous when the storage-local file isn't there. So `cce init` reports "active" correctly even for users with custom storage who already have `cce serve` running. cli.py — auto-prune background task in `cce serve` - _auto_prune_loop runs prune_old_payloads(days=30) once daily on a daemon-thread executor. Staggered start (120s) so it doesn't compete with vec-backfill / compression-loop on cold-start. Cancellation wired into the existing serve cleanup. Users who never invoke `cce sessions prune` manually now get bounded memory.db growth automatically. hook_installer.py — Windows .cmd hook script - HOOK_SCRIPT_NAME is now platform-aware: cce_hook.cmd on Windows, cce_hook.sh elsewhere. New _hook_script_body() returns the matching body. The .cmd version uses cmd.exe syntax, looks up the port via %USERPROFILE%\\.cce\\projects\\<name>\\serve.port, and exits 0 on every error path so capture never blocks the user. HOOK_MARKER widened from "cce_hook.sh" to "cce_hook" so the uninstall path matches both extensions. - +2 tests covering both platform branches via monkeypatch. bench_recall.py — conversational-query test cases - Added "how can we improve code quality" and "what should we do about database performance" so future stop-word tweaks can be measured against real conversational phrasing instead of just keyword queries. Tests: 141/141 across memory + dashboard + integration + CLI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The "Warm fastembed model cache" step pre-downloads bge-small to avoid the xdist worker race on first model load. But four tests in the suite also use `all-MiniLM-L6-v2` (test_embedder.py, test_retriever.py, test_embedding_cache.py × 4 cases) — those races weren't covered, so under sufficient timing pressure CI sees: ONNXRuntimeError NO_SUCHFILE — Load model from /tmp/fastembed_cache/ models--qdrant--all-MiniLM-L6-v2-onnx/.../model.onnx failed. File doesn't exist It manifests intermittently (the run on PR #7 hit it; main was passing). Cure is the same as for bge-small: warm the model in a single process before pytest spawns the four xdist workers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…isions" The headline problem the memory feature is supposed to solve: you sit down today, Claude Code opens, and you DON'T have to re-explain what you decided last week. Until this commit the data was captured + queryable but not *surfaced*. The SessionStart hook only inserted the new sessions row; prior decisions stayed behind a session_recall tool call the agent had to remember to make. CLAUDE.md said it should, but instruction-following isn't a guarantee. memory/hooks.py — handle_session_start now returns plain text - New build_session_resume(conn, project) builds a markdown block: ## CCE memory · resuming <project> **Previous session** (<ended_at>): <rollup_summary lines> **Recent decisions** (most-recent first): - <decision> — <reason> (session: `<sid>`) ... (top 5) Call session_recall("<topic>") for more, or session_timeline("<sid>") to drill in. - Empty string for a brand-new project (no awkward header on the first session). - Reasons truncated at 200 chars so a single rambling decision can't blow up the resume. - Wrapped in try/except so a bad query never breaks SessionStart. memory/hook_installer.py — shell script captures SessionStart stdout - POSIX: branches on $HOOK_NAME. SessionStart's curl response is captured into RESPONSE and printed to stdout. Other hooks keep their stdout/stderr discarded as before. Timeout bumped to 2s for SessionStart since the resume query is a hair slower than a write. - Windows .cmd: same shape using a temp file (cmd can't easily capture command output to a variable across multiple lines). Cleans the temp file afterwards. Why this fixes the problem Claude Code captures the SessionStart hook's stdout and injects it into the model's context at the start of the conversation. So at prompt 0 the model already sees the prior rollup + recent decisions — no tool call required, no "what did we decide about auth last week?" round-trip with the user. Live-tested against /home/fazle/trading/v3: 4 trading decisions recorded yesterday surface verbatim in the SessionStart response body, ready to be injected. Tests: +1 (test_session_start_returns_resume_with_prior_rollup_and_decisions) covering the rollup + decisions path. 142/142 across memory + dashboard + integration + CLI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Without this widening, the resume context that handle_session_start returns is only injected on cold-start. After the user runs `/clear` (wipes context) or `/compact` (trims context), Claude Code re-issues a SessionStart event — but the matcher="" we had before only matched the default (startup) variant on some Claude Code versions, leaving us silent exactly when re-injection matters most. hook_installer.py - New HOOK_MATCHERS dict — per-hook matcher overrides keyed by name. SessionStart gets "startup|clear|compact"; everything else stays "". - install_settings() reads from this dict instead of hardcoding "". Trade-off Matcher is now a regex-like alternation, but Claude Code accepts that natively (see claude-mem's hooks.json which uses the same three triggers). On a Claude Code version that doesn't recognise one of the subtypes, the hook just won't fire for that subtype — the others still work, so no behavioural regression. Tests +1 (test_session_start_matcher_covers_clear_and_compact) asserts all three triggers are in the SessionStart matcher and the other four hooks keep their empty matcher. 143/143 across the suite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pre-merge code-audit found 4 actionable items. Fixed; CI to confirm. hook_installer.py — Windows hook command quoting (production bug) - Was: shlex.quote on every platform, including Windows. shlex emits POSIX single quotes — cmd.exe doesn't dequote those, so any Windows user with a space in %USERPROFILE% (e.g. C:\\Users\\Alice Smith\\) got a silently-broken hook command. Hooks fail closed; no error surfaces to the user. - Now: new _quote_hook_path branches on _is_windows() and emits cmd.exe double quotes on Windows, sh single quotes on POSIX. - +1 test (test_install_settings_uses_cmd_quoting_on_windows) using monkeypatch to force the Windows branch. cli.py — CLAUDE.md template now documents session_timeline / session_event - The SessionStart resume body's affordance line literally tells the model to "Call session_timeline(\"<sid>\")", but CLAUDE.md (which primes the agent) didn't document either tool. Bumped block version to "3" so cce init re-renders. New "Drilling deeper from a recall hit" subsection covers both tools and when to prefer each. memory/db.py — auto_prune_loop extracted to module level (testable) - Was: inline closure inside cli._run_serve. Untestable without spinning up the whole MCP server. - Now: auto_prune_loop(storage_base, days, initial_delay, interval, stop_event) at module level. Test suite injects 0.0/0.05 timing instead of waiting 120s + 86400s. Same behaviour; same defaults. - +2 tests: test_auto_prune_loop_runs_one_iteration — old payload gets pruned on the first pass. test_auto_prune_loop_stop_event_short_circuits_initial_delay — clean exit during the stagger. tests/memory/test_hooks.py — build_session_resume edge cases - Pre-merge audit found only the "rollup + decisions" branch was tested; the two more-common branches (decisions only, rollup only) weren't. Added both as separate tests so a regression that breaks the week-1 path (decisions only, no completed-session rollup yet) can't pass CI. tests/test_cli_init_probe.py — rendezvous-fallback path - Pre-merge audit found that 8ed9f70's "fall back to default-path rendezvous when storage-local is missing" had no test covering that branch. Added one — monkeypatches Path.home() and verifies the probe finds the listening port via the rendezvous file alone. Tests: 149/149 across memory + dashboard + integration + CLI. Deferred to follow-ups (not merge-blockers per the audit): · _drain_one_threaded._embedder closure — only matters if two compression_loop instances run concurrently, which never happens in production today. · Rendezvous-write-failure escalation to status — quiet edge case. · Windows .cmd %TEMP% with parentheses — already partly mitigated by setlocal enabledelayedexpansion; revisit if a real Windows user reports it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reconciles two independent callbacks for showing motion during long indexing runs on large repos. They serve different timescales and are complementary, so this commit keeps both rather than picking one. pipeline.run_indexing now accepts: · embed_progress_fn(current, total) — per-batch numeric ticks during the embed phase. Already wired through cli.py to a live progress bar. · phase_fn(msg) — string status before each major phase ("Embedding 32k chunks (CPU-bound, can take several minutes)…", "Writing 32k chunks to vector + FTS + graph index…"). Closes the in-place chunking bar first so the message doesn't get overwritten via \\r. cli.py defines both callbacks; non-verbose TTY runs see a chunking bar → phase line → embed bar → phase line → embed bar's final tick. embedder.py keeps the canonical (current, total) numeric API. The WIP's alternate string-message API has been dropped — cli.py's bar already delivers the "still alive" intent through chunks/N motion. tests/indexer/test_pipeline_phase_progress.py · test_phase_fn_announces_embedding_and_ingest — pins that "Embedding…" and "Writing…" phase markers fire from run_indexing so a 10-30 min embed phase on a 7035-file repo doesn't look hung. · test_embedder_calls_progress_fn_during_inference — rewritten to use the canonical numeric callback; asserts the final tick reports full chunk count and embeddings actually attached. Resolution context: PR #7 (memory feature) merged into main while these indexer-progress changes were stashed. Both branches independently added new keyword args to run_indexing's signature, creating a parameter-list conflict that resolved cleanly by keeping both. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fazleelahhee and others added 7 commits April 28, 2026 00:22

fazleelahhee self-assigned this Apr 28, 2026

fazleelahhee and others added 7 commits April 28, 2026 00:30

fazleelahhee requested review from Copilot and rajkumarsakthivel April 28, 2026 04:07

fazleelahhee and others added 5 commits April 28, 2026 04:30

This was referenced Apr 28, 2026

feat(memory): adopt deterministic prose compression (cavemem-style grammar) #8

Closed

feat(install): cross-IDE installers — Cursor, Gemini CLI, OpenCode, Codex #9

Open

fazleelahhee merged commit 674923e into main Apr 28, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): claude-mem feature parity — 5-step series consolidated#7

feat(memory): claude-mem feature parity — 5-step series consolidated#7
fazleelahhee merged 21 commits into
mainfrom
ai-memory

fazleelahhee commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fazleelahhee commented Apr 28, 2026

Summary

Copilot review items addressed

Test plan

Stats

Closes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant