feat(memory): claude-mem feature parity — 5-step series consolidated#7
Merged
Conversation
Captures the 6 design decisions from brainstorming: auto+explicit capture with source column, background compression worker in cce serve, per-turn + per-session rollup granularity, extended session_recall + new session_timeline/session_event MCP tools, one-shot cce sessions migrate command, three dashboard panels. Compressor is extractive using BAAI/bge-small-en-v1.5 already loaded for the index — no new dependencies, no extra RAM, no Ollama required. Ships as 5 sequential PRs off this branch; each independently reviewable.
PR 1 of 5 against feature/memory-claude-mem-parity. Lays the storage foundation: per-project SQLite at ~/.cce/projects/<name>/memory.db with the v1 schema (sessions, prompts, tool_events, tool_event_payloads, turn_summaries, decisions, code_areas, pending_compressions, migrated_files, schema_versions) plus FTS5 virtual tables and triggers for prompts / decisions / turn_summaries. Adds `cce sessions migrate` — idempotent one-shot importer for legacy per-session JSON files (current path and pre-rebrand ~/.claude-context-engine/...). Imported decisions and code areas are tagged source='migrated' so future session_recall can rank them. Consumed JSONs are archived into sessions/migrated.zip and removed. No behaviour change to the existing JSON capture path. Hooks and the compression worker land in PR 2 and PR 3. 10 new unit tests cover: schema bootstrap, idempotent reconnection, foreign-key enforcement, FTS triggers, migration of session JSON + decisions_log archive, idempotent rerun, archive-and-remove. Full suite: 311 passed, 1 skipped. See docs/specs/2026-04-28-memory-claude-mem-parity-design.md.
PR 2 of 5 against feature/memory-claude-mem-parity, stacked on PR 1. Adds the auto-capture pipeline: - src/context_engine/memory/hooks.py — aiohttp handlers for /hooks/SessionStart, /hooks/UserPromptSubmit, /hooks/PostToolUse, /hooks/Stop, /hooks/SessionEnd. Each writes the appropriate row(s) to memory.db. Compression for the just-ended turn is enqueued in pending_compressions (UNIQUE constraint dedupes Stop + next-prompt double-fire). All errors are logged and return 202 — capture is best-effort and must never block the user's flow. - src/context_engine/memory/hook_server.py — loopback aiohttp listener on a random free port, started as a background asyncio task from cce serve's _run_serve. Port written to <storage_base>/serve.port. Cleanly shut down on MCP server exit. - src/context_engine/memory/hook_installer.py — installs ~/.cce/hooks/cce_hook.sh (the thin shell shim that POSTs hook payloads to the local port) and wires .claude/settings.json with all 5 lifecycle entries. Idempotent. Preserves user-added hooks. Uninstall removes only entries whose command points at our script. - cce init wires the installer in step 5; _run_serve spawns the hook server alongside the MCP stdio loop and watcher. Tests (15 new, full suite 326 passed): - tests/memory/test_hooks.py: integration tests via aiohttp_client for all 5 endpoints, including dedup, prompt-number assignment, payload sidecar table, and 400 on missing session_id. - tests/memory/test_hook_installer.py: script write + chmod, idempotent reinstall, settings.json merge preserving user hooks, uninstall removing only ours. pyproject.toml: pytest-aiohttp added to dev/dependency-groups. No change to recall surface — session_recall still reads JSON. PR 4 retires the JSON path and points recall at memory.db.
PR 3 of 5 against feature/memory-claude-mem-parity, stacked on PR 2. Adds the background compression half: - src/context_engine/memory/extractive.py — sentence splitter + centroid-based extractive summariser. No new dependencies; takes any embedder exposing embed_query(str) -> Iterable[float]. Pure source text; no synthesis means no hallucination. - src/context_engine/memory/compressor.py — compress_turn() and compress_session_rollup() build the candidate text from prompts + tool_events (+payloads), run the extractive summariser using the bge-small embedder already loaded for the index, and persist to turn_summaries / sessions.rollup_summary with tier='extractive'. Falls back to truncation when embedder is None or extractive raises. compression_loop() drains pending_compressions every 5 s, oldest first, single-flight by design. Queue rows that error are kept with attempts++ and last_error stamped for retry. - _run_serve in cli.py spawns the compression worker alongside the hook server. Owns its own sqlite connection to avoid cross-thread use; gracefully cancelled and closed on MCP server exit. Tests (14 new, full suite 340 passed): - tests/memory/test_extractive.py: sentence split, centroid neighbour selection, source-order preservation, truncation fallback shape. - tests/memory/test_compressor.py: turn compression with extractive + truncation tiers, session rollup combining turn summaries, empty rollup when no turns, _drain_one queue pop, compression_loop with stop_event for graceful shutdown. Recall surface unchanged from main; PR 4 retires the JSON path.
… session_event
PR 4 of 5 against feature/memory-claude-mem-parity, stacked on PR 3.
Wires the MCP retrieval surface to memory.db while preserving the
existing JSON path for backward compatibility.
- ContextEngineMCP opens memory.db on startup and seeds an
INSERT OR IGNORE sessions row so manual record_decision /
record_code_area dual-writes don't fail the FK constraint when the
SessionStart hook hasn't fired yet (test envs, future-non-CC clients).
- session_recall now folds three new candidate sources on top of the
existing JSON sessions / consolidated decisions:
- decisions (manual + migrated, last 200 by recency)
- code_areas (manual + migrated, last 200 by recency)
- turn_summaries (last 200 turns, the layer-1 compact index)
Tags include source and session_id so the agent can drill via the
new tools.
- session_timeline(session_id, limit=20) — layer 2. Returns the
session's turn_summaries with rollup + status header.
- session_event(event_id) — layer 3. Returns the raw input/output
payload for one tool_event, with a dedicated "aged out" message
when the payload row was pruned by retention.
- record_decision and record_code_area now dual-write to memory.db
with source='manual'. Prior JSON write path remains active so a
rollback to a previous PR doesn't lose recall coverage. The JSON
write side can be retired once parity is confirmed in production.
Tests (10 new, full suite 350 passed):
- tests/memory/test_mcp_recall.py covers dual-write of decisions and
code_areas, session_timeline with seeded summaries, session_event
payload roundtrip + aged-out message + invalid id, session_recall
surfacing memory.db decisions, and TOOL_NAMES registration.
PR 5 of 5 against feature/memory-claude-mem-parity, stacked on PR 4.
Adds three new memory.db-backed views to the existing CCE dashboard.
The dashboard never holds a memory.db handle — each request opens a
short-lived connection so the MCP server's writes are never blocked.
Backend (src/context_engine/dashboard/server.py):
- GET /api/memory/sessions[?limit] — list sessions, most-recent first.
- GET /api/memory/sessions/{id}/timeline — one session's metadata +
ordered turn_summaries.
- GET /api/memory/decisions[?q&source] — FTS5 search over decisions
with optional source facet (manual|auto|migrated). User input is
phrase-quoted so FTS5 metacharacters like '-' are treated literally
('bge-small' matches its own substring rather than parsing as
'bge AND NOT small'). On any FTS5 syntax fall-through, fall back to
unranked recent listing — no 500s.
Frontend (src/context_engine/dashboard/_page.py):
- New 'Memory' nav entry between Sessions and Analytics.
- New page-memory section with a Sessions / Decisions tab toggle.
- Sessions tab shows a list; clicking a row opens a turn-by-turn
timeline panel below with rollup at the top.
- Decisions tab is a search box + source select, results in a table
tagged by source.
Tests (9 new in tests/dashboard/test_memory_endpoints.py, full suite
359 passed, 1 skipped):
- Sessions list ordering, empty case when memory.db absent.
- Timeline returns header + turns for known session, null for unknown.
- Decisions FTS5 with and without query, source facet, combined
filters, and the phrase-quoting fix for hyphenated input.
Closes the 5-PR series. Spec at
docs/specs/2026-04-28-memory-claude-mem-parity-design.md.
Consolidates the 11 actionable items from the 5-PR memory stack into one
fixup commit on the unified ai-memory branch.
migrate.py
- Archive *before* mark-imported + commit, with rollback on zip failure
so a failed archive no longer leaves files stuck imported-but-not-archived.
- Preserve session_id linkage from decisions_log.json when the referenced
session already exists (was unconditionally NULL).
- Memoise _session_exists per-file (constant per archive entry).
- Use `timestamp is not None` so legacy 0/0.0 timestamps keep their original
ordering instead of being stamped to "now".
compressor.py
- Drop unused `Any` import.
- Cap raw_input at _TOOL_INPUT_CHAR_CAP (4 KB) before json.loads so multi-MB
patch payloads don't stall the compression worker.
- Yield the asyncio loop after every drained item, with a short breath every
5 items, so a backlog doesn't monopolise mcp.run_stdio().
mcp_server.py
- Use FTS5 MATCH on decisions_fts / turn_summaries_fts (and a LIKE filter on
code_areas) to prefilter recall candidates instead of embedding the latest
600 rows on every session_recall call.
- Clamp `limit` in session_timeline to 1..200 with a helper that handles bad
input cleanly.
- Wrap the sessions metadata query and the session_event payload query in
try/except so DB errors return a specific message.
- Handle NULL raw_input/raw_output in session_event without rendering the
string "None".
- Update the dual-write comment to reflect that recall now goes through FTS5.
design spec
- Drop the inaccurate "Eight tables" sentence; describe the support and
FTS5 tables that ship in v1 instead.
Tests: 43/43 memory + dashboard pass; 344/345 across the full suite (the
single failure is test_ollama_client hitting httpx.ReadTimeout because no
Ollama is running in this environment — unrelated).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 28, 2026
Closed
Adds a vec0 layer over the existing FTS5 store so session_recall can
match paraphrases and synonyms, not just lexical overlap. sqlite-vec
is already a hard dep (used by storage/vector_store.py for code chunks)
and bge-small is already loaded for the index, so no new model and no
new runtime cost beyond the embed_query() per write/query.
memory/db.py
- CURRENT_VERSION → 2; bootstrap creates `decisions_vec` and
`turn_summaries_vec` (vec0 float[384]) alongside the v1 tables.
- `connect()` loads sqlite-vec via `enable_load_extension`; if the
load fails the db still opens but vec tables are skipped (FTS-only
fallback) and helpers no-op.
- In-place v1 → v2 upgrade adds the empty vec tables; existing rows
are populated by `backfill_vec_tables(conn, embedder)` on the next
MCP-server start.
- New helpers: `record_decision_vec`, `record_turn_summary_vec`,
`search_decisions_vec`, `search_turn_summaries_vec`,
`backfill_vec_tables`, `has_vec_tables`. `_write_vec_row` swallows
dim mismatches so a swapped embedder doesn't break source-table
inserts — the row simply isn't semantically searchable until vec
tables are rebuilt.
- We don't add vec tables for prompts (raw user text is rarely the
right semantic anchor) or code_areas (file-path keyed; LIKE is
enough).
memory/compressor.py
- `compress_turn` writes the new turn summary's embedding to
turn_summaries_vec right after persisting the row.
integration/mcp_server.py
- At MCP-server startup, run `backfill_vec_tables` so projects that
ran on v1 pick up semantic recall on next launch.
- `record_decision` dual-write now also writes to decisions_vec.
- `_search_sessions` is now hybrid: it unions FTS5 hits and vec hits
by row id (no double-formatting), then runs the existing cosine
rank over the merged candidate pool. Empty vec hits (extension
missing or tables empty) leave it FTS-only — no behaviour change
in the degraded path.
tests/memory/test_db.py
- Five new tests cover v2 bootstrap, decision/turn vec write+search,
backfill on a v1-shaped db, and the v1→v2 upgrade-in-place path.
- `_FakeEmbedder` produces 384-dim deterministic vectors so tests
don't pay fastembed init cost.
Tests: 78/78 memory + dashboard + integration. The single Ollama timeout
in the full suite is unrelated and predates this commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Substantial follow-up to the sqlite-vec layer. Five concrete improvements,
each addressing a real issue from the deep review:
session_recall now returns a TL;DR header
Top extractive sentences (3) pulled from the top-N matches via the same
bge-small extractive worker the compressor uses — no LLM call, no
hallucination, ~50 ms on the asyncio thread. Header is suppressed when
there are <3 matches (would just echo them).
Format:
TL;DR (N matches for 'topic'):
<2-3 extracted sentences>
Source matches:
- [decision src=...|sid:...] ...
- [turn sid:...|n:...] ...
Provenance tags survive so callers can drill via session_event /
session_timeline; tag prefix is stripped before summarisation so the
summariser sees content, not metadata.
Hybrid recall via reciprocal rank fusion
The previous "embed every candidate" pipeline is gone. Each source
produces its own ranked list (FTS5 decisions, FTS5 turns, vec
decisions, vec turns, JSON-cosine, code_areas LIKE), then `_rrf_merge`
fuses them via 1/(60+rank). Items found by multiple sources rise.
Vec hits no longer get re-embedded — sqlite-vec's rank is preserved.
RRF k=60 is the canonical Cormack/Clarke/Buettcher 2009 value.
Backfill on a daemon thread
`_spawn_vec_backfill` opens its own connection (sqlite3 enforces
check_same_thread) and embeds historical decisions/turns out of band,
so MCP startup no longer stalls on a many-second embed-everything
sweep for projects that ran on v1.
Cleanup triggers for orphaned vec rows
decisions_vec_ad / turn_summaries_vec_ad fire AFTER DELETE on the
source tables and drop the matching vec rowid. Without these, FK
cascades / explicit deletes leaked rows in the vec tables. Triggers
are added on bootstrap and on v1→v2 upgrade.
Refactor: _search_sessions split into focused methods
_collect_json_candidates / _rank_json_candidates /
_collect_memory_db_candidates / _format_decisions_in_id_order /
_format_turns_in_id_order. Each is independently testable; the top-
level method is now a 20-line composition.
Tests
+7 new tests (RRF behaviour, tag stripping, TL;DR present/absent,
vec-on-source-delete cleanup for both decisions and turn_summaries).
85/85 memory + dashboard + integration pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small, independent fixes that shave the per-call token budget the
agent pays for memory.
mcp_server.py
- _rrf_merge now dedupes by stripped content, so the same decision
showing up as both [decision src=manual] and [decision src=migrated]
during the JSON↔memory.db dual-write window collapses into one
boosted entry instead of inflating recall by ~10–20%.
- _format_decisions_in_id_order / _format_turns_in_id_order append
a relative-time hint ("3d ago") and a callable drill affordance
("→ session_timeline(\"<sid>\")" / "→ session_event(id=<n>)").
Saves the agent a follow-up call most of the time and gives the
model a temporal signal it previously had to infer.
- session_event applies a read-time cap (_EVENT_PAYLOAD_READ_CAP=4 KB)
via the new _truncate_payload helper. Inputs already had a 4 KB
write cap; outputs were stored uncapped, so a captured 50 KB Bash
stdout previously re-fed ~12 k tokens on every fetch.
- New helper: _humanise_relative_time (just-now / 5m / 3h / 4d /
2mo / 1y), defensive about None and bad input.
Tests
- +5 tests covering RRF dedup, recency formatting, payload
truncation, and the new recall-line drill affordance.
- 114/114 memory + dashboard + integration pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each compression_loop iteration now runs the heavy work — embed_query
calls + SQLite INSERT — on a worker thread via asyncio.to_thread, with
a thread-local sqlite3 connection (sqlite3 enforces same-thread).
Previously a 50-turn backlog could freeze mcp.run_stdio() for ~30 s
while the loop drained synchronously. Now the asyncio thread only runs
the queue peek + sleep pacing.
compressor.py
- _drain_one_sync(conn, embedder) — pure-sync; called from either the
main thread (tests) or a worker thread (production).
- _drain_one_threaded(db_path) — opens worker-local conn, calls
_drain_one_sync, closes. Reads the embedder off the function's
attribute set by compression_loop, which keeps the to_thread
closure-free (no risk of capturing the asyncio loop).
- _drain_one(conn, embedder) — async test shim around _drain_one_sync.
- compression_loop now takes db_path; passes it through to_thread.
Still accepts a sqlite3.Connection for back-compat with the
existing test that drives the loop directly.
cli.py
- cce serve no longer opens a long-lived compression_conn; passes
memory_db_path(storage_base) to compression_loop.
Tests: 11/11 compressor + memory_loop pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Caps memory.db growth and finally makes the "raw payload aged out"
branch in session_event reachable.
memory/db.py
- prune_old_payloads(conn, days=30) — finds payloads referenced
only by tool_events older than `days`, NULL-out raw_output and
set raw_input='' (raw_input has NOT NULL on the v1 schema, so
'' is its aged-out sentinel). size_bytes -> 0. Returns counts.
- tool_events.summary stays — the gist of an aged event is still
available via session_timeline / session_recall.
cli.py
- cce sessions prune now does two jobs:
1. JSON sessions consolidation (existing)
2. memory.db raw-payload retention (new, default 30d)
- --retain-payloads-days flag for the second job.
- Output is split per-job so it's clear what ran.
mcp_server.py
- _handle_session_event aged-out check accepts the new sentinel
(`not row["raw_input"]`) in addition to NULL — previously the
check was unreachable because nothing wrote NULL.
Tests: 115/115. New test verifies (a) old payloads are aged out,
(b) recent payloads are kept, (c) tool_events.summary is untouched,
(d) the prune is idempotent.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Answers "is memory broken?" in one command, without `cce serve`. Replaces
the previous workflow of opening sqlite3 against memory.db.
cli.py
- New `cce sessions status` group entry. Reports:
project + storage path
memory.db path + size in KB
schema version + sqlite-vec availability
sessions count by status (active/completed/failed)
decisions count by source (manual/auto/migrated)
compressed turn_summaries count
pending_compressions queue depth + max attempts (CROSS if stuck)
vec coverage: decisions=N/total, turns=N/total
retained raw payload count + estimated MB
- Falls back to a "not initialised" message with a how-to-bootstrap
hint when memory.db doesn't exist yet.
Tests: 3 new in tests/test_cli_sessions_status.py covering missing-db,
populated-db rendering (schema, counts, queue drained), and stuck-queue
warning surfacing. 118/118 across the memory + dashboard + integration
+ CLI suite.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three production-affecting defects surfaced during a deep wire-up audit
of the ai-memory branch.
pyproject.toml
- Promote aiohttp from optional [http] extra to a core dependency.
The hook server (memory/hook_server.py) and the hook handlers
(memory/hooks.py) import it unconditionally — the optional gate
was a footgun that left default installs unable to capture.
- Keep [http] as an empty back-compat marker so existing extras
references don't break.
memory/db.py — backfill_vec_tables is now incremental
- Was: "only run if vec table is empty". Effect: the moment a single
decision was recorded manually, all subsequently-migrated rows
were permanently invisible to semantic recall — startup backfill
skipped because vec was no longer empty.
- Now: embed any source rows missing from vec, regardless of vec
population. Idempotent and safe to run on every MCP startup.
Picks up rows imported by `cce sessions migrate` (which has no
embedder), rows captured while the vec extension was unavailable,
and the original v1→v2 upgrade backfill.
integration/mcp_server.py — _handle_session_event triage
- Three states are now distinguished:
a) payload_id IS NULL → "no captured payload — only the descriptor"
b) payload_id present, raws cleared → "aged out of retention window"
c) raws populated → normal payload render
- Previously (a) and (b) collapsed into the "aged out" branch — the
user would be told their payload was retention-pruned even when
no payload row had ever existed.
Tests
- test_session_event_returns_no_payload_message_when_payload_id_null:
keeps the historical NULL-payload coverage but renames + asserts
on the correct (non-aged-out) message.
- test_session_event_returns_aged_out_message_after_prune: new test
that creates a real payload row, runs prune_old_payloads(days=0),
then exercises the actual aged-out branch.
- 119/119 across memory + dashboard + integration + CLI.
- 15/15 across the previously-skipped tests/memory/test_hooks.py
and test_hook_installer.py (pytest-aiohttp + aiohttp now in env).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end test against /home/fazle/trading/v3 with bge-small + 4 real
trading decisions surfaced three real bugs the unit tests didn't catch.
mcp_server.py — RRF dedup widened
- _content_key strips both the [tag] prefix *and* the trailing
" · 5m ago · → session_timeline(...)" affordance, so the same
decision rendered through different paths (memory.db with hints
+ JSON history without) collapses to one entry instead of
appearing twice. Previously the unit dedup test passed because
both inputs had the same suffix; the live test caught the gap.
- When two paths produce the same content key, the richer-rendered
form (with the affordance hints) wins as the visible label.
mcp_server.py — TL;DR rewritten as bullets
- Was: extractive_summary returned a space-joined paragraph because
the underlying joiner is " ". With short matches and no
sentence-ending punctuation, the result was a wall of text.
- Now: embed each match, score by cosine to the centroid, render
the top-3 most central matches as bullet points. Same algorithm,
readable output.
mcp_server.py — JSON-cosine embeds clean content
- Was: embed_query("[decision] Roll positions at expiry-2 — Avoids
assignment risk on Friday") — the [tag] prefix is metadata noise
that inflates similarity for unrelated topics.
- Now: embed_query(_content_key(text)) — clean signal, no metadata.
memory/db.py — vec distance threshold + tuning
- _VEC_MAX_DISTANCE = 0.92 (was 1.0). bge-small's noise floor on
short English is cosine ≈ 0.50 — random off-topic queries
against the trading corpus hit 0.535. So 0.58 (L2 ≤ 0.92) is the
threshold that keeps real paraphrase matches and rejects noise.
- search_*_vec accept a max_distance kwarg so deterministic test
embedders (which don't satisfy bge-small-tuned thresholds) can
pass max_distance=99.0.
mcp_server.py — _SESSION_RECALL_MIN_SIM 0.35 → 0.55
- Aligned with the same noise-floor reasoning. 0.55 catches the
real "risk management" → "Risk limit at 2%" paraphrase (0.638)
while rejecting "how is the weather today" against trading.
Live test results (against /home/fazle/trading/v3 with 4 decisions):
· "risk management" → 2 relevant matches (Risk limit,
Roll positions w/ "assignment risk")
· "machine learning model" → 2 matches (XGBoost #1 + Risk limit)
· "sqlite" → 2 matches (SQLite #1)
· "how is the weather today" → 0 matches (clean rejection)
Persistence verified: fresh MCP restart recalls decisions from the
prior session via the existing memory.db, with the original session_id
preserved in the affordance hint.
Tests: 134/134 across memory + dashboard + integration + CLI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three independent quality-of-life wins that surfaced from the live test
+ post-mortem review.
memory/hook_installer.py — shell-quote the hook command path
- Was: f"{HOOK_PATH} {hook_name}" — Claude Code passes this through
sh -c, which tokenises on whitespace. Any user with a space in
HOME (very common on macOS: "/Users/Firstname Lastname") would
get a shell error and silent-broken capture.
- Now: shlex.quote(str(HOOK_PATH)) — handles spaces and any other
shell metacharacters. New regression test creates a path
containing "Alice Smith" and asserts the quoting.
cli.py — cce init memory-capture reachability probe
- After install_settings() wires the hooks, _check_memory_capture_reachable
looks for the storage_base/serve.port file and tries a TCP connect
on 127.0.0.1:<port>. Three states reported:
a) port file missing → "cce serve hasn't been started" with
clear next-steps text (warns it's silently dropped otherwise)
b) port file present but nothing listening → "stale" warning
c) reachable → ✓ confirmation
- Hooks fail closed (curl ... || true), so this is the only place
the user gets told that capture isn't actually working before
they restart Claude Code expecting it to.
- 4 unit tests cover all four code paths (missing, stale, live,
unparsable port file).
scripts/bench_recall.py — recall quality benchmark
- Seeds a tmp memory.db with 46 known decisions across 7 topics
(auth, db, ml, infra, perf, testing, frontend) and runs 12 known
queries — including 2 deliberately off-topic ("how is the weather
today", "best ice cream flavour") to measure rejection.
- Reports recall@k, precision@k, MRR per query + aggregate.
- --min-sim and --vec-max flags let the caller sweep thresholds
without code changes. Run it whenever bge-small swaps, the
corpus shape changes, or the recall pipeline is touched.
- Current defaults (min_sim=0.55, vec_max=0.92): R@5=0.75 P@5=0.40
MRR=0.67 over the 12-query suite. Tightening to 0.60/0.85 lifts
precision slightly with no recall loss; looser values lose both.
- Documents an FTS stop-word leakage in the corpus ("how is the
weather today" returns 7 because FTS5 OR-matches on "is/the/today"
substrings) — left as a follow-up since the fix is non-trivial.
Tests: 139/139 across memory + dashboard + integration + CLI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four independent fixes that surfaced from the live test + post-mortem.
All measured against the recall benchmark; aggregate metrics improved
(R=0.68→0.75, P=0.39→0.46, MRR=0.68→0.74) — see scripts/bench_recall.py.
mcp_server.py — FTS5 stop-word filter
- _FTS_STOP_WORDS: conservative function-word list (~150 entries:
articles, auxiliaries, pronouns, prepositions, conjunctions,
interrogatives). Topic words (code, auth, database, improve,
scale) are deliberately NOT in the list.
- _strip_stop_words used in three places:
a) _fts_match_query — FTS5 OR-match no longer hits "is/the/today"
b) _rank_json_candidates topic embed — sharper topic vector on
conversational queries
c) search_decisions_vec / search_turn_summaries_vec topic embed
- "how is the weather today" now returns 0 false positives (was 7).
"how can we improve code quality" still returns 39 because
bge-small finds half an engineering corpus semantically related
to that phrase — a model limit, not a filter limit. R@5 unchanged
on real queries; off-topic queries reject cleanly.
hook_server.py — rendezvous port file at default location
- Authoritative port file still lives at <storage_base>/serve.port.
- When storage_path is customised in config.yaml (anything other
than ~/.cce/projects), also write the port to the *default*
rendezvous location ~/.cce/projects/<name>/serve.port. The hook
shell script always reads from the default location because it
has no way to read config.yaml — this is what keeps capture
wired up for users with custom storage paths.
cli.py — _check_memory_capture_reachable also checks rendezvous
- Falls back to the default-path rendezvous when the storage-local
file isn't there. So `cce init` reports "active" correctly even
for users with custom storage who already have `cce serve`
running.
cli.py — auto-prune background task in `cce serve`
- _auto_prune_loop runs prune_old_payloads(days=30) once daily on
a daemon-thread executor. Staggered start (120s) so it doesn't
compete with vec-backfill / compression-loop on cold-start.
Cancellation wired into the existing serve cleanup. Users who
never invoke `cce sessions prune` manually now get bounded
memory.db growth automatically.
hook_installer.py — Windows .cmd hook script
- HOOK_SCRIPT_NAME is now platform-aware: cce_hook.cmd on Windows,
cce_hook.sh elsewhere. New _hook_script_body() returns the
matching body. The .cmd version uses cmd.exe syntax, looks up
the port via %USERPROFILE%\\.cce\\projects\\<name>\\serve.port,
and exits 0 on every error path so capture never blocks the
user. HOOK_MARKER widened from "cce_hook.sh" to "cce_hook" so
the uninstall path matches both extensions.
- +2 tests covering both platform branches via monkeypatch.
bench_recall.py — conversational-query test cases
- Added "how can we improve code quality" and "what should we do
about database performance" so future stop-word tweaks can be
measured against real conversational phrasing instead of just
keyword queries.
Tests: 141/141 across memory + dashboard + integration + CLI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "Warm fastembed model cache" step pre-downloads bge-small to avoid the xdist worker race on first model load. But four tests in the suite also use `all-MiniLM-L6-v2` (test_embedder.py, test_retriever.py, test_embedding_cache.py × 4 cases) — those races weren't covered, so under sufficient timing pressure CI sees: ONNXRuntimeError NO_SUCHFILE — Load model from /tmp/fastembed_cache/ models--qdrant--all-MiniLM-L6-v2-onnx/.../model.onnx failed. File doesn't exist It manifests intermittently (the run on PR #7 hit it; main was passing). Cure is the same as for bge-small: warm the model in a single process before pytest spawns the four xdist workers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…isions"
The headline problem the memory feature is supposed to solve: you sit
down today, Claude Code opens, and you DON'T have to re-explain what
you decided last week.
Until this commit the data was captured + queryable but not
*surfaced*. The SessionStart hook only inserted the new sessions row;
prior decisions stayed behind a session_recall tool call the agent had
to remember to make. CLAUDE.md said it should, but instruction-following
isn't a guarantee.
memory/hooks.py — handle_session_start now returns plain text
- New build_session_resume(conn, project) builds a markdown block:
## CCE memory · resuming <project>
**Previous session** (<ended_at>):
<rollup_summary lines>
**Recent decisions** (most-recent first):
- <decision> — <reason> (session: `<sid>`)
... (top 5)
Call session_recall("<topic>") for more, or
session_timeline("<sid>") to drill in.
- Empty string for a brand-new project (no awkward header on the
first session).
- Reasons truncated at 200 chars so a single rambling decision can't
blow up the resume.
- Wrapped in try/except so a bad query never breaks SessionStart.
memory/hook_installer.py — shell script captures SessionStart stdout
- POSIX: branches on $HOOK_NAME. SessionStart's curl response is
captured into RESPONSE and printed to stdout. Other hooks keep
their stdout/stderr discarded as before. Timeout bumped to 2s for
SessionStart since the resume query is a hair slower than a write.
- Windows .cmd: same shape using a temp file (cmd can't easily
capture command output to a variable across multiple lines).
Cleans the temp file afterwards.
Why this fixes the problem
Claude Code captures the SessionStart hook's stdout and injects it
into the model's context at the start of the conversation. So at
prompt 0 the model already sees the prior rollup + recent decisions
— no tool call required, no "what did we decide about auth last
week?" round-trip with the user.
Live-tested against /home/fazle/trading/v3: 4 trading decisions
recorded yesterday surface verbatim in the SessionStart response
body, ready to be injected.
Tests: +1 (test_session_start_returns_resume_with_prior_rollup_and_decisions)
covering the rollup + decisions path. 142/142 across memory + dashboard
+ integration + CLI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without this widening, the resume context that handle_session_start
returns is only injected on cold-start. After the user runs `/clear`
(wipes context) or `/compact` (trims context), Claude Code re-issues a
SessionStart event — but the matcher="" we had before only matched the
default (startup) variant on some Claude Code versions, leaving us
silent exactly when re-injection matters most.
hook_installer.py
- New HOOK_MATCHERS dict — per-hook matcher overrides keyed by name.
SessionStart gets "startup|clear|compact"; everything else stays "".
- install_settings() reads from this dict instead of hardcoding "".
Trade-off
Matcher is now a regex-like alternation, but Claude Code accepts
that natively (see claude-mem's hooks.json which uses the same
three triggers). On a Claude Code version that doesn't recognise
one of the subtypes, the hook just won't fire for that subtype —
the others still work, so no behavioural regression.
Tests
+1 (test_session_start_matcher_covers_clear_and_compact) asserts
all three triggers are in the SessionStart matcher and the other
four hooks keep their empty matcher. 143/143 across the suite.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 28, 2026
Pre-merge code-audit found 4 actionable items. Fixed; CI to confirm.
hook_installer.py — Windows hook command quoting (production bug)
- Was: shlex.quote on every platform, including Windows. shlex emits
POSIX single quotes — cmd.exe doesn't dequote those, so any Windows
user with a space in %USERPROFILE% (e.g. C:\\Users\\Alice Smith\\)
got a silently-broken hook command. Hooks fail closed; no error
surfaces to the user.
- Now: new _quote_hook_path branches on _is_windows() and emits
cmd.exe double quotes on Windows, sh single quotes on POSIX.
- +1 test (test_install_settings_uses_cmd_quoting_on_windows) using
monkeypatch to force the Windows branch.
cli.py — CLAUDE.md template now documents session_timeline / session_event
- The SessionStart resume body's affordance line literally tells the
model to "Call session_timeline(\"<sid>\")", but CLAUDE.md (which
primes the agent) didn't document either tool. Bumped block version
to "3" so cce init re-renders. New "Drilling deeper from a recall
hit" subsection covers both tools and when to prefer each.
memory/db.py — auto_prune_loop extracted to module level (testable)
- Was: inline closure inside cli._run_serve. Untestable without
spinning up the whole MCP server.
- Now: auto_prune_loop(storage_base, days, initial_delay, interval,
stop_event) at module level. Test suite injects 0.0/0.05 timing
instead of waiting 120s + 86400s. Same behaviour; same defaults.
- +2 tests:
test_auto_prune_loop_runs_one_iteration — old payload gets
pruned on the first pass.
test_auto_prune_loop_stop_event_short_circuits_initial_delay
— clean exit during the stagger.
tests/memory/test_hooks.py — build_session_resume edge cases
- Pre-merge audit found only the "rollup + decisions" branch was
tested; the two more-common branches (decisions only, rollup only)
weren't. Added both as separate tests so a regression that breaks
the week-1 path (decisions only, no completed-session rollup yet)
can't pass CI.
tests/test_cli_init_probe.py — rendezvous-fallback path
- Pre-merge audit found that 8ed9f70's "fall back to default-path
rendezvous when storage-local is missing" had no test covering
that branch. Added one — monkeypatches Path.home() and verifies
the probe finds the listening port via the rendezvous file alone.
Tests: 149/149 across memory + dashboard + integration + CLI.
Deferred to follow-ups (not merge-blockers per the audit):
· _drain_one_threaded._embedder closure — only matters if two
compression_loop instances run concurrently, which never happens
in production today.
· Rendezvous-write-failure escalation to status — quiet edge case.
· Windows .cmd %TEMP% with parentheses — already partly mitigated
by setlocal enabledelayedexpansion; revisit if a real Windows
user reports it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fazleelahhee
added a commit
that referenced
this pull request
Apr 28, 2026
Reconciles two independent callbacks for showing motion during long
indexing runs on large repos. They serve different timescales and are
complementary, so this commit keeps both rather than picking one.
pipeline.run_indexing now accepts:
· embed_progress_fn(current, total) — per-batch numeric ticks during
the embed phase. Already wired through cli.py to a live progress bar.
· phase_fn(msg) — string status before each major phase
("Embedding 32k chunks (CPU-bound, can take several minutes)…",
"Writing 32k chunks to vector + FTS + graph index…").
Closes the in-place chunking bar first so the message doesn't get
overwritten via \\r.
cli.py defines both callbacks; non-verbose TTY runs see a chunking bar
→ phase line → embed bar → phase line → embed bar's final tick.
embedder.py keeps the canonical (current, total) numeric API. The WIP's
alternate string-message API has been dropped — cli.py's bar already
delivers the "still alive" intent through chunks/N motion.
tests/indexer/test_pipeline_phase_progress.py
· test_phase_fn_announces_embedding_and_ingest — pins that
"Embedding…" and "Writing…" phase markers fire from run_indexing
so a 10-30 min embed phase on a 7035-file repo doesn't look hung.
· test_embedder_calls_progress_fn_during_inference — rewritten to
use the canonical numeric callback; asserts the final tick reports
full chunk count and embeddings actually attached.
Resolution context: PR #7 (memory feature) merged into main while these
indexer-progress changes were stashed. Both branches independently added
new keyword args to run_indexing's signature, creating a parameter-list
conflict that resolved cleanly by keeping both.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Consolidates the previously-stacked PRs #2–#6 into a single PR against
main, with Copilot's review feedback addressed in fixup commit353dc27.Adds full conversation memory to cce: per-project SQLite store, lifecycle capture hooks, background extractive compression, FTS5-prefiltered recall, and a dashboard view — feature parity with
claude-mem, no extra runtime deps.994edb2memory.dbschema (sessions, prompts, tool_events, turn_summaries, decisions, code_areas, pending_compressions, FTS5 + triggers) +cce sessions migrate(idempotent JSON-to-SQLite importer)eb0dacccce serve82ce5b1pending_compressionsusingbge-small(already loaded for the index — no Ollama, no extra model)255eb53session_recall(topic)extended; newsession_timeline(session_id)andsession_event(event_id)MCP tools4a7370f353dc27Design + brainstorm decisions:
docs/specs/2026-04-28-memory-claude-mem-parity-design.md.Copilot review items addressed
migrate.py— archive-before-mark ordering with rollback (no more imported-but-unarchived files); preservesession_idlinkage fromdecisions_log.json;timestamp is not Noneso legacy0timestamps keep ordering;_session_existsmemoised per file.compressor.py— drop unusedAny; capraw_inputat 4 KB beforejson.loadsso huge patches don't stall the worker; yield the asyncio loop after every drained item with a 50 ms breath every 5 items so a backlog doesn't monopolisemcp.run_stdio().mcp_server.py— replace LIMIT-200 sweeps with FTS5MATCHprefilter ondecisions_fts/turn_summaries_fts(LIKE oncode_areas, which has no FTS); clamplimitinsession_timelineto 1..200; wrap sessions metadata + event payload queries in try/except; NULL-saferaw_input/raw_outputinsession_event.design spec — replaced "Eight tables" with the actual v1 description (support tables + FTS5 + triggers).
Deferred (nice-to-have, not bugs): three robustness/perf items already covered conceptually elsewhere — happy to address in a follow-up if reviewers want them in.
Test plan
pytest tests/memory tests/dashboard/test_memory_endpoints.py— 43/43 passpytest tests/integration— 30/30 pass (mcp_server.pyshared paths)test_ollama_clienthittinghttpx.ReadTimeoutbecause no Ollama is running locally; unrelated to this change)cce sessions migrate --helpregisterscce servefor a few prompts; verify timeline + decisions search renderStats
35 files changed · +4,089 / −111 LOC
Closes
This supersedes the 5-PR stack — closing #2, #3, #4, #5, #6 with pointers to this PR.
🤖 Generated with Claude Code