Skip to content

feat(memory): claude-mem feature parity — 5-step series consolidated#7

Merged
fazleelahhee merged 21 commits into
mainfrom
ai-memory
Apr 28, 2026
Merged

feat(memory): claude-mem feature parity — 5-step series consolidated#7
fazleelahhee merged 21 commits into
mainfrom
ai-memory

Conversation

@fazleelahhee
Copy link
Copy Markdown
Contributor

Summary

Consolidates the previously-stacked PRs #2#6 into a single PR against main, with Copilot's review feedback addressed in fixup commit 353dc27.

Adds full conversation memory to cce: per-project SQLite store, lifecycle capture hooks, background extractive compression, FTS5-prefiltered recall, and a dashboard view — feature parity with claude-mem, no extra runtime deps.

Step Commit Scope
1. Foundation 994edb2 memory.db schema (sessions, prompts, tool_events, turn_summaries, decisions, code_areas, pending_compressions, FTS5 + triggers) + cce sessions migrate (idempotent JSON-to-SQLite importer)
2. Capture eb0dacc 5 lifecycle hooks (UserPromptSubmit, PreToolUse, PostToolUse, Stop, SessionEnd) + loopback HTTP server in cce serve
3. Compress 82ce5b1 Extractive worker drains pending_compressions using bge-small (already loaded for the index — no Ollama, no extra model)
4. Recall 255eb53 session_recall(topic) extended; new session_timeline(session_id) and session_event(event_id) MCP tools
5. Dashboard 4a7370f Sessions list + timeline + decisions search panels
Copilot fixes 353dc27 11 of 14 review items applied — see below

Design + brainstorm decisions: docs/specs/2026-04-28-memory-claude-mem-parity-design.md.

Copilot review items addressed

migrate.py — archive-before-mark ordering with rollback (no more imported-but-unarchived files); preserve session_id linkage from decisions_log.json; timestamp is not None so legacy 0 timestamps keep ordering; _session_exists memoised per file.

compressor.py — drop unused Any; cap raw_input at 4 KB before json.loads so huge patches don't stall the worker; yield the asyncio loop after every drained item with a 50 ms breath every 5 items so a backlog doesn't monopolise mcp.run_stdio().

mcp_server.py — replace LIMIT-200 sweeps with FTS5 MATCH prefilter on decisions_fts / turn_summaries_fts (LIKE on code_areas, which has no FTS); clamp limit in session_timeline to 1..200; wrap sessions metadata + event payload queries in try/except; NULL-safe raw_input/raw_output in session_event.

design spec — replaced "Eight tables" with the actual v1 description (support tables + FTS5 + triggers).

Deferred (nice-to-have, not bugs): three robustness/perf items already covered conceptually elsewhere — happy to address in a follow-up if reviewers want them in.

Test plan

  • pytest tests/memory tests/dashboard/test_memory_endpoints.py — 43/43 pass
  • pytest tests/integration — 30/30 pass (mcp_server.py shared paths)
  • Full suite — 344/345 pass (the one failure is test_ollama_client hitting httpx.ReadTimeout because no Ollama is running locally; unrelated to this change)
  • Smoke: cce sessions migrate --help registers
  • Smoke: open dashboard against a project that has run cce serve for a few prompts; verify timeline + decisions search render

Stats

35 files changed · +4,089 / −111 LOC

Closes

This supersedes the 5-PR stack — closing #2, #3, #4, #5, #6 with pointers to this PR.

🤖 Generated with Claude Code

fazleelahhee and others added 7 commits April 28, 2026 00:22
Captures the 6 design decisions from brainstorming: auto+explicit
capture with source column, background compression worker in cce
serve, per-turn + per-session rollup granularity, extended
session_recall + new session_timeline/session_event MCP tools,
one-shot cce sessions migrate command, three dashboard panels.

Compressor is extractive using BAAI/bge-small-en-v1.5 already loaded
for the index — no new dependencies, no extra RAM, no Ollama
required.

Ships as 5 sequential PRs off this branch; each independently
reviewable.
PR 1 of 5 against feature/memory-claude-mem-parity. Lays the storage
foundation: per-project SQLite at ~/.cce/projects/<name>/memory.db
with the v1 schema (sessions, prompts, tool_events, tool_event_payloads,
turn_summaries, decisions, code_areas, pending_compressions,
migrated_files, schema_versions) plus FTS5 virtual tables and triggers
for prompts / decisions / turn_summaries.

Adds `cce sessions migrate` — idempotent one-shot importer for legacy
per-session JSON files (current path and pre-rebrand
~/.claude-context-engine/...). Imported decisions and code areas are
tagged source='migrated' so future session_recall can rank them.
Consumed JSONs are archived into sessions/migrated.zip and removed.

No behaviour change to the existing JSON capture path. Hooks and the
compression worker land in PR 2 and PR 3.

10 new unit tests cover: schema bootstrap, idempotent reconnection,
foreign-key enforcement, FTS triggers, migration of session JSON +
decisions_log archive, idempotent rerun, archive-and-remove. Full
suite: 311 passed, 1 skipped.

See docs/specs/2026-04-28-memory-claude-mem-parity-design.md.
PR 2 of 5 against feature/memory-claude-mem-parity, stacked on PR 1.

Adds the auto-capture pipeline:

- src/context_engine/memory/hooks.py — aiohttp handlers for
  /hooks/SessionStart, /hooks/UserPromptSubmit, /hooks/PostToolUse,
  /hooks/Stop, /hooks/SessionEnd. Each writes the appropriate row(s)
  to memory.db. Compression for the just-ended turn is enqueued in
  pending_compressions (UNIQUE constraint dedupes Stop + next-prompt
  double-fire). All errors are logged and return 202 — capture is
  best-effort and must never block the user's flow.

- src/context_engine/memory/hook_server.py — loopback aiohttp listener
  on a random free port, started as a background asyncio task from
  cce serve's _run_serve. Port written to <storage_base>/serve.port.
  Cleanly shut down on MCP server exit.

- src/context_engine/memory/hook_installer.py — installs
  ~/.cce/hooks/cce_hook.sh (the thin shell shim that POSTs hook
  payloads to the local port) and wires .claude/settings.json with
  all 5 lifecycle entries. Idempotent. Preserves user-added hooks.
  Uninstall removes only entries whose command points at our script.

- cce init wires the installer in step 5; _run_serve spawns the
  hook server alongside the MCP stdio loop and watcher.

Tests (15 new, full suite 326 passed):
- tests/memory/test_hooks.py: integration tests via aiohttp_client
  for all 5 endpoints, including dedup, prompt-number assignment,
  payload sidecar table, and 400 on missing session_id.
- tests/memory/test_hook_installer.py: script write + chmod,
  idempotent reinstall, settings.json merge preserving user hooks,
  uninstall removing only ours.

pyproject.toml: pytest-aiohttp added to dev/dependency-groups.

No change to recall surface — session_recall still reads JSON.
PR 4 retires the JSON path and points recall at memory.db.
PR 3 of 5 against feature/memory-claude-mem-parity, stacked on PR 2.

Adds the background compression half:

- src/context_engine/memory/extractive.py — sentence splitter +
  centroid-based extractive summariser. No new dependencies; takes any
  embedder exposing embed_query(str) -> Iterable[float]. Pure source
  text; no synthesis means no hallucination.

- src/context_engine/memory/compressor.py — compress_turn() and
  compress_session_rollup() build the candidate text from prompts +
  tool_events (+payloads), run the extractive summariser using the
  bge-small embedder already loaded for the index, and persist to
  turn_summaries / sessions.rollup_summary with tier='extractive'.
  Falls back to truncation when embedder is None or extractive raises.

  compression_loop() drains pending_compressions every 5 s, oldest
  first, single-flight by design. Queue rows that error are kept with
  attempts++ and last_error stamped for retry.

- _run_serve in cli.py spawns the compression worker alongside the
  hook server. Owns its own sqlite connection to avoid cross-thread
  use; gracefully cancelled and closed on MCP server exit.

Tests (14 new, full suite 340 passed):
- tests/memory/test_extractive.py: sentence split, centroid neighbour
  selection, source-order preservation, truncation fallback shape.
- tests/memory/test_compressor.py: turn compression with extractive +
  truncation tiers, session rollup combining turn summaries, empty
  rollup when no turns, _drain_one queue pop, compression_loop with
  stop_event for graceful shutdown.

Recall surface unchanged from main; PR 4 retires the JSON path.
… session_event

PR 4 of 5 against feature/memory-claude-mem-parity, stacked on PR 3.

Wires the MCP retrieval surface to memory.db while preserving the
existing JSON path for backward compatibility.

- ContextEngineMCP opens memory.db on startup and seeds an
  INSERT OR IGNORE sessions row so manual record_decision /
  record_code_area dual-writes don't fail the FK constraint when the
  SessionStart hook hasn't fired yet (test envs, future-non-CC clients).

- session_recall now folds three new candidate sources on top of the
  existing JSON sessions / consolidated decisions:
    - decisions       (manual + migrated, last 200 by recency)
    - code_areas      (manual + migrated, last 200 by recency)
    - turn_summaries  (last 200 turns, the layer-1 compact index)
  Tags include source and session_id so the agent can drill via the
  new tools.

- session_timeline(session_id, limit=20) — layer 2. Returns the
  session's turn_summaries with rollup + status header.

- session_event(event_id) — layer 3. Returns the raw input/output
  payload for one tool_event, with a dedicated "aged out" message
  when the payload row was pruned by retention.

- record_decision and record_code_area now dual-write to memory.db
  with source='manual'. Prior JSON write path remains active so a
  rollback to a previous PR doesn't lose recall coverage. The JSON
  write side can be retired once parity is confirmed in production.

Tests (10 new, full suite 350 passed):
- tests/memory/test_mcp_recall.py covers dual-write of decisions and
  code_areas, session_timeline with seeded summaries, session_event
  payload roundtrip + aged-out message + invalid id, session_recall
  surfacing memory.db decisions, and TOOL_NAMES registration.
PR 5 of 5 against feature/memory-claude-mem-parity, stacked on PR 4.

Adds three new memory.db-backed views to the existing CCE dashboard.
The dashboard never holds a memory.db handle — each request opens a
short-lived connection so the MCP server's writes are never blocked.

Backend (src/context_engine/dashboard/server.py):

- GET /api/memory/sessions[?limit] — list sessions, most-recent first.
- GET /api/memory/sessions/{id}/timeline — one session's metadata +
  ordered turn_summaries.
- GET /api/memory/decisions[?q&source] — FTS5 search over decisions
  with optional source facet (manual|auto|migrated). User input is
  phrase-quoted so FTS5 metacharacters like '-' are treated literally
  ('bge-small' matches its own substring rather than parsing as
  'bge AND NOT small'). On any FTS5 syntax fall-through, fall back to
  unranked recent listing — no 500s.

Frontend (src/context_engine/dashboard/_page.py):

- New 'Memory' nav entry between Sessions and Analytics.
- New page-memory section with a Sessions / Decisions tab toggle.
- Sessions tab shows a list; clicking a row opens a turn-by-turn
  timeline panel below with rollup at the top.
- Decisions tab is a search box + source select, results in a table
  tagged by source.

Tests (9 new in tests/dashboard/test_memory_endpoints.py, full suite
359 passed, 1 skipped):

- Sessions list ordering, empty case when memory.db absent.
- Timeline returns header + turns for known session, null for unknown.
- Decisions FTS5 with and without query, source facet, combined
  filters, and the phrase-quoting fix for hyphenated input.

Closes the 5-PR series. Spec at
docs/specs/2026-04-28-memory-claude-mem-parity-design.md.
Consolidates the 11 actionable items from the 5-PR memory stack into one
fixup commit on the unified ai-memory branch.

migrate.py
  - Archive *before* mark-imported + commit, with rollback on zip failure
    so a failed archive no longer leaves files stuck imported-but-not-archived.
  - Preserve session_id linkage from decisions_log.json when the referenced
    session already exists (was unconditionally NULL).
  - Memoise _session_exists per-file (constant per archive entry).
  - Use `timestamp is not None` so legacy 0/0.0 timestamps keep their original
    ordering instead of being stamped to "now".

compressor.py
  - Drop unused `Any` import.
  - Cap raw_input at _TOOL_INPUT_CHAR_CAP (4 KB) before json.loads so multi-MB
    patch payloads don't stall the compression worker.
  - Yield the asyncio loop after every drained item, with a short breath every
    5 items, so a backlog doesn't monopolise mcp.run_stdio().

mcp_server.py
  - Use FTS5 MATCH on decisions_fts / turn_summaries_fts (and a LIKE filter on
    code_areas) to prefilter recall candidates instead of embedding the latest
    600 rows on every session_recall call.
  - Clamp `limit` in session_timeline to 1..200 with a helper that handles bad
    input cleanly.
  - Wrap the sessions metadata query and the session_event payload query in
    try/except so DB errors return a specific message.
  - Handle NULL raw_input/raw_output in session_event without rendering the
    string "None".
  - Update the dual-write comment to reflect that recall now goes through FTS5.

design spec
  - Drop the inaccurate "Eight tables" sentence; describe the support and
    FTS5 tables that ship in v1 instead.

Tests: 43/43 memory + dashboard pass; 344/345 across the full suite (the
single failure is test_ollama_client hitting httpx.ReadTimeout because no
Ollama is running in this environment — unrelated).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a vec0 layer over the existing FTS5 store so session_recall can
match paraphrases and synonyms, not just lexical overlap. sqlite-vec
is already a hard dep (used by storage/vector_store.py for code chunks)
and bge-small is already loaded for the index, so no new model and no
new runtime cost beyond the embed_query() per write/query.

memory/db.py
  - CURRENT_VERSION → 2; bootstrap creates `decisions_vec` and
    `turn_summaries_vec` (vec0 float[384]) alongside the v1 tables.
  - `connect()` loads sqlite-vec via `enable_load_extension`; if the
    load fails the db still opens but vec tables are skipped (FTS-only
    fallback) and helpers no-op.
  - In-place v1 → v2 upgrade adds the empty vec tables; existing rows
    are populated by `backfill_vec_tables(conn, embedder)` on the next
    MCP-server start.
  - New helpers: `record_decision_vec`, `record_turn_summary_vec`,
    `search_decisions_vec`, `search_turn_summaries_vec`,
    `backfill_vec_tables`, `has_vec_tables`. `_write_vec_row` swallows
    dim mismatches so a swapped embedder doesn't break source-table
    inserts — the row simply isn't semantically searchable until vec
    tables are rebuilt.
  - We don't add vec tables for prompts (raw user text is rarely the
    right semantic anchor) or code_areas (file-path keyed; LIKE is
    enough).

memory/compressor.py
  - `compress_turn` writes the new turn summary's embedding to
    turn_summaries_vec right after persisting the row.

integration/mcp_server.py
  - At MCP-server startup, run `backfill_vec_tables` so projects that
    ran on v1 pick up semantic recall on next launch.
  - `record_decision` dual-write now also writes to decisions_vec.
  - `_search_sessions` is now hybrid: it unions FTS5 hits and vec hits
    by row id (no double-formatting), then runs the existing cosine
    rank over the merged candidate pool. Empty vec hits (extension
    missing or tables empty) leave it FTS-only — no behaviour change
    in the degraded path.

tests/memory/test_db.py
  - Five new tests cover v2 bootstrap, decision/turn vec write+search,
    backfill on a v1-shaped db, and the v1→v2 upgrade-in-place path.
  - `_FakeEmbedder` produces 384-dim deterministic vectors so tests
    don't pay fastembed init cost.

Tests: 78/78 memory + dashboard + integration. The single Ollama timeout
in the full suite is unrelated and predates this commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@fazleelahhee fazleelahhee self-assigned this Apr 28, 2026
fazleelahhee and others added 7 commits April 28, 2026 00:30
Substantial follow-up to the sqlite-vec layer. Five concrete improvements,
each addressing a real issue from the deep review:

session_recall now returns a TL;DR header
  Top extractive sentences (3) pulled from the top-N matches via the same
  bge-small extractive worker the compressor uses — no LLM call, no
  hallucination, ~50 ms on the asyncio thread. Header is suppressed when
  there are <3 matches (would just echo them).

  Format:
    TL;DR (N matches for 'topic'):
      <2-3 extracted sentences>

    Source matches:
      - [decision src=...|sid:...] ...
      - [turn sid:...|n:...] ...

  Provenance tags survive so callers can drill via session_event /
  session_timeline; tag prefix is stripped before summarisation so the
  summariser sees content, not metadata.

Hybrid recall via reciprocal rank fusion
  The previous "embed every candidate" pipeline is gone. Each source
  produces its own ranked list (FTS5 decisions, FTS5 turns, vec
  decisions, vec turns, JSON-cosine, code_areas LIKE), then `_rrf_merge`
  fuses them via 1/(60+rank). Items found by multiple sources rise.
  Vec hits no longer get re-embedded — sqlite-vec's rank is preserved.
  RRF k=60 is the canonical Cormack/Clarke/Buettcher 2009 value.

Backfill on a daemon thread
  `_spawn_vec_backfill` opens its own connection (sqlite3 enforces
  check_same_thread) and embeds historical decisions/turns out of band,
  so MCP startup no longer stalls on a many-second embed-everything
  sweep for projects that ran on v1.

Cleanup triggers for orphaned vec rows
  decisions_vec_ad / turn_summaries_vec_ad fire AFTER DELETE on the
  source tables and drop the matching vec rowid. Without these, FK
  cascades / explicit deletes leaked rows in the vec tables. Triggers
  are added on bootstrap and on v1→v2 upgrade.

Refactor: _search_sessions split into focused methods
  _collect_json_candidates / _rank_json_candidates /
  _collect_memory_db_candidates / _format_decisions_in_id_order /
  _format_turns_in_id_order. Each is independently testable; the top-
  level method is now a 20-line composition.

Tests
  +7 new tests (RRF behaviour, tag stripping, TL;DR present/absent,
  vec-on-source-delete cleanup for both decisions and turn_summaries).
  85/85 memory + dashboard + integration pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small, independent fixes that shave the per-call token budget the
agent pays for memory.

mcp_server.py
  - _rrf_merge now dedupes by stripped content, so the same decision
    showing up as both [decision src=manual] and [decision src=migrated]
    during the JSON↔memory.db dual-write window collapses into one
    boosted entry instead of inflating recall by ~10–20%.
  - _format_decisions_in_id_order / _format_turns_in_id_order append
    a relative-time hint ("3d ago") and a callable drill affordance
    ("→ session_timeline(\"<sid>\")" / "→ session_event(id=<n>)").
    Saves the agent a follow-up call most of the time and gives the
    model a temporal signal it previously had to infer.
  - session_event applies a read-time cap (_EVENT_PAYLOAD_READ_CAP=4 KB)
    via the new _truncate_payload helper. Inputs already had a 4 KB
    write cap; outputs were stored uncapped, so a captured 50 KB Bash
    stdout previously re-fed ~12 k tokens on every fetch.
  - New helper: _humanise_relative_time (just-now / 5m / 3h / 4d /
    2mo / 1y), defensive about None and bad input.

Tests
  - +5 tests covering RRF dedup, recency formatting, payload
    truncation, and the new recall-line drill affordance.
  - 114/114 memory + dashboard + integration pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each compression_loop iteration now runs the heavy work — embed_query
calls + SQLite INSERT — on a worker thread via asyncio.to_thread, with
a thread-local sqlite3 connection (sqlite3 enforces same-thread).
Previously a 50-turn backlog could freeze mcp.run_stdio() for ~30 s
while the loop drained synchronously. Now the asyncio thread only runs
the queue peek + sleep pacing.

compressor.py
  - _drain_one_sync(conn, embedder) — pure-sync; called from either the
    main thread (tests) or a worker thread (production).
  - _drain_one_threaded(db_path) — opens worker-local conn, calls
    _drain_one_sync, closes. Reads the embedder off the function's
    attribute set by compression_loop, which keeps the to_thread
    closure-free (no risk of capturing the asyncio loop).
  - _drain_one(conn, embedder) — async test shim around _drain_one_sync.
  - compression_loop now takes db_path; passes it through to_thread.
    Still accepts a sqlite3.Connection for back-compat with the
    existing test that drives the loop directly.

cli.py
  - cce serve no longer opens a long-lived compression_conn; passes
    memory_db_path(storage_base) to compression_loop.

Tests: 11/11 compressor + memory_loop pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Caps memory.db growth and finally makes the "raw payload aged out"
branch in session_event reachable.

memory/db.py
  - prune_old_payloads(conn, days=30) — finds payloads referenced
    only by tool_events older than `days`, NULL-out raw_output and
    set raw_input='' (raw_input has NOT NULL on the v1 schema, so
    '' is its aged-out sentinel). size_bytes -> 0. Returns counts.
  - tool_events.summary stays — the gist of an aged event is still
    available via session_timeline / session_recall.

cli.py
  - cce sessions prune now does two jobs:
      1. JSON sessions consolidation (existing)
      2. memory.db raw-payload retention (new, default 30d)
  - --retain-payloads-days flag for the second job.
  - Output is split per-job so it's clear what ran.

mcp_server.py
  - _handle_session_event aged-out check accepts the new sentinel
    (`not row["raw_input"]`) in addition to NULL — previously the
    check was unreachable because nothing wrote NULL.

Tests: 115/115. New test verifies (a) old payloads are aged out,
(b) recent payloads are kept, (c) tool_events.summary is untouched,
(d) the prune is idempotent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Answers "is memory broken?" in one command, without `cce serve`. Replaces
the previous workflow of opening sqlite3 against memory.db.

cli.py
  - New `cce sessions status` group entry. Reports:
      project + storage path
      memory.db path + size in KB
      schema version + sqlite-vec availability
      sessions count by status (active/completed/failed)
      decisions count by source (manual/auto/migrated)
      compressed turn_summaries count
      pending_compressions queue depth + max attempts (CROSS if stuck)
      vec coverage: decisions=N/total, turns=N/total
      retained raw payload count + estimated MB

  - Falls back to a "not initialised" message with a how-to-bootstrap
    hint when memory.db doesn't exist yet.

Tests: 3 new in tests/test_cli_sessions_status.py covering missing-db,
populated-db rendering (schema, counts, queue drained), and stuck-queue
warning surfacing. 118/118 across the memory + dashboard + integration
+ CLI suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three production-affecting defects surfaced during a deep wire-up audit
of the ai-memory branch.

pyproject.toml
  - Promote aiohttp from optional [http] extra to a core dependency.
    The hook server (memory/hook_server.py) and the hook handlers
    (memory/hooks.py) import it unconditionally — the optional gate
    was a footgun that left default installs unable to capture.
  - Keep [http] as an empty back-compat marker so existing extras
    references don't break.

memory/db.py — backfill_vec_tables is now incremental
  - Was: "only run if vec table is empty". Effect: the moment a single
    decision was recorded manually, all subsequently-migrated rows
    were permanently invisible to semantic recall — startup backfill
    skipped because vec was no longer empty.
  - Now: embed any source rows missing from vec, regardless of vec
    population. Idempotent and safe to run on every MCP startup.
    Picks up rows imported by `cce sessions migrate` (which has no
    embedder), rows captured while the vec extension was unavailable,
    and the original v1→v2 upgrade backfill.

integration/mcp_server.py — _handle_session_event triage
  - Three states are now distinguished:
      a) payload_id IS NULL  → "no captured payload — only the descriptor"
      b) payload_id present, raws cleared → "aged out of retention window"
      c) raws populated → normal payload render
  - Previously (a) and (b) collapsed into the "aged out" branch — the
    user would be told their payload was retention-pruned even when
    no payload row had ever existed.

Tests
  - test_session_event_returns_no_payload_message_when_payload_id_null:
    keeps the historical NULL-payload coverage but renames + asserts
    on the correct (non-aged-out) message.
  - test_session_event_returns_aged_out_message_after_prune: new test
    that creates a real payload row, runs prune_old_payloads(days=0),
    then exercises the actual aged-out branch.
  - 119/119 across memory + dashboard + integration + CLI.
  - 15/15 across the previously-skipped tests/memory/test_hooks.py
    and test_hook_installer.py (pytest-aiohttp + aiohttp now in env).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end test against /home/fazle/trading/v3 with bge-small + 4 real
trading decisions surfaced three real bugs the unit tests didn't catch.

mcp_server.py — RRF dedup widened
  - _content_key strips both the [tag] prefix *and* the trailing
    " · 5m ago · → session_timeline(...)" affordance, so the same
    decision rendered through different paths (memory.db with hints
    + JSON history without) collapses to one entry instead of
    appearing twice. Previously the unit dedup test passed because
    both inputs had the same suffix; the live test caught the gap.
  - When two paths produce the same content key, the richer-rendered
    form (with the affordance hints) wins as the visible label.

mcp_server.py — TL;DR rewritten as bullets
  - Was: extractive_summary returned a space-joined paragraph because
    the underlying joiner is " ". With short matches and no
    sentence-ending punctuation, the result was a wall of text.
  - Now: embed each match, score by cosine to the centroid, render
    the top-3 most central matches as bullet points. Same algorithm,
    readable output.

mcp_server.py — JSON-cosine embeds clean content
  - Was: embed_query("[decision] Roll positions at expiry-2 — Avoids
    assignment risk on Friday") — the [tag] prefix is metadata noise
    that inflates similarity for unrelated topics.
  - Now: embed_query(_content_key(text)) — clean signal, no metadata.

memory/db.py — vec distance threshold + tuning
  - _VEC_MAX_DISTANCE = 0.92 (was 1.0). bge-small's noise floor on
    short English is cosine ≈ 0.50 — random off-topic queries
    against the trading corpus hit 0.535. So 0.58 (L2 ≤ 0.92) is the
    threshold that keeps real paraphrase matches and rejects noise.
  - search_*_vec accept a max_distance kwarg so deterministic test
    embedders (which don't satisfy bge-small-tuned thresholds) can
    pass max_distance=99.0.

mcp_server.py — _SESSION_RECALL_MIN_SIM 0.35 → 0.55
  - Aligned with the same noise-floor reasoning. 0.55 catches the
    real "risk management" → "Risk limit at 2%" paraphrase (0.638)
    while rejecting "how is the weather today" against trading.

Live test results (against /home/fazle/trading/v3 with 4 decisions):
  · "risk management"          → 2 relevant matches (Risk limit,
                                  Roll positions w/ "assignment risk")
  · "machine learning model"   → 2 matches (XGBoost #1 + Risk limit)
  · "sqlite"                   → 2 matches (SQLite #1)
  · "how is the weather today" → 0 matches (clean rejection)

Persistence verified: fresh MCP restart recalls decisions from the
prior session via the existing memory.db, with the original session_id
preserved in the affordance hint.

Tests: 134/134 across memory + dashboard + integration + CLI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fazleelahhee and others added 5 commits April 28, 2026 04:30
Three independent quality-of-life wins that surfaced from the live test
+ post-mortem review.

memory/hook_installer.py — shell-quote the hook command path
  - Was: f"{HOOK_PATH} {hook_name}" — Claude Code passes this through
    sh -c, which tokenises on whitespace. Any user with a space in
    HOME (very common on macOS: "/Users/Firstname Lastname") would
    get a shell error and silent-broken capture.
  - Now: shlex.quote(str(HOOK_PATH)) — handles spaces and any other
    shell metacharacters. New regression test creates a path
    containing "Alice Smith" and asserts the quoting.

cli.py — cce init memory-capture reachability probe
  - After install_settings() wires the hooks, _check_memory_capture_reachable
    looks for the storage_base/serve.port file and tries a TCP connect
    on 127.0.0.1:<port>. Three states reported:
      a) port file missing → "cce serve hasn't been started" with
         clear next-steps text (warns it's silently dropped otherwise)
      b) port file present but nothing listening → "stale" warning
      c) reachable → ✓ confirmation
  - Hooks fail closed (curl ... || true), so this is the only place
    the user gets told that capture isn't actually working before
    they restart Claude Code expecting it to.
  - 4 unit tests cover all four code paths (missing, stale, live,
    unparsable port file).

scripts/bench_recall.py — recall quality benchmark
  - Seeds a tmp memory.db with 46 known decisions across 7 topics
    (auth, db, ml, infra, perf, testing, frontend) and runs 12 known
    queries — including 2 deliberately off-topic ("how is the weather
    today", "best ice cream flavour") to measure rejection.
  - Reports recall@k, precision@k, MRR per query + aggregate.
  - --min-sim and --vec-max flags let the caller sweep thresholds
    without code changes. Run it whenever bge-small swaps, the
    corpus shape changes, or the recall pipeline is touched.
  - Current defaults (min_sim=0.55, vec_max=0.92): R@5=0.75 P@5=0.40
    MRR=0.67 over the 12-query suite. Tightening to 0.60/0.85 lifts
    precision slightly with no recall loss; looser values lose both.
  - Documents an FTS stop-word leakage in the corpus ("how is the
    weather today" returns 7 because FTS5 OR-matches on "is/the/today"
    substrings) — left as a follow-up since the fix is non-trivial.

Tests: 139/139 across memory + dashboard + integration + CLI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four independent fixes that surfaced from the live test + post-mortem.
All measured against the recall benchmark; aggregate metrics improved
(R=0.68→0.75, P=0.39→0.46, MRR=0.68→0.74) — see scripts/bench_recall.py.

mcp_server.py — FTS5 stop-word filter
  - _FTS_STOP_WORDS: conservative function-word list (~150 entries:
    articles, auxiliaries, pronouns, prepositions, conjunctions,
    interrogatives). Topic words (code, auth, database, improve,
    scale) are deliberately NOT in the list.
  - _strip_stop_words used in three places:
      a) _fts_match_query — FTS5 OR-match no longer hits "is/the/today"
      b) _rank_json_candidates topic embed — sharper topic vector on
         conversational queries
      c) search_decisions_vec / search_turn_summaries_vec topic embed
  - "how is the weather today" now returns 0 false positives (was 7).
    "how can we improve code quality" still returns 39 because
    bge-small finds half an engineering corpus semantically related
    to that phrase — a model limit, not a filter limit. R@5 unchanged
    on real queries; off-topic queries reject cleanly.

hook_server.py — rendezvous port file at default location
  - Authoritative port file still lives at <storage_base>/serve.port.
  - When storage_path is customised in config.yaml (anything other
    than ~/.cce/projects), also write the port to the *default*
    rendezvous location ~/.cce/projects/<name>/serve.port. The hook
    shell script always reads from the default location because it
    has no way to read config.yaml — this is what keeps capture
    wired up for users with custom storage paths.

cli.py — _check_memory_capture_reachable also checks rendezvous
  - Falls back to the default-path rendezvous when the storage-local
    file isn't there. So `cce init` reports "active" correctly even
    for users with custom storage who already have `cce serve`
    running.

cli.py — auto-prune background task in `cce serve`
  - _auto_prune_loop runs prune_old_payloads(days=30) once daily on
    a daemon-thread executor. Staggered start (120s) so it doesn't
    compete with vec-backfill / compression-loop on cold-start.
    Cancellation wired into the existing serve cleanup. Users who
    never invoke `cce sessions prune` manually now get bounded
    memory.db growth automatically.

hook_installer.py — Windows .cmd hook script
  - HOOK_SCRIPT_NAME is now platform-aware: cce_hook.cmd on Windows,
    cce_hook.sh elsewhere. New _hook_script_body() returns the
    matching body. The .cmd version uses cmd.exe syntax, looks up
    the port via %USERPROFILE%\\.cce\\projects\\<name>\\serve.port,
    and exits 0 on every error path so capture never blocks the
    user. HOOK_MARKER widened from "cce_hook.sh" to "cce_hook" so
    the uninstall path matches both extensions.
  - +2 tests covering both platform branches via monkeypatch.

bench_recall.py — conversational-query test cases
  - Added "how can we improve code quality" and "what should we do
    about database performance" so future stop-word tweaks can be
    measured against real conversational phrasing instead of just
    keyword queries.

Tests: 141/141 across memory + dashboard + integration + CLI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "Warm fastembed model cache" step pre-downloads bge-small to avoid
the xdist worker race on first model load. But four tests in the suite
also use `all-MiniLM-L6-v2` (test_embedder.py, test_retriever.py,
test_embedding_cache.py × 4 cases) — those races weren't covered, so
under sufficient timing pressure CI sees:

  ONNXRuntimeError NO_SUCHFILE — Load model from /tmp/fastembed_cache/
  models--qdrant--all-MiniLM-L6-v2-onnx/.../model.onnx failed.
  File doesn't exist

It manifests intermittently (the run on PR #7 hit it; main was passing).
Cure is the same as for bge-small: warm the model in a single process
before pytest spawns the four xdist workers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…isions"

The headline problem the memory feature is supposed to solve: you sit
down today, Claude Code opens, and you DON'T have to re-explain what
you decided last week.

Until this commit the data was captured + queryable but not
*surfaced*. The SessionStart hook only inserted the new sessions row;
prior decisions stayed behind a session_recall tool call the agent had
to remember to make. CLAUDE.md said it should, but instruction-following
isn't a guarantee.

memory/hooks.py — handle_session_start now returns plain text
  - New build_session_resume(conn, project) builds a markdown block:
      ## CCE memory · resuming <project>

      **Previous session** (<ended_at>):
        <rollup_summary lines>

      **Recent decisions** (most-recent first):
        - <decision> — <reason> (session: `<sid>`)
        ... (top 5)

      Call session_recall("<topic>") for more, or
      session_timeline("<sid>") to drill in.
  - Empty string for a brand-new project (no awkward header on the
    first session).
  - Reasons truncated at 200 chars so a single rambling decision can't
    blow up the resume.
  - Wrapped in try/except so a bad query never breaks SessionStart.

memory/hook_installer.py — shell script captures SessionStart stdout
  - POSIX: branches on $HOOK_NAME. SessionStart's curl response is
    captured into RESPONSE and printed to stdout. Other hooks keep
    their stdout/stderr discarded as before. Timeout bumped to 2s for
    SessionStart since the resume query is a hair slower than a write.
  - Windows .cmd: same shape using a temp file (cmd can't easily
    capture command output to a variable across multiple lines).
    Cleans the temp file afterwards.

Why this fixes the problem
  Claude Code captures the SessionStart hook's stdout and injects it
  into the model's context at the start of the conversation. So at
  prompt 0 the model already sees the prior rollup + recent decisions
  — no tool call required, no "what did we decide about auth last
  week?" round-trip with the user.

  Live-tested against /home/fazle/trading/v3: 4 trading decisions
  recorded yesterday surface verbatim in the SessionStart response
  body, ready to be injected.

Tests: +1 (test_session_start_returns_resume_with_prior_rollup_and_decisions)
covering the rollup + decisions path. 142/142 across memory + dashboard
+ integration + CLI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without this widening, the resume context that handle_session_start
returns is only injected on cold-start. After the user runs `/clear`
(wipes context) or `/compact` (trims context), Claude Code re-issues a
SessionStart event — but the matcher="" we had before only matched the
default (startup) variant on some Claude Code versions, leaving us
silent exactly when re-injection matters most.

hook_installer.py
  - New HOOK_MATCHERS dict — per-hook matcher overrides keyed by name.
    SessionStart gets "startup|clear|compact"; everything else stays "".
  - install_settings() reads from this dict instead of hardcoding "".

Trade-off
  Matcher is now a regex-like alternation, but Claude Code accepts
  that natively (see claude-mem's hooks.json which uses the same
  three triggers). On a Claude Code version that doesn't recognise
  one of the subtypes, the hook just won't fire for that subtype —
  the others still work, so no behavioural regression.

Tests
  +1 (test_session_start_matcher_covers_clear_and_compact) asserts
  all three triggers are in the SessionStart matcher and the other
  four hooks keep their empty matcher. 143/143 across the suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-merge code-audit found 4 actionable items. Fixed; CI to confirm.

hook_installer.py — Windows hook command quoting (production bug)
  - Was: shlex.quote on every platform, including Windows. shlex emits
    POSIX single quotes — cmd.exe doesn't dequote those, so any Windows
    user with a space in %USERPROFILE% (e.g. C:\\Users\\Alice Smith\\)
    got a silently-broken hook command. Hooks fail closed; no error
    surfaces to the user.
  - Now: new _quote_hook_path branches on _is_windows() and emits
    cmd.exe double quotes on Windows, sh single quotes on POSIX.
  - +1 test (test_install_settings_uses_cmd_quoting_on_windows) using
    monkeypatch to force the Windows branch.

cli.py — CLAUDE.md template now documents session_timeline / session_event
  - The SessionStart resume body's affordance line literally tells the
    model to "Call session_timeline(\"<sid>\")", but CLAUDE.md (which
    primes the agent) didn't document either tool. Bumped block version
    to "3" so cce init re-renders. New "Drilling deeper from a recall
    hit" subsection covers both tools and when to prefer each.

memory/db.py — auto_prune_loop extracted to module level (testable)
  - Was: inline closure inside cli._run_serve. Untestable without
    spinning up the whole MCP server.
  - Now: auto_prune_loop(storage_base, days, initial_delay, interval,
    stop_event) at module level. Test suite injects 0.0/0.05 timing
    instead of waiting 120s + 86400s. Same behaviour; same defaults.
  - +2 tests:
      test_auto_prune_loop_runs_one_iteration — old payload gets
      pruned on the first pass.
      test_auto_prune_loop_stop_event_short_circuits_initial_delay
      — clean exit during the stagger.

tests/memory/test_hooks.py — build_session_resume edge cases
  - Pre-merge audit found only the "rollup + decisions" branch was
    tested; the two more-common branches (decisions only, rollup only)
    weren't. Added both as separate tests so a regression that breaks
    the week-1 path (decisions only, no completed-session rollup yet)
    can't pass CI.

tests/test_cli_init_probe.py — rendezvous-fallback path
  - Pre-merge audit found that 8ed9f70's "fall back to default-path
    rendezvous when storage-local is missing" had no test covering
    that branch. Added one — monkeypatches Path.home() and verifies
    the probe finds the listening port via the rendezvous file alone.

Tests: 149/149 across memory + dashboard + integration + CLI.

Deferred to follow-ups (not merge-blockers per the audit):
  · _drain_one_threaded._embedder closure — only matters if two
    compression_loop instances run concurrently, which never happens
    in production today.
  · Rendezvous-write-failure escalation to status — quiet edge case.
  · Windows .cmd %TEMP% with parentheses — already partly mitigated
    by setlocal enabledelayedexpansion; revisit if a real Windows
    user reports it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@fazleelahhee fazleelahhee merged commit 674923e into main Apr 28, 2026
3 checks passed
fazleelahhee added a commit that referenced this pull request Apr 28, 2026
Reconciles two independent callbacks for showing motion during long
indexing runs on large repos. They serve different timescales and are
complementary, so this commit keeps both rather than picking one.

pipeline.run_indexing now accepts:
  · embed_progress_fn(current, total) — per-batch numeric ticks during
    the embed phase. Already wired through cli.py to a live progress bar.
  · phase_fn(msg) — string status before each major phase
    ("Embedding 32k chunks (CPU-bound, can take several minutes)…",
    "Writing 32k chunks to vector + FTS + graph index…").
    Closes the in-place chunking bar first so the message doesn't get
    overwritten via \\r.

cli.py defines both callbacks; non-verbose TTY runs see a chunking bar
→ phase line → embed bar → phase line → embed bar's final tick.

embedder.py keeps the canonical (current, total) numeric API. The WIP's
alternate string-message API has been dropped — cli.py's bar already
delivers the "still alive" intent through chunks/N motion.

tests/indexer/test_pipeline_phase_progress.py
  · test_phase_fn_announces_embedding_and_ingest — pins that
    "Embedding…" and "Writing…" phase markers fire from run_indexing
    so a 10-30 min embed phase on a 7035-file repo doesn't look hung.
  · test_embedder_calls_progress_fn_during_inference — rewritten to
    use the canonical numeric callback; asserts the final tick reports
    full chunk count and embeddings actually attached.

Resolution context: PR #7 (memory feature) merged into main while these
indexer-progress changes were stashed. Both branches independently added
new keyword args to run_indexing's signature, creating a parameter-list
conflict that resolved cleanly by keeping both.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant