code-memory

A lightweight, local-first memory layer for coding agents.

Structural symbol graph · semantic vector recall · episodic task log

Jump to: Get it running · Plug it into your agent

Note

Using this with a coding agent? My personal Claude Code / OpenCode harness — hooks, agents, MCP wiring, and the code-memory integration — lives at dev/settings-opencode. Drop-in reference for plugging this memory layer into a real agent setup.

What is this?

code-memory gives a coding agent (Claude Code, OpenCode, Cursor, your own harness) a memory it can actually use:

Structural memory — a symbol graph of every file, function, import, and call (FalkorDB).
Semantic memory — dense embeddings of every symbol (Qdrant + Ollama-served bge-m3), with an opt-in dense+sparse hybrid path for symbol-heavy corpora.
Episodic memory — a task log of past prompts, plans, patches, and outcomes (SQLite + embedded recall).

It runs entirely on your machine. No OpenAI calls. No cloud. No vendor lock-in. Designed to be boring infrastructure you can wire into any harness via CLI, hooks, or MCP.

code-memory architecture — agent → orchestrator → extractor / embedder / episodic → FalkorDB + Qdrant + SQLite

Why it doesn't blow up your context window

Naive "give the LLM your whole repo" approaches die fast: context windows are finite, attention degrades with size, and tokens cost money. code-memory sidesteps that by keeping the bulk of the knowledge outside the prompt and injecting only a small, query-relevant slice when the agent actually needs it.

The trick is a two-phase split:

Offline — index everything, inject nothing. Ingest walks the repo once, chunks code with tree-sitter, embeds each chunk with bge-m3 (default: served by Ollama so the model stays warm across per-file save-hook re-ingests; opt-in EMBED_BACKEND=flagembed for true dense+sparse), and stores:
- vectors in Qdrant (semantic recall, hybrid-ready collection layout)
- symbol / import / call edges in FalkorDB (structural recall)
- past prompts / plans / patches / verdicts in SQLite (episodic recall)
None of this lives in the LLM context. It lives on disk.
Online — retrieve a small, focused Context Pack. When the agent has a question, it sends a natural-language query. The retriever:
- embeds the query and pulls top-k semantically similar chunks from Qdrant
- expands the neighborhood in FalkorDB (callers, callees, imports, types)
- pulls a few similar past episodes from SQLite
- returns a compact, ranked Context Pack — typically a handful of snippets, not the whole repo

The agent only ever sees the Context Pack, not the index behind it. The repo can be 5 MB or 500 MB — what hits the prompt is the same small budget of relevant chunks.

500 MB repo on disk     ─┐
                         │   index (Qdrant + FalkorDB + SQLite)
                         ▼
                  ┌──────────────┐
   query  ──────► │  retriever   │ ──►  ~few KB Context Pack  ──►  LLM prompt
                  └──────────────┘

Net effect: memory grows with the repo; context stays roughly constant.

Two-stage retrieval: bi-encoder + cross-encoder (+ hybrid opt-in)

Retrieval quality is the difference between an agent that hits the right file on the first try and one that flounders for ten tool calls. code-memory runs a two-stage pipeline with a third opt-in lexical layer:

Stage 1 — `bge-m3` bi-encoder (always on, default path)

Every code chunk is embedded once with bge-m3. The default backend is Ollama — the daemon keeps the model warm across short-lived processes, which matters because the Claude Code / OpenCode plugins fire a per-save code-memory reingest <path> after every Write|Edit. An in-process embedder would pay a multi-second cold load on every save; Ollama costs ~0 extra startup.

The query is embedded the same way, and Qdrant returns the top-N chunks by cosine similarity in milliseconds. Collection layout is named-vector (dense

sparse slots) so the door stays open for lexical fusion without re-ingestion; the sparse slot is simply empty under the Ollama backend.

Hybrid opt-in — `EMBED_BACKEND=flagembed` + `CODEMEMORY_HYBRID=1`

For corpora where the agent queries mostly by exact symbol name (JwtAuthGuard.canActivate, useCustomerStore), you can flip to the in-process FlagEmbedding backend, which emits both the dense vector and a learned sparse lexical vector from one forward pass:

uv sync --extra hybrid
export EMBED_BACKEND=flagembed
export CODEMEMORY_HYBRID=1
# then re-ingest so the sparse slot gets populated
code-memory ingest . --full

Queries then go through Qdrant's server-side Reciprocal Rank Fusion:

RRF(d) = Σ_branches  1 / (k + rank_branch(d))

Why this is opt-in, not default: benchmarks on a 2.6k-file Angular corpus (see docs/BENCHMARK.md) showed pure dense outperforming hybrid (RRF and DBSF) on realistic queries — m3's dense head is strong enough that adding sparse tends to surface generated API stubs and .spec.ts files that share identifier vocabulary with the query, without bringing new wins. And the FlagEmbedding backend trades Ollama's warm-daemon model for an in-process load (~5-15 s) which is a regression for hook-driven workflows. Re-run scripts/benchmark.py against your own corpus before flipping it.

Stage 2 — cross-encoder rerank (auto on Metal/CUDA, blended)

The top-N candidates from stage 1 are then rescored by a cross-encoder (BAAI/bge-reranker-v2-m3 by default): query and chunk are concatenated and fed through a transformer that produces a joint relevance score. Because the model attends to both sides simultaneously, it picks up semantic relationships a bi-encoder cosine sim misses. The tradeoff: one forward pass per pair, so it only makes sense on the already-narrow candidate set — never on the whole index.

The final score is a blend, not a replacement:

score = (1 - α) · bi_encoder_score + α · cross_encoder_score

with α = 0.5 by default. We picked blending after A/B testing replace-mode on a real Angular repo: the cross-encoder won on 2/5 queries (promoted the actual token-manager service over generic error interceptors; promoted the concrete phone.validator over a generic spec file), tied on 2/5, and regressed on 1/5 (promoted a mock CORS file to #1 for "authentication login flow"). Blending recovers from the regression while keeping the wins.

Policy

auto (default) — cross-encoder fires only when a Metal (Apple Silicon) or CUDA accelerator is detected. CPU-only hosts stay on bi-encoder alone — the latency hit isn't worth it.
CODEMEMORY_RERANK=1 — force on (even CPU).
CODEMEMORY_RERANK=0 — force off.
CODEMEMORY_RERANK_ALPHA=0.7 — push more weight to the cross-encoder (or 0.0 to fall back to pure bi-encoder without disabling the load).
CODEMEMORY_RERANK_MODEL — swap the cross-encoder (default BAAI/bge-reranker-v2-m3).

Install the optional dependency once:

uv sync --extra rerank        # or: pip install -e .[rerank]

On a first call the model warms up (~2-5 s, ~1.1 GB from HF cache); after that it's a singleton (~50-200 ms per query on MPS).

Benchmarks

All benchmarks run on the same anonymized corpus — acme/sample-webapp, ~2.6k files, 5.5k chunks of an Angular clean-architecture frontend — and the same 30 hand-crafted queries (scripts/benchmark_queries.json, mix of natural-language and PascalCase identifier searches). Hardware: Apple Silicon (MPS), fp16.

Benchmark 1 — code-memory vs no-code-memory baseline

Does plugging code-memory into an agent actually help, vs the default agent fallback (ripgrep keyword search)?

Mode	Recall@10	MRR	nDCG@10	p50 (ms)
grep (no code-memory)	0.367	0.169	0.218	176
code-memory (dense)	0.967	0.798	0.840	86
code-memory (dense + CE rerank)	0.967	0.573	0.672	7855

Headline: Recall@10 +164%, MRR +372%, nDCG@10 +285% — and faster than walking the filesystem (86 ms vs 176 ms) because Qdrant ANN beats rg -l over thousands of files. The cross-encoder regresses on this corpus; α=0.5 blending dampens but doesn't erase it.

Full per-query breakdown, top-5 paths, and latency p95 in docs/BENCHMARK_VS_BASELINE.md. Regenerate against your own corpus:

uv run python scripts/benchmark_vs_baseline.py \
    --project <slug> --corpus /path/to/repo \
    --out docs/BENCHMARK_VS_BASELINE.md

Benchmark 2 — dense vs hybrid vs cross-encoder rerank

Once you've decided to use code-memory, which retrieval mode wins?

Mode	Recall@5	MRR	nDCG@10	p50 (ms)
dense (default)	0.967	0.798	0.840	36
hybrid (`EMBED_BACKEND=flagembed` + `CODEMEMORY_HYBRID=1`)	0.900	0.680	0.750	34
dense + CE rerank	0.900	0.568	0.668	8853
hybrid + CE rerank	0.933	0.787	0.831	10446

Read it this way: the m3 dense head is strong enough on its own that adding the sparse view via RRF surfaces spec/generated noise without clear wins. Cross-encoder rerank improves hybrid (closes the gap to ~0.787 MRR) but never beats pure dense, and costs ~200× the latency. We ship dense via Ollama as the default and keep hybrid + rerank behind opt-in env vars for corpora where the tradeoffs flip.

Full breakdown in docs/BENCHMARK.md. Regenerate:

uv run python scripts/benchmark.py --project <slug> --out docs/BENCHMARK.md

Caveats

These results reflect one Angular codebase. Symbol-heavy or polyglot corpora may favor hybrid; LLM-style narrative corpora may favor the cross-encoder. Always re-run on your own corpus before changing defaults.
Gold paths are substring matches against hit.payload.path. Good for A/B comparison, not absolute IR.
Rerank latency includes a cold model load on the first query; warm calls drop to ~250 ms per query but stay dominated by N pairs × forward-pass cost.

Auto-learning and auto-query

Without MCP, the agent has to call code-memory retrieve and code-memory record explicitly (or via a hook). That works but pushes discipline onto the user and the agent. The goal is to make memory ambient: the agent uses it without thinking about it, and the memory updates itself as the agent works.

This is what the MCP server unlocks.

Auto-query

Exposed as an MCP server, code-memory advertises tools like:

memory.retrieve(query, k) — semantic + graph recall
memory.neighbors(symbol) — structural expansion around a symbol
memory.episodes(query) — similar past tasks

The agent's normal tool-selection loop picks memory.retrieve the same way it picks Read or Grep today — whenever the model judges that more context would help. No slash command, no hook, no human in the loop. The user just asks their question; the agent silently pulls the right chunks before answering.

In practice this means:

"How does the auth middleware work?" → agent calls memory.retrieve("auth middleware") → gets the relevant 3-5 chunks → answers.
"Refactor UserService to use the new repo pattern." → agent calls memory.neighbors("UserService") → sees every caller and dependency before touching code.
"We had this bug last month, what did we do?" → agent calls memory.episodes(...) → recalls the prior fix.

Auto-learning

The same MCP surface exposes write-side tools:

memory.record(prompt, plan, patch, verdict) — log a finished task
memory.reingest(path) — refresh the index for a changed file

Combined with editor / harness hooks (file save, commit, task completion), this closes the loop:

File saved → memory.reingest(path) keeps the index live, so retrieval never goes stale.
Task finished → memory.record(...) writes the episode (what was asked, what was planned, what was patched, did it pass) into SQLite + Qdrant.
Next similar task → that episode surfaces automatically via memory.episodes(...).

The result is a memory that gets smarter the more the agent uses the codebase, without ever bloating the context window. Each task teaches the system; each query pays back only the few KB that are actually relevant.

Between sessions, code-memory ingest uses a git delta to catch up anything the live hooks missed — see Git-aware incremental ingest below.

Status: an MCP server ships in the package (see MCP server below). The same primitives remain reachable via the CLI (retrieve, record, reingest) for shell-hook workflows.

Requirements

Component	Minimum version	Notes
Python	3.11	Used to build the orchestrator, CLI, and extractor.
Docker	20.x (Compose v2)	Runs FalkorDB and Qdrant locally.
Ollama	latest	Default embedding backend (long-running daemon keeps `bge-m3` warm across CLI hooks).
Embedding model	`bge-m3`	`ollama pull bge-m3` (~1.2 GB). Optional opt-in: `EMBED_BACKEND=flagembed` for in-process dense+sparse via FlagEmbedding (~2.3 GB).
Disk	~3 GB	Ollama model (~1.2 GB) + Docker volumes + Python deps.
RAM	8 GB+	16 GB+ recommended for large repos.
OS	macOS / Linux / Windows (WSL2)	Tested on Apple Silicon (M-series) and Linux x86_64.

Optional

Redis CLI (brew install redis / apt install redis-tools) for poking at FalkorDB from the terminal.
DB Browser for SQLite (brew install --cask db-browser-for-sqlite) for browsing episodic memory.

Installation

One-shot scripts

We provide cross-platform install scripts under scripts/.

macOS / Linux

git clone https://github.com/<you>/code-memory.git
cd code-memory
./scripts/install.sh

Windows (PowerShell)

git clone https://github.com/<you>/code-memory.git
cd code-memory
./scripts/install.ps1

Both scripts will:

Check prerequisites (python3, docker, ollama).
Create a Python virtual environment at ./.venv.
Install the package in editable mode.
Copy .env.example → .env (if not present).
Start FalkorDB and Qdrant via Docker Compose.
Pull the bge-m3 embedding model into Ollama.
Run the smoke test suite.
Optionally install the OpenCode and/or Claude Code harness plugins (see Harness plugins). Interactive by default; pass --plugins=all|opencode|claudecode|none to bypass the prompt.

# install everything plus both plugins (non-interactive)
./scripts/install.sh --plugins=all

# only the Claude Code plugin, project-local
./scripts/install.sh --plugins=claudecode --plugins-scope=project

Manual installation

If you prefer to run each step by hand:

# 1. Clone the repo
git clone https://github.com/<you>/code-memory.git
cd code-memory

# 2. Start infrastructure (FalkorDB + Qdrant)
docker compose -f docker/docker-compose.yml up -d

# 3. Pull the embedding model
ollama pull bge-m3

# 4. Create Python environment
python3 -m venv .venv
source .venv/bin/activate         # Windows: .venv\Scripts\Activate.ps1
pip install -e ".[dev]"

# 5. Configure environment
cp .env.example .env

# 6. Verify
pytest -q

You should now have a working code-memory CLI on your $PATH (inside the venv).

Usage

Ingest a repository

code-memory ingest /path/to/repo                 # auto: incremental if known, else full
code-memory ingest /path/to/repo --full          # force a complete re-walk
code-memory ingest /path/to/repo --since main    # diff main..HEAD
code-memory ingest /path/to/repo --dry-run       # show the plan, write nothing

This walks the repo, extracts symbols / imports / calls with tree-sitter, writes them to the FalkorDB graph, and indexes each symbol snippet into Qdrant via bge-m3.

The walker honors the repo's .gitignore (including nested ones) and skips obvious generated junk — minified bundles (*.min.js, *.min.css), sourcemaps, lockfiles, node_modules, build outputs — so embeddings aren't burned on noise. Override the defaults via the project config if you need to index something the walker normally drops.

Git-aware incremental ingest

After the first run, code-memory remembers the HEAD commit it last ingested (per repository, in the project's SQLite). Subsequent ingest calls run git diff <last_sha>..HEAD --name-status -M and only touch what actually changed:

Added / Modified files → re-extracted and re-embedded
Deleted files → records dropped from FalkorDB + Qdrant
Renamed files → old path purged, new path ingested
Dirty worktree (uncommitted edits, including untracked) → also re-ingested

When the stored SHA is no longer reachable (history rewrite, branch deletion), the next run automatically falls back to a full walk and re-records. Non-git checkouts always do a full walk.

Check what's pending:

code-memory ingest-status /path/to/repo
# -> { "last_sha": "...", "head_sha": "...", "drift": {"changed": 7, "deleted": 1}, "dirty": 3 }

Warning

The first ingest can take a while. Wall time is dominated by embedding every symbol through Ollama (bge-m3 on CPU is the bottleneck on most laptops) and scales roughly linearly with codebase size. Rough orders of magnitude on an M-series Mac:

Small repo (~1k symbols): seconds
Mid repo (~10k symbols): a few minutes
Large repo (100k+ symbols): tens of minutes to an hour+

Subsequent runs use the git delta described above, so they finish in seconds even on large repos. GPU-accelerated Ollama, fewer files (tune ignore globs), or a smaller embedding model (nomic-embed-text) all cut the first-run cost too.

Query the memory

code-memory retrieve "where is the auth middleware defined?"

Returns a structured Context Pack: top-k semantic hits, neighborhood expansion, and any similar past episodes.

Re-ingest a single file (for live workflows / hooks)

code-memory reingest src/auth.ts

Record a task episode

code-memory record \
  --prompt "fix the login timeout" \
  --plan "extend session, add retry" \
  --patch "$(git diff)" \
  --verdict pass

Navigate the call / import graph

The graph store backs five topology commands. Each prints JSON (--json) or a human table:

code-memory callers getBearerToken              # who calls this symbol?
code-memory callees UserService --depth 2       # what does UserService reach?
code-memory definitions UserService             # all defining files + lines
code-memory dependencies src/auth.ts            # what does this file import?
code-memory importers '@acme-ng/security'     # who imports this package?
code-memory resolve                             # rebuild call/import edges

These same five surfaces are exposed as MCP tools (codememory_callers, codememory_callees, codememory_definitions, codememory_dependencies, codememory_importers) so the agent can do impact analysis without shelling out.

Reset a project's index

code-memory reset                          # wipe current project (with confirm)
code-memory reset --yes                    # skip confirmation
code-memory reset --include-episodes       # also drop conversation history
code-memory reset --all                    # every known project

Default scope drops Qdrant vectors + FalkorDB graph + ingest_state for the project; episodic memory is preserved unless --include-episodes is passed. code-memory ingest --full triggers the same reset implicitly before re-walking.

MCP server

code-memory ships an MCP (Model Context Protocol) server that exposes the same primitives as native tools. The agent can call them in its normal tool-selection loop — no shell parsing, no slash command.

Tools advertised:

Core

Tool	Purpose
`codememory_retrieve`	Semantic + graph + episodic recall (returns a Context Pack).
`codememory_record`	Log a finished task (prompt / plan / patch / verdict).
`codememory_reingest`	Re-index a single file after edits.
`codememory_ingest`	Full / incremental repo ingest (long-running; requires explicit `confirmed=true`).

Topology (call/import graph)

Tool	Purpose
`codememory_callers`	Files that call a symbol (impact analysis: "what breaks if I rename X?").
`codememory_callees`	Symbols called from the file that defines a given symbol (outgoing dependencies).
`codememory_definitions`	All files + line ranges that define a given symbol name (disambiguate before callers/callees).
`codememory_dependencies`	Modules imported by a file (forward import graph).
`codememory_importers`	Files that import a module / package / path (reverse import graph).

.NET assembly surface

Tool	Purpose
`codememory_assembly_members`	List public methods of an indexed .NET Type, read on-demand from the referenced DLL (~tens of ms). Members aren't bulk-indexed to keep the graph small.

Temporal (time-travel queries)

Tool	Purpose
`codememory_drift`	Symbols whose `last_seen_sha` ≠ supplied HEAD — classified `tombstoned` / `drifted`.
`codememory_at_sha`	Nodes (Symbol or File) alive at a past commit. Caller passes `sha_ord` from `git rev-list --count --first-parent <sha>`.
`codememory_callers_at_sha`	Callers of a symbol as the graph looked at a past commit ("what called X before commit Y deleted it").

Every tool requires an explicit project argument — the silent cwd-fallback was hiding namespace bugs (see commit 3663772). Pass the slug printed by code-memory project (or code-memory projects to list all known slugs). Sentinel values like auto / default are rejected.

Transport: stdio. Script entrypoint: code-memory-mcp.

Recommended: `uvx` (no clone, no venv)

Requires uv (brew install uv / pipx install uv). uvx fetches, caches, and runs the package in an isolated environment.

Claude Code

claude mcp add code-memory \
  --scope user \
  -- uvx --from git+https://github.com/dev/code-memory code-memory-mcp

OpenCode

{
  "mcp": {
    "code-memory": {
      "type": "local",
      "command": [
        "uvx",
        "--from",
        "git+https://github.com/dev/code-memory",
        "code-memory-mcp"
      ],
      "enabled": true
    }
  }
}

Pin a version by appending @<tag-or-sha> to the git URL, e.g. git+https://github.com/dev/code-memory@v0.1.0.

Once the package is published to PyPI the --from git+… part drops: command: ["uvx", "code-memory-mcp"].

Every tool call must include the project slug — the server no longer falls back to cwd-detection. The startup-detected slug is surfaced in each tool's schema description so the agent has the exact value to pass; the harness plugins (see Harness plugins) wire this automatically.

Alternative: pipx (persistent install on `$PATH`)

pipx install git+https://github.com/dev/code-memory
# then in any MCP client:
#   command: ["code-memory-mcp"]

Local checkout (development)

{
  "mcp": {
    "code-memory": {
      "type": "local",
      "command": ["/absolute/path/to/code-memory/.venv/bin/code-memory-mcp"],
      "enabled": true,
      "environment": { "CODE_MEMORY_PROJECT": "auto" }
    }
  }
}

Run directly (debug)

uvx --from git+https://github.com/dev/code-memory code-memory-mcp
# speaks JSON-RPC on stdio; useful with `npx @modelcontextprotocol/inspector`

Harness plugins

The MCP server above exposes manual tools. The harness plugins make the same backend ambient — auto-retrieving a Context Pack on every substantive user prompt, auto-reingesting on every Write / Edit, and recording sessions as episodes when the agent stops. The plugins are optional and live alongside the MCP server; install both for the best experience.

Plugin	Hook model
`plugins/opencode`	Bun-loaded TypeScript module; uses `chat.message`, `experimental.chat.system.transform`, `tool.execute.after`, `session.idle`.
`plugins/claude-code`	Plain Node scripts wired via `hooks.json`; uses `SessionStart`, `UserPromptSubmit`, `PostToolUse` (`Write`/`Edit`/`MultiEdit`), `Stop`.

Both plugins:

Detect substantive code intent (trivial follow-ups skip retrieval).
Dedup the same query within 60 s.
Debounce the cross-file resolver (~1.5 s after the last write) so a 20-file refactor collapses to exactly one resolver run.
Run a one-shot git-delta ingest at session start to catch out-of-band edits (vim, IDE, git pull).
Record the session as an episode on idle / stop with the first user message + git diff as the patch.

Install via the top-level installer:

./scripts/install.sh --plugins=all                       # both, global
./scripts/install.sh --plugins=claudecode                # Claude Code only
./scripts/install.sh --plugins=opencode,claudecode       # both, csv form
./scripts/install.sh --plugins=all --plugins-scope=project   # ./.opencode and ./.claude

Or install a single plugin directly without re-running the backend installer:

./plugins/opencode/install.sh        # ~/.config/opencode/plugins/
./plugins/claude-code/install.sh     # ~/.claude/plugins/code-memory/

Restart your agent after install.

Inspect the stores

Store	UI
FalkorDB	http://localhost:3000 (graph browser, Cypher)
Qdrant	http://localhost:6333/dashboard
SQLite	`sqlite3 ./data/episodic.db`

Team sync — share memory across developers

A single dev's full ingest is expensive. The team-sync layer makes that work shareable so dev B inherits dev A's index without re-running ingest, then keeps the index in lockstep with the code as branches move.

How it works

┌─────────────────────────────┐
│  main branch (your code)    │
├─────────────────────────────┤
│  feat/x, feat/y, ...        │
├─────────────────────────────┤
│  codemem-snapshots (orphan) │ ← per-commit snapshots, content-addressed
│   ├─ snapshots/<sha>.cmsnap │
│   ├─ manifests/<sha>.json   │
│   └─ index.json             │
└─────────────────────────────┘

A snapshot is a tar.gz containing the full vectors + graph + state for a single git SHA, plus a manifest with embed_model, embed_dim, and a sha256 content digest. Snapshots live on an orphan git branch named codemem-snapshots — no external storage, no CI, just git push/git fetch.

Snapshots are content-addressed by the SHA they represent. Two devs publishing for the same SHA produce identical bytes (deterministic dump + canonical sort) so concurrent publishes converge instead of conflicting.

Trust guarantees

Property	Mechanism
Reproducible	snapshot keyed by `git SHA` + `embed_model`
Tamper-evident	`manifest.content_sha256` recomputed on every load
Model-drift safe	`verify_snapshot()` rejects mismatched `embed_model` / `dim`
Disaster recovery	any dev can re-publish; no single point of failure
Offline-safe	local incremental works without `git fetch`

Concrete workflow

Dev A merges to main, publishes a snapshot

# after merging PR
git checkout main && git pull
code-memory sync . --publish
# -> ingests latest main, then pushes <sha>.cmsnap to codemem-snapshots

Or one-shot:

code-memory snapshot publish .

Dev B clones fresh — instant memory, no ingest

git clone <repo> && cd <repo>
code-memory hooks install        # one-time setup; installs hooks + autostart
code-memory sync                 # pulls snapshot for HEAD, applies it

Output:

{ "action": "pull_snapshot", "head_sha": "abc123...", "snapshot_sha": "abc123..." }

No tree-sitter walk. No embedding calls. Index ready in seconds.

Dev B pulls main with new commits

When git hooks are installed, git pull automatically triggers code-memory sync:

$ git pull
...
[sync] HEAD = ghi789
[sync] snapshot ghi789 found in codemem-snapshots
[sync] action=pull_snapshot

If no snapshot exists for that exact SHA, the syncer walks back via rev-list --first-parent to find the nearest ancestor snapshot and applies it, then runs an incremental ingest for the commits in between.

Dev B on a feature branch

git checkout -b feat/y         # post-checkout hook fires
# -> action=pull_then_incremental (snapshot=abc123, base=abc123)
# edits files...
# watcher (auto-running) detects edits, debounces 2s, runs incremental

Feature branches never publish (only commits on --canonical-branch main).

Automated sync — four layers of resilience

┌──────────────────────────┐
│ 1. Git hooks             │ ← post-checkout, post-merge, post-rewrite,
│                          │   post-commit, post-applypatch
│                          │   (run code-memory sync in background)
├──────────────────────────┤
│ 2. Filesystem watcher    │ ← watchdog (FSEvents/inotify/ReadDirChangesW),
│                          │   debounce 2s, catches saves between commits
├──────────────────────────┤
│ 3. MCP pre-query guard   │ ← every codememory_retrieve checks HEAD drift,
│                          │   syncs if stale (cheap noop when clean)
├──────────────────────────┤
│ 4. OS autostart service  │ ← runs watcher independent of MCP host;
│                          │   launchd / systemd --user / schtasks
└──────────────────────────┘

Each layer is a safety net for the others. Hooks bypassed? Watcher catches it. Watcher crashes? Autostart restarts. Autostart blocked? MCP guard catches drift on the next query.

Cross-platform autostart

The OS service runs code-memory watch <repo> at every user logon. Zero admin required, fully user-level:

OS	Mechanism	Unit location
macOS	launchd LaunchAgent	`~/Library/LaunchAgents/com.codememory.watch.<slug>.plist`
Linux	systemd --user	`~/.config/systemd/user/codememory-watch-<slug>.service`
Windows	Task Scheduler (logon)	`\CodeMemory\Watch\<slug>` (task name)

Bootstrap is automatic — the MCP server registers and starts the service the first time it's invoked in a repo. No manual install step. The Watcher class also runs in-process inside the MCP server as a belt-and-suspenders fallback for sessions where OS autostart is unavailable (corporate-locked machines, CI containers).

CLI surface

code-memory sync [ROOT]              # smart reconcile: pull / incremental / full
code-memory status [ROOT]            # unified view: autostart, hooks, snapshot, drift
code-memory watch [ROOT]             # foreground watcher (for tmux / screen)
code-memory hooks install            # git hooks + autostart (idempotent)
code-memory hooks uninstall          # clean removal
code-memory snapshot publish [ROOT]  # build snapshot for HEAD, push to branch
code-memory snapshot list            # list snapshots on the snapshot branch
code-memory snapshot gc --keep 20    # prune all but recent N snapshots
code-memory autostart status         # OS service health

Decision tree (what `sync` actually does)

State	Action
HEAD == state.sha, worktree clean	`noop`
HEAD == state.sha, worktree dirty	`dirty_only` (reingest dirty files)
No local state, snapshot exists for HEAD	`pull_snapshot`
No local state, snapshot exists for ancestor	`pull_then_incremental`
No local state, no snapshot	`full_ingest`
HEAD moved, snapshot exists for HEAD	`pull_snapshot`
HEAD moved, snapshot for ancestor	`pull_then_incremental`
HEAD moved, state.sha reachable	`incremental`
HEAD moved, state.sha rewritten	`full_ingest`

Environment variables

Var	Effect
`CODE_MEMORY_REPO`	Override repo root for MCP bootstrap (default: cwd / git top)
`CODE_MEMORY_NO_AUTOSTART`	Skip OS autostart registration on MCP boot
`CODE_MEMORY_NO_BOOT_SYNC`	Skip initial sync on MCP boot
`CODE_MEMORY_NO_INPROC_WATCHER`	Skip in-process watcher (use OS service only)
`CODE_MEMORY_NO_GUARD`	Skip pre-query freshness guard
`CODE_MEMORY_LOG_LEVEL`	`DEBUG` / `INFO` / `WARNING` (default `INFO`)
`CODEMEMORY_RERANK`	`auto` (default — on if Metal/CUDA), `1` (force on), `0` (force off)
`CODEMEMORY_RERANK_MODEL`	Cross-encoder model id (default `BAAI/bge-reranker-v2-m3`)
`CODEMEMORY_RERANK_ALPHA`	Blend weight: `score = (1-α)·bi + α·ce` (default `0.5`)

Failure modes & recovery

Symptom	Resolution
`embed_model mismatch` on apply	Re-run `code-memory ingest --full` and re-publish
`content digest mismatch`	Snapshot corrupt; `code-memory sync` falls back to incremental
Watcher not picking up changes	`code-memory status` → check `running: true`; reinstall via `code-memory hooks install`
Snapshot branch grew too large	`code-memory snapshot gc --keep 20`
Hook never fires after `git pull`	Repo has `core.hooksPath` set elsewhere; the installer follows that path — verify with `git config core.hooksPath`

What is not shared

Stored locally only	Why
Episodic memory (`episodic.db`)	Personal: your task history, plans, verdicts
Per-repo `ingest_state` SQLite row	Records the dev's local checkpoint
Working-tree-dirty incremental updates	Uncommitted code shouldn't pollute team snapshot

Snapshots only carry vectors + graph + state for committed code, scoped to a project slug. Two projects in the same Qdrant/Falkor instance can publish independently without colliding.

Project layout

src/code_memory/
├── embed/            # Ollama embeddings wrapper
├── vector/           # Qdrant store
├── graph/            # FalkorDB store (callers / callees / definitions / imports)
├── extractor/        # tree-sitter -> symbols / imports / calls
│   └── gitignore.py      # .gitignore + minified/generated skip rules
├── episodic/         # SQLite task log
├── orchestrator/     # ingest pipeline, retrieval, context pack
│   ├── pipeline.py       # ingest_repo / ingest_file / reingest_file
│   ├── retrieve.py       # Retriever + ContextPack rendering (rerank, episode filter)
│   ├── resolver.py       # bind raw CALLS / IMPORTS to actual Symbol / File nodes
│   ├── reset.py          # wipe vectors + graph + ingest_state per project
│   ├── ingest_state.py   # per-repo last_sha checkpoint (SQLite)
│   └── git_delta.py      # git diff -> changed / deleted / dirty
├── sync/             # team-shared memory (snapshots, watcher, autostart)
│   ├── snapshot.py       # tar.gz blob: build / verify / apply
│   ├── store.py          # orphan-branch git-backed snapshot storage
│   ├── sync.py           # decision tree: pull / incremental / full
│   ├── watcher.py        # cross-platform filesystem watcher (watchdog)
│   ├── hooks.py          # git hooks installer (idempotent marker blocks)
│   └── autostart/        # OS service adapters
│       ├── launchd.py        # macOS LaunchAgent
│       ├── systemd.py        # Linux systemd --user
│       └── schtasks.py       # Windows Task Scheduler
├── mcp_server.py     # stdio MCP server (`code-memory-mcp`)
└── cli.py            # typer-based CLI entrypoint

Temporal model — time-aware graph queries

The code graph is bi-temporal at the SHA level: every File, Symbol, and edge carries three lifecycle properties stamped at ingest time.

property	written when	answers
`first_seen_sha`	element is upserted for the first time	"when did this enter the codebase?"
`last_seen_sha`	every subsequent re-ingest of the same element	"is this still alive at HEAD?"
`invalid_sha`	element was deleted (file removed, symbol disappeared)	"when did this leave?"

Deletions don't DETACH DELETE anymore — they tombstone. The node stays in the graph with invalid_sha = <removal commit>. Topology queries (callers, callees, definitions, dependencies, importers) filter tombstones out by default, so the live view is unchanged. The history stays queryable.

Each lifecycle stamp pairs a SHA string with a topological ordinal (first_seen_ord, last_seen_ord, invalid_ord) computed via git rev-list --count --first-parent <sha>. The ordinal is a monotonic integer along the main branch — parent < child, always — which makes range comparisons across SHAs ("alive before SHA X") cheap. Tombstones additionally carry an invalid_at wall-clock timestamp.

What this unlocks

# Drift detection: symbols the latest ingest didn't confirm
store.drift(head_sha=current_head)
# -> [{'name': 'OldHelper', 'status': 'tombstoned', 'invalid_sha': '…'},
#     {'name': 'Renamed',  'status': 'drifted',    'last_seen_sha': '…'}]

# Time-travel: graph as of a past commit
store.at_sha(sha="abc123…", sha_ord=4271, label="Symbol")

# Who used to call this symbol before it was deleted?
store.callers_at_sha("getBearerToken", sha="abc123…", sha_ord=4271)

// Pre-deletion callers of a now-removed symbol (without ordinals)
MATCH (caller)-[c:CALLS]->(s:Symbol {name: 'getBearerToken'})
WHERE s.invalid_sha = $deletion_sha
RETURN caller.key

CLI

# Sanity-check a long-running watcher
code-memory drift -p my-project

# Bound monotonic graph growth
code-memory vacuum --before release/26.18      # by git ref
code-memory vacuum --older-than 30d            # by wall-clock age
code-memory vacuum --all --dry-run             # report counts without writing

MCP tools

Agents reach the same surface via three native tools:

tool	what
`codememory_drift(head_sha, project)`	symbols not last-seen at HEAD, classified `tombstoned` / `drifted`
`codememory_at_sha(sha, sha_ord, label?, limit?, project)`	nodes alive at a past commit (caller computes `sha_ord` with `git rev-list --count --first-parent <sha>`)
`codememory_callers_at_sha(symbol, sha, sha_ord, project)`	callers as the graph looked at that commit

Episode replay

Episode.head_sha records the git HEAD when the agent recorded the episode. Combined with the graph stamps, you can reconstruct "what the agent saw" without rewinding the working tree.

What this doesn't do (yet)

Edge versioning is coarse. An edge that disappears and reappears collapses into one row whose last_seen_sha jumps over the gap. Enough for "alive at SHA X" queries; not enough for "every commit that toggled this edge".
Vacuum is destructive and explicit. Once you vacuum --before X, the pre-X tombstones are gone — at_sha / callers_at_sha for SHAs before X stop returning the dropped rows. Tradeoff is intentional; the design favours bounded growth over unbounded history.

See CHANGELOG.md for the design rationale.

Roadmap

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
docker		docker
docs		docs
plugins		plugins
scripts		scripts
src/code_memory		src/code_memory
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

code-memory

What is this?

Why it doesn't blow up your context window

Two-stage retrieval: bi-encoder + cross-encoder (+ hybrid opt-in)

Stage 1 — bge-m3 bi-encoder (always on, default path)

Hybrid opt-in — EMBED_BACKEND=flagembed + CODEMEMORY_HYBRID=1

Stage 2 — cross-encoder rerank (auto on Metal/CUDA, blended)

Policy

Benchmarks

Benchmark 1 — code-memory vs no-code-memory baseline

Benchmark 2 — dense vs hybrid vs cross-encoder rerank

Caveats

Auto-learning and auto-query

Auto-query

Auto-learning

Requirements

Optional

Installation

One-shot scripts

macOS / Linux

Windows (PowerShell)

Manual installation

Usage

Ingest a repository

Git-aware incremental ingest

Query the memory

Re-ingest a single file (for live workflows / hooks)

Record a task episode

Navigate the call / import graph

Reset a project's index

MCP server

Recommended: uvx (no clone, no venv)

Claude Code

OpenCode

Alternative: pipx (persistent install on $PATH)

Local checkout (development)

Run directly (debug)

Harness plugins

Inspect the stores

Team sync — share memory across developers

How it works

Trust guarantees

Concrete workflow

Dev A merges to main, publishes a snapshot

Dev B clones fresh — instant memory, no ingest

Dev B pulls main with new commits

Dev B on a feature branch

Automated sync — four layers of resilience

Cross-platform autostart

CLI surface

Decision tree (what sync actually does)

Environment variables

Failure modes & recovery

What is not shared

Project layout

Temporal model — time-aware graph queries

What this unlocks

CLI

MCP tools

Episode replay

What this doesn't do (yet)

Roadmap

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Stage 1 — `bge-m3` bi-encoder (always on, default path)

Hybrid opt-in — `EMBED_BACKEND=flagembed` + `CODEMEMORY_HYBRID=1`

Recommended: `uvx` (no clone, no venv)

Alternative: pipx (persistent install on `$PATH`)

Decision tree (what `sync` actually does)

Packages