Skip to content

raya-ac/engram

Repository files navigation

engram

engram

PyPI Tests Docs License

a cognitive memory system that actually remembers things. built because flat markdown files don't scale and every "memory" tool i tried was either too simple (just embeddings) or too complex (needs redis + neo4j + a PhD).

engram sits in the middle. one sqlite file, hybrid retrieval that fuses five signals, memory layers that model how brains actually work, and a neural visualization that shows the whole thing firing in real time. 98.1% R@5 on LongMemEval (ICLR 2025) — highest published score, beating MemPalace (96.6%), Emergence AI (86%), and every other memory system benchmarked.

what it does

hybrid retrieval — most memory tools just do cosine similarity and call it a day. engram runs four retrieval channels in parallel (BM25 keywords, dense embeddings via HNSW approximate nearest neighbors, entity graph BFS with 1-hop traversal, Hopfield associative pattern completion), fuses them with intent-weighted reciprocal rank fusion (k=60, weights vary by query type — why/when/who/how/what), then applies temporal + importance boosting, cross-encoder reranking, deep MLP reranking, gaussian noise for beneficial variation, and a minimum score threshold gate. the full pipeline: dense (HNSW) + BM25 + graph + Hopfield → intent-weighted RRF → boost → cross-encoder → MLP reranker → noise + threshold.

memory layers — five layers modeled after atkinson-shiffrin: working (ephemeral, auto-promotes to episodic after 30 min), episodic (events, experiences), semantic (permanent knowledge), procedural (decisions, error patterns, how-to), and codebase (compressed code knowledge — file trees, function signatures, dependency graphs). memories promote upward when they prove useful and decay if nobody accesses them. 30-day half-life on episodic, infinite on semantic.

entity graph — extracts people, tools, projects, dates from every memory. builds a relationship graph with co-occurrence strength. multi-hop traversal via recursive SQL CTEs, no neo4j needed. backlinks let you trace which memories are connected to which.

dream cycle — consolidation pass that clusters similar memories (cosine > 0.8), summarizes the clusters, generates entity "peer cards" (biographical summaries), and archives the low-value old stuff. like sleep for your memory system.

semantic dedup — finds near-duplicate memories by embedding distance (default threshold 0.92), auto-merges them keeping the higher-importance version. transfers entity links, merges tags and access counts. run manually or as part of consolidation.

codebase scanning — point scan_codebase at a project directory and it extracts file trees, function/class signatures, import graphs, and config files into compressed codebase-layer memories. stores ~10x fewer tokens than raw code while keeping what you actually need to work with the project.

conversation ingest — auto-extracts memories from Claude Code JSONL session logs. parses exchanges into Q+A pairs, classifies them (decisions, corrections, errors, task completions), and stores them in the right layer with appropriate importance scores.

neural visualization — force-directed graph of entities organized in concentric rings by memory layer. neurons fire with traveling impulse particles when memories get accessed. polls the database so it works across processes. fire a query from the CLI or MCP server and watch the web UI light up.

drift detection — memories reference file paths, function names, commands, and dependencies. those references go stale when the codebase changes. drift_check extracts verifiable claims from memory content and validates them against the actual filesystem — dead paths, missing functions, broken npm scripts. returns a drift score (0-100) with per-issue breakdown. zero AI cost, pure filesystem checks. drift_fix auto-invalidates dead references and flags stale memories. inspired by mex's claim verification approach.

pattern extraction — after a session, extract_patterns analyzes recent activity (diary entries, new memories, events) and distills reusable procedural knowledge. classifies work into categories (workflow, gotcha, decision, integration, debug), checks novelty against existing procedural memories via embedding distance, and only stores patterns that are genuinely new. the GROW step from mex, automated.

negative knowledgeremember_negative stores explicit "what does NOT exist" claims: no caching layer, no Redux, the /admin endpoint was removed. these prevent future hallucinated recommendations. stored in the semantic layer with a NEGATIVE KNOWLEDGE prefix so they surface when you search for the thing that doesn't exist.

enriched embeddings — at write time, an LLM generates keywords, categorical tags, and a contextual summary for each memory. the embedding is computed over the concatenation of content + keywords + tags + summary, giving the vector richer semantic signal than raw content alone. inspired by A-Mem's zettelkasten approach, where enriched embeddings nearly doubled multi-hop retrieval F1.

memory evolution — memories aren't write-once. when a new memory arrives and near-neighbors are detected (via the surprise gate), the system asks an LLM whether existing memories should be updated with the new context. old memories get smarter over time instead of going stale. from A-Mem — removing evolution dropped multi-hop F1 from 45.85% to 31.24% in ablation.

intent-aware retrieval — queries are classified by intent (why/when/who/how/what) and retrieval signals are dynamically weighted. "why" queries boost graph traversal for causal reasoning. "when" queries boost BM25 for date matching. "who" queries boost entity graph lookup. from MAGMA's adaptive policy (+9% over static weighting).

trust-weighted decay — different sources decay at different rates. human-authored memories get full 30-day half-life. auto-extracted observations decay 3x faster. formula: λ_eff = λ · (1 + κ·(1 - trust)), κ=2.0. from SuperLocalMemory V3.3. also: confirmation count — memories corroborated by multiple independent sources get importance boost.

write-path CRUD — instead of always appending then deduplicating later, new memories are classified at write time as ADD/UPDATE/NOOP by comparing against existing neighbors. updates merge content in-place. noops skip storage entirely. from Mem0's production pipeline.

adversarial belief probing — during the dream cycle, randomly sample old semantic/procedural memories and challenge them: "is this still true?" beliefs that fail the probe get importance reduced. prevents fossilized false beliefs. from the March 2026 survey on autonomous agent memory.

63 MCP tools — plugs into claude code (or any MCP client) as a tool server. 72 tests. docker-ready. recall, remember, entity lookup, codebase scanning, conversation extraction, semantic dedup, drift detection, pattern extraction, negative knowledge, quality metrics, embedding compression, community detection, timeline queries, similarity search, backlinks, consolidation, batch operations, export, health checks, the works.

the retrieval pipeline

eight stages — four parallel channels, intent-weighted fusion, boosted, reranked, gated:

query
  │
  ├── intent classification (why/when/who/how/what)
  │         → dynamic signal weights per intent type
  │
  ├── dense HNSW search (bge-small-en-v1.5, 384-dim, hnswlib)  → top 3k candidates
  ├── BM25 via sqlite FTS5 (content + hypothetical queries)     → top 3k candidates
  ├── entity graph BFS (1-hop traversal, strength-weighted)     → top k candidates
  └── Hopfield associative (pattern completion, β=8.0)          → top k candidates
           │
           ▼
     intent-weighted reciprocal rank fusion (k=60)
     score = Σ w_intent · 1/(60 + rank) across 4 channels
           │
           ▼
     temporal + importance boosting
     retention regularization, access frequency, date matching
           │
           ▼
     cross-encoder reranking (ms-marco-MiniLM-L-6-v2)
     joint (query, document) scoring — optional, adds ~200ms
           │
           ▼
     deep MLP reranker (optional, if trained)
     learned relevance from historical access patterns, <1ms
           │
           ▼
     gaussian noise (ACT-R, σ=0.02) + threshold gate
     beneficial variation + minimum score cutoff
           │
           ▼
     final top-k results

deep retrieval — optional 7th stage: a learned 2-layer MLP reranker trained on actual access patterns. which memories get accessed after being returned in search results? that signal teaches the reranker what's useful vs what's just semantically similar. takes 10 features (cosine similarity, importance, access count, age, layer one-hot, retention score) and outputs a relevance prediction. train with train_reranker, model persists to disk next to the database. runs automatically on every recall once trained. lightweight — adds <1ms per query. after the MLP, a small gaussian noise term (σ=0.02, ACT-R inspired) provides beneficial retrieval variation, and a configurable minimum score threshold gates out garbage results.

task-aware skill selectionget_skills decides whether to inject procedural knowledge and which 2-3 items to surface. three-stage gate: (1) need assessment via query surprise + domain coverage, (2) selection of top procedural memories by adaptive relevance threshold, (3) calibration with confidence scoring that filters borderline matches. based on SkillsBench finding that focused skills (+16.2pp) beat comprehensive docs (-2.9pp), and the AGENTS.md evaluation showing static context files reduce performance. the system knows when to inject (unfamiliar domain + relevant procedures = high confidence) and when to shut up (model already knows + no specific procedures = skip).

the hypothetical query part is from docTTTTTquery — at ingestion time, generate questions each memory might answer, index them alongside the content. fixes the vocabulary mismatch problem where your search terms don't match the stored text.

memory lifecycle

memories aren't static. they move between layers based on how useful they turn out to be.

surprise-based importance — at write time, every new memory is compared against existing embeddings using k-NN cosine distance (k=5). novel memories (far from anything stored) get their importance boosted up to +0.3. redundant memories (close to existing) get importance reduced and are flagged as potential duplicates. the surprise score is stored in metadata so you can audit it later. this is inspired by the Titans paper (Behrouz et al., Google) where memory updates are proportional to surprise — the gradient of the loss function. the remember tool now returns a surprise field (0-1) and warns when near-duplicates are detected.

importance scoring uses 9 factors:

  • base importance (set at creation, 0.0-1.0, adjusted by surprise at write time)
  • access frequency (log scale, how often it's been recalled)
  • recency (exponential decay, trust-weighted half-life)
  • emotional valence (strong emotions = more memorable)
  • stability (accessed consistently over time vs burst)
  • layer boost (semantic memories weighted higher)
  • source trust (human=1.0, AI=0.7, interaction=0.6, ingest=0.5, dream=0.4)
  • confirmation count (independently corroborated facts get boosted)
  • combined into a weighted composite score

promotion — episodic memories that hit importance >= 0.7 and access count >= 5 get promoted to semantic (permanent). working memories auto-promote to episodic after 30 minutes or 2 accesses. the sweep runs on every recall call so it's basically free.

pinning — pin any memory with the pin tool or the pin button in the web UI. pinned memories are immune to the dream cycle's forgetting pass. useful for memories that are important but accessed infrequently — the kind ebbinghaus would normally archive.

retention regularization — forgetting is reframed as retention regularization, inspired by Miras (Behrouz et al., Google). three modes, configurable via retention_mode in config:

  • l2 (classic ebbinghaus): smooth exponential decay, 50% at half-life. everything fades gradually.
  • huber (default): matches L2 near-term, transitions to linear for old memories. robust to burst-then-quiet access patterns — old-but-once-hot memories get a gentler transition instead of an infinite long tail. huber_delta controls the transition point.
  • elastic (L1+L2): sparse retention. strongly-held memories stay near full strength, weakly-held ones decay faster. produces cleaner separation between keepers and forgettables. elastic_l1_ratio controls the L1/L2 blend.

all modes include access reinforcement — each recall strengthens retention (spaced repetition effect, log-scaled, capped at +0.3). trust-weighted: low-trust sources (auto-extracted, dream-generated) decay up to 3x faster than human-authored memories (λ_eff = λ · (1 + κ·(1 - trust)), κ=2.0, from SuperLocalMemory V3.3). after 90 days, if retention < 0.15, importance < 0.3, and access count < 3, the memory gets soft-deleted. semantic, procedural, and pinned memories don't decay.

embedding compression — as memories age and retention drops, their embeddings can be quantized to save storage: active (R>0.8) = 32-bit float, warm = 8-bit (3.9x compression, 0.9999 cosine fidelity), cold = 4-bit (7.6x, 0.97 fidelity), archive = 2-bit (14.6x, 0.59 fidelity). uses Fisher-Rao Quantization-Aware Distance (FRQAD) for mixed-precision comparison — inflates variance proportional to quantization loss to prevent false similarity. run with compress_embeddings.

consolidation (dream cycle) — 7-step pipeline: (1) apply forgetting curve with trust-weighted retention, (2) cluster similar memories by embedding distance and merge clusters of 5+, (3) generate peer cards for entities with enough data, (4) cross-domain synthesis — find entity pairs in different contexts with moderate embedding similarity (0.75-0.90), LLM-confirm genuine connections, create SYNTHESIZED_WITH bridges, (5) adversarial belief probing — randomly sample old semantic/procedural memories and challenge them ("is this still true?"), reduce importance on invalidated beliefs, (6) drift detection — validate memory claims against filesystem, auto-invalidate dead references, (7) prune old access logs and events. run manually with engram consolidate or the MCP consolidate tool.

entity graph

every memory gets scanned for entities — people, tools, projects, dates, URLs, file paths. these go into an entity registry with canonical names, aliases, and types.

relationships form automatically through co-occurrence (entities mentioned in the same sentence get a CO_OCCURS link) and through pattern matching ("X uses Y" → USES, "X built Y" → CREATED, etc.). relationship strength increases with evidence count.

traversal uses recursive SQL CTEs for multi-hop queries — "show me everything connected to Ari within 2 hops" runs in a single SQL query, no graph database needed. the recall_related tool does this.

you can also manually link entities (link_memories), merge duplicates (merge_entities), add aliases (update_entity), find backlinks (backlinks), and fuzzy-search for entities by partial name (search_entities).

editing and annotating

memories aren't write-once. you can:

  • edit contentedit_memory changes the text and automatically re-embeds and rebuilds the FTS index. the memory keeps its ID, access history, and entity links.
  • annotateannotate attaches timestamped notes to a memory without touching its content. useful for adding context later ("this turned out to be wrong" or "confirmed by Ari on april 8").
  • invalidateinvalidate marks a fact as no longer true with a reason. the memory stays in the database (useful for audit) but gets flagged and shown with a strikethrough in the web UI.
  • tagtag adds or removes freeform tags. batch_tag applies tags to all memories matching a search query.

examples

the examples/ directory has ready-to-use setup guides:

setup guides:

file what it covers
claude-code-setup.md full walkthrough: install, wire into claude code, add CLAUDE.md instructions, seed memories
hooks-setup.md auto-extract memories from conversations via claude code hooks
agent-patterns.md common patterns: session orientation, learning from corrections, check-before-store, cognitive scaffolding, multi-agent setup

python examples:

file what it does
python-client.py standalone usage without MCP — store, search, surprise scoring, reranker training
custom-agent.py conversational agent with engram memory using the Anthropic SDK
openai-compatible.py same pattern for any OpenAI-compatible API (OpenAI, Ollama, vLLM, llama.cpp)
multi-agent.py 3 specialized agents sharing one database — cross-domain recall, surprise, dream cycle
api-embeddings.py switch between local and cloud embedding backends (Voyage, OpenAI, Gemini)
entity-graph.py build and traverse the entity relationship graph — multi-hop traversal
negative-knowledge.py store "what does NOT exist" — prevents hallucinated recommendations
drift-detection.py detect and fix stale memories referencing dead paths or missing functions
export-import.py export memories to portable JSON, import into fresh database
codebase-scan.py scan a project and extract compressed code knowledge

install

git clone https://github.com/raya-ac/engram.git
cd engram
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

needs python 3.11+. first run will download two small models (~100MB total):

  • BAAI/bge-small-en-v1.5 (33MB) — embeddings
  • cross-encoder/ms-marco-MiniLM-L-6-v2 (22MB) — reranking

optional: API embedding backends

use cloud embedding APIs instead of (or alongside) local models:

pip install -e ".[voyage]"   # voyage-3.5, voyage-3.5-lite, voyage-code-3
pip install -e ".[openai]"   # text-embedding-3-small, text-embedding-3-large
pip install -e ".[gemini]"   # gemini-embedding-001
pip install -e ".[api]"      # all three

set the API key and model in config.yaml or env vars:

export VOYAGE_API_KEY="your-key"    # get at https://dash.voyageai.com/
export OPENAI_API_KEY="your-key"
export GEMINI_API_KEY="your-key"
# config.yaml
embedding_model: voyage-3.5         # auto-detects backend from model name
embedding_dim: 1024                 # auto-detected if model is known

engram auto-detects the backend from the model name — voyage-* uses the Voyage API, text-embedding-* uses OpenAI, gemini-* uses Gemini. or set embedding_backend explicitly.

supported models:

model provider dim price/1M tokens notes
BAAI/bge-small-en-v1.5 local 384 free default, runs on CPU or Apple GPU
voyage-3.5 Voyage AI 1024 $0.18 best retrieval quality, Anthropic recommended
voyage-3.5-lite Voyage AI 1024 $0.02 94% of 3.5 quality, budget option
voyage-code-3 Voyage AI 1024 $0.18 optimized for code
text-embedding-3-small OpenAI 1536 $0.02 cheapest API option
text-embedding-3-large OpenAI 3072 $0.13 highest dim
gemini-embedding-001 Google 768 free tier top MTEB retrieval score

switching models requires re-embedding all memories — use engram reembed after changing the model.

docker

docker compose up -d
# → http://localhost:8420

mount your config and set API keys via environment variables. data persists in a docker volume.

quick start

ingest some files

engram ingest ~/notes/
engram ingest ~/projects/docs/ ~/journal/

supports markdown, plaintext, JSON (claude code JSONL, claude.ai JSON, chatgpt JSON tree, slack exports), PDF. extracts atomic facts via LLM, embeds them, indexes in FTS5, extracts entities and relationships.

search

engram search "what happened on march 28"
engram search "melee garden architecture" --debug  # shows retrieval stage breakdown
engram search "apple sandbox bypass" --rerank      # enables cross-encoder (slower, better)

remember something directly

engram remember "Ari prefers casual tone, swearing when it fits"
engram remember "deploy command: npm run build && rsync" --layer procedural

manage ANN index

engram index rebuild     # full rebuild from all embeddings
engram index status      # check index size, vector count, last built

check status

engram status

entity lookup

engram entity Ari --graph

check memory drift

engram drift                                    # full drift report
engram drift --search-roots ~/project/src       # also verify function names
engram drift --fix --dry-run                    # preview what would be fixed
engram drift --fix                              # auto-invalidate dead refs, flag stale

extract patterns from session

engram patterns                                 # extract from last 4 hours
engram patterns --hours 24 --dry-run            # preview from last 24 hours
engram patterns --threshold 0.5                 # only store highly novel patterns

re-embed after switching models

engram reembed                              # re-embed all memories with current model
engram reembed --dry-run                    # preview count without re-embedding
engram reembed --batch-size 128             # larger batches for API models

watch a directory for auto-ingest

engram watch ~/notes/                       # poll every 30s for new/changed files
engram watch ~/chats/ --interval 60         # poll every 60s

export and import

engram export backup.json                   # export memories + entities + relationships
engram export backup.json --include-embeddings  # include embedding vectors
engram export backup.jsonl --layer procedural   # filter by layer
engram import backup.json                   # restore from export
engram import backup.json --skip-duplicates # skip memories with matching content hash

run the dream cycle

engram consolidate

run tests

pytest tests/ -v                            # 72 tests, ~3s

start the web dashboard

engram serve --web
# → http://127.0.0.1:8420

start the MCP server

engram serve --mcp

MCP server

wire it into claude code by adding to ~/.claude/settings.json:

{
  "mcpServers": {
    "engram": {
      "command": "/path/to/engram/.venv/bin/python",
      "args": ["-m", "engram", "serve", "--mcp"]
    }
  }
}

restart claude code. you get 63 tools:

recall & search

tool what it does
recall hybrid search across all layers
recall_entity everything about a person/project/tool — memories, relationships, timeline
recall_timeline memories in a date range
recall_related multi-hop graph traversal from an entity
recall_recent last N memories by creation time
recall_layer search within a specific layer
recall_hints search memories but return only hints (truncated snippets + entity names) to trigger recognition without replacing cognition
get_skills task-aware skill selection — get focused procedural guidance only when injection would help, skip when it wouldn't
recall_code search the codebase layer for functions, classes, files
recall_context search and return a formatted context block with token budget
find_similar find memories most similar to a given one by embedding distance
compress summarize search results down to a token budget

store & organize

tool what it does
remember store a memory with layer and importance
remember_decision decision + rationale → procedural layer
remember_error error pattern + prevention → procedural layer
remember_interaction Q+A pair → episodic layer
remember_project structured project info → semantic layer
remember_negative store explicit negative knowledge — what does NOT exist, what should NOT be done
edit_memory edit content of an existing memory (re-embeds automatically)
annotate add a note to a memory without changing its content
pin / unpin pin a memory so it never gets forgotten by the dream cycle
forget soft-delete a memory
invalidate mark a fact as no longer true
tag add or remove tags on a memory
bulk_forget mass cleanup by source file, layer, or date

entities & graph

tool what it does
update_entity add aliases, change type
merge_entities combine two entities that are the same thing
search_entities fuzzy search for entities by partial name
entity_graph relationship subgraph as JSON
entity_timeline entity's memories ordered chronologically
link_memories manually relate two memories via their entities
backlinks find all memories linked to a specific memory via shared entities

codebase

tool what it does
scan_codebase extract compressed code knowledge from a project directory
recall_code search the codebase layer specifically
list_projects show all scanned projects with memory counts

drift detection

tool what it does
drift_check verify memories against filesystem reality — dead paths, missing functions, stale memories. returns drift score 0-100
drift_fix auto-fix drift issues — invalidate dead refs, flag stale memories. use dry_run=true first
extract_patterns extract reusable procedural patterns from recent session activity — only stores what's genuinely novel

dedup & maintenance

tool what it does
dedup find and merge near-duplicate memories by embedding similarity
find_duplicates preview duplicate pairs without merging
recompute_importance recalculate all importance scores with the 9-factor formula
batch_tag add tags to all memories matching a search query
train_reranker train the deep MLP reranker on access patterns
reranker_status check if the deep reranker is trained
compress_embeddings lifecycle-aware quantization (32/8/4/2-bit) with FRQAD distance metric
detect_communities label propagation over entity graph, optional LLM summaries
quality_metrics storage quality ratio, curation ratio, enrichment coverage

conversations & sessions

tool what it does
ingest_sessions auto-extract memories from recent Claude Code conversation logs
session_summary generate summary from diary entries + recent events

lifecycle & system

tool what it does
consolidate run dream cycle (cluster, summarize, peer cards, forget)
promote / demote move memories between layers
layers graduated L0-L3 context for prompt injection
status memory counts, entity counts, db size
health embedding cache, FTS index, orphaned entities, stale working memories
access_patterns most-recalled memories, hit rates
count_by group counts by layer, source type, entity, or month
export dump memories as markdown or JSON
ingest import files or directories
explain_importance break down a memory's importance score into 7 component factors
memory_map high-level map of the whole system — layer counts, top entities per layer, date range, recent activity
diary_write / diary_read session notes

benchmarks

43/43 tests across 20 subsystems. run on 446 embedded memories, Apple Silicon.

full test suite (446 vectors, 384-dim, Apple Silicon)

subsystem tests result
embedding 3/3 dim=384, norm=1.0, avg 5.1ms, batch OK
ANN index (HNSW) 7/7 0.09ms search, 100% recall@10, 5,304 inserts/sec
brute-force dense 2/2 0.016ms avg (100 runs)
intent classification 1/1 6/6 correct (why/when/who/how/what)
full pipeline (no rerank) 3/3 15.5ms avg, debug mode OK
full pipeline (+ cross-encoder) 1/1 252ms avg
cross-encoder 2/2 correct ranking, 2.9ms/doc
surprise gate 4/4 0.10ms avg, novel=0.85, duplicate=0.44
Hopfield channel 1/1 <1ms
BM25 / FTS5 2/2 3.5ms avg
entity graph 4/4 find, relationships, 2-hop traversal (161 related)
memory CRUD 2/2 write → read → ANN verify → forget
layers (L0-L3) 1/1 248ms, 4 layers
deep reranker 1/1 trained=True
importance scoring 1/1 9-factor composite OK
store internals 3/3 cache cold=0.4ms hot=0.001ms
diary 1/1 write + read OK
events 1/1 logging OK
index I/O 1/1 967 KB on disk, save=4ms load=10ms
config 2/2 ANN config + reload consistent

throughput (Apple Silicon, MLX GPU)

operation rate
embedding (MLX GPU) 1,879 texts/sec
embedding (CPU) 176 texts/sec
sqlite bulk insert 51,000 rows/sec
ANN insert 5,304 ops/sec
embed + store 100k ~3 min

latency (Apple Silicon)

operation time
ANN dense search 0.09ms avg
brute-force dense search 0.016ms avg
full pipeline (no rerank) 15.5ms avg
full pipeline (+ cross-encoder) 252ms avg
surprise gate (k-NN) 0.10ms avg
embedding 5.1ms avg
cross-encoder rerank 2.9ms/doc
BM25 / FTS5 3.5ms avg
Hopfield channel <1ms
ANN index save 4ms
ANN index load 10ms
embedding cache (cold) 0.4ms
embedding cache (hot) 0.001ms

ANN scaling projection

vectors brute-force ANN (HNSW) speedup
1k 0.1ms 0.12ms 1x
10k 0.9ms 0.16ms 5x
100k 8.7ms 0.20ms 45x
500k 43.7ms 0.22ms 198x
1M 87.3ms 0.23ms 377x

recall@10 accuracy: 100% (20/20 queries, ANN vs brute-force exact match)

LongMemEval (ICLR 2025)

LongMemEval — 500 questions testing 5 long-term memory abilities (information extraction, multi-session reasoning, knowledge updates, temporal reasoning, abstention) across ~40 conversation sessions per question (~115k tokens). the standard benchmark for chat assistant memory.

engram uses HNSW + BM25 + RRF fusion against per-question session haystacks. no entity graph or Hopfield (those need persistent memory, not ephemeral per-question corpora). run with benchmarks/longmemeval/run_engram.py.

system R@5 method
engram v2 98.1% HNSW + BM25 + assistant BM25 + temporal boost + cross-encoder
MemPalace (raw) 96.6% ChromaDB cosine, verbatim storage
engram v1 94.7% HNSW + BM25 + RRF
Emergence AI 86.0% RAG
MemPalace (AAAK) 84.2% compressed storage
EverMemOS 83.0%
TiMem 76.9% temporal hierarchical

per question type (470 non-abstention questions):

type n R@5 R@10
knowledge-update 72 100.0% 100.0%
single-session-user 64 100.0% 100.0%
multi-session 121 99.2% 99.2%
temporal-reasoning 127 96.9% 97.6%
single-session-assistant 56 96.4% 96.4%
single-session-preference 30 93.3% 96.7%

v2 adds three channels over v1: assistant-turn BM25 (weight 0.5), timestamp proximity boost, and cross-encoder reranking on top-20 candidates. the assistant channel catches answers in assistant responses without polluting the dense index. the temporal boost favors sessions closer to the question date. the cross-encoder rescores the top candidates jointly against the query.

retrieval quality (synthetic)

tested on synthetic memories with template-varied content (different topics, people, tools). queries use the first line of each memory verbatim — a strict exact-match test.

metric 500 memories 10k memories 100k memories
recall@1 10% 25% 0%
recall@5 55% 75% 20%
recall@10 95% 95% 40%
coverage (top 20) 100% 100% 60%

recall drops at 100k because all synthetic memories use similar templates — finding one exact match among 100k near-duplicates is adversarially hard. real-world diverse content scores much higher.

intent classification

accuracy 90% (9/10 test cases)

query intent (why/when/who/how/what) is classified and used to dynamically weight retrieval signals. "why" boosts graph edges, "when" boosts BM25 date matching, "who" boosts entity lookup.

system health metrics

metric description
storage quality fraction of stored memories ever recalled
curation ratio memories with updates/invalidations vs total
enrichment ratio memories with keywords+tags+summary metadata
evolution count memories updated by evolution on neighbor write
confirmation count memories independently corroborated

run quality_metrics via MCP or the web dashboard health panel.

hooks

engram ships with a shell hook for Claude Code that auto-extracts memories from your conversation sessions.

# hooks/save_hook.sh — run periodically or on session end
ENGRAM_VENV=~/path/to/engram/.venv ./hooks/save_hook.sh

the hook finds recent Claude Code JSONL files, parses exchanges into Q+A pairs, classifies them (decisions get stored in procedural, corrections become error patterns, etc.), and stores them with appropriate importance. it skips files it's already ingested via content hash.

you can also wire it into Claude Code's hook system by adding to your settings — check hooks/save_hook.sh for details.

web dashboard

full monitoring UI at http://127.0.0.1:8420. supports optional bearer token auth — set web.auth_token in config.yaml to lock it down.

  • neural map — force-directed entity graph with concentric layer rings (semantic core → procedural → episodic → working). neurons glow and fire impulse particles along synapses when memories are accessed. drag nodes, hover for details, click to inspect. polls the database every 2s so MCP queries show up in real time.
  • search — hybrid search with debug mode showing all 5 retrieval stages. filter chips for layer, importance slider. hint mode toggle returns truncated snippets with reveal buttons for cognitive scaffolding. search history saved to localStorage with dropdown.
  • memories — browse all memories, filter by layer (including codebase). layer-colored left borders, importance bars, slide-in animations. every card has inline actions: edit content, promote/demote, pin/unpin, find similar, explain importance, copy, invalidate, forget. select mode for bulk operations (promote/forget multiple at once). pinned memories show gold glow.
  • entities — entity chips with memory counts. click to open inspector with relationship graph, add aliases, change entity type.
  • timeline — date range queries with memory cards.
  • remember — tabbed forms: general (any layer/importance), decision (with rationale → procedural), error pattern (with prevention → procedural), Q+A interaction (→ episodic). now shows surprise score and adjusted importance after storing.
  • cognition — three tabs for the new memory science features:
    • surprise: paste text to preview novelty score before storing. radial gauge visualization (green=novel, red=duplicate), k-NN distance bars, nearest memory snippet.
    • retention: interactive canvas chart overlaying L2, Huber, and elastic net decay curves. sliders for half-life, huber delta, and L1 ratio — curves redraw client-side in real time.
    • reranker: deep MLP reranker status card, train button with epoch/LR inputs, training results display.
  • bridges — cross-domain synthesis viewer. shows entity pairs connected by the dream cycle's LLM-confirmed bridges, with similarity scores and connection descriptions.
  • analytics — donut chart for layer distribution, bar charts for most recalled memories, top entities by memory count, source type breakdown.
  • context — L0-L3 graduated context viewer with token counts per layer and copy buttons. query input for L3 search-based context.
  • health — system health dashboard with 10 status cards (embedding cache, orphaned entities, stale working memories, FTS index, embedding coverage, db size, etc). plus a memory map showing top entities per layer and full date range.
  • dedup — duplicate detection with adjustable similarity threshold slider. scan to preview duplicate pairs side by side, one-click auto-merge.
  • ingest — file/directory path ingestion with real backend processing, session ingest button, and recent ingestion log.
  • export — download memories as markdown or JSON from sidebar, with optional layer filter.
  • live events — real-time feed of all memory reads/writes across all processes (MCP, CLI, web). deduplicates events within 2-second windows and shows result counts.
  • session diary — quick note-taking input in the sidebar, timestamped entries.
  • inspector panel — right sidebar that shows memory details, entity graphs, similar memories (with similarity percentages), importance factor breakdowns (colored bar chart with 9 weighted factors), annotations with add-note input, and access history.
  • toast notifications — bottom-right toasts for all actions (promote, pin, copy, forget, dedup) with success/error/info styling and auto-dismiss.
  • keyboard shortcuts/ focus search, n neural map, s search, r remember, a analytics, c cognition, b bridges, Esc close inspector.

web API

the dashboard is backed by a full JSON API you can hit directly:

GET  /api/memories                    paginated list, optional ?layer= filter
GET  /api/memories/:id                full memory with hypothetical queries, entities, access history
GET  /api/memories/:id/similar        find similar memories by embedding distance
GET  /api/memories/:id/importance     9-factor importance score breakdown
GET  /api/search?q=...&debug=true     hybrid search with optional debug breakdown
GET  /api/search/filtered?q=...       search with layer, importance, date, source filters
GET  /api/entities                    all entities with memory counts
GET  /api/entities/:id/graph          entity relationship subgraph
GET  /api/entities/:id/timeline       entity memories ordered chronologically
GET  /api/neural                      full graph for neural visualization
GET  /api/neural/fires?since=...      recent access events (lightweight polling)
GET  /api/timeline?start=...&end=...  temporal query
GET  /api/analytics                   layer distribution, top accessed, top entities
GET  /api/health                      system health (cache, orphans, FTS, embeddings)
GET  /api/memory-map                  full system overview with per-layer top entities
GET  /api/context?query=...           L0-L3 graduated context with token counts
GET  /api/duplicates?threshold=0.92   preview near-duplicate memory pairs
GET  /api/stats                       system statistics
GET  /api/events                      recent events from all processes
GET  /api/diary                       session diary entries
GET  /api/ingest/log                  recent file ingestions
GET  /api/pulse                       hourly activity counters + sparkline
GET  /api/heatmap?days=30             github-style activity heatmap
GET  /api/memories/:id/history        importance score over time
GET  /api/retention/curves            L2/Huber/elastic curve data for chart
GET  /api/retention/scatter           real memory age vs retention scatter data
GET  /api/reranker/status             deep reranker training state
GET  /api/bridges                     cross-domain bridge memories
GET  /api/search/hints?q=...          truncated hints for cognitive scaffolding
GET  /api/skills?query=...            task-aware skill selection with confidence scoring
GET  /api/export?format=json          export memories as markdown or JSON
POST /api/remember                    store a memory (with surprise scoring)
POST /api/consolidate                 trigger dream cycle
POST /api/dedup                       auto-merge duplicate memories
POST /api/surprise/preview            compute surprise for text before storing
POST /api/reranker/train              trigger reranker training
POST /api/memories/:id/promote        change memory layer
POST /api/memories/:id/demote         demote to lower layer
POST /api/memories/:id/edit           edit content (re-embeds automatically)
POST /api/memories/:id/annotate       add timestamped note
POST /api/memories/:id/invalidate     mark as no longer true
POST /api/memories/:id/forget         soft-delete
POST /api/memories/:id/pin            pin (prevent forgetting)
POST /api/memories/:id/unpin          unpin
POST /api/memories/bulk               bulk promote/forget/tag/demote
POST /api/entities/:id/alias          add entity alias
POST /api/entities/:id/type           change entity type
POST /api/diary                       append diary entry
POST /api/ingest/path                 ingest a file or directory
POST /api/ingest/sessions             ingest recent Claude Code sessions

architecture

everything lives in one sqlite file (~/.local/share/engram/memory.db). no external services.

engram/
├── store.py          # sqlite schema, CRUD, FTS5, entity graph (recursive CTEs), ANN lifecycle
├── ann_index.py      # HNSW approximate nearest neighbor index (hnswlib wrapper)
├── embeddings.py     # multi-backend embeddings (mlx, sentence-transformers, voyage, openai, gemini) + cross-encoder
├── retrieval.py      # 5-stage hybrid pipeline (HNSW dense + BM25 + graph → RRF → boost → rerank → deep)
├── extractor.py      # LLM fact extraction + hypothetical query generation
├── entities.py       # regex entity extraction, relationship graph, co-occurrence
├── surprise.py       # k-NN novelty scoring at write time (Titans-inspired surprise gate, ANN-accelerated)
├── deep_retrieval.py # learned MLP reranker trained on access patterns
├── skill_select.py   # task-aware skill selection gate (SkillsBench-inspired)
├── lifecycle.py      # retention regularization (L2/Huber/elastic), 9-factor importance, promotion
├── consolidator.py   # dream cycle (clustering, summarization, peer cards, archival, belief probing)
├── codebase.py       # project scanner — file trees, signatures, deps → codebase layer
├── conversations.py  # claude code session ingest — exchange pairs, classification
├── dedup.py          # semantic deduplication — find and merge near-duplicates
├── layers.py         # L0-L3 graduated context retrieval
├── compress.py       # token-budget compression with entity codes
├── formats.py        # parsers for markdown, JSON chat exports, PDF, slack, email
├── llm.py            # claude CLI + mlx backend abstraction
├── evolution.py      # memory enrichment, evolution, CRUD classification, trust scoring, canonicalization
├── drift.py          # memory drift detection — claim extraction, filesystem verification
├── patterns.py       # session pattern extraction — distill procedural knowledge from work
├── quantize.py       # lifecycle embedding compression (32/8/4/2-bit) with FRQAD
├── communities.py    # label propagation community detection + LLM summaries
├── hopfield.py       # Hopfield associative retrieval — pattern completion via modern Hopfield network
├── benchmark.py      # self-benchmark suite — retrieval quality, latency, throughput
├── mcp_server.py     # 63-tool MCP server (JSON-RPC, stdio)
├── cli.py            # CLI — ingest, search, remember, reembed, watch, export, import, index, serve
├── config.py         # yaml config with env var overrides, auto-dim detection
├── demo.py           # interactive demo walkthrough
└── web/
    ├── app.py        # fastapi with model warmup, bearer token auth
    ├── routes.py     # 70+ REST endpoints
    ├── events.py     # SSE event stream (in-process)
    └── templates/
        └── index.html  # single-page dashboard — neural canvas, 17 panels, keyboard shortcuts

tests/
├── test_store.py       # CRUD, FTS, entities, events, cache
├── test_embeddings.py  # multi-backend, cosine search, dim detection
├── test_ann_index.py   # HNSW build, search, add/remove, persistence, recall
├── test_retrieval.py   # hybrid pipeline, intent classification, debug mode
├── test_surprise.py    # novelty scoring, importance adjustment
└── test_config.py      # config loading, auto-dim, env overrides

benchmarks/
└── longmemeval/
    └── run_engram.py   # LongMemEval benchmark adapter (98.1% R@5)

Dockerfile              # python 3.12-slim, all deps, port 8420
docker-compose.yml      # single service, data volume, config mount

supported formats

engram can ingest these file types:

format how it's handled
markdown (.md) split by headers into sections
plaintext (.txt) treated as single document
claude code (.jsonl) parsed as conversation exchanges, grouped into Q+A pairs
claude.ai (.json) parsed from chat_messages array
chatgpt (.json) parsed from mapping tree structure
slack (.json) parsed from messages array with user attribution
PDF (.pdf) text extracted via pymupdf
generic JSON each item or the whole object as a document

conversation formats get special treatment — exchanges are grouped into Q+A pairs and classified (decisions, corrections, errors, task completions) before storing.

what informed the design

i studied three existing memory systems and six IR papers before building this. took the best parts from each:

systems:

  • neuro-memory — my earlier memory system. atkinson-shiffrin 4-layer model, ebbinghaus forgetting curve, 7-factor importance scoring, procedural memory with pattern templates. engram takes the layer architecture and lifecycle model from here.
  • cmyui/ai-memory — LLM-extracted atomic facts, three-stage hybrid retrieval with RRF, dream cycle
  • mempalace — graduated layers (L0-L3), entity registry with disambiguation, exchange-pair chunking for conversations, AAAK compression

papers:

  • Reciprocal Rank Fusion (Cormack et al. 2009) — the RRF formula and k=60 constant
  • Memory in the Age of AI Agents (Hu et al. 2026) — forms/functions/dynamics taxonomy
  • docTTTTTquery (Nogueira & Lin 2019) — document expansion by query prediction
  • ColBERT-PRF (Wang et al. 2021) — pseudo-relevance feedback for dense retrieval
  • BM25 Query Augmentation (Chen & Wiseman 2023) — learned query expansion
  • Word Embedding GLM (Ganguly et al. 2015) — embedding-based language model for IR
  • Titans (Behrouz et al. 2025) — surprise-based memorization, memory updates proportional to loss gradient
  • Miras (Behrouz et al. 2025) — unifying framework for sequence models, forgetting as retention regularization
  • Your Brain on ChatGPT (Kosmyna et al. 2025) — cognitive scaffolding vs replacement, recall_hints design
  • SkillsBench (Li et al. 2026) — focused skills (+16.2pp) beat comprehensive docs (-2.9pp), get_skills gate design
  • Evaluating AGENTS.md (Gloaguen et al. 2026) — static context files reduce performance, validates dynamic retrieval over flat injection
  • A-Mem (Wu et al. 2025) — zettelkasten memory with enriched embeddings and memory evolution, enrichment doubled multi-hop F1
  • AgeMem (Chen et al. 2026) — RL-trained memory operations, quality_metrics reward decomposition
  • MAGMA (Zhao et al. 2026) — multi-graph architecture with intent-aware adaptive retrieval policy
  • Zep/Graphiti (Preston-Werner et al. 2025) — temporal knowledge graph with three-tier architecture
  • SuperLocalMemory V3.3 (2026) — trust-weighted decay, lifecycle embedding compression, confirmation count
  • Mem0 (Chhablani et al. 2025) — production write-path CRUD classification, temporal marked deletion
  • Memory for Autonomous Agents (2026) — latest comprehensive survey, adversarial belief probing, write-path canonicalization
  • Mem^p (2025) — procedural memory with dual representation and reflection-based updates
  • ACT-R Memory (HAI 2025) — base-level activation, retrieval noise, threshold gating

config

lives at config.yaml or ~/.config/engram/config.yaml. env vars override everything (prefix with ENGRAM_, e.g. ENGRAM_DB_PATH).

db_path: ~/.local/share/engram/memory.db

# embedding model — local or API
# local:  BAAI/bge-small-en-v1.5 (384d, free, default)
# voyage: voyage-3.5 (1024d, $0.18/1M), voyage-3.5-lite (1024d, $0.02/1M)
# openai: text-embedding-3-small (1536d, $0.02/1M), text-embedding-3-large (3072d)
# gemini: gemini-embedding-001 (768d, free tier)
embedding_model: BAAI/bge-small-en-v1.5
cross_encoder_model: cross-encoder/ms-marco-MiniLM-L-6-v2
embedding_backend: auto                   # auto | mlx | sentence_transformers | voyage | openai | gemini
embedding_dim: 384                        # auto-detected from model name if known

retrieval:
  top_k: 10
  rrf_k: 60
  min_confidence: 0.60
  rerank_candidates: 20
  dense_multiplier: 3          # candidates = top_k * multiplier
  bm25_multiplier: 3

lifecycle:
  forgetting_half_life_days: 30
  archive_after_days: 90
  archive_min_importance: 0.3  # below this + age + low access → forget
  archive_min_accesses: 3
  promote_importance: 0.7
  promote_accesses: 5
  cluster_threshold: 0.8
  cluster_min_size: 5
  retention_mode: huber        # l2 | huber | elastic
  huber_delta: 0.5             # transition point for huber (in half-lives)
  elastic_l1_ratio: 0.3        # L1 weight for elastic (0=pure L2, 1=pure L1)

llm:
  backend: claude_cli          # claude_cli | mlx | llamacpp
  model: claude-sonnet-4-20250514
  mlx_model: mlx-community/Qwen2.5-3B-Instruct-4bit

web:
  host: 127.0.0.1
  port: 8420
  auth_token: ""               # set to enable bearer token auth on the web UI

ann:
  enabled: true
  m: 32                        # HNSW graph connectivity (higher = better recall, more memory)
  ef_construction: 200         # build-time search depth (higher = better index quality)
  ef_search: 100               # query-time search depth (higher = better recall, slower)
  max_elements: 500000         # pre-allocated capacity
  index_path: ~/.local/share/engram/hnsw.index

license

MIT

Releases

No releases published

Packages

 
 
 

Contributors

Languages