Skip to content

snow-ghost/mem

Repository files navigation

mem — Standalone Memory Palace for AI Agents

Go Release

A pure Go memory palace system inspired by MemPalace, rewritten from scratch with zero Python dependencies. Single static binary, single SQLite file, no LLM required for core features.

What it is

mem organizes knowledge into a navigable palace structure:

Wing (person/project) → Hall (facts/events/...) → Room (topic) → Drawer (verbatim content)

Features:

  • Palace structure — wings, halls, rooms, drawers, tunnels (cross-wing links)
  • BM25 semantic search — pure Go implementation, no embeddings needed
  • Temporal knowledge graph — entity-relationship triples with validity windows + contradiction detection
  • 4-layer memory stack — L0 Identity → L1 Critical Facts → L2 On-demand → L3 Deep Search
  • Wake-up context — compact (~120 token) AAAK-like compression for AI session starts
  • MCP server — 8 tools for Claude Code / ChatGPT / Cursor integration
  • Mining — files (code, docs) and conversations (Claude, ChatGPT, Slack, plain text)
  • Auto-init — any command bootstraps the palace on first use
  • Zero LLM dependency for core features (everything works offline)

Install

Pre-built binaries

Download from Releases:

curl -fsSL https://github.com/snow-ghost/mem/releases/latest/download/mem-linux-amd64.tar.gz | tar xz
sudo mv mem-linux-amd64 /usr/local/bin/mem

From source

go install github.com/snow-ghost/mem/cmd/mem@latest

Docker

docker run --rm -v $(pwd):/project ghcr.io/snow-ghost/mem status

Requires Go 1.26+ only for building. At runtime, zero runtime dependencies.

Quick Start

# Initialize the palace (auto-creates ~/.mempalace/palace.db)
mem init

# Mine a project into the palace
mem mine ~/projects/myapp --wing myapp

# Mine conversation exports
mem mine ~/chats --mode convos --wing conversations

# Search across all memories
mem search "why did we switch to GraphQL"

# Filter by wing and room
mem search "auth decision" --wing myapp --room auth

# Compact context for AI session start
mem wake-up

# Knowledge graph operations
mem kg add Kai works_on Orion --from 2025-06-01
mem kg query Kai
mem kg timeline Orion
mem kg invalidate Kai works_on Orion --ended 2026-03-01

# Status overview
mem status

# Optional: semantic search with an OpenAI-compatible embeddings API
export MEM_EMBEDDINGS_URL=https://api.openai.com/v1/embeddings
export MEM_EMBEDDINGS_MODEL=text-embedding-3-small
export MEM_EMBEDDINGS_API_KEY=sk-...
# `mem mine` now auto-embeds new drawers; `mem reindex` covers older ones.
mem mine ~/projects/myapp --wing myapp        # auto-embeds new drawers
mem reindex                                    # one-shot for older drawers
mem search "auth decision" --mode hybrid       # BM25 + cosine via RRF
# (use --no-embed on `mem mine` to skip the embedding step)

# Optional: cross-encoder reranking on top of hybrid for stronger top-1
export MEM_RERANK_URL=https://your-endpoint/v1/rerank
export MEM_RERANK_MODEL=BAAI/bge-reranker-v2-m3

# Recency boost — favour newer drawers (great for changing facts)
mem search "current geo-targeting setting" --recency 0.5

# Query2Doc / HyDE — LLM writes pseudo-answer, embed + average with query
export MEM_LLM_URL=https://your-endpoint/v1/chat/completions
export MEM_LLM_MODEL=Qwen/Qwen3-Next-80B-A3B-Instruct
mem search "what should I cook tonight" --mode hybrid --query2doc

# Start MCP server (for Claude Code integration)
mem mcp

# Benchmark your config (BM25 / vector full-scan / HNSW / API latency)
mem benchmark --drawers 5000 --queries 200

How it works

Storage

Everything lives in a single SQLite database at ~/.mempalace/palace.db (override with MEM_PALACE env var). Schema includes:

  • wings, rooms, drawers, closets — palace hierarchy
  • search_terms, search_index, search_meta — BM25 inverted index
  • entities, triples — temporal knowledge graph

Search

Built-in BM25 Okapi implementation with our own inverted index:

  • Tokenization with stopword removal
  • TF computation with batch indexing (transactional)
  • Classic BM25 scoring (k1=1.5, b=0.75)
  • Filter by wing / room before scoring (for palace structure boost)

No vector embeddings required for the default mode — everything works offline.

Optional: semantic embeddings (hybrid search)

Set MEM_EMBEDDINGS_URL + MEM_EMBEDDINGS_MODEL (+ MEM_EMBEDDINGS_API_KEY) pointing at any OpenAI-compatible /v1/embeddings endpoint — OpenAI, Voyage AI, Cohere (compat mode), Together, Ollama, LM Studio, LocalAI, llama.cpp server. Once set, mem mine automatically embeds new drawers as it ingests them (opt out with --no-embed). mem reindex covers any older drawers that predate the embeddings provider. mem search --mode hybrid fuses BM25 + cosine similarity via weighted Reciprocal Rank Fusion (k=60). Pure vector search (--mode vector) is also available. Embeddings are stored as BLOBs in the same SQLite file — no second database. The entire feature is optional; unset vars = BM25-only behavior unchanged.

For stronger top-1 results, also set MEM_RERANK_URL + MEM_RERANK_MODEL (Cohere-compatible /v1/rerank endpoint, e.g. BAAI/bge-reranker-v2-m3). The MCP mem_search tool accepts a mode argument that selects between bm25, vector, and hybrid retrieval at call time.

For large palaces (any scale where you do repeated queries), pass --hnsw to mem search --mode vector to use a pure-Go HNSW index:

Drawers Full scan HNSW Speedup
1k (real DB) 5.7 ms 0.7 ms 8.1×
10k (in-mem) 2.99 ms 0.56 ms 5.3×
50k (in-mem) 17.96 ms 0.75 ms 24×

In the real-DB path the speedup is even larger than the in-memory microbench suggests because SearchVector decodes the SQLite BLOB on every query while HNSW decodes once during build. The HNSW index is persisted in the hnsw_cache SQLite table — first query after a mine pays the build cost (~700ms / 1k vectors), subsequent queries load the graph in milliseconds. Recall@10 on 1k random vectors: 98.6%.

Local CPU embeddings via llama.cpp

For an air-gapped or cost-controlled setup, run llama.cpp's HTTP server with a small embedding model. Tested with BAAI/bge-small-en-v1.5 (~30MB GGUF, ~50 emb/sec on a modern CPU):

# 1. Download a small embedding model (one-time)
huggingface-cli download CompendiumLabs/bge-small-en-v1.5-gguf bge-small-en-v1.5-q8_0.gguf \
  --local-dir ~/models/

# 2. Run llama-server in embedding mode (note: --embeddings is REQUIRED)
llama-server -m ~/models/bge-small-en-v1.5-q8_0.gguf \
  --embeddings --port 8091 --pooling mean

# 3. Point mem at it
export MEM_EMBEDDINGS_URL=http://localhost:8091/v1/embeddings
export MEM_EMBEDDINGS_MODEL=bge-small-en-v1.5-q8_0
# (no API key needed for local server)

# 4. Verify
mem benchmark --drawers 100 --queries 20

For reranking on CPU (~5× slower than embedding so use sparingly):

llama-server -m ~/models/bge-reranker-v2-m3-q4.gguf \
  --reranking --port 8092

export MEM_RERANK_URL=http://localhost:8092/v1/rerank
export MEM_RERANK_MODEL=bge-reranker-v2-m3

Notes:

  • A general chat model (LFM2.5, Qwen, Llama, etc.) loaded without --embeddings will return 501 not_supported_error. The flag is required and changes the server's pooling/output behavior.
  • For a chat LLM you'd need a separate inference server anyway — embedding work is best on a dedicated tiny model.
  • On CPU, dimensions matter: bge-small-en-v1.5 is 384-d (5× smaller than bge-m3's 1920-d), so HNSW build/search are also 5× faster.

Knowledge Graph

Entity-relationship triples with temporal validity:

  • add_triple(subject, predicate, object, valid_from, valid_to)
  • invalidate facts when they stop being true
  • query_entity with as_of date filtering
  • timeline for chronological entity story
  • Contradiction detection — flags conflicts when adding facts

4-Layer Memory Stack

  • L0 (Identity) — read from ~/.mempalace/identity.txt if present
  • L1 (Critical Facts) — auto-compressed AAAK-like summary from top drawers
  • L2 (On-demand) — filtered retrieval by wing/room
  • L3 (Deep Search) — full BM25 search across palace

mem wake-up outputs L0+L1 (~120-170 tokens) for AI session bootstrap.

MCP Integration

Register as an MCP server for Claude Code / ChatGPT / Cursor:

# Claude Code
claude mcp add mem -- mem mcp

# Available tools:
#   mem_search       — BM25 search with wing/room filters
#   mem_add_drawer   — Store content in the palace
#   mem_status       — Palace overview
#   mem_wake_up      — Compact context for AI
#   mem_kg_query     — Query the knowledge graph
#   mem_kg_add       — Add fact to the graph
#   mem_list_wings   — Enumerate wings
#   mem_list_rooms   — Enumerate rooms in a wing

Benchmarks

Evaluated on three public memory benchmarks. See benchmarks/README.md for reproduction steps.

LongMemEval (ICLR 2025) — 500 questions, 6 question types

On longmemeval_oracle.json. Two metrics: the answer-text heuristic (does any top-k result contain the answer string?) and the official session-id metric (is any top-k session in answer_session_ids?).

Config R@1 (heur / sid) R@5 (heur / sid) R@10 (heur / sid)
BM25 + stemming (local) 44.0 / 31.6 70.4 / 62.0 76.8 / 74.6
Hybrid RRF 0.7 (MiniLM local via llama.cpp) 47.2 / 37.0 72.0 / 65.2 79.4 / 76.8
Hybrid RRF 0.7 (bge-m3 cloud, historical) 48.0 / — 74.6 / — 79.2 / —
Hybrid + rerank bge-reranker-v2-m3 (cloud) 52.6 / — 74.6 / — 80.8 / —

Tokenizer applies Porter step 1a/1b stemming (+1.6 R@5 on BM25 alone, no external dependency). Hybrid adds weighted Reciprocal Rank Fusion (0.7 BM25 / 0.3 vector). Cross-encoder rerank (BAAI/bge-reranker-v2-m3) adds +4.6 R@1 for top-1-driven workflows.

BM25 and MiniLM-hybrid numbers above are from the current code on the local stack. The bge-m3 cloud rows are kept for reference — they require an external embeddings/rerank endpoint. See benchmarks/README.md for the full sweep, including L# Cache (R@5 77.2%), Query2Doc (R@5 77.8%), and the reproduction that closes the gap to MemPalace's 96.6% on longmemeval_s_cleaned.

LoCoMo (Snap Research) — 10 long-form conversations, 1986 QAs

Metric BM25 (offline) Hybrid (BM25 + bge-m3)
Recall@1 60.0% 59.0%
Recall@5 88.2% 88.6%
Recall@10 93.7% 95.6%
Avg query latency 1.7 ms 4.5 ms

Hybrid's win is concentrated where it matters most: multi-hop +7.3 pp (hardest category, 59.4 → 66.7) and single-hop +5.7 pp (80.5 → 86.2). R@10 jumps +1.9 pp — embeddings rescue evidence that fell out of BM25 top-5.

ConvoMem (Salesforce, arXiv 2511.10523) — 7,021 test cases, sizes 1–6

Metric Value
Recall@1 100.0%
Recall@5 100.0%
Avg query latency 1.4 ms

Confirms the ConvoMem paper's thesis: "your first 150 conversations don't need RAG". BM25 alone is sufficient at small haystacks. The harder regime (50–300 conversations with value-change tracking) is left as future work.

Architecture

cmd/mem/               CLI entry
internal/
  config/              Configuration (env vars, paths)
  db/                  SQLite schema + connection
  palace/              Wings, rooms, drawers, tunnels
  search/              BM25 + vector + hybrid (RRF) search, Porter stemmer,
                       HNSW index, heuristic question classifier
  embeddings/          Optional OpenAI-compatible client + blob serializer
  rerank/              Optional Cohere-compatible cross-encoder client
  kg/                  Temporal knowledge graph + contradiction detection
  layers/              4-layer memory stack (L0 identity, L1 compression, wake-up)
  miner/               File and conversation mining (Claude JSONL, ChatGPT, Slack, plain text)
  mcp/                 MCP server with 8 tools
benchmarks/
  longmemeval/         LongMemEval harness
  locomo/              LoCoMo harness
  convomem/            ConvoMem harness

Dependencies

Only 2 external dependencies (pure Go, no CGo):

  • modernc.org/sqlite — pure-Go SQLite driver (no CGo = static binary)
  • github.com/modelcontextprotocol/go-sdk — official MCP SDK

Everything else is Go stdlib.

Previous code (LLM-dependent memory companion)

The previous version of mem (LLM-dependent extraction/consolidation with Claude/OpenCode/Codex backends) lives at github.com/snow-ghost/mem-agent.

License

MIT

About

mem is a persistent memory companion for AI coding agents. It captures significant events after each session, extracts reusable principles and procedural skills through periodic consolidation, and injects relevant context at session start. File-based, human-readable, zero dependencies. Built for Claude Code, Codex & OpenCode.

Resources

License

Stars

Watchers

Forks

Contributors