A pure Go memory palace system inspired by MemPalace, rewritten from scratch with zero Python dependencies. Single static binary, single SQLite file, no LLM required for core features.
mem organizes knowledge into a navigable palace structure:
Wing (person/project) → Hall (facts/events/...) → Room (topic) → Drawer (verbatim content)
Features:
- Palace structure — wings, halls, rooms, drawers, tunnels (cross-wing links)
- BM25 semantic search — pure Go implementation, no embeddings needed
- Temporal knowledge graph — entity-relationship triples with validity windows + contradiction detection
- 4-layer memory stack — L0 Identity → L1 Critical Facts → L2 On-demand → L3 Deep Search
- Wake-up context — compact (~120 token) AAAK-like compression for AI session starts
- MCP server — 8 tools for Claude Code / ChatGPT / Cursor integration
- Mining — files (code, docs) and conversations (Claude, ChatGPT, Slack, plain text)
- Auto-init — any command bootstraps the palace on first use
- Zero LLM dependency for core features (everything works offline)
Download from Releases:
curl -fsSL https://github.com/snow-ghost/mem/releases/latest/download/mem-linux-amd64.tar.gz | tar xz
sudo mv mem-linux-amd64 /usr/local/bin/memgo install github.com/snow-ghost/mem/cmd/mem@latestdocker run --rm -v $(pwd):/project ghcr.io/snow-ghost/mem statusRequires Go 1.26+ only for building. At runtime, zero runtime dependencies.
# Initialize the palace (auto-creates ~/.mempalace/palace.db)
mem init
# Mine a project into the palace
mem mine ~/projects/myapp --wing myapp
# Mine conversation exports
mem mine ~/chats --mode convos --wing conversations
# Search across all memories
mem search "why did we switch to GraphQL"
# Filter by wing and room
mem search "auth decision" --wing myapp --room auth
# Compact context for AI session start
mem wake-up
# Knowledge graph operations
mem kg add Kai works_on Orion --from 2025-06-01
mem kg query Kai
mem kg timeline Orion
mem kg invalidate Kai works_on Orion --ended 2026-03-01
# Status overview
mem status
# Optional: semantic search with an OpenAI-compatible embeddings API
export MEM_EMBEDDINGS_URL=https://api.openai.com/v1/embeddings
export MEM_EMBEDDINGS_MODEL=text-embedding-3-small
export MEM_EMBEDDINGS_API_KEY=sk-...
# `mem mine` now auto-embeds new drawers; `mem reindex` covers older ones.
mem mine ~/projects/myapp --wing myapp # auto-embeds new drawers
mem reindex # one-shot for older drawers
mem search "auth decision" --mode hybrid # BM25 + cosine via RRF
# (use --no-embed on `mem mine` to skip the embedding step)
# Optional: cross-encoder reranking on top of hybrid for stronger top-1
export MEM_RERANK_URL=https://your-endpoint/v1/rerank
export MEM_RERANK_MODEL=BAAI/bge-reranker-v2-m3
# Recency boost — favour newer drawers (great for changing facts)
mem search "current geo-targeting setting" --recency 0.5
# Query2Doc / HyDE — LLM writes pseudo-answer, embed + average with query
export MEM_LLM_URL=https://your-endpoint/v1/chat/completions
export MEM_LLM_MODEL=Qwen/Qwen3-Next-80B-A3B-Instruct
mem search "what should I cook tonight" --mode hybrid --query2doc
# Start MCP server (for Claude Code integration)
mem mcp
# Benchmark your config (BM25 / vector full-scan / HNSW / API latency)
mem benchmark --drawers 5000 --queries 200Everything lives in a single SQLite database at ~/.mempalace/palace.db
(override with MEM_PALACE env var). Schema includes:
wings,rooms,drawers,closets— palace hierarchysearch_terms,search_index,search_meta— BM25 inverted indexentities,triples— temporal knowledge graph
Built-in BM25 Okapi implementation with our own inverted index:
- Tokenization with stopword removal
- TF computation with batch indexing (transactional)
- Classic BM25 scoring (k1=1.5, b=0.75)
- Filter by wing / room before scoring (for palace structure boost)
No vector embeddings required for the default mode — everything works offline.
Set MEM_EMBEDDINGS_URL + MEM_EMBEDDINGS_MODEL (+ MEM_EMBEDDINGS_API_KEY)
pointing at any OpenAI-compatible /v1/embeddings endpoint — OpenAI, Voyage AI,
Cohere (compat mode), Together, Ollama, LM Studio, LocalAI, llama.cpp server.
Once set, mem mine automatically embeds new drawers as it ingests them
(opt out with --no-embed). mem reindex covers any older drawers that
predate the embeddings provider. mem search --mode hybrid fuses BM25 +
cosine similarity via weighted Reciprocal Rank Fusion (k=60). Pure vector
search (--mode vector) is also available. Embeddings are stored as BLOBs
in the same SQLite file — no second database. The entire feature is
optional; unset vars = BM25-only behavior unchanged.
For stronger top-1 results, also set MEM_RERANK_URL + MEM_RERANK_MODEL
(Cohere-compatible /v1/rerank endpoint, e.g. BAAI/bge-reranker-v2-m3).
The MCP mem_search tool accepts a mode argument that selects between
bm25, vector, and hybrid retrieval at call time.
For large palaces (any scale where you do repeated queries), pass --hnsw
to mem search --mode vector to use a pure-Go HNSW index:
| Drawers | Full scan | HNSW | Speedup |
|---|---|---|---|
| 1k (real DB) | 5.7 ms | 0.7 ms | 8.1× |
| 10k (in-mem) | 2.99 ms | 0.56 ms | 5.3× |
| 50k (in-mem) | 17.96 ms | 0.75 ms | 24× |
In the real-DB path the speedup is even larger than the in-memory
microbench suggests because SearchVector decodes the SQLite BLOB on
every query while HNSW decodes once during build. The HNSW index is
persisted in the hnsw_cache SQLite table — first query after a
mine pays the build cost (~700ms / 1k vectors), subsequent queries
load the graph in milliseconds. Recall@10 on 1k random vectors: 98.6%.
For an air-gapped or cost-controlled setup, run llama.cpp's
HTTP server with a small embedding model. Tested with BAAI/bge-small-en-v1.5
(~30MB GGUF, ~50 emb/sec on a modern CPU):
# 1. Download a small embedding model (one-time)
huggingface-cli download CompendiumLabs/bge-small-en-v1.5-gguf bge-small-en-v1.5-q8_0.gguf \
--local-dir ~/models/
# 2. Run llama-server in embedding mode (note: --embeddings is REQUIRED)
llama-server -m ~/models/bge-small-en-v1.5-q8_0.gguf \
--embeddings --port 8091 --pooling mean
# 3. Point mem at it
export MEM_EMBEDDINGS_URL=http://localhost:8091/v1/embeddings
export MEM_EMBEDDINGS_MODEL=bge-small-en-v1.5-q8_0
# (no API key needed for local server)
# 4. Verify
mem benchmark --drawers 100 --queries 20For reranking on CPU (~5× slower than embedding so use sparingly):
llama-server -m ~/models/bge-reranker-v2-m3-q4.gguf \
--reranking --port 8092
export MEM_RERANK_URL=http://localhost:8092/v1/rerank
export MEM_RERANK_MODEL=bge-reranker-v2-m3Notes:
- A general chat model (LFM2.5, Qwen, Llama, etc.) loaded without
--embeddingswill return501 not_supported_error. The flag is required and changes the server's pooling/output behavior. - For a chat LLM you'd need a separate inference server anyway — embedding work is best on a dedicated tiny model.
- On CPU, dimensions matter:
bge-small-en-v1.5is 384-d (5× smaller thanbge-m3's 1920-d), so HNSW build/search are also 5× faster.
Entity-relationship triples with temporal validity:
add_triple(subject, predicate, object, valid_from, valid_to)invalidatefacts when they stop being truequery_entitywithas_ofdate filteringtimelinefor chronological entity story- Contradiction detection — flags conflicts when adding facts
- L0 (Identity) — read from
~/.mempalace/identity.txtif present - L1 (Critical Facts) — auto-compressed AAAK-like summary from top drawers
- L2 (On-demand) — filtered retrieval by wing/room
- L3 (Deep Search) — full BM25 search across palace
mem wake-up outputs L0+L1 (~120-170 tokens) for AI session bootstrap.
Register as an MCP server for Claude Code / ChatGPT / Cursor:
# Claude Code
claude mcp add mem -- mem mcp
# Available tools:
# mem_search — BM25 search with wing/room filters
# mem_add_drawer — Store content in the palace
# mem_status — Palace overview
# mem_wake_up — Compact context for AI
# mem_kg_query — Query the knowledge graph
# mem_kg_add — Add fact to the graph
# mem_list_wings — Enumerate wings
# mem_list_rooms — Enumerate rooms in a wingEvaluated on three public memory benchmarks. See benchmarks/README.md
for reproduction steps.
On longmemeval_oracle.json. Two metrics: the answer-text heuristic
(does any top-k result contain the answer string?) and the official
session-id metric (is any top-k session in answer_session_ids?).
| Config | R@1 (heur / sid) | R@5 (heur / sid) | R@10 (heur / sid) |
|---|---|---|---|
| BM25 + stemming (local) | 44.0 / 31.6 | 70.4 / 62.0 | 76.8 / 74.6 |
| Hybrid RRF 0.7 (MiniLM local via llama.cpp) | 47.2 / 37.0 | 72.0 / 65.2 | 79.4 / 76.8 |
| Hybrid RRF 0.7 (bge-m3 cloud, historical) | 48.0 / — | 74.6 / — | 79.2 / — |
| Hybrid + rerank bge-reranker-v2-m3 (cloud) | 52.6 / — | 74.6 / — | 80.8 / — |
Tokenizer applies Porter step 1a/1b stemming (+1.6 R@5 on BM25 alone,
no external dependency). Hybrid adds weighted Reciprocal Rank Fusion
(0.7 BM25 / 0.3 vector). Cross-encoder rerank (BAAI/bge-reranker-v2-m3)
adds +4.6 R@1 for top-1-driven workflows.
BM25 and MiniLM-hybrid numbers above are from the current code on the
local stack. The bge-m3 cloud rows are kept for reference — they
require an external embeddings/rerank endpoint. See
benchmarks/README.md for the full sweep,
including L# Cache (R@5 77.2%), Query2Doc (R@5 77.8%), and the
reproduction that closes the gap to MemPalace's 96.6% on
longmemeval_s_cleaned.
| Metric | BM25 (offline) | Hybrid (BM25 + bge-m3) |
|---|---|---|
| Recall@1 | 60.0% | 59.0% |
| Recall@5 | 88.2% | 88.6% |
| Recall@10 | 93.7% | 95.6% |
| Avg query latency | 1.7 ms | 4.5 ms |
Hybrid's win is concentrated where it matters most: multi-hop +7.3 pp (hardest category, 59.4 → 66.7) and single-hop +5.7 pp (80.5 → 86.2). R@10 jumps +1.9 pp — embeddings rescue evidence that fell out of BM25 top-5.
| Metric | Value |
|---|---|
| Recall@1 | 100.0% |
| Recall@5 | 100.0% |
| Avg query latency | 1.4 ms |
Confirms the ConvoMem paper's thesis: "your first 150 conversations don't need RAG". BM25 alone is sufficient at small haystacks. The harder regime (50–300 conversations with value-change tracking) is left as future work.
cmd/mem/ CLI entry
internal/
config/ Configuration (env vars, paths)
db/ SQLite schema + connection
palace/ Wings, rooms, drawers, tunnels
search/ BM25 + vector + hybrid (RRF) search, Porter stemmer,
HNSW index, heuristic question classifier
embeddings/ Optional OpenAI-compatible client + blob serializer
rerank/ Optional Cohere-compatible cross-encoder client
kg/ Temporal knowledge graph + contradiction detection
layers/ 4-layer memory stack (L0 identity, L1 compression, wake-up)
miner/ File and conversation mining (Claude JSONL, ChatGPT, Slack, plain text)
mcp/ MCP server with 8 tools
benchmarks/
longmemeval/ LongMemEval harness
locomo/ LoCoMo harness
convomem/ ConvoMem harness
Only 2 external dependencies (pure Go, no CGo):
modernc.org/sqlite— pure-Go SQLite driver (no CGo = static binary)github.com/modelcontextprotocol/go-sdk— official MCP SDK
Everything else is Go stdlib.
The previous version of mem (LLM-dependent extraction/consolidation with Claude/OpenCode/Codex backends)
lives at github.com/snow-ghost/mem-agent.
MIT