claude-memory

Persistent hybrid-search memory system for Claude Code. Replaces markdown-based memory with selective retrieval, cross-project search, and agent-scoped memories that scale without context cost.

Architecture

SQLite — single-file backing store, portable, zero config
Tantivy — BM25 full-text search (file paths, function names, exact patterns)
fastembed-rs — local embeddings via all-MiniLM-L6-v2 ONNX model (semantic similarity, no API calls)
Hybrid ranking — BM25 + cosine similarity combined via Reciprocal Rank Fusion (RRF)
MCP server — stdio JSON-RPC server for native Claude Code tool integration
CLI — direct command-line interface for humans and scripts

Build

cargo build --release

Binary: target/release/claude-memory.exe (~33MB)

First run downloads the embedding model (~80MB, cached in ~/.claude/memory/models/).

Setup

1. Register the MCP server

claude mcp add claude-memory -- /path/to/claude-memory serve

Or add manually to ~/.claude.json:

{
  "mcpServers": {
    "claude-memory": {
      "type": "stdio",
      "command": "/path/to/claude-memory",
      "args": ["serve"]
    }
  }
}

This gives Claude Code six native tools: memory_store, memory_search, memory_recall, memory_forget, memory_prune, memory_context.

2. Install skills (optional)

Copy the skill directories to your Claude Code skills location:

# Personal skills (available in all projects)
cp -r skills/remember ~/.claude/skills/remember
cp -r skills/recall ~/.claude/skills/recall

# Or project-level skills (this project only)
cp -r skills/remember .claude/skills/remember
cp -r skills/recall .claude/skills/recall

This enables /remember and /recall slash commands that wrap the MCP tools with natural language handling.

Usage

CLI

# Store a memory
claude-memory store "User prefers terse responses" --tags "preference" --project "myapp" -m feedback

# Hybrid search (BM25 + vector)
claude-memory search "how does testing work"

# Filter by project/agent/tags
claude-memory recall --project myapp --memory-type feedback

# Task-relevant context
claude-memory context "refactoring the auth middleware" -k 5

# Delete by ID or search
claude-memory forget --id <uuid>
claude-memory forget --query "outdated preference"

# Clean up stale memories
claude-memory prune --max-age-days 90 --dry-run
claude-memory prune --max-age-days 90

Skills

/remember the user prefers single bundled PRs for refactors in this area
/recall testing preferences

MCP tools (used by Claude directly)

When registered as an MCP server, Claude Code can call these tools natively:

Tool	Purpose
`memory_store`	Save memory with auto-embedding + BM25 indexing
`memory_search`	Hybrid BM25 + vector search, returns ranked results
`memory_recall`	Filter by project/agent/tags/type
`memory_forget`	Remove specific memories
`memory_prune`	Decay stale/low-access memories
`memory_context`	Return top-K relevant memories for a task description

Storage

All data lives in ~/.claude/memory/:

~/.claude/memory/
  memory.db      # SQLite database (source of truth)
  tantivy/       # BM25 search index (rebuildable from SQLite)
  models/        # ONNX model cache (downloaded on first use)

Override with CLAUDE_MEMORY_DIR environment variable.

Memory types

Type	Purpose
`user`	Facts about the user — role, preferences, expertise
`feedback`	How to approach work — corrections and confirmed approaches
`project`	Ongoing work context — decisions, deadlines, constraints
`reference`	Pointers to external resources — URLs, dashboards, systems

How search works

Every query runs through two retrieval paths simultaneously:

BM25 (Tantivy) — term-frequency keyword matching, great for exact names and patterns
Vector (fastembed cosine similarity) — semantic similarity, great for "I vaguely remember something about..."

Results are combined via Reciprocal Rank Fusion (k=60), which merges ranked lists without requiring score normalization. A memory that ranks well in both paths gets a strong combined score.

Design decisions

SQLite is the source of truth. The Tantivy index is a derived cache — if corrupted, it rebuilds from SQLite on next startup.
Embeddings are brute-force cosine. For a personal memory system (<100K memories), this is fast enough and avoids ANN index complexity.
Model loads lazily. Commands that don't need embeddings (e.g., recall, forget --id) skip the ~200ms model load.
Access counts track usage. Every retrieval increments access_count, enabling prune to identify stale memories.
All logging goes to stderr. Stdout is reserved for JSON results (CLI) or JSON-RPC (MCP), so logging never pollutes the transport.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
skills		skills
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
build.bat		build.bat
build.sh		build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

claude-memory

Architecture

Build

Setup

1. Register the MCP server

2. Install skills (optional)

Usage

CLI

Skills

MCP tools (used by Claude directly)

Storage

Memory types

How search works

Design decisions

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

claude-memory

Architecture

Build

Setup

1. Register the MCP server

2. Install skills (optional)

Usage

CLI

Skills

MCP tools (used by Claude directly)

Storage

Memory types

How search works

Design decisions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages