Your AI coding agent finally remembers things. RAM indexes your Claude Code (and other assistant) session transcripts into a local hybrid search database and gives the agent access to that history via MCP — so architectural decisions, bug patterns, and project conventions persist across sessions automatically.
No cloud. No daemons. No external services. A single ram binary, ~640 KB, that runs
entirely on your machine.
- Zero infrastructure — embedded LanceDB, local ONNX embeddings (80–274 MB depending on config), no external services
- Hybrid search — weighted BM25 + vector similarity fused via RRF, cross-encoder reranking enabled by default
- Temporal decay — recent memories rank higher; configurable half-life (default 30 days)
- Single binary — one
ramexecutable handles MCP server, indexer, CLI, and hooks - Automatic indexing — file watcher indexes new transcript content with a 5-minute cooldown on active sessions
- Explicit memorization — agents can write permanent memories via the
memorizeMCP tool - Async enrichment — optional background classifier enriches tags and metadata without blocking
- OpenTelemetry — traces and metrics exported to any OTLP endpoint
- CLI parity — every MCP capability is also available as a CLI command with
--output json
Without RAM:
You (Day 1): "Let's use LanceDB for the vector store"
Claude: "Good choice! I'll implement it."
[implements LanceDB integration]
You (Day 5, new session): "Why did we choose LanceDB?"
Claude: "I don't have context from previous sessions. Can you remind me?"
With RAM:
You (Day 1): "Let's use LanceDB for the vector store"
Claude: "Good choice! I'll implement it."
[Claude calls memorize: "Chose LanceDB over Qdrant because..."]
You (Day 5, new session): "Why did we choose LanceDB?"
Claude: "Based on our decision from 4 days ago, we chose LanceDB because
it embeds in-process with no daemon, has native BM25 support, and..."
Without RAM:
You (Week 1): "The async task is panicking"
Claude: [debugs, finds root cause, fixes]
You (Week 3): "Different async task is panicking"
Claude: [starts from scratch, doesn't remember similar bug]
With RAM:
You (Week 1): "The async task is panicking"
Claude: [debugs, finds root cause, fixes]
You (Week 3): "Different async task is panicking"
Claude: "This looks similar to the issue we fixed 2 weeks ago where
the Arc clone was inside the closure. Let me check if..."
[applies same fix pattern]
Without RAM:
You: "Use thiserror for library errors"
Claude: [implements with thiserror]
[New session]
You: "Add error handling to this function"
Claude: [uses anyhow instead of thiserror]
You: "No, we use thiserror in libraries"
Claude: "Sorry, I'll fix that"
With RAM:
You: "Use thiserror for library errors"
Claude: [implements with thiserror]
[Claude calls memorize: "Convention: thiserror for libraries, anyhow for binaries"]
[New session]
You: "Add error handling to this function"
Claude: "Based on our project convention, I'll use thiserror since
this is library code..."
Without RAM:
[Project A] You: "How do we handle rate limiting?"
Claude: [implements solution]
[Project B, new session] You: "How do we handle rate limiting?"
Claude: [implements different solution, doesn't remember Project A]
With RAM (global scope):
[Project A] You: "How do we handle rate limiting?"
Claude: [implements solution]
[Claude calls memorize with scope="global"]
[Project B] You: "How do we handle rate limiting?"
Claude: "I remember we implemented a token bucket rate limiter in
Project A. Would you like me to use the same pattern here?"
Good fit:
- Multi-session projects where context accumulates over days/weeks
- Projects with explicit architectural decisions worth remembering
- Teams with coding conventions that agents should follow
- Cross-project pattern reuse (with global scope)
Not needed:
- Single-session throwaway scripts
- Projects where all context fits in a single prompt
- When you prefer agents to start fresh each session
| Assistant | MCP Support | Transcript Indexing | Notes |
|---|---|---|---|
| Claude Code | ✅ Full | ✅ Automatic | All platforms; subagent sessions indexed |
| OpenCode | ✅ Automatic | SQLite DB polled every 5 min | |
| IBM Bob | ❌ CLI only | ✅ Automatic | Native format + VS Code extension |
| Cline / Roo Cline | ❌ CLI only | ✅ Automatic | VS Code globalStorage layout |
| OpenAI Codex | ❌ CLI only | ✅ Automatic | |
| Zed | ❌ CLI only | ✅ Automatic | Markdown conversation format |
| Aider | ❌ CLI only | ✅ Automatic | Chat history indexing |
| Cursor | ❌ Not yet | ❌ | Parser not implemented |
| GitHub Copilot | ❌ Not yet | ❌ | Parser not implemented |
Legend:
- ✅ Full: Complete MCP integration with all tools available
⚠️ TBD: MCP support status unclear, check integration guide- ❌ CLI only: No MCP support, use
ram search/ram memorizefrom the terminal
Download the latest release for your platform:
# macOS / Linux (via install script)
curl -sSf https://github.com/planetf1/ramem/releases/latest/download/ramem-installer.sh | sh
# Or download manually from:
# https://github.com/planetf1/ramem/releasesThe ram binary is approximately 640 KB (static build). Embedding models (~80 MB total) are downloaded on first run by fastembed.
Resource usage at runtime:
RAM uses a smart daemon model: the first ram serve call starts one real daemon process;
subsequent calls from other sessions attach as lightweight stdio proxies (~11 MB each).
Only the daemon loads the models and the LanceDB index.
Daemon RSS breakdown with default config (BGESmallENV15 + JinaRerankerV1TurboEn reranker enabled):
| Component | Approx RSS |
|---|---|
Embedding model in ORT (BGESmallENV15, 128 MB on disk) |
~250 MB |
Reranker model in ORT (JinaRerankerV1TurboEn, 146 MB on disk) |
~280 MB |
| LanceDB Arrow buffers (scales with index size) | ~2–4× on-disk size |
| Tokio runtime + allocator overhead | ~50 MB |
A typical installation with 100 K+ indexed chunks and both models loaded lands around 1–2 GB RSS — this is expected, not a leak.
To reduce memory footprint:
- Disable the cross-encoder reranker (saves ~280 MB, biggest single change):
[search] reranker_enabled = false
- Reduce the reranker candidate pool (less Arrow allocation per query, saves ~100–200 MB at peak):
reranker_candidate_pool = 20 # default is 50
- Use a smaller embedding model (
AllMiniLML6V2, 80 MB on disk vs 128 MB):After changing the model run[embedding] model = "AllMiniLML6V2"
ram wipe && ram initto re-index. - Use an external embedding server (no ONNX runtime in-process, ~50 MB for the daemon):
[embedding] backend = "local_api" local_api_url = "http://127.0.0.1:8095/v1" local_api_model = "your-model"
- Prune the index — a large number of indexed chunks (check with
ram status) drives LanceDB memory use. Runram wipe && ram initafter narrowingindex.transcript_pathsto recent content.
brew install planetf1/tap/ramemcargo install ramem# Or build from source
git clone https://github.com/planetf1/ramem
cd ramem
cargo build --release
# binary at target/release/ramChoose your AI coding assistant for detailed setup instructions:
- Claude Code — Full MCP integration with automatic hooks
- OpenCode — Transcript indexing with potential MCP support
- Zed — CLI-based workflow with markdown conversation parsing
- Aider — CLI-based workflow with chat history indexing
ram initThis scans well-known transcript locations, indexes existing content, and starts a background file watcher.
ram search "how did we fix the async error"
ram search "lancedb schema decisions" --output jsonSee your assistant's integration guide for MCP server configuration.
Once connected as an MCP server, the following tools are available to the agent:
Search indexed memories using hybrid BM25 + semantic similarity with temporal decay.
| Parameter | Type | Description |
|---|---|---|
query |
string |
Natural language query |
limit |
integer? |
Max results (default 5, max 50) |
scope |
string? |
Filter by scope (global, project:<name>) |
source |
string? |
Filter by indexed or explicit |
Store an explicit memory entry directly (bypasses passive indexing).
| Parameter | Type | Description |
|---|---|---|
concept |
string |
Content to store |
tags |
string[] |
Classification tags |
importance |
float? |
0.0–1.0, default 0.8 |
ram init Scan well-known locations, bulk-index, set up watcher
ram serve Start MCP stdio server
ram index Index transcripts (--watch for continuous, --path for custom dir)
ram search <query> Search memories (--limit N, --scope S, --output json|text)
ram memorize <text> Store explicit memory (--tags t1,t2 --importance 0.9)
ram status Index statistics and config summary (--output json|text)
ram backup Create dated .tar.gz snapshot (--dest <dir>)
ram restore Restore from snapshot (--file <path>)
ram config show Print current configuration
ram config set Update a configuration key
ram hook Claude Code hook handlers (session-start, post-tool-use, session-end)
Configuration file: ~/.config/ram/config.toml
[embedding]
backend = "fastembed" # fastembed | local_api | auto
model = "AllMiniLML6V2" # ~80 MB, downloads on first run
init_timeout_secs = 300 # Timeout for model download (default: 5 minutes)
embed_timeout_secs = 30 # Timeout for individual embed operations (default: 30 seconds)
[search]
rrf_k = 60 # Reciprocal Rank Fusion constant
decay_half_life_days = 7.0
[augmentation]
enabled = true
llm_endpoint = "" # optional: Ollama/MLX URL for richer classification
[otel]
endpoint = "" # OTLP gRPC endpoint; empty = disabledSee docs/spec.md for the full configuration reference.
For better embedding quality on Apple Silicon, point RAM at a local MLX or Ollama server:
[embedding]
backend = "local_api"
local_api_url = "http://127.0.0.1:8095/v1"
local_api_model = "qwen3-embedding"Any OpenAI-compatible /v1/embeddings endpoint works.
RAM automatically discovers transcripts in:
| Platform | Path |
|---|---|
| macOS / Linux | ~/.claude/projects/**/*.jsonl |
| macOS (legacy) | ~/Library/Application Support/claude-code/transcripts/**/*.jsonl |
| Linux (legacy) | ~/.local/share/claude-code/**/*.jsonl |
Additional paths can be added via ram config set index.transcript_paths '["/extra/path"]'.
- FAQ — Frequently asked questions about privacy, performance, and usage
- Technical Specification — Authoritative requirements, schema, and API reference
- Architecture — Threading model, dataflow, concurrency, and failure domains
- Troubleshooting — Common issues and solutions
- Comparisons — How RAM compares to other memory systems
We welcome contributions from the community! Please review our community guidelines:
- Contributing Guide — Development setup, code standards, and PR workflow
- Code of Conduct — Community standards and enforcement
- Security Policy — How to report vulnerabilities
For technical implementation details, see AGENTS.md.
RAM is positioned against a real ecosystem of memory / transcript-indexing tools.
See docs/comparisons.md for a full table covering Anthropic's
official offerings, community projects (rmcp-memex, opencode-memsearch,
code-session-memory, etc.), and state-of-the-art agent memory systems (Zep, Mem0,
Letta, Cognee).
Apache-2.0