ramem (RAM) — AI Agent Memory

Your AI coding agent finally remembers things. RAM indexes your Claude Code (and other assistant) session transcripts into a local hybrid search database and gives the agent access to that history via MCP — so architectural decisions, bug patterns, and project conventions persist across sessions automatically.

No cloud. No daemons. No external services. A single ram binary, ~640 KB, that runs entirely on your machine.

Features

Zero infrastructure — embedded LanceDB, local ONNX embeddings (80–274 MB depending on config), no external services
Hybrid search — weighted BM25 + vector similarity fused via RRF, cross-encoder reranking enabled by default
Temporal decay — recent memories rank higher; configurable half-life (default 30 days)
Single binary — one ram executable handles MCP server, indexer, CLI, and hooks
Automatic indexing — file watcher indexes new transcript content with a 5-minute cooldown on active sessions
Explicit memorization — agents can write permanent memories via the memorize MCP tool
Async enrichment — optional background classifier enriches tags and metadata without blocking
OpenTelemetry — traces and metrics exported to any OTLP endpoint
CLI parity — every MCP capability is also available as a CLI command with --output json

Use Cases

Architectural Decisions

Without RAM:

You (Day 1): "Let's use LanceDB for the vector store"
Claude: "Good choice! I'll implement it."
[implements LanceDB integration]

You (Day 5, new session): "Why did we choose LanceDB?"
Claude: "I don't have context from previous sessions. Can you remind me?"

With RAM:

You (Day 1): "Let's use LanceDB for the vector store"
Claude: "Good choice! I'll implement it."
[Claude calls memorize: "Chose LanceDB over Qdrant because..."]

You (Day 5, new session): "Why did we choose LanceDB?"
Claude: "Based on our decision from 4 days ago, we chose LanceDB because
it embeds in-process with no daemon, has native BM25 support, and..."

Bug Pattern Recognition

Without RAM:

You (Week 1): "The async task is panicking"
Claude: [debugs, finds root cause, fixes]

You (Week 3): "Different async task is panicking"
Claude: [starts from scratch, doesn't remember similar bug]

With RAM:

You (Week 1): "The async task is panicking"
Claude: [debugs, finds root cause, fixes]

You (Week 3): "Different async task is panicking"
Claude: "This looks similar to the issue we fixed 2 weeks ago where
the Arc clone was inside the closure. Let me check if..."
[applies same fix pattern]

Project Conventions

Without RAM:

You: "Use thiserror for library errors"
Claude: [implements with thiserror]

[New session]
You: "Add error handling to this function"
Claude: [uses anyhow instead of thiserror]
You: "No, we use thiserror in libraries"
Claude: "Sorry, I'll fix that"

With RAM:

You: "Use thiserror for library errors"
Claude: [implements with thiserror]
[Claude calls memorize: "Convention: thiserror for libraries, anyhow for binaries"]

[New session]
You: "Add error handling to this function"
Claude: "Based on our project convention, I'll use thiserror since
this is library code..."

Cross-Project Learning

Without RAM:

[Project A] You: "How do we handle rate limiting?"
Claude: [implements solution]

[Project B, new session] You: "How do we handle rate limiting?"
Claude: [implements different solution, doesn't remember Project A]

With RAM (global scope):

[Project A] You: "How do we handle rate limiting?"
Claude: [implements solution]
[Claude calls memorize with scope="global"]

[Project B] You: "How do we handle rate limiting?"
Claude: "I remember we implemented a token bucket rate limiter in
Project A. Would you like me to use the same pattern here?"

When to Use RAM

Good fit:

Multi-session projects where context accumulates over days/weeks
Projects with explicit architectural decisions worth remembering
Teams with coding conventions that agents should follow
Cross-project pattern reuse (with global scope)

Not needed:

Single-session throwaway scripts
Projects where all context fits in a single prompt
When you prefer agents to start fresh each session

Supported AI Coding Assistants

Assistant	MCP Support	Transcript Indexing	Notes
Claude Code	✅ Full	✅ Automatic	All platforms; subagent sessions indexed
OpenCode	⚠️ TBD	✅ Automatic	SQLite DB polled every 5 min
IBM Bob	❌ CLI only	✅ Automatic	Native format + VS Code extension
Cline / Roo Cline	❌ CLI only	✅ Automatic	VS Code globalStorage layout
OpenAI Codex	❌ CLI only	✅ Automatic
Zed	❌ CLI only	✅ Automatic	Markdown conversation format
Aider	❌ CLI only	✅ Automatic	Chat history indexing
Cursor	❌ Not yet	❌	Parser not implemented
GitHub Copilot	❌ Not yet	❌	Parser not implemented

Legend:

✅ Full: Complete MCP integration with all tools available
⚠️ TBD: MCP support status unclear, check integration guide
❌ CLI only: No MCP support, use ram search / ram memorize from the terminal

Installation

Prebuilt Binaries (Recommended)

Download the latest release for your platform:

# macOS / Linux (via install script)
curl -sSf https://github.com/planetf1/ramem/releases/latest/download/ramem-installer.sh | sh

# Or download manually from:
# https://github.com/planetf1/ramem/releases

The ram binary is approximately 640 KB (static build). Embedding models (~80 MB total) are downloaded on first run by fastembed.

Resource usage at runtime:

RAM uses a smart daemon model: the first ram serve call starts one real daemon process; subsequent calls from other sessions attach as lightweight stdio proxies (~11 MB each). Only the daemon loads the models and the LanceDB index.

Daemon RSS breakdown with default config (BGESmallENV15 + JinaRerankerV1TurboEn reranker enabled):

Component	Approx RSS
Embedding model in ORT (`BGESmallENV15`, 128 MB on disk)	~250 MB
Reranker model in ORT (`JinaRerankerV1TurboEn`, 146 MB on disk)	~280 MB
LanceDB Arrow buffers (scales with index size)	~2–4× on-disk size
Tokio runtime + allocator overhead	~50 MB

A typical installation with 100 K+ indexed chunks and both models loaded lands around 1–2 GB RSS — this is expected, not a leak.

To reduce memory footprint:

Disable the cross-encoder reranker (saves ~280 MB, biggest single change):
```
[search]
reranker_enabled = false
```
Reduce the reranker candidate pool (less Arrow allocation per query, saves ~100–200 MB at peak):
```
reranker_candidate_pool = 20   # default is 50
```
Use a smaller embedding model (AllMiniLML6V2, 80 MB on disk vs 128 MB):
```
[embedding]
model = "AllMiniLML6V2"
```
After changing the model run ram wipe && ram init to re-index.

Use an external embedding server (no ONNX runtime in-process, ~50 MB for the daemon):

[embedding]
backend = "local_api"
local_api_url = "http://127.0.0.1:8095/v1"
local_api_model = "your-model"

Prune the index — a large number of indexed chunks (check with ram status) drives LanceDB memory use. Run ram wipe && ram init after narrowing index.transcript_paths to recent content.

Homebrew (macOS / Linux)

brew install planetf1/tap/ramem

crates.io

cargo install ramem

From Source

# Or build from source
git clone https://github.com/planetf1/ramem
cd ramem
cargo build --release
# binary at target/release/ram

Quick Start

Choose your AI coding assistant for detailed setup instructions:

Claude Code — Full MCP integration with automatic hooks
OpenCode — Transcript indexing with potential MCP support
Zed — CLI-based workflow with markdown conversation parsing
Aider — CLI-based workflow with chat history indexing

General Setup (All Assistants)

1. Initialize (index existing transcripts + start watcher)

ram init

This scans well-known transcript locations, indexes existing content, and starts a background file watcher.

2. Search from the terminal

ram search "how did we fix the async error"
ram search "lancedb schema decisions" --output json

3. Configure MCP (if supported)

See your assistant's integration guide for MCP server configuration.

MCP Tools

Once connected as an MCP server, the following tools are available to the agent:

`search_memory`

Search indexed memories using hybrid BM25 + semantic similarity with temporal decay.

Parameter	Type	Description
`query`	`string`	Natural language query
`limit`	`integer?`	Max results (default 5, max 50)
`scope`	`string?`	Filter by scope (`global`, `project:<name>`)
`source`	`string?`	Filter by `indexed` or `explicit`

`memorize`

Store an explicit memory entry directly (bypasses passive indexing).

Parameter	Type	Description
`concept`	`string`	Content to store
`tags`	`string[]`	Classification tags
`importance`	`float?`	0.0–1.0, default 0.8

CLI Reference

ram init              Scan well-known locations, bulk-index, set up watcher
ram serve             Start MCP stdio server
ram index             Index transcripts (--watch for continuous, --path for custom dir)
ram search <query>    Search memories (--limit N, --scope S, --output json|text)
ram memorize <text>   Store explicit memory (--tags t1,t2 --importance 0.9)
ram status            Index statistics and config summary (--output json|text)
ram backup            Create dated .tar.gz snapshot (--dest <dir>)
ram restore           Restore from snapshot (--file <path>)
ram config show       Print current configuration
ram config set        Update a configuration key
ram hook              Claude Code hook handlers (session-start, post-tool-use, session-end)

Configuration

Configuration file: ~/.config/ram/config.toml

[embedding]
backend = "fastembed"           # fastembed | local_api | auto
model = "AllMiniLML6V2"         # ~80 MB, downloads on first run
init_timeout_secs = 300         # Timeout for model download (default: 5 minutes)
embed_timeout_secs = 30         # Timeout for individual embed operations (default: 30 seconds)

[search]
rrf_k = 60                      # Reciprocal Rank Fusion constant
decay_half_life_days = 7.0

[augmentation]
enabled = true
llm_endpoint = ""               # optional: Ollama/MLX URL for richer classification

[otel]
endpoint = ""                   # OTLP gRPC endpoint; empty = disabled

See docs/spec.md for the full configuration reference.

Using a Local Embedding Server (MLX / Ollama)

For better embedding quality on Apple Silicon, point RAM at a local MLX or Ollama server:

[embedding]
backend = "local_api"
local_api_url = "http://127.0.0.1:8095/v1"
local_api_model = "qwen3-embedding"

Any OpenAI-compatible /v1/embeddings endpoint works.

Transcript Locations

RAM automatically discovers transcripts in:

Platform	Path
macOS / Linux	`~/.claude/projects/*/.jsonl`
macOS (legacy)	`~/Library/Application Support/claude-code/transcripts/*/.jsonl`
Linux (legacy)	`~/.local/share/claude-code/*/.jsonl`

Additional paths can be added via ram config set index.transcript_paths '["/extra/path"]'.

Documentation

FAQ — Frequently asked questions about privacy, performance, and usage
Technical Specification — Authoritative requirements, schema, and API reference
Architecture — Threading model, dataflow, concurrency, and failure domains
Troubleshooting — Common issues and solutions
Comparisons — How RAM compares to other memory systems

Community

We welcome contributions from the community! Please review our community guidelines:

Contributing Guide — Development setup, code standards, and PR workflow
Code of Conduct — Community standards and enforcement
Security Policy — How to report vulnerabilities

For technical implementation details, see AGENTS.md.

Related Projects

RAM is positioned against a real ecosystem of memory / transcript-indexing tools. See docs/comparisons.md for a full table covering Anthropic's official offerings, community projects (rmcp-memex, opencode-memsearch, code-session-memory, etc.), and state-of-the-art agent memory systems (Zep, Mem0, Letta, Cognee).

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 209 Commits
.bob		.bob
.cargo		.cargo
.fastembed_cache/models--Qdrant--all-MiniLM-L6-v2-onnx		.fastembed_cache/models--Qdrant--all-MiniLM-L6-v2-onnx
.github/workflows		.github/workflows
Users/jonesn/src/rust-agent-memory/.beads/hooks		Users/jonesn/src/rust-agent-memory/.beads/hooks
crates		crates
docs		docs
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
SECURITY.md		SECURITY.md
clippy.toml		clippy.toml
deny.toml		deny.toml
dist-workspace.toml		dist-workspace.toml
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml

Folders and files

Latest commit

History

Repository files navigation

ramem (RAM) — AI Agent Memory

Features

Use Cases

Architectural Decisions

Bug Pattern Recognition

Project Conventions

Cross-Project Learning

When to Use RAM

Supported AI Coding Assistants

Installation

Prebuilt Binaries (Recommended)

Homebrew (macOS / Linux)

crates.io

From Source

Quick Start

General Setup (All Assistants)

1. Initialize (index existing transcripts + start watcher)

2. Search from the terminal

3. Configure MCP (if supported)

MCP Tools

search_memory

memorize

CLI Reference

Configuration

Using a Local Embedding Server (MLX / Ollama)

Transcript Locations

Documentation

Community

Related Projects

License

About

Topics

Resources

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`search_memory`

`memorize`

Packages