Skip to content

planetf1/ramem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

209 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

ramem (RAM) — AI Agent Memory

CI crates.io License: Apache-2.0

Your AI coding agent finally remembers things. RAM indexes your Claude Code (and other assistant) session transcripts into a local hybrid search database and gives the agent access to that history via MCP — so architectural decisions, bug patterns, and project conventions persist across sessions automatically.

No cloud. No daemons. No external services. A single ram binary, ~640 KB, that runs entirely on your machine.

Features

  • Zero infrastructure — embedded LanceDB, local ONNX embeddings (80–274 MB depending on config), no external services
  • Hybrid search — weighted BM25 + vector similarity fused via RRF, cross-encoder reranking enabled by default
  • Temporal decay — recent memories rank higher; configurable half-life (default 30 days)
  • Single binary — one ram executable handles MCP server, indexer, CLI, and hooks
  • Automatic indexing — file watcher indexes new transcript content with a 5-minute cooldown on active sessions
  • Explicit memorization — agents can write permanent memories via the memorize MCP tool
  • Async enrichment — optional background classifier enriches tags and metadata without blocking
  • OpenTelemetry — traces and metrics exported to any OTLP endpoint
  • CLI parity — every MCP capability is also available as a CLI command with --output json

Use Cases

Architectural Decisions

Without RAM:

You (Day 1): "Let's use LanceDB for the vector store"
Claude: "Good choice! I'll implement it."
[implements LanceDB integration]

You (Day 5, new session): "Why did we choose LanceDB?"
Claude: "I don't have context from previous sessions. Can you remind me?"

With RAM:

You (Day 1): "Let's use LanceDB for the vector store"
Claude: "Good choice! I'll implement it."
[Claude calls memorize: "Chose LanceDB over Qdrant because..."]

You (Day 5, new session): "Why did we choose LanceDB?"
Claude: "Based on our decision from 4 days ago, we chose LanceDB because
it embeds in-process with no daemon, has native BM25 support, and..."

Bug Pattern Recognition

Without RAM:

You (Week 1): "The async task is panicking"
Claude: [debugs, finds root cause, fixes]

You (Week 3): "Different async task is panicking"
Claude: [starts from scratch, doesn't remember similar bug]

With RAM:

You (Week 1): "The async task is panicking"
Claude: [debugs, finds root cause, fixes]

You (Week 3): "Different async task is panicking"
Claude: "This looks similar to the issue we fixed 2 weeks ago where
the Arc clone was inside the closure. Let me check if..."
[applies same fix pattern]

Project Conventions

Without RAM:

You: "Use thiserror for library errors"
Claude: [implements with thiserror]

[New session]
You: "Add error handling to this function"
Claude: [uses anyhow instead of thiserror]
You: "No, we use thiserror in libraries"
Claude: "Sorry, I'll fix that"

With RAM:

You: "Use thiserror for library errors"
Claude: [implements with thiserror]
[Claude calls memorize: "Convention: thiserror for libraries, anyhow for binaries"]

[New session]
You: "Add error handling to this function"
Claude: "Based on our project convention, I'll use thiserror since
this is library code..."

Cross-Project Learning

Without RAM:

[Project A] You: "How do we handle rate limiting?"
Claude: [implements solution]

[Project B, new session] You: "How do we handle rate limiting?"
Claude: [implements different solution, doesn't remember Project A]

With RAM (global scope):

[Project A] You: "How do we handle rate limiting?"
Claude: [implements solution]
[Claude calls memorize with scope="global"]

[Project B] You: "How do we handle rate limiting?"
Claude: "I remember we implemented a token bucket rate limiter in
Project A. Would you like me to use the same pattern here?"

When to Use RAM

Good fit:

  • Multi-session projects where context accumulates over days/weeks
  • Projects with explicit architectural decisions worth remembering
  • Teams with coding conventions that agents should follow
  • Cross-project pattern reuse (with global scope)

Not needed:

  • Single-session throwaway scripts
  • Projects where all context fits in a single prompt
  • When you prefer agents to start fresh each session

Supported AI Coding Assistants

Assistant MCP Support Transcript Indexing Notes
Claude Code ✅ Full ✅ Automatic All platforms; subagent sessions indexed
OpenCode ⚠️ TBD ✅ Automatic SQLite DB polled every 5 min
IBM Bob ❌ CLI only ✅ Automatic Native format + VS Code extension
Cline / Roo Cline ❌ CLI only ✅ Automatic VS Code globalStorage layout
OpenAI Codex ❌ CLI only ✅ Automatic
Zed ❌ CLI only ✅ Automatic Markdown conversation format
Aider ❌ CLI only ✅ Automatic Chat history indexing
Cursor ❌ Not yet Parser not implemented
GitHub Copilot ❌ Not yet Parser not implemented

Legend:

  • ✅ Full: Complete MCP integration with all tools available
  • ⚠️ TBD: MCP support status unclear, check integration guide
  • ❌ CLI only: No MCP support, use ram search / ram memorize from the terminal

Installation

Prebuilt Binaries (Recommended)

Download the latest release for your platform:

# macOS / Linux (via install script)
curl -sSf https://github.com/planetf1/ramem/releases/latest/download/ramem-installer.sh | sh

# Or download manually from:
# https://github.com/planetf1/ramem/releases

The ram binary is approximately 640 KB (static build). Embedding models (~80 MB total) are downloaded on first run by fastembed.

Resource usage at runtime:

RAM uses a smart daemon model: the first ram serve call starts one real daemon process; subsequent calls from other sessions attach as lightweight stdio proxies (~11 MB each). Only the daemon loads the models and the LanceDB index.

Daemon RSS breakdown with default config (BGESmallENV15 + JinaRerankerV1TurboEn reranker enabled):

Component Approx RSS
Embedding model in ORT (BGESmallENV15, 128 MB on disk) ~250 MB
Reranker model in ORT (JinaRerankerV1TurboEn, 146 MB on disk) ~280 MB
LanceDB Arrow buffers (scales with index size) ~2–4× on-disk size
Tokio runtime + allocator overhead ~50 MB

A typical installation with 100 K+ indexed chunks and both models loaded lands around 1–2 GB RSS — this is expected, not a leak.

To reduce memory footprint:

  • Disable the cross-encoder reranker (saves ~280 MB, biggest single change):
    [search]
    reranker_enabled = false
  • Reduce the reranker candidate pool (less Arrow allocation per query, saves ~100–200 MB at peak):
    reranker_candidate_pool = 20   # default is 50
  • Use a smaller embedding model (AllMiniLML6V2, 80 MB on disk vs 128 MB):
    [embedding]
    model = "AllMiniLML6V2"
    After changing the model run ram wipe && ram init to re-index.
  • Use an external embedding server (no ONNX runtime in-process, ~50 MB for the daemon):
    [embedding]
    backend = "local_api"
    local_api_url = "http://127.0.0.1:8095/v1"
    local_api_model = "your-model"
  • Prune the index — a large number of indexed chunks (check with ram status) drives LanceDB memory use. Run ram wipe && ram init after narrowing index.transcript_paths to recent content.

Homebrew (macOS / Linux)

brew install planetf1/tap/ramem

crates.io

cargo install ramem

From Source

# Or build from source
git clone https://github.com/planetf1/ramem
cd ramem
cargo build --release
# binary at target/release/ram

Quick Start

Choose your AI coding assistant for detailed setup instructions:

  • Claude Code — Full MCP integration with automatic hooks
  • OpenCode — Transcript indexing with potential MCP support
  • Zed — CLI-based workflow with markdown conversation parsing
  • Aider — CLI-based workflow with chat history indexing

General Setup (All Assistants)

1. Initialize (index existing transcripts + start watcher)

ram init

This scans well-known transcript locations, indexes existing content, and starts a background file watcher.

2. Search from the terminal

ram search "how did we fix the async error"
ram search "lancedb schema decisions" --output json

3. Configure MCP (if supported)

See your assistant's integration guide for MCP server configuration.

MCP Tools

Once connected as an MCP server, the following tools are available to the agent:

search_memory

Search indexed memories using hybrid BM25 + semantic similarity with temporal decay.

Parameter Type Description
query string Natural language query
limit integer? Max results (default 5, max 50)
scope string? Filter by scope (global, project:<name>)
source string? Filter by indexed or explicit

memorize

Store an explicit memory entry directly (bypasses passive indexing).

Parameter Type Description
concept string Content to store
tags string[] Classification tags
importance float? 0.0–1.0, default 0.8

CLI Reference

ram init              Scan well-known locations, bulk-index, set up watcher
ram serve             Start MCP stdio server
ram index             Index transcripts (--watch for continuous, --path for custom dir)
ram search <query>    Search memories (--limit N, --scope S, --output json|text)
ram memorize <text>   Store explicit memory (--tags t1,t2 --importance 0.9)
ram status            Index statistics and config summary (--output json|text)
ram backup            Create dated .tar.gz snapshot (--dest <dir>)
ram restore           Restore from snapshot (--file <path>)
ram config show       Print current configuration
ram config set        Update a configuration key
ram hook              Claude Code hook handlers (session-start, post-tool-use, session-end)

Configuration

Configuration file: ~/.config/ram/config.toml

[embedding]
backend = "fastembed"           # fastembed | local_api | auto
model = "AllMiniLML6V2"         # ~80 MB, downloads on first run
init_timeout_secs = 300         # Timeout for model download (default: 5 minutes)
embed_timeout_secs = 30         # Timeout for individual embed operations (default: 30 seconds)

[search]
rrf_k = 60                      # Reciprocal Rank Fusion constant
decay_half_life_days = 7.0

[augmentation]
enabled = true
llm_endpoint = ""               # optional: Ollama/MLX URL for richer classification

[otel]
endpoint = ""                   # OTLP gRPC endpoint; empty = disabled

See docs/spec.md for the full configuration reference.

Using a Local Embedding Server (MLX / Ollama)

For better embedding quality on Apple Silicon, point RAM at a local MLX or Ollama server:

[embedding]
backend = "local_api"
local_api_url = "http://127.0.0.1:8095/v1"
local_api_model = "qwen3-embedding"

Any OpenAI-compatible /v1/embeddings endpoint works.

Transcript Locations

RAM automatically discovers transcripts in:

Platform Path
macOS / Linux ~/.claude/projects/**/*.jsonl
macOS (legacy) ~/Library/Application Support/claude-code/transcripts/**/*.jsonl
Linux (legacy) ~/.local/share/claude-code/**/*.jsonl

Additional paths can be added via ram config set index.transcript_paths '["/extra/path"]'.

Documentation

  • FAQ — Frequently asked questions about privacy, performance, and usage
  • Technical Specification — Authoritative requirements, schema, and API reference
  • Architecture — Threading model, dataflow, concurrency, and failure domains
  • Troubleshooting — Common issues and solutions
  • Comparisons — How RAM compares to other memory systems

Community

We welcome contributions from the community! Please review our community guidelines:

For technical implementation details, see AGENTS.md.

Related Projects

RAM is positioned against a real ecosystem of memory / transcript-indexing tools. See docs/comparisons.md for a full table covering Anthropic's official offerings, community projects (rmcp-memex, opencode-memsearch, code-session-memory, etc.), and state-of-the-art agent memory systems (Zep, Mem0, Letta, Cognee).

License

Apache-2.0

About

Local offline memory for AI coding agents. Hybrid BM25+vector search via MCP — no cloud, no daemons.

Topics

Resources

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors