Skip to content

A memory storage with vector indexing and search for claude code

Notifications You must be signed in to change notification settings

nikolan/memory-mcp-node

Repository files navigation

Memory MCP

Persistent, searchable memory for AI assistants. Works with Claude Desktop, Cursor, Continue, and any MCP-compatible client.

Store conversations, decisions, preferences, and facts in plain Markdown files with hybrid vector + keyword search.

Based on the excellent work by Manthan Gupta

Features

  • Hybrid Search - Vector embeddings + BM25 keyword matching (70/30 blend)
  • Plain Markdown - Human-readable files you can edit directly
  • Maintenance LLM (OpenAI) - Promotion and deep compaction run via maintenance scripts
  • Smart Compaction - TF-IDF clustering catches semantic duplicates
  • Concurrent Access - WAL mode supports multiple connections
  • Team Ready - Distributed sync architecture (Phase 2)

Getting Started

1) Clone

git clone <repo-url>
cd memory-mcp-node
npm install

2) Build

npm run build

3) Set up MCP

Run the guided setup (recommended) to generate a client config:

npm run init

Or add the MCP server manually (example):

{
  "mcpServers": {
    "memory": {
      "command": "node",
      "args": ["/absolute/path/to/memory-mcp-node/dist/index.js"]
    }
  }
}

4) (Optional) Add Memory Protocol

If your client supports it, add the instructions from MEMORY_PROTOCOL.md to your agent rules.

5) Set up maintenance (better search + token economy)

Maintenance runs outside MCP and uses OpenAI for promotion/deep compaction.
Create a repo-local .env with OPENAI_API_KEY, then run:

npm run maintenance -- --action check

Tech Stack

Technology What It Does Why We Chose It
Node.js 18+ Runtime environment Native ES modules, excellent async I/O, ubiquitous in AI tooling ecosystem
TypeScript Type-safe JavaScript Catches bugs at compile time, better IDE support, self-documenting code
SQLite + sqlite-vec Vector database with FTS5 Zero-config embedded database, native vector search, full-text search in one package. No external services needed
WAL Mode Write-ahead logging Enables concurrent reads during writes, critical for multiple MCP connections from different clients
Transformers.js Embeddings Xenova/all-MiniLM-L6-v2 Local semantic search, 384 dimensions, no external embedding API
BM25 (FTS5) Keyword search Industry-standard relevance ranking, handles exact matches that vector search misses
TF-IDF Clustering Topic grouping Groups semantically similar entries before deduplication, catches "Chose Stripe" + "Using Stripe API" as related
Markdown Files Storage format Human-readable, git-friendly, editable with any text editor, survives tool changes
MCP Protocol AI tool interface Anthropic's standard for tool use, works with Claude, Cursor, Continue, and growing ecosystem
dotenv Configuration Simple secrets management, 12-factor app compliance, easy local development

Architecture

System Overview

flowchart TB
    subgraph clients [MCP Clients]
        Claude[Claude Desktop]
        Cursor[Cursor IDE]
        Continue[Continue.dev]
    end

    subgraph server [Memory MCP Server]
        Tools[MCP Tools Layer]
        
        subgraph core [Core Services]
            Indexer[Indexer]
            Search[Hybrid Search]
            Compact[Compaction]
            Promote[Promotion]
        end
        
        subgraph storage [Storage Layer]
            SQLite[(SQLite + WAL)]
            Files[Markdown Files]
        end
    end

    subgraph external [External APIs]
        OpenAI[OpenAI API (maintenance)]
    end

    clients -->|MCP Protocol| Tools
    Tools --> core
    Promote -->|Scoring| OpenAI
    Compact -.->|Deep Mode| OpenAI
    core --> storage
Loading

Search Flow

sequenceDiagram
    participant Client as MCP Client
    participant Server as Memory Server
    participant Embed as Local Embeddings
    participant Vec as Vector Search
    participant BM25 as BM25 Search
    participant DB as SQLite

    Client->>Server: memory_search(query)
    Server->>Embed: Generate query embedding
    Embed-->>Server: 384-dim vector
    
    par Parallel Search
        Server->>Vec: Vector similarity search
        Vec->>DB: SELECT with cosine distance
        DB-->>Vec: Top K results
    and
        Server->>BM25: Keyword search
        BM25->>DB: FTS5 MATCH query
        DB-->>BM25: Top K results
    end
    
    Server->>Server: Merge results (70% vec + 30% BM25)
    Server->>Server: Deduplicate and rank
    Server-->>Client: Ranked memory chunks
Loading

Memory Lifecycle

flowchart LR
    subgraph daily [Daily Memory]
        Store[memory_store] --> Daily[memory/YYYY-MM-DD.md]
        Daily --> Compact[Compaction]
    end

    subgraph longterm [Long-Term Memory]
        Compact --> Promote{Promotion Score}
        Promote -->|Score >= 0.8| MEMORY[MEMORY.md]
        Promote -->|Score < 0.8| Archive[Retained in daily]
    end

    subgraph cleanup [Maintenance]
        Retention[Retention Policy] --> Delete[Delete old files]
        Delete --> Vacuum[Clean DB entries]
    end

    Daily --> Retention
Loading

Storage Structure

flowchart TB
    subgraph root [.memory/]
        MEMORY[MEMORY.md<br/>Long-term memories]
        DB[(index.sqlite<br/>Search index)]
        WAL[index.sqlite-wal<br/>WAL journal]
        
        subgraph daily [memory/]
            D1[2025-01-26.md]
            D2[2025-01-27.md]
            D3[2025-01-28.md]
        end
        
        subgraph team [team/]
            T1[Synced team knowledge<br/>Phase 2]
        end
    end

    style MEMORY fill:#e1f5fe
    style DB fill:#fff3e0
    style daily fill:#f3e5f5
    style team fill:#e8f5e9
Loading

File Contents

MEMORY.md - Promoted long-term memories:

# Memory

## User Preferences
- Prefers TypeScript over JavaScript
- Uses Vim keybindings

## Important Decisions
### 2025-01-28
Chose PostgreSQL for the new project because...

## Key Contacts
- Alice (Tech Lead) - alice@company.com

memory/2025-01-28.md - Daily conversation memory:

# 2025-01-28

## 10:30

Working on the authentication module. Decided to use JWT tokens
with refresh token rotation for better security.

## 14:15

User prefers detailed explanations over brief answers when
discussing architecture decisions.

Configuration

Secrets (.env)

API keys and paths. Create in repo root:

# Optional - for maintenance LLM (promotion, deep compaction)
OPENAI_API_KEY=sk-...

Settings (config.json)

All other configuration. Edit config.json in repo root:

{
  "embeddingModel": "Xenova/all-MiniLM-L6-v2",
  "maxDailyChats": 180,
  "maintenance": {
    "compactionThresholdKB": 50,
    "compactionThresholdEntries": 30,
    "compactionMode": "quick",
    "promotionScoreThreshold": 0.8,
    "promotionLookbackDays": 30,
    "autoMaintenanceIntervalHours": 24
  },
  "distributed": {
    "enabled": false,
    "autoSync": true
  }
}
Setting Description
embeddingModel Transformers.js embedding model
compactionMode quick (TF-IDF + dedup) or deep (LLM summarization)
distributed Team sync settings (Phase 2)

MCP Client Setup

Claude Desktop

~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "memory": {
      "command": "node",
      "args": ["/absolute/path/to/memory-mcp-node/dist/index.js"]
    }
  }
}

Cursor

Settings → Features → MCP Servers:

{
  "memory": {
    "command": "node",
    "args": ["/absolute/path/to/memory-mcp-node/dist/index.js"]
  }
}

Tip: Run npm run init to get the exact paths for your system.

Codex

Codex reads MCP settings from ~/.codex/config.toml (TOML format, not JSON). Add the MCP server there using the Codex-specific format.


MCP Tools

Tool Purpose
memory_search Find relevant memories using hybrid search
memory_store Save new information
memory_get Read specific content by path
memory_list_recent Load recent context
memory_forget Remove memories
memory_status Check system health

The AI uses these tools automatically based on conversation context.


Maintenance

The system maintains itself through retention, compaction, and promotion:

flowchart LR
    subgraph triggers [Triggers]
        Start[Conversation Start]
        Writes[Many Writes]
        Manual[User Request]
    end

    subgraph actions [Maintenance Actions]
        Check{Overdue?}
        Retention[Retention<br/>Delete old files]
        Compaction[Compaction<br/>Deduplicate]
        Promotion[Promotion<br/>Extract facts]
    end

    Start --> Check
    Writes --> Check
    Manual --> Check
    Check -->|Yes| Retention
    Retention --> Compaction
    Compaction --> Promotion
    Check -->|No| Skip[Skip]
Loading
Action What it does
Retention Keeps latest N daily files, cleans old database entries
Compaction Deduplicates (quick) or summarizes (deep) large daily files
Promotion Extracts long-term facts to MEMORY.md

Manual Commands

# Check status
npm run maintenance -- --action check

# Full maintenance (dry run)
npm run maintenance -- --action full

# Full maintenance (execute)
npm run maintenance -- --action full --dry-run false

# Specific actions
npm run maintenance -- --action retention
npm run maintenance -- --action compact
npm run maintenance -- --action promote

Maintenance LLM Usage

Embeddings run locally via transformers.js. OpenAI is used only for maintenance (promotion scoring and deep compaction) via the maintenance script.

Component Provider Model When
Embeddings Local (transformers.js) Xenova/all-MiniLM-L6-v2 Every search/store
Promotion Scoring OpenAI gpt-4o-mini Maintenance runs
Deep Compaction OpenAI gpt-4o-mini When enabled

Troubleshooting

"OpenAI API key is required"

  • Ensure .env exists in repo root with OPENAI_API_KEY
  • Check for typos in the key

Build fails

npm install
npm run build

MCP client not connecting

  • Verify absolute paths in client config
  • Ensure dist/index.js exists (run npm run build)
  • Check client logs for error messages
  • Restart the MCP client after config changes

401 Incorrect API key

  • Rerun the maintenance script after updating .env
  • Verify key is valid at platform.openai.com

WAL files appearing

  • index.sqlite-wal and index.sqlite-shm are normal
  • Part of WAL mode for better concurrency
  • Don't delete while server is running

Requirements

  • Node.js 18+
  • OpenAI API key (required for maintenance only)

License

MIT

About

A memory storage with vector indexing and search for claude code

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •