Skip to content

Performance

Chris & Mike edited this page Mar 5, 2026 · 25 revisions

Performance

Optimization strategies and best practices for Memory Journal MCP Server.

Design Philosophy: Fast context retrieval is critical for AI workflows. Memory Journal is optimized for sub-second query response times, enabling AI to access project history without noticeable latency.


Performance Overview

Memory Journal achieves:

  • 2-3 second startup (10x improvement from v1.0)
  • <10ms entry creation (typical)
  • <50ms full-text search (for 1000 entries)
  • <1s semantic search (reliable on first load after v1.2.1 fix)
  • No performance degradation from v2.x refactoring (all async operations maintained)

Startup Performance

Lazy Loading Optimization

Problem (early versions):

  • Startup time: 14 seconds
  • ML dependencies loaded eagerly

Solution:

  • Lazy initialization of ML model (@xenova/transformers)
  • Model loads only on first semantic search
  • Startup: 2-3 seconds

Modular Architecture

v3.0.0 TypeScript architecture:

  • Modular TypeScript codebase with clear separation of concerns
  • Handler layer, manager layer, and database layer
  • Each module focused on a single responsibility

Performance Impact:

  • No degradation - All async operations preserved
  • Fast startup - 2-3 seconds
  • No overhead from modularization
  • Better maintainability - Easy to optimize specific components
  • Full TypeScript strict mode - Zero type errors

The modular architecture makes it easier to identify and optimize performance bottlenecks!

Implementation (v3.0.0 TypeScript):

// Lazy initialization - model loads on first semantic search
private async ensureInitialized(): Promise<void> {
  if (this.initialized) return;

  // Load embeddings model (lazy, ~5s first time)
  this.embedder = await pipeline(
    'feature-extraction',
    'Xenova/all-MiniLM-L6-v2'
  );

  this.initialized = true;
}

Timeline:

  • Server startup: 2-3s
  • First semantic search: +5s (one-time model load)
  • Subsequent searches: <1s

Database Performance

SQLite Optimization

Memory Journal uses optimal SQLite settings:

-- sql.js uses an in-memory WASM database
-- Disk-based PRAGMAs (journal_mode, synchronous, mmap_size) do not apply
PRAGMA foreign_keys = ON;
PRAGMA busy_timeout = 30000;         -- 30s wait for locks

Benefits:

  • In-memory WASM: All data in memory for fast access
  • Database loaded from disk on startup, saved on changes
  • No I/O overhead for queries

Indexing Strategy

Indexes created:

-- Entry lookups
CREATE INDEX idx_memory_journal_timestamp ON memory_journal(timestamp);
CREATE INDEX idx_memory_journal_type ON memory_journal(entry_type);
CREATE INDEX idx_memory_journal_personal ON memory_journal(is_personal);
CREATE INDEX idx_memory_journal_deleted ON memory_journal(deleted_at);

-- Tag lookups
CREATE INDEX idx_tags_name ON tags(name);
CREATE INDEX idx_entry_tags_entry ON entry_tags(entry_id);
CREATE INDEX idx_entry_tags_tag ON entry_tags(tag_id);

-- Relationship lookups
CREATE INDEX idx_relationships_from ON relationships(from_entry_id);
CREATE INDEX idx_relationships_to ON relationships(to_entry_id);

Query optimization:

  • All queries use appropriate indexes
  • No full table scans (except ANALYZE)
  • Covering indexes where possible

Full-Text Search Performance

Full-text search speed:

Entries Search Time
100 <5ms
1,000 <10ms
10,000 <50ms
100,000 <200ms

Optimization:

  • LIKE-based queries with parameterized inputs
  • Query term escaping for safe search
  • Result limits (default 10)

Transaction Management

Single Connection Pattern

Anti-pattern (causes locking):

async function updateEntry(...) {
  const db1 = getDb();
  db1.run("UPDATE memory_journal ...");
  // Nested connection - causes lock!
  await autoCreateTags(tags); // Opens separate db handle
}

Correct pattern:

async function updateEntry(...) {
  const db = getDb();
  db.run("UPDATE memory_journal ...");
  // Use same connection
  db.run("INSERT OR IGNORE INTO tags ...");
  // No nested connections = no locks
}

Benefits:

  • No database locking
  • Atomic transactions
  • Better concurrency

Thread-Safe Tag Creation

Race-safe pattern:

// INSERT OR IGNORE prevents race conditions
db.run("INSERT OR IGNORE INTO tags (name, usage_count) VALUES (?, 1)", [
  tagName,
]);

// Then lookup
const result = db.exec("SELECT id FROM tags WHERE name = ?", [tagName]);
const tagId = result[0].values[0][0];

Handles concurrent tag creation:

  • Multiple entries creating same tag
  • No duplicates
  • No conflicts

Search Performance

Full-Text Search (LIKE-based)

Fastest option:

  • sql.js in-memory WASM
  • LIKE-based queries
  • No external dependencies

Best practices:

// Use specific terms
search_entries({ query: "lazy loading pattern" }); // Fast

// Avoid overly broad
search_entries({ query: "a" }); // Slow (too many matches)

// Use limits
search_entries({ query: "optimization", limit: 10 }); // Fast

Date Range Search

Very fast:

  • Indexed by timestamp
  • Simple range query
  • ~5ms typical

Best practices:

// Reasonable ranges
search_by_date_range({
  start_date: "2025-10-01",
  end_date: "2025-10-31", // 1 month
});

// Add filters for large ranges
search_by_date_range({
  start_date: "2025-01-01",
  end_date: "2025-12-31", // 1 year
  tags: ["performance"], // Filter!
});

Semantic Search

Performance characteristics:

Operation Time
First search (model load) ~5s
Generate embedding 50-100ms
vectra search 10-50ms
Fetch entries 10-50ms
Total (subsequent) 70-200ms

Optimization:

  • Model cached after first load
  • vectra index in memory
  • Batch entry fetching

Best practices:

// Use reasonable limits
semantic_search({ query: "...", limit: 10 }); // Fast

// Higher thresholds = faster
semantic_search({
  query: "...",
  similarity_threshold: 0.5, // Fewer results
});

Visualization Performance

Mermaid Generation

Performance:

  • Entry-centric (depth 2): <100ms
  • Tag-based (20 entries): <50ms
  • Recursive CTE: efficient graph traversal

Best practices:

// Reasonable depth
visualize_relationships({ entry_id: 42, depth: 2 }); // Good

// Reasonable limits
visualize_relationships({ tags: ["feature"], limit: 20 }); // Good

// Avoid huge graphs
visualize_relationships({ depth: 5, limit: 100 }); // Slow to render

Memory Usage

Typical Memory Footprint

Minimal (no semantic search):

  • Server: 20-30 MB
  • SQLite cache: 64 MB
  • Total: ~100 MB

With semantic search:

  • Server: 20-30 MB
  • SQLite cache: 64 MB
  • Model: 80-100 MB
  • vectra index: 1-10 MB (depends on entries)
  • Total: ~200 MB

Scaling Considerations

Small Journal (<1000 entries)

Performance:

  • All operations <50ms
  • Startup: 2-3s
  • No optimization needed

Medium Journal (1000-10000 entries)

Performance:

  • Full-text search: <50ms
  • Semantic search: <200ms
  • Startup: 2-3s

Recommendations:

  • Use search filters (tags, dates)
  • Limit visualization size
  • Regular ANALYZE (monthly)

Large Journal (>10000 entries)

Performance:

  • Full-text search: <200ms
  • Semantic search: <500ms
  • Startup: 2-3s

Recommendations:

  • Archive old entries
  • Use specific search queries
  • Increase similarity_threshold
  • Consider database partitioning

Token Efficiency

Tool Filtering

Reduce context window consumption by disabling unused tools:

MEMORY_JOURNAL_MCP_TOOL_FILTER="-search,-analytics,-relationships,-export,-admin,-github,-backup"

Token savings by configuration:

Configuration Filter Tools Est. Token Savings
Full (default) (none) 42 Baseline (~6,500 tokens)
Read-only -admin 34 ~15% (~900 tokens/request)
No GitHub -github 24 ~36% (~2,100 tokens/request)
Focused -admin,-github 19 ~51% (~3,000 tokens/request)
Minimal Search -analytics,-relationships,-export,-admin,-github,-backup 10 ~73% (~4,300 tokens/request)
Core Only -search,-analytics,-relationships,-export,-admin,-github,-backup 6 ~84% (~5,000 tokens/request)

Benefits:

  • Faster AI responses - Smaller context = faster processing
  • Reduced API costs - Fewer tokens = lower bills
  • Stay under client limits - Essential for Windsurf (100-tool limit)
  • Better tool selection - AI makes better choices with fewer options

Complete Tool Filtering Guide →


Optimization Tips

For Faster Searches

1. Use specific queries:

// Good
search_entries({ query: "lazy loading pattern" });

// Poor
search_entries({ query: "loading" });

2. Filter early:

// Good
search_entries({
  query: "optimization",
  is_personal: false, // Reduces search space
});

3. Use appropriate limits:

// Default 10 is good
search_entries({ query: "...", limit: 10 });

// Only increase if needed
search_entries({ query: "...", limit: 50 }); // Slower

For Faster Entry Creation

1. Disable auto_context if not needed:

create_entry({
  content: "...",
  auto_context: false, // Skips Git subprocess
});

2. Batch tag creation:

// Create entries with same tags together
// Tags only created once

For Faster Visualization

1. Use appropriate depth:

// Good balance
visualize_relationships({ entry_id: 42, depth: 2 });

// Only use depth 3 if necessary
visualize_relationships({ entry_id: 42, depth: 3 }); // Slower

2. Limit graph size:

visualize_relationships({
  tags: ["feature"],
  limit: 20, // Sweet spot
});

Benchmarks

Memory Journal is designed for extremely low overhead during AI task execution. We include a vitest bench suite to maintain these baseline guarantees:

  • Database Reads: Operations execute in fractions of a millisecond. calculateImportance is ~55x faster than retrieving 50 recent entries.
  • Vector Search Engine: Semantic searches via vectra perform significantly faster than parallel entry indexing (>190x faster locally).
  • Core MCP Routines: Complex operations exhibit negligible latency when executed through standard MCP tools. Calling tools natively adds ~1.4x overhead compared to direct function execution.

To run the benchmarking suite locally:

npm run bench

Maintenance for Performance

Monthly

-- Update statistics
ANALYZE;

-- Check index usage
EXPLAIN QUERY PLAN SELECT ...;

Quarterly

-- Reclaim space
VACUUM;

-- Integrity check
PRAGMA integrity_check;

As Needed

  • Archive old entries
  • Review slow queries
  • Check database size

Troubleshooting Performance Issues

Slow Startup

Check:

  • Version (should be latest)
  • ML dependencies installed
  • System resources (CPU, disk)

Fix:

  • Update to the latest version (includes all performance optimizations)
  • Use Docker image (optimized)
  • Remove ML dependencies if not needed

Slow Searches

Check:

  • Database size
  • Query complexity
  • Filters applied

Fix:

  • Use more specific queries
  • Add filters (tags, dates, is_personal)
  • Run ANALYZE
  • Consider archiving

Database Locks

Check:

  • Multiple connections
  • Long transactions
  • Nested database calls

Fix:

  • Update to the latest version (includes all locking fixes)
  • Use single connection per transaction
  • Avoid nested database calls

Next: Check Architecture or Security.

Clone this wiki locally