Skip to content

Performance

Chris & Mike edited this page Apr 11, 2026 · 25 revisions

Performance

Optimization strategies and best practices for Memory Journal MCP Server.

Design Philosophy: Fast context retrieval is critical for AI workflows. Memory Journal is optimized for sub-second query response times, enabling AI to access project history without noticeable latency.


Performance Overview

Memory Journal achieves:

  • 2-3 second startup (10x improvement from v1.0)
  • <10ms entry creation (typical)
  • <50ms full-text search (for 1000 entries)
  • <1s semantic search (reliable on first load after v1.2.1 fix)
  • No performance degradation from v2.x refactoring (all async operations maintained)

Startup Performance

Lazy Loading Optimization

Problem (early versions):

  • Startup time: 14 seconds
  • ML dependencies loaded eagerly

Solution:

  • Lazy initialization of ML model (@huggingface/transformers)
  • Model loads only on first semantic search
  • Startup: 2-3 seconds

Modular Architecture

v3.0.0 TypeScript architecture:

  • Modular TypeScript codebase with clear separation of concerns
  • Handler layer, manager layer, and database layer
  • Each module focused on a single responsibility

Performance Impact:

  • No degradation - All async operations preserved
  • Fast startup - 2-3 seconds
  • No overhead from modularization
  • Better maintainability - Easy to optimize specific components
  • Full TypeScript strict mode - Zero type errors

The modular architecture makes it easier to identify and optimize performance bottlenecks!

Implementation (v3.0.0 TypeScript):

// Lazy initialization - model loads on first semantic search
private async ensureInitialized(): Promise<void> {
  if (this.initialized) return;

  // Load embeddings model (lazy, ~5s first time)
  this.embedder = await pipeline(
    'feature-extraction',
    'Xenova/all-MiniLM-L6-v2'
  );

  this.initialized = true;
}

Timeline:

  • Server startup: 2-3s
  • First semantic search: +5s (one-time model load)
  • Subsequent searches: <1s

Database Performance

SQLite Optimization

Memory Journal uses optimal SQLite settings:

-- better-sqlite3 uses native file-based database with WAL mode
PRAGMA journal_mode = WAL;           -- Write-Ahead Logging for concurrent reads
PRAGMA synchronous = NORMAL;         -- Balanced durability/performance
PRAGMA foreign_keys = ON;
PRAGMA busy_timeout = 30000;         -- 30s wait for locks

Benefits:

  • WAL mode: Concurrent readers with single writer
  • Native file-based: Direct disk access for reliable persistence
  • Efficient I/O for queries

Indexing Strategy

Indexes created:

-- Entry lookups
CREATE INDEX idx_memory_journal_timestamp ON memory_journal(timestamp);
CREATE INDEX idx_memory_journal_type ON memory_journal(entry_type);
CREATE INDEX idx_memory_journal_personal ON memory_journal(is_personal);
CREATE INDEX idx_memory_journal_deleted ON memory_journal(deleted_at);

-- Tag lookups
CREATE INDEX idx_tags_name ON tags(name);
CREATE INDEX idx_entry_tags_entry ON entry_tags(entry_id);
CREATE INDEX idx_entry_tags_tag ON entry_tags(tag_id);

-- Relationship lookups
CREATE INDEX idx_relationships_from ON relationships(from_entry_id);
CREATE INDEX idx_relationships_to ON relationships(to_entry_id);

Query optimization:

  • All queries use appropriate indexes
  • No full table scans (except ANALYZE)
  • Covering indexes where possible

Full-Text Search Performance

Full-text search speed:

Entries Search Time
100 <5ms
1,000 <10ms
10,000 <50ms
100,000 <200ms

Optimization:

  • FTS5 queries with BM25 ranking
  • Query term escaping for safe search
  • Result limits (default 10)
  • Fallback to LIKE for queries with special characters

Transaction Management

Single Connection Pattern

Anti-pattern (causes locking):

async function updateEntry(...) {
  const db1 = getDb();
  db1.run("UPDATE memory_journal ...");
  // Nested connection - causes lock!
  await autoCreateTags(tags); // Opens separate db handle
}

Correct pattern:

async function updateEntry(...) {
  const db = getDb();
  db.run("UPDATE memory_journal ...");
  // Use same connection
  db.run("INSERT OR IGNORE INTO tags ...");
  // No nested connections = no locks
}

Benefits:

  • No database locking
  • Atomic transactions
  • Better concurrency

Thread-Safe Tag Creation

Batched race-safe pattern (v4.5+):

// Batch insert all tags at once — INSERT OR IGNORE prevents race conditions
const placeholders = tags.map(() => "(?, 0)").join(", ");
db.run(
  `INSERT OR IGNORE INTO tags (name, usage_count) VALUES ${placeholders}`,
  tags,
);

// Batch lookup all tag IDs in one query
const inClause = tags.map(() => "?").join(", ");
const result = db.exec(
  `SELECT id, name FROM tags WHERE name IN (${inClause})`,
  tags,
);

Benefits:

  • Batch insert + batch lookup (2 queries instead of 2N)
  • No duplicates, no conflicts
  • Multiple entries creating same tag handled atomically

Search Performance

Full-Text Search (FTS5)

Fastest option:

  • better-sqlite3 native file-based database
  • FTS5 with BM25 ranking
  • No external dependencies

Best practices:

// Use specific terms
search_entries({ query: "lazy loading pattern" }); // Fast

// Avoid overly broad
search_entries({ query: "a" }); // Slow (too many matches)

// Use limits
search_entries({ query: "optimization", limit: 10 }); // Fast

Date Range Search

Very fast:

  • Indexed by timestamp
  • Simple range query
  • ~5ms typical

Best practices:

// Reasonable ranges
search_by_date_range({
  start_date: "2025-10-01",
  end_date: "2025-10-31", // 1 month
});

// Add filters for large ranges
search_by_date_range({
  start_date: "2025-01-01",
  end_date: "2025-12-31", // 1 year
  tags: ["performance"], // Filter!
});

Semantic Search

Performance characteristics:

Operation Time
First search (model load) ~5s
Generate embedding 50-100ms
sqlite-vec search 10-50ms
Fetch entries 10-50ms
Total (subsequent) 70-200ms

Optimization:

  • Model cached after first load
  • sqlite-vec index in SQLite database
  • Batch entry fetching

Best practices:

// Use reasonable limits
semantic_search({ query: "...", limit: 10 }); // Fast

// Higher thresholds = faster
semantic_search({
  query: "...",
  similarity_threshold: 0.5, // Fewer results
});

Visualization Performance

Mermaid Generation

Performance:

  • Entry-centric (depth 2): <100ms
  • Tag-based (20 entries): <50ms
  • Recursive CTE: efficient graph traversal

Best practices:

// Reasonable depth
visualize_relationships({ entry_id: 42, depth: 2 }); // Good

// Reasonable limits
visualize_relationships({ tags: ["feature"], limit: 20 }); // Good

// Avoid huge graphs
visualize_relationships({ depth: 5, limit: 100 }); // Slow to render

Memory Usage

Typical Memory Footprint

Minimal (no semantic search):

  • Server: 20-30 MB
  • SQLite cache: 64 MB
  • Total: ~100 MB

With semantic search:

  • Server: 20-30 MB
  • SQLite cache: 64 MB
  • Model: 80-100 MB
  • sqlite-vec index: 1-10 MB (depends on entries)
  • Total: ~200 MB

Scaling Considerations

Small Journal (<1000 entries)

Performance:

  • All operations <50ms
  • Startup: 2-3s
  • No optimization needed

Medium Journal (1000-10000 entries)

Performance:

  • Full-text search: <50ms
  • Semantic search: <200ms
  • Startup: 2-3s

Recommendations:

  • Use search filters (tags, dates)
  • Limit visualization size
  • Regular ANALYZE (monthly)

Large Journal (>10000 entries)

Performance:

  • Full-text search: <200ms
  • Semantic search: <500ms
  • Startup: 2-3s

Recommendations:

  • Archive old entries
  • Use specific search queries
  • Increase similarity_threshold
  • Consider database partitioning

Token Efficiency

Tool Filtering

Reduce context window consumption by disabling unused tools:

MEMORY_JOURNAL_MCP_TOOL_FILTER="-search,-analytics,-relationships,-export,-admin,-github,-backup"

Token savings: From the full 67-tool set (~6,500 tokens baseline), filtering can save up to ~86% by exposing only the 6 core tools. Common configurations like -github (~36%) or -admin,-github (~48%) provide significant savings with minimal functionality loss.

Benefits:

  • Faster AI responses - Smaller context = faster processing
  • Reduced API costs - Fewer tokens = lower bills
  • Stay under client limits - Essential for Windsurf (100-tool limit)
  • Better tool selection - AI makes better choices with fewer options

Complete Tool Filtering Guide →


Optimization Tips

For Faster Searches

1. Use specific queries:

// Good
search_entries({ query: "lazy loading pattern" });

// Poor
search_entries({ query: "loading" });

2. Filter early:

// Good
search_entries({
  query: "optimization",
  is_personal: false, // Reduces search space
});

3. Use appropriate limits:

// Default 10 is good
search_entries({ query: "...", limit: 10 });

// Only increase if needed
search_entries({ query: "...", limit: 50 }); // Slower

For Faster Entry Creation

1. Disable auto_context if not needed:

create_entry({
  content: "...",
  auto_context: false, // Skips Git subprocess
});

2. Tag batching is automatic:

As of v4.5+, linkTagsToEntry batches all tag inserts and lookups internally. No user action needed — passing multiple tags is already optimized.


For Faster Visualization

1. Use appropriate depth:

// Good balance
visualize_relationships({ entry_id: 42, depth: 2 });

// Only use depth 3 if necessary
visualize_relationships({ entry_id: 42, depth: 3 }); // Slower

2. Limit graph size:

visualize_relationships({
  tags: ["feature"],
  limit: 20, // Sweet spot
});

Benchmarks

Memory Journal is designed for extremely low overhead during AI task execution. We include a vitest bench suite to maintain these baseline guarantees:

  • Database Reads: Operations execute in fractions of a millisecond. calculateImportance is ~7x faster than retrieving 50 recent entries (composite index optimization narrows this gap by accelerating getRecentEntries ~4x).
  • Vector Search Engine: Both search (780 ops/sec) and indexing (640 ops/sec) are high-throughput via sqlite-vec with SQL-native KNN queries.
  • Core MCP Routines: getTools uses cached O(1) dispatch (~4800x faster than tool execution). create_entry and search_entries execute through the full MCP layer with sub-millisecond overhead.

To run the benchmarking suite locally:

npm run bench

Internal Optimizations

Performance improvements implemented in the server internals (v4.5+). These are transparent to users — no API changes.

Batch Tag Fetching (N+1 Elimination)

Multi-row methods (getRecentEntries, getEntriesPage, searchEntries, searchByDateRange) previously called getTagsForEntry() per row, causing N+1 queries. Now uses batchGetTagsForEntries() + rowsToEntries() to fetch all tags in a single IN (...) query.

Operation Before After
getRecentEntries(50) 51 queries 2 queries
searchEntries (20 results) 21 queries 2 queries

Batch Tag Linking

linkTagsToEntry() batches tag inserts and lookups:

Step Before (per tag) After (batched)
Insert tags N × INSERT OR IGNORE 1 × multi-value INSERT
Lookup IDs N × SELECT id 1 × SELECT ... WHERE IN
Total statements 4N 2 + 2N

Tool Dispatch Cache

callTool() caches tool definitions in a Map for O(1) lookup instead of rebuilding all 67 ToolDefinition objects on every call.

Benchmark Before After Speedup
create_entry 105 ops/sec 800 ops/sec 7.6×
search_entries 95 ops/sec 733 ops/sec 7.7×
get_recent_entries 82 ops/sec 106 ops/sec 1.3×

Cache invalidates when context parameters change (db, github, vectorManager, config, teamDb).

Conditional JOIN in Date Range Search

searchByDateRange only JOINs tag tables (entry_tags, tags) when a tag filter is provided. Without tags, queries skip the JOIN and DISTINCT, avoiding unnecessary row multiplication.

Consolidated Statistics Queries

getStatistics() reduced from 5 sequential db.exec() calls to 3 using multi-statement exec() and SUM(CASE ...) aggregation.

Simplified Vector Index Rebuild

rebuildIndex() removed a redundant orphan detection pass that preceded a delete-all pass. Now performs a single delete-all before re-indexing.


Maintenance for Performance

Monthly

-- Update statistics
ANALYZE;

-- Check index usage
EXPLAIN QUERY PLAN SELECT ...;

Quarterly

-- Reclaim space
VACUUM;

-- Integrity check
PRAGMA integrity_check;

As Needed

  • Archive old entries
  • Review slow queries
  • Check database size

Troubleshooting Performance Issues

Slow Startup

Check:

  • Version (should be latest)
  • ML dependencies installed
  • System resources (CPU, disk)

Fix:

  • Update to the latest version (includes all performance optimizations)
  • Use Docker image (optimized)
  • Remove ML dependencies if not needed

Slow Searches

Check:

  • Database size
  • Query complexity
  • Filters applied

Fix:

  • Use more specific queries
  • Add filters (tags, dates, is_personal)
  • Run ANALYZE
  • Consider archiving

Database Locks

Check:

  • Multiple connections
  • Long transactions
  • Nested database calls

Fix:

  • Update to the latest version (includes all locking fixes)
  • Use single connection per transaction
  • Avoid nested database calls

Automated Maintenance (HTTP Only)

For long-running HTTP/SSE deployments, the server includes a built-in scheduler that can automate database optimization and index rebuilds on configurable intervals. See Configuration → Automated Scheduler for CLI flags and setup.


Next: Check Architecture or Security.

Clone this wiki locally