Performance

Optimization strategies and best practices for Memory Journal MCP Server.

Design Philosophy: Fast context retrieval is critical for AI workflows. Memory Journal is optimized for sub-second query response times, enabling AI to access project history without noticeable latency.

Performance Overview

Memory Journal v3.0.0 achieves:

2-3 second startup (10x improvement from v1.0)
<10ms entry creation (typical)
<50ms full-text search (for 1000 entries)
<1s semantic search (reliable on first load after v1.2.1 fix)
No performance degradation from v2.x refactoring (all async operations maintained)

Startup Performance

Lazy Loading Optimization

Problem (early versions):

Startup time: 14 seconds
ML dependencies loaded eagerly

Solution:

Lazy initialization of ML model (@xenova/transformers)
Model loads only on first semantic search
Startup: 2-3 seconds

Modular Architecture

v3.0.0 TypeScript architecture:

Modular TypeScript codebase with clear separation of concerns
Handler layer, manager layer, and database layer
Each module focused on a single responsibility

Performance Impact:

✅ No degradation - All async operations preserved
✅ Fast startup - 2-3 seconds
✅ No overhead from modularization
✅ Better maintainability - Easy to optimize specific components
✅ Full TypeScript strict mode - Zero type errors

The modular architecture makes it easier to identify and optimize performance bottlenecks!

Implementation (v3.0.0 TypeScript):

// Lazy initialization - model loads on first semantic search
private async ensureInitialized(): Promise<void> {
  if (this.initialized) return;

  // Load embeddings model (lazy, ~5s first time)
  this.embedder = await pipeline(
    'feature-extraction',
    'Xenova/all-MiniLM-L6-v2'
  );

  this.initialized = true;
}

Timeline:

Server startup: 2-3s
First semantic search: +5s (one-time model load)
Subsequent searches: <1s

Database Performance

SQLite Optimization

Memory Journal uses optimal SQLite settings:

PRAGMA journal_mode = WAL;          -- Write-Ahead Logging
PRAGMA synchronous = NORMAL;         -- Balance speed/safety
PRAGMA cache_size = -64000;          -- 64MB cache
PRAGMA mmap_size = 268435456;        -- 256MB memory-mapped I/O
PRAGMA temp_store = MEMORY;          -- Temp tables in memory
PRAGMA busy_timeout = 30000;         -- 30s lock timeout

Benefits:

WAL mode: Concurrent reads + writes
Large cache: Hot data stays in memory
Memory-mapped I/O: Direct memory access
Memory temp store: Fast temporary operations

Indexing Strategy

Indexes created:

-- Entry lookups
CREATE INDEX idx_memory_journal_timestamp ON memory_journal(timestamp);
CREATE INDEX idx_memory_journal_type ON memory_journal(entry_type);
CREATE INDEX idx_memory_journal_personal ON memory_journal(is_personal);
CREATE INDEX idx_memory_journal_deleted ON memory_journal(deleted_at);

-- Tag lookups
CREATE INDEX idx_tags_name ON tags(name);
CREATE INDEX idx_entry_tags_entry ON entry_tags(entry_id);
CREATE INDEX idx_entry_tags_tag ON entry_tags(tag_id);

-- Relationship lookups
CREATE INDEX idx_relationships_from ON relationships(from_entry_id);
CREATE INDEX idx_relationships_to ON relationships(to_entry_id);

Query optimization:

All queries use appropriate indexes
No full table scans (except ANALYZE)
Covering indexes where possible

FTS5 Performance

Full-text search speed:

Entries	Search Time
100	<5ms
1,000	<10ms
10,000	<50ms
100,000	<200ms

Optimization:

Porter stemming (matches variations)
BM25 ranking (relevance scoring)
Result limits (default 10)

Transaction Management

Single Connection Pattern

Anti-pattern (causes locking):

async function updateEntry(...) {
  const db1 = getDb();
  db1.run("UPDATE memory_journal ...");
  // Nested connection - causes lock!
  await autoCreateTags(tags); // Opens separate db handle
}

Correct pattern:

async function updateEntry(...) {
  const db = getDb();
  db.run("UPDATE memory_journal ...");
  // Use same connection
  db.run("INSERT OR IGNORE INTO tags ...");
  // No nested connections = no locks
}

Benefits:

No database locking
Atomic transactions
Better concurrency

Thread-Safe Tag Creation

Race-safe pattern:

// INSERT OR IGNORE prevents race conditions
db.run("INSERT OR IGNORE INTO tags (name, usage_count) VALUES (?, 1)", [
  tagName,
]);

// Then lookup
const result = db.exec("SELECT id FROM tags WHERE name = ?", [tagName]);
const tagId = result[0].values[0][0];

Handles concurrent tag creation:

Multiple entries creating same tag
No duplicates
No conflicts

Search Performance

Full-Text Search (FTS5)

Fastest option:

SQLite native
Optimized C implementation
No external dependencies

Best practices:

// Use specific terms
search_entries({ query: "lazy loading pattern" }); // Fast

// Avoid overly broad
search_entries({ query: "a" }); // Slow (too many matches)

// Use limits
search_entries({ query: "optimization", limit: 10 }); // Fast

Date Range Search

Very fast:

Indexed by timestamp
Simple range query
~5ms typical

Best practices:

// Reasonable ranges
search_by_date_range({
  start_date: "2025-10-01",
  end_date: "2025-10-31", // 1 month
});

// Add filters for large ranges
search_by_date_range({
  start_date: "2025-01-01",
  end_date: "2025-12-31", // 1 year
  tags: ["performance"], // Filter!
});

Semantic Search

Performance characteristics:

Operation	Time
First search (model load)	~5s
Generate embedding	50-100ms
vectra search	10-50ms
Fetch entries	10-50ms
Total (subsequent)	70-200ms

Optimization:

Model cached after first load
vectra index in memory
Batch entry fetching

Best practices:

// Use reasonable limits
semantic_search({ query: "...", limit: 10 }); // Fast

// Higher thresholds = faster
semantic_search({
  query: "...",
  similarity_threshold: 0.5, // Fewer results
});

Visualization Performance

Mermaid Generation

Performance:

Entry-centric (depth 2): <100ms
Tag-based (20 entries): <50ms
Recursive CTE: efficient graph traversal

Best practices:

// Reasonable depth
visualize_relationships({ entry_id: 42, depth: 2 }); // Good

// Reasonable limits
visualize_relationships({ tags: ["feature"], limit: 20 }); // Good

// Avoid huge graphs
visualize_relationships({ depth: 5, limit: 100 }); // Slow to render

Memory Usage

Typical Memory Footprint

Minimal (no semantic search):

Server: 20-30 MB
SQLite cache: 64 MB
Total: ~100 MB

With semantic search:

Server: 20-30 MB
SQLite cache: 64 MB
Model: 80-100 MB
vectra index: 1-10 MB (depends on entries)
Total: ~200 MB

Scaling Considerations

Small Journal (<1000 entries)

Performance:

All operations <50ms
Startup: 2-3s
No optimization needed

Medium Journal (1000-10000 entries)

Performance:

FTS5 search: <50ms
Semantic search: <200ms
Startup: 2-3s

Recommendations:

Use search filters (tags, dates)
Limit visualization size
Regular ANALYZE (monthly)

Large Journal (>10000 entries)

Performance:

FTS5 search: <200ms
Semantic search: <500ms
Startup: 2-3s

Recommendations:

Archive old entries
Use specific search queries
Increase similarity_threshold
Consider database partitioning

Token Efficiency

Tool Filtering

Reduce context window consumption by disabling unused tools:

MEMORY_JOURNAL_MCP_TOOL_FILTER="-search,-analytics,-relationships,-export,-admin,-github,-backup"

Token savings by configuration:

Configuration	Filter	Tools	Est. Token Savings
Full (default)	(none)	39	Baseline (~6,050 tokens)
Read-only	`-admin`	34	~15% (~900 tokens/request)
No GitHub	`-github`	24	~36% (~2,100 tokens/request)
Focused	`-admin,-github`	19	~51% (~3,000 tokens/request)
Minimal Search	`-analytics,-relationships,-export,-admin,-github,-backup`	10	~73% (~4,300 tokens/request)
Core Only	`-search,-analytics,-relationships,-export,-admin,-github,-backup`	6	~84% (~5,000 tokens/request)

Benefits:

Faster AI responses - Smaller context = faster processing
Reduced API costs - Fewer tokens = lower bills
Stay under client limits - Essential for Windsurf (100-tool limit)
Better tool selection - AI makes better choices with fewer options

Complete Tool Filtering Guide →

Optimization Tips

For Faster Searches

1. Use specific queries:

// Good
search_entries({ query: "lazy loading pattern" });

// Poor
search_entries({ query: "loading" });

2. Filter early:

// Good
search_entries({
  query: "optimization",
  is_personal: false, // Reduces search space
});

3. Use appropriate limits:

// Default 10 is good
search_entries({ query: "...", limit: 10 });

// Only increase if needed
search_entries({ query: "...", limit: 50 }); // Slower

For Faster Entry Creation

1. Disable auto_context if not needed:

create_entry({
  content: "...",
  auto_context: false, // Skips Git subprocess
});

2. Batch tag creation:

// Create entries with same tags together
// Tags only created once

For Faster Visualization

1. Use appropriate depth:

// Good balance
visualize_relationships({ entry_id: 42, depth: 2 });

// Only use depth 3 if necessary
visualize_relationships({ entry_id: 42, depth: 3 }); // Slower

2. Limit graph size:

visualize_relationships({
  tags: ["feature"],
  limit: 20, // Sweet spot
});

Benchmarks

Memory Journal is designed for extremely low overhead during AI task execution. We include a vitest bench suite to maintain these baseline guarantees:

Database Reads: Operations execute in fractions of a millisecond. calculateImportance is ~55x faster than retrieving 50 recent entries.
Vector Search Engine: Semantic searches via vectra perform significantly faster than parallel entry indexing (>190x faster locally).
Core MCP Routines: Complex operations exhibit negligible latency when executed through standard MCP tools. Calling tools natively adds ~1.4x overhead compared to direct function execution.

To run the benchmarking suite locally:

npm run bench

Maintenance for Performance

Monthly

-- Update statistics
ANALYZE;

-- Check index usage
EXPLAIN QUERY PLAN SELECT ...;

Quarterly

-- Reclaim space
VACUUM;

-- Integrity check
PRAGMA integrity_check;

As Needed

Archive old entries
Review slow queries
Check database size

Troubleshooting Performance Issues

Slow Startup

Check:

Version (should be v2.1.0+)
ML dependencies installed (v1.2.1 fixed first-load delay)
System resources (CPU, disk)

Fix:

Update to v2.1.0 (includes all performance optimizations)
Use Docker image (optimized)
Remove ML dependencies if not needed

Slow Searches

Check:

Database size
Query complexity
Filters applied

Fix:

Use more specific queries
Add filters (tags, dates, is_personal)
Run ANALYZE
Consider archiving

Database Locks

Check:

Multiple connections
Long transactions
Nested database calls

Fix:

Update to v2.1.0 (includes all locking fixes)
Use single connection per transaction
Avoid nested database calls

Next: Check Architecture or Security.

Performance

Performance

Performance Overview

Startup Performance

Lazy Loading Optimization

Modular Architecture

Database Performance

SQLite Optimization

Indexing Strategy

FTS5 Performance

Transaction Management

Single Connection Pattern

Thread-Safe Tag Creation

Search Performance

Full-Text Search (FTS5)

Date Range Search

Semantic Search

Visualization Performance

Mermaid Generation

Memory Usage

Typical Memory Footprint

Scaling Considerations

Small Journal (<1000 entries)

Medium Journal (1000-10000 entries)

Large Journal (>10000 entries)

Token Efficiency

Tool Filtering

Optimization Tips

For Faster Searches

For Faster Entry Creation

For Faster Visualization

Benchmarks

Maintenance for Performance

Monthly

Quarterly

As Needed

Troubleshooting Performance Issues

Slow Startup

Slow Searches

Database Locks

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

🏠 Home

Getting Started

Core Features

Knowledge Graph

GitHub Integration

Workflows

Deployment

Architecture

Clone this wiki locally