Skip to content

Performance

Temp edited this page Dec 8, 2025 · 25 revisions

Performance

Optimization strategies and best practices for Memory Journal MCP Server.

Design Philosophy: Fast context retrieval is critical for AI workflows. Memory Journal is optimized for sub-second query response times, enabling AI to access project history without noticeable latency.


Performance Overview

Memory Journal v2.2.0 achieves:

  • 2-3 second startup (10x improvement from v1.0)
  • <10ms entry creation (typical)
  • <50ms full-text search (for 1000 entries)
  • <1s semantic search (reliable on first load after v1.2.1 fix)
  • No performance degradation from v2.x refactoring (all async operations maintained)

Startup Performance

v1.1.3 Optimization: Lazy Loading

Problem (v1.0):

  • Startup time: 14 seconds
  • ML dependencies loaded eagerly
  • import sentence_transformers took ~14s

Solution (v1.1.3):

  • Lazy import of ML libraries
  • Model loads only on first semantic search
  • Startup: 2-3 seconds

v2.0.0 Optimization: Modular Architecture

Changes:

  • Refactored from single 4093-line file to 30 modules
  • Each module ~150-300 lines (highly maintainable)
  • Clear separation of concerns

Performance Impact:

  • No degradation - All async operations preserved
  • Same startup time - 2-3 seconds maintained
  • Same operation speed - No overhead from modularization
  • Better maintainability - 10x easier to optimize specific components
  • True Pyright strict - 700+ type issues fixed, zero exclusions (better IDE support)

The modular architecture makes it easier to identify and optimize performance bottlenecks in the future!

Implementation:

# Top level - check availability only
VECTOR_SEARCH_AVAILABLE = False
if importlib.util.find_spec("sentence_transformers"):
    VECTOR_SEARCH_AVAILABLE = True

# In VectorSearchManager
def _ensure_initialized(self):
    if not self.initialized:
        # Load on first use
        from sentence_transformers import SentenceTransformer
        import faiss
        self.model = SentenceTransformer(self.model_name)
        self.initialized = True

Timeline:

  • Server startup: 2-3s
  • First semantic search: +15s (one-time)
  • Subsequent searches: <1s

Database Performance

SQLite Optimization

Memory Journal uses optimal SQLite settings:

PRAGMA journal_mode = WAL;          # Write-Ahead Logging
PRAGMA synchronous = NORMAL;         # Balance speed/safety
PRAGMA cache_size = -64000;          # 64MB cache
PRAGMA mmap_size = 268435456;        # 256MB memory-mapped I/O
PRAGMA temp_store = MEMORY;          # Temp tables in memory
PRAGMA busy_timeout = 30000;         # 30s lock timeout

Benefits:

  • WAL mode: Concurrent reads + writes
  • Large cache: Hot data stays in memory
  • Memory-mapped I/O: Direct memory access
  • Memory temp store: Fast temporary operations

Indexing Strategy

Indexes created:

-- Entry lookups
CREATE INDEX idx_memory_journal_timestamp ON memory_journal(timestamp);
CREATE INDEX idx_memory_journal_type ON memory_journal(entry_type);
CREATE INDEX idx_memory_journal_personal ON memory_journal(is_personal);
CREATE INDEX idx_memory_journal_deleted ON memory_journal(deleted_at);

-- Tag lookups
CREATE INDEX idx_tags_name ON tags(name);
CREATE INDEX idx_entry_tags_entry ON entry_tags(entry_id);
CREATE INDEX idx_entry_tags_tag ON entry_tags(tag_id);

-- Relationship lookups
CREATE INDEX idx_relationships_from ON relationships(from_entry_id);
CREATE INDEX idx_relationships_to ON relationships(to_entry_id);

Query optimization:

  • All queries use appropriate indexes
  • No full table scans (except ANALYZE)
  • Covering indexes where possible

FTS5 Performance

Full-text search speed:

Entries Search Time
100 <5ms
1,000 <10ms
10,000 <50ms
100,000 <200ms

Optimization:

  • Porter stemming (matches variations)
  • BM25 ranking (relevance scoring)
  • Result limits (default 10)

Transaction Management

Single Connection Pattern

Anti-pattern (v1.0 - caused locking):

def update_entry(...):
    with db.get_connection() as conn1:
        conn1.execute("UPDATE memory_journal ...")
        # Nested connection - causes lock!
        db.auto_create_tags(tags)  # Opens conn2

Correct pattern (v1.2.1):

def update_entry(...):
    with db.get_connection() as conn:
        conn.execute("UPDATE memory_journal ...")
        # Use same connection
        conn.execute("INSERT OR IGNORE INTO tags ...")
        # No nested connections = no locks

Benefits:

  • No database locking
  • Atomic transactions
  • Better concurrency

Thread-Safe Tag Creation

Race-safe pattern:

# INSERT OR IGNORE prevents race conditions
conn.execute(
    "INSERT OR IGNORE INTO tags (name, usage_count) VALUES (?, 1)",
    (tag_name,)
)

# Then lookup
cursor = conn.execute("SELECT id FROM tags WHERE name = ?", (tag_name,))
tag_id = cursor.fetchone()[0]

Handles concurrent tag creation:

  • Multiple entries creating same tag
  • No duplicates
  • No conflicts

Search Performance

Full-Text Search (FTS5)

Fastest option:

  • SQLite native
  • Optimized C implementation
  • No external dependencies

Best practices:

// Use specific terms
search_entries({ query: "lazy loading pattern" })  // Fast

// Avoid overly broad
search_entries({ query: "a" })  // Slow (too many matches)

// Use limits
search_entries({ query: "optimization", limit: 10 })  // Fast

Date Range Search

Very fast:

  • Indexed by timestamp
  • Simple range query
  • ~5ms typical

Best practices:

// Reasonable ranges
search_by_date_range({
  start_date: "2025-10-01",
  end_date: "2025-10-31"  // 1 month
})

// Add filters for large ranges
search_by_date_range({
  start_date: "2025-01-01",
  end_date: "2025-12-31",  // 1 year
  tags: ["performance"]     // Filter!
})

Semantic Search

Performance characteristics:

Operation Time
First search (model load) 15s
Generate embedding 50-100ms
FAISS search 10-50ms
Fetch entries 10-50ms
Total (subsequent) 70-200ms

Optimization:

  • Model cached after first load
  • FAISS index in memory
  • Batch entry fetching

Best practices:

// Use reasonable limits
semantic_search({ query: "...", limit: 10 })  // Fast

// Higher thresholds = faster
semantic_search({
  query: "...",
  similarity_threshold: 0.5  // Fewer results
})

Visualization Performance

Mermaid Generation

Performance:

  • Entry-centric (depth 2): <100ms
  • Tag-based (20 entries): <50ms
  • Recursive CTE: efficient graph traversal

Best practices:

// Reasonable depth
visualize_relationships({ entry_id: 42, depth: 2 })  // Good

// Reasonable limits
visualize_relationships({ tags: ["feature"], limit: 20 })  // Good

// Avoid huge graphs
visualize_relationships({ depth: 5, limit: 100 })  // Slow to render

Memory Usage

Typical Memory Footprint

Minimal (no semantic search):

  • Server: 20-30 MB
  • SQLite cache: 64 MB
  • Total: ~100 MB

With semantic search:

  • Server: 20-30 MB
  • SQLite cache: 64 MB
  • Model: 80-100 MB
  • FAISS index: 1-10 MB (depends on entries)
  • Total: ~200 MB

Scaling Considerations

Small Journal (<1000 entries)

Performance:

  • All operations <50ms
  • Startup: 2-3s
  • No optimization needed

Medium Journal (1000-10000 entries)

Performance:

  • FTS5 search: <50ms
  • Semantic search: <200ms
  • Startup: 2-3s

Recommendations:

  • Use search filters (tags, dates)
  • Limit visualization size
  • Regular ANALYZE (monthly)

Large Journal (>10000 entries)

Performance:

  • FTS5 search: <200ms
  • Semantic search: <500ms
  • Startup: 2-3s

Recommendations:

  • Archive old entries
  • Use specific search queries
  • Increase similarity_threshold
  • Consider database partitioning

Token Efficiency (v2.2.0)

Tool Filtering

Reduce context window consumption by disabling unused tools:

MEMORY_JOURNAL_MCP_TOOL_FILTER="-search,-analytics,-relationships,-export,-admin,-test"

Token savings by configuration:

Configuration Filter Tools Token Reduction
Full (default) (none) 16 Baseline (~2,450 tokens)
Production -test 14 ~12%
Read-only -admin 14 ~15%
Focused -test,-admin 12 ~25%
Lightweight (core only) 5 ~69%

Benefits:

  • Faster AI responses - Smaller context = faster processing
  • Reduced API costs - Fewer tokens = lower bills
  • Stay under client limits - Essential for Windsurf (100-tool limit)
  • Better tool selection - AI makes better choices with fewer options

Complete Tool Filtering Guide →


Optimization Tips

For Faster Searches

1. Use specific queries:

// Good
search_entries({ query: "lazy loading pattern" })

// Poor
search_entries({ query: "loading" })

2. Filter early:

// Good
search_entries({
  query: "optimization",
  is_personal: false  // Reduces search space
})

3. Use appropriate limits:

// Default 10 is good
search_entries({ query: "...", limit: 10 })

// Only increase if needed
search_entries({ query: "...", limit: 50 })  // Slower

For Faster Entry Creation

1. Disable auto_context if not needed:

create_entry({
  content: "...",
  auto_context: false  // Skips Git subprocess
})

2. Batch tag creation:

// Create entries with same tags together
// Tags only created once

For Faster Visualization

1. Use appropriate depth:

// Good balance
visualize_relationships({ entry_id: 42, depth: 2 })

// Only use depth 3 if necessary
visualize_relationships({ entry_id: 42, depth: 3 })  // Slower

2. Limit graph size:

visualize_relationships({
  tags: ["feature"],
  limit: 20  // Sweet spot
})

Benchmarks

Entry Operations

Operation Time
Create entry (no Git) 5-10ms
Create entry (with Git) 50-100ms
Update entry 5-15ms
Delete entry (soft) 5-10ms
Delete entry (permanent) 10-20ms
Get entry by ID 1-5ms

Search Operations

Operation Entries Time
FTS5 search 1,000 <10ms
FTS5 search 10,000 <50ms
Date range 1,000 <5ms
Date range 10,000 <20ms
Semantic search 1,000 100-200ms
Semantic search 10,000 200-500ms

Visualization

Operation Entries Relationships Time
Entry-centric (depth 2) 10 15 <50ms
Tag-based 20 30 <100ms
Recent 20 25 <100ms

Maintenance for Performance

Monthly

-- Update statistics
ANALYZE;

-- Check index usage
EXPLAIN QUERY PLAN SELECT ...;

Quarterly

-- Reclaim space
VACUUM;

-- Integrity check
PRAGMA integrity_check;

As Needed

  • Archive old entries
  • Review slow queries
  • Check database size

Troubleshooting Performance Issues

Slow Startup

Check:

  • Version (should be v2.1.0+)
  • ML dependencies installed (v1.2.1 fixed first-load delay)
  • System resources (CPU, disk)

Fix:

  • Update to v2.1.0 (includes all performance optimizations)
  • Use Docker image (optimized)
  • Remove ML dependencies if not needed

Slow Searches

Check:

  • Database size
  • Query complexity
  • Filters applied

Fix:

  • Use more specific queries
  • Add filters (tags, dates, is_personal)
  • Run ANALYZE
  • Consider archiving

Database Locks

Check:

  • Multiple connections
  • Long transactions
  • Nested database calls

Fix:

  • Update to v2.1.0 (includes all locking fixes)
  • Use single connection per transaction
  • Avoid nested database calls

Next: Check Architecture or Security.

Clone this wiki locally