-
Notifications
You must be signed in to change notification settings - Fork 7
Performance
Optimization strategies and best practices for Memory Journal MCP Server.
Design Philosophy: Fast context retrieval is critical for AI workflows. Memory Journal is optimized for sub-second query response times, enabling AI to access project history without noticeable latency.
Memory Journal v2.2.0 achieves:
- 2-3 second startup (10x improvement from v1.0)
- <10ms entry creation (typical)
- <50ms full-text search (for 1000 entries)
- <1s semantic search (reliable on first load after v1.2.1 fix)
- No performance degradation from v2.x refactoring (all async operations maintained)
Problem (v1.0):
- Startup time: 14 seconds
- ML dependencies loaded eagerly
-
import sentence_transformerstook ~14s
Solution (v1.1.3):
- Lazy import of ML libraries
- Model loads only on first semantic search
- Startup: 2-3 seconds
Changes:
- Refactored from single 4093-line file to 30 modules
- Each module ~150-300 lines (highly maintainable)
- Clear separation of concerns
Performance Impact:
- ✅ No degradation - All async operations preserved
- ✅ Same startup time - 2-3 seconds maintained
- ✅ Same operation speed - No overhead from modularization
- ✅ Better maintainability - 10x easier to optimize specific components
- ✅ True Pyright strict - 700+ type issues fixed, zero exclusions (better IDE support)
The modular architecture makes it easier to identify and optimize performance bottlenecks in the future!
Implementation:
# Top level - check availability only
VECTOR_SEARCH_AVAILABLE = False
if importlib.util.find_spec("sentence_transformers"):
VECTOR_SEARCH_AVAILABLE = True
# In VectorSearchManager
def _ensure_initialized(self):
if not self.initialized:
# Load on first use
from sentence_transformers import SentenceTransformer
import faiss
self.model = SentenceTransformer(self.model_name)
self.initialized = TrueTimeline:
- Server startup: 2-3s
- First semantic search: +15s (one-time)
- Subsequent searches: <1s
Memory Journal uses optimal SQLite settings:
PRAGMA journal_mode = WAL; # Write-Ahead Logging
PRAGMA synchronous = NORMAL; # Balance speed/safety
PRAGMA cache_size = -64000; # 64MB cache
PRAGMA mmap_size = 268435456; # 256MB memory-mapped I/O
PRAGMA temp_store = MEMORY; # Temp tables in memory
PRAGMA busy_timeout = 30000; # 30s lock timeoutBenefits:
- WAL mode: Concurrent reads + writes
- Large cache: Hot data stays in memory
- Memory-mapped I/O: Direct memory access
- Memory temp store: Fast temporary operations
Indexes created:
-- Entry lookups
CREATE INDEX idx_memory_journal_timestamp ON memory_journal(timestamp);
CREATE INDEX idx_memory_journal_type ON memory_journal(entry_type);
CREATE INDEX idx_memory_journal_personal ON memory_journal(is_personal);
CREATE INDEX idx_memory_journal_deleted ON memory_journal(deleted_at);
-- Tag lookups
CREATE INDEX idx_tags_name ON tags(name);
CREATE INDEX idx_entry_tags_entry ON entry_tags(entry_id);
CREATE INDEX idx_entry_tags_tag ON entry_tags(tag_id);
-- Relationship lookups
CREATE INDEX idx_relationships_from ON relationships(from_entry_id);
CREATE INDEX idx_relationships_to ON relationships(to_entry_id);Query optimization:
- All queries use appropriate indexes
- No full table scans (except ANALYZE)
- Covering indexes where possible
Full-text search speed:
| Entries | Search Time |
|---|---|
| 100 | <5ms |
| 1,000 | <10ms |
| 10,000 | <50ms |
| 100,000 | <200ms |
Optimization:
- Porter stemming (matches variations)
- BM25 ranking (relevance scoring)
- Result limits (default 10)
Anti-pattern (v1.0 - caused locking):
def update_entry(...):
with db.get_connection() as conn1:
conn1.execute("UPDATE memory_journal ...")
# Nested connection - causes lock!
db.auto_create_tags(tags) # Opens conn2Correct pattern (v1.2.1):
def update_entry(...):
with db.get_connection() as conn:
conn.execute("UPDATE memory_journal ...")
# Use same connection
conn.execute("INSERT OR IGNORE INTO tags ...")
# No nested connections = no locksBenefits:
- No database locking
- Atomic transactions
- Better concurrency
Race-safe pattern:
# INSERT OR IGNORE prevents race conditions
conn.execute(
"INSERT OR IGNORE INTO tags (name, usage_count) VALUES (?, 1)",
(tag_name,)
)
# Then lookup
cursor = conn.execute("SELECT id FROM tags WHERE name = ?", (tag_name,))
tag_id = cursor.fetchone()[0]Handles concurrent tag creation:
- Multiple entries creating same tag
- No duplicates
- No conflicts
Fastest option:
- SQLite native
- Optimized C implementation
- No external dependencies
Best practices:
// Use specific terms
search_entries({ query: "lazy loading pattern" }) // Fast
// Avoid overly broad
search_entries({ query: "a" }) // Slow (too many matches)
// Use limits
search_entries({ query: "optimization", limit: 10 }) // FastVery fast:
- Indexed by timestamp
- Simple range query
- ~5ms typical
Best practices:
// Reasonable ranges
search_by_date_range({
start_date: "2025-10-01",
end_date: "2025-10-31" // 1 month
})
// Add filters for large ranges
search_by_date_range({
start_date: "2025-01-01",
end_date: "2025-12-31", // 1 year
tags: ["performance"] // Filter!
})Performance characteristics:
| Operation | Time |
|---|---|
| First search (model load) | 15s |
| Generate embedding | 50-100ms |
| FAISS search | 10-50ms |
| Fetch entries | 10-50ms |
| Total (subsequent) | 70-200ms |
Optimization:
- Model cached after first load
- FAISS index in memory
- Batch entry fetching
Best practices:
// Use reasonable limits
semantic_search({ query: "...", limit: 10 }) // Fast
// Higher thresholds = faster
semantic_search({
query: "...",
similarity_threshold: 0.5 // Fewer results
})Performance:
- Entry-centric (depth 2): <100ms
- Tag-based (20 entries): <50ms
- Recursive CTE: efficient graph traversal
Best practices:
// Reasonable depth
visualize_relationships({ entry_id: 42, depth: 2 }) // Good
// Reasonable limits
visualize_relationships({ tags: ["feature"], limit: 20 }) // Good
// Avoid huge graphs
visualize_relationships({ depth: 5, limit: 100 }) // Slow to renderMinimal (no semantic search):
- Server: 20-30 MB
- SQLite cache: 64 MB
- Total: ~100 MB
With semantic search:
- Server: 20-30 MB
- SQLite cache: 64 MB
- Model: 80-100 MB
- FAISS index: 1-10 MB (depends on entries)
- Total: ~200 MB
Performance:
- All operations <50ms
- Startup: 2-3s
- No optimization needed
Performance:
- FTS5 search: <50ms
- Semantic search: <200ms
- Startup: 2-3s
Recommendations:
- Use search filters (tags, dates)
- Limit visualization size
- Regular ANALYZE (monthly)
Performance:
- FTS5 search: <200ms
- Semantic search: <500ms
- Startup: 2-3s
Recommendations:
- Archive old entries
- Use specific search queries
- Increase similarity_threshold
- Consider database partitioning
Reduce context window consumption by disabling unused tools:
MEMORY_JOURNAL_MCP_TOOL_FILTER="-search,-analytics,-relationships,-export,-admin,-test"Token savings by configuration:
| Configuration | Filter | Tools | Token Reduction |
|---|---|---|---|
| Full (default) | (none) | 16 | Baseline (~2,450 tokens) |
| Production | -test |
14 | ~12% |
| Read-only | -admin |
14 | ~15% |
| Focused | -test,-admin |
12 | ~25% |
| Lightweight | (core only) | 5 | ~69% |
Benefits:
- Faster AI responses - Smaller context = faster processing
- Reduced API costs - Fewer tokens = lower bills
- Stay under client limits - Essential for Windsurf (100-tool limit)
- Better tool selection - AI makes better choices with fewer options
Complete Tool Filtering Guide →
1. Use specific queries:
// Good
search_entries({ query: "lazy loading pattern" })
// Poor
search_entries({ query: "loading" })2. Filter early:
// Good
search_entries({
query: "optimization",
is_personal: false // Reduces search space
})3. Use appropriate limits:
// Default 10 is good
search_entries({ query: "...", limit: 10 })
// Only increase if needed
search_entries({ query: "...", limit: 50 }) // Slower1. Disable auto_context if not needed:
create_entry({
content: "...",
auto_context: false // Skips Git subprocess
})2. Batch tag creation:
// Create entries with same tags together
// Tags only created once1. Use appropriate depth:
// Good balance
visualize_relationships({ entry_id: 42, depth: 2 })
// Only use depth 3 if necessary
visualize_relationships({ entry_id: 42, depth: 3 }) // Slower2. Limit graph size:
visualize_relationships({
tags: ["feature"],
limit: 20 // Sweet spot
})| Operation | Time |
|---|---|
| Create entry (no Git) | 5-10ms |
| Create entry (with Git) | 50-100ms |
| Update entry | 5-15ms |
| Delete entry (soft) | 5-10ms |
| Delete entry (permanent) | 10-20ms |
| Get entry by ID | 1-5ms |
| Operation | Entries | Time |
|---|---|---|
| FTS5 search | 1,000 | <10ms |
| FTS5 search | 10,000 | <50ms |
| Date range | 1,000 | <5ms |
| Date range | 10,000 | <20ms |
| Semantic search | 1,000 | 100-200ms |
| Semantic search | 10,000 | 200-500ms |
| Operation | Entries | Relationships | Time |
|---|---|---|---|
| Entry-centric (depth 2) | 10 | 15 | <50ms |
| Tag-based | 20 | 30 | <100ms |
| Recent | 20 | 25 | <100ms |
-- Update statistics
ANALYZE;
-- Check index usage
EXPLAIN QUERY PLAN SELECT ...;-- Reclaim space
VACUUM;
-- Integrity check
PRAGMA integrity_check;- Archive old entries
- Review slow queries
- Check database size
Check:
- Version (should be v2.1.0+)
- ML dependencies installed (v1.2.1 fixed first-load delay)
- System resources (CPU, disk)
Fix:
- Update to v2.1.0 (includes all performance optimizations)
- Use Docker image (optimized)
- Remove ML dependencies if not needed
Check:
- Database size
- Query complexity
- Filters applied
Fix:
- Use more specific queries
- Add filters (tags, dates, is_personal)
- Run ANALYZE
- Consider archiving
Check:
- Multiple connections
- Long transactions
- Nested database calls
Fix:
- Update to v2.1.0 (includes all locking fixes)
- Use single connection per transaction
- Avoid nested database calls
Next: Check Architecture or Security.