-
Notifications
You must be signed in to change notification settings - Fork 7
Performance
Optimization strategies and best practices for Memory Journal MCP Server.
Design Philosophy: Fast context retrieval is critical for AI workflows. Memory Journal is optimized for sub-second query response times, enabling AI to access project history without noticeable latency.
Memory Journal v3.0.0 achieves:
- 2-3 second startup (10x improvement from v1.0)
- <10ms entry creation (typical)
- <50ms full-text search (for 1000 entries)
- <1s semantic search (reliable on first load after v1.2.1 fix)
- No performance degradation from v2.x refactoring (all async operations maintained)
Problem (early versions):
- Startup time: 14 seconds
- ML dependencies loaded eagerly
Solution:
- Lazy initialization of ML model (
@xenova/transformers) - Model loads only on first semantic search
- Startup: 2-3 seconds
v3.0.0 TypeScript architecture:
- Modular TypeScript codebase with clear separation of concerns
- Handler layer, manager layer, and database layer
- Each module focused on a single responsibility
Performance Impact:
- ✅ No degradation - All async operations preserved
- ✅ Fast startup - 2-3 seconds
- ✅ No overhead from modularization
- ✅ Better maintainability - Easy to optimize specific components
- ✅ Full TypeScript strict mode - Zero type errors
The modular architecture makes it easier to identify and optimize performance bottlenecks!
Implementation (v3.0.0 TypeScript):
// Lazy initialization - model loads on first semantic search
private async ensureInitialized(): Promise<void> {
if (this.initialized) return;
// Load embeddings model (lazy, ~5s first time)
this.embedder = await pipeline(
'feature-extraction',
'Xenova/all-MiniLM-L6-v2'
);
this.initialized = true;
}Timeline:
- Server startup: 2-3s
- First semantic search: +5s (one-time model load)
- Subsequent searches: <1s
Memory Journal uses optimal SQLite settings:
PRAGMA journal_mode = WAL; -- Write-Ahead Logging
PRAGMA synchronous = NORMAL; -- Balance speed/safety
PRAGMA cache_size = -64000; -- 64MB cache
PRAGMA mmap_size = 268435456; -- 256MB memory-mapped I/O
PRAGMA temp_store = MEMORY; -- Temp tables in memory
PRAGMA busy_timeout = 30000; -- 30s lock timeoutBenefits:
- WAL mode: Concurrent reads + writes
- Large cache: Hot data stays in memory
- Memory-mapped I/O: Direct memory access
- Memory temp store: Fast temporary operations
Indexes created:
-- Entry lookups
CREATE INDEX idx_memory_journal_timestamp ON memory_journal(timestamp);
CREATE INDEX idx_memory_journal_type ON memory_journal(entry_type);
CREATE INDEX idx_memory_journal_personal ON memory_journal(is_personal);
CREATE INDEX idx_memory_journal_deleted ON memory_journal(deleted_at);
-- Tag lookups
CREATE INDEX idx_tags_name ON tags(name);
CREATE INDEX idx_entry_tags_entry ON entry_tags(entry_id);
CREATE INDEX idx_entry_tags_tag ON entry_tags(tag_id);
-- Relationship lookups
CREATE INDEX idx_relationships_from ON relationships(from_entry_id);
CREATE INDEX idx_relationships_to ON relationships(to_entry_id);Query optimization:
- All queries use appropriate indexes
- No full table scans (except ANALYZE)
- Covering indexes where possible
Full-text search speed:
| Entries | Search Time |
|---|---|
| 100 | <5ms |
| 1,000 | <10ms |
| 10,000 | <50ms |
| 100,000 | <200ms |
Optimization:
- Porter stemming (matches variations)
- BM25 ranking (relevance scoring)
- Result limits (default 10)
Anti-pattern (causes locking):
async function updateEntry(...) {
const db1 = getDb();
db1.run("UPDATE memory_journal ...");
// Nested connection - causes lock!
await autoCreateTags(tags); // Opens separate db handle
}Correct pattern:
async function updateEntry(...) {
const db = getDb();
db.run("UPDATE memory_journal ...");
// Use same connection
db.run("INSERT OR IGNORE INTO tags ...");
// No nested connections = no locks
}Benefits:
- No database locking
- Atomic transactions
- Better concurrency
Race-safe pattern:
// INSERT OR IGNORE prevents race conditions
db.run("INSERT OR IGNORE INTO tags (name, usage_count) VALUES (?, 1)", [
tagName,
]);
// Then lookup
const result = db.exec("SELECT id FROM tags WHERE name = ?", [tagName]);
const tagId = result[0].values[0][0];Handles concurrent tag creation:
- Multiple entries creating same tag
- No duplicates
- No conflicts
Fastest option:
- SQLite native
- Optimized C implementation
- No external dependencies
Best practices:
// Use specific terms
search_entries({ query: "lazy loading pattern" }); // Fast
// Avoid overly broad
search_entries({ query: "a" }); // Slow (too many matches)
// Use limits
search_entries({ query: "optimization", limit: 10 }); // FastVery fast:
- Indexed by timestamp
- Simple range query
- ~5ms typical
Best practices:
// Reasonable ranges
search_by_date_range({
start_date: "2025-10-01",
end_date: "2025-10-31", // 1 month
});
// Add filters for large ranges
search_by_date_range({
start_date: "2025-01-01",
end_date: "2025-12-31", // 1 year
tags: ["performance"], // Filter!
});Performance characteristics:
| Operation | Time |
|---|---|
| First search (model load) | ~5s |
| Generate embedding | 50-100ms |
| vectra search | 10-50ms |
| Fetch entries | 10-50ms |
| Total (subsequent) | 70-200ms |
Optimization:
- Model cached after first load
- vectra index in memory
- Batch entry fetching
Best practices:
// Use reasonable limits
semantic_search({ query: "...", limit: 10 }); // Fast
// Higher thresholds = faster
semantic_search({
query: "...",
similarity_threshold: 0.5, // Fewer results
});Performance:
- Entry-centric (depth 2): <100ms
- Tag-based (20 entries): <50ms
- Recursive CTE: efficient graph traversal
Best practices:
// Reasonable depth
visualize_relationships({ entry_id: 42, depth: 2 }); // Good
// Reasonable limits
visualize_relationships({ tags: ["feature"], limit: 20 }); // Good
// Avoid huge graphs
visualize_relationships({ depth: 5, limit: 100 }); // Slow to renderMinimal (no semantic search):
- Server: 20-30 MB
- SQLite cache: 64 MB
- Total: ~100 MB
With semantic search:
- Server: 20-30 MB
- SQLite cache: 64 MB
- Model: 80-100 MB
- vectra index: 1-10 MB (depends on entries)
- Total: ~200 MB
Performance:
- All operations <50ms
- Startup: 2-3s
- No optimization needed
Performance:
- FTS5 search: <50ms
- Semantic search: <200ms
- Startup: 2-3s
Recommendations:
- Use search filters (tags, dates)
- Limit visualization size
- Regular ANALYZE (monthly)
Performance:
- FTS5 search: <200ms
- Semantic search: <500ms
- Startup: 2-3s
Recommendations:
- Archive old entries
- Use specific search queries
- Increase similarity_threshold
- Consider database partitioning
Reduce context window consumption by disabling unused tools:
MEMORY_JOURNAL_MCP_TOOL_FILTER="-search,-analytics,-relationships,-export,-admin,-github,-backup"Token savings by configuration:
| Configuration | Filter | Tools | Est. Token Savings |
|---|---|---|---|
| Full (default) | (none) | 39 | Baseline (~6,050 tokens) |
| Read-only | -admin |
34 | ~15% (~900 tokens/request) |
| No GitHub | -github |
24 | ~36% (~2,100 tokens/request) |
| Focused | -admin,-github |
19 | ~51% (~3,000 tokens/request) |
| Minimal Search | -analytics,-relationships,-export,-admin,-github,-backup |
10 | ~73% (~4,300 tokens/request) |
| Core Only | -search,-analytics,-relationships,-export,-admin,-github,-backup |
6 | ~84% (~5,000 tokens/request) |
Benefits:
- Faster AI responses - Smaller context = faster processing
- Reduced API costs - Fewer tokens = lower bills
- Stay under client limits - Essential for Windsurf (100-tool limit)
- Better tool selection - AI makes better choices with fewer options
Complete Tool Filtering Guide →
1. Use specific queries:
// Good
search_entries({ query: "lazy loading pattern" });
// Poor
search_entries({ query: "loading" });2. Filter early:
// Good
search_entries({
query: "optimization",
is_personal: false, // Reduces search space
});3. Use appropriate limits:
// Default 10 is good
search_entries({ query: "...", limit: 10 });
// Only increase if needed
search_entries({ query: "...", limit: 50 }); // Slower1. Disable auto_context if not needed:
create_entry({
content: "...",
auto_context: false, // Skips Git subprocess
});2. Batch tag creation:
// Create entries with same tags together
// Tags only created once1. Use appropriate depth:
// Good balance
visualize_relationships({ entry_id: 42, depth: 2 });
// Only use depth 3 if necessary
visualize_relationships({ entry_id: 42, depth: 3 }); // Slower2. Limit graph size:
visualize_relationships({
tags: ["feature"],
limit: 20, // Sweet spot
});Memory Journal is designed for extremely low overhead during AI task execution. We include a vitest bench suite to maintain these baseline guarantees:
-
Database Reads: Operations execute in fractions of a millisecond.
calculateImportanceis ~55x faster than retrieving 50 recent entries. -
Vector Search Engine: Semantic searches via
vectraperform significantly faster than parallel entry indexing (>190x faster locally). - Core MCP Routines: Complex operations exhibit negligible latency when executed through standard MCP tools. Calling tools natively adds ~1.4x overhead compared to direct function execution.
To run the benchmarking suite locally:
npm run bench-- Update statistics
ANALYZE;
-- Check index usage
EXPLAIN QUERY PLAN SELECT ...;-- Reclaim space
VACUUM;
-- Integrity check
PRAGMA integrity_check;- Archive old entries
- Review slow queries
- Check database size
Check:
- Version (should be v2.1.0+)
- ML dependencies installed (v1.2.1 fixed first-load delay)
- System resources (CPU, disk)
Fix:
- Update to v2.1.0 (includes all performance optimizations)
- Use Docker image (optimized)
- Remove ML dependencies if not needed
Check:
- Database size
- Query complexity
- Filters applied
Fix:
- Use more specific queries
- Add filters (tags, dates, is_personal)
- Run ANALYZE
- Consider archiving
Check:
- Multiple connections
- Long transactions
- Nested database calls
Fix:
- Update to v2.1.0 (includes all locking fixes)
- Use single connection per transaction
- Avoid nested database calls
Next: Check Architecture or Security.