-
Notifications
You must be signed in to change notification settings - Fork 7
Performance
Optimization strategies and best practices for Memory Journal MCP Server.
Design Philosophy: Fast context retrieval is critical for AI workflows. Memory Journal is optimized for sub-second query response times, enabling AI to access project history without noticeable latency.
Memory Journal achieves:
- 2-3 second startup (10x improvement from v1.0)
- <10ms entry creation (typical)
- <50ms full-text search (for 1000 entries)
- <1s semantic search (reliable on first load after v1.2.1 fix)
- No performance degradation from v2.x refactoring (all async operations maintained)
Problem (early versions):
- Startup time: 14 seconds
- ML dependencies loaded eagerly
Solution:
- Lazy initialization of ML model (
@huggingface/transformers) - Model loads only on first semantic search
- Startup: 2-3 seconds
v3.0.0 TypeScript architecture:
- Modular TypeScript codebase with clear separation of concerns
- Handler layer, manager layer, and database layer
- Each module focused on a single responsibility
Performance Impact:
- ✅ No degradation - All async operations preserved
- ✅ Fast startup - 2-3 seconds
- ✅ No overhead from modularization
- ✅ Better maintainability - Easy to optimize specific components
- ✅ Full TypeScript strict mode - Zero type errors
The modular architecture makes it easier to identify and optimize performance bottlenecks!
Implementation (v3.0.0 TypeScript):
// Lazy initialization - model loads on first semantic search
private async ensureInitialized(): Promise<void> {
if (this.initialized) return;
// Load embeddings model (lazy, ~5s first time)
this.embedder = await pipeline(
'feature-extraction',
'Xenova/all-MiniLM-L6-v2'
);
this.initialized = true;
}Timeline:
- Server startup: 2-3s
- First semantic search: +5s (one-time model load)
- Subsequent searches: <1s
Memory Journal uses optimal SQLite settings:
-- better-sqlite3 uses native file-based database with WAL mode
PRAGMA journal_mode = WAL; -- Write-Ahead Logging for concurrent reads
PRAGMA synchronous = NORMAL; -- Balanced durability/performance
PRAGMA foreign_keys = ON;
PRAGMA busy_timeout = 30000; -- 30s wait for locksBenefits:
- WAL mode: Concurrent readers with single writer
- Native file-based: Direct disk access for reliable persistence
- Efficient I/O for queries
Indexes created:
-- Entry lookups
CREATE INDEX idx_memory_journal_timestamp ON memory_journal(timestamp);
CREATE INDEX idx_memory_journal_type ON memory_journal(entry_type);
CREATE INDEX idx_memory_journal_personal ON memory_journal(is_personal);
CREATE INDEX idx_memory_journal_deleted ON memory_journal(deleted_at);
-- Tag lookups
CREATE INDEX idx_tags_name ON tags(name);
CREATE INDEX idx_entry_tags_entry ON entry_tags(entry_id);
CREATE INDEX idx_entry_tags_tag ON entry_tags(tag_id);
-- Relationship lookups
CREATE INDEX idx_relationships_from ON relationships(from_entry_id);
CREATE INDEX idx_relationships_to ON relationships(to_entry_id);Query optimization:
- All queries use appropriate indexes
- No full table scans (except ANALYZE)
- Covering indexes where possible
Full-text search speed:
| Entries | Search Time |
|---|---|
| 100 | <5ms |
| 1,000 | <10ms |
| 10,000 | <50ms |
| 100,000 | <200ms |
Optimization:
- FTS5 queries with BM25 ranking
- Query term escaping for safe search
- Result limits (default 10)
- Fallback to LIKE for queries with special characters
Anti-pattern (causes locking):
async function updateEntry(...) {
const db1 = getDb();
db1.run("UPDATE memory_journal ...");
// Nested connection - causes lock!
await autoCreateTags(tags); // Opens separate db handle
}Correct pattern:
async function updateEntry(...) {
const db = getDb();
db.run("UPDATE memory_journal ...");
// Use same connection
db.run("INSERT OR IGNORE INTO tags ...");
// No nested connections = no locks
}Benefits:
- No database locking
- Atomic transactions
- Better concurrency
Batched race-safe pattern (v4.5+):
// Batch insert all tags at once — INSERT OR IGNORE prevents race conditions
const placeholders = tags.map(() => "(?, 0)").join(", ");
db.run(
`INSERT OR IGNORE INTO tags (name, usage_count) VALUES ${placeholders}`,
tags,
);
// Batch lookup all tag IDs in one query
const inClause = tags.map(() => "?").join(", ");
const result = db.exec(
`SELECT id, name FROM tags WHERE name IN (${inClause})`,
tags,
);Benefits:
- Batch insert + batch lookup (2 queries instead of 2N)
- No duplicates, no conflicts
- Multiple entries creating same tag handled atomically
Fastest option:
- better-sqlite3 native file-based database
- FTS5 with BM25 ranking
- No external dependencies
Best practices:
// Use specific terms
search_entries({ query: "lazy loading pattern" }); // Fast
// Avoid overly broad
search_entries({ query: "a" }); // Slow (too many matches)
// Use limits
search_entries({ query: "optimization", limit: 10 }); // FastVery fast:
- Indexed by timestamp
- Simple range query
- ~5ms typical
Best practices:
// Reasonable ranges
search_by_date_range({
start_date: "2025-10-01",
end_date: "2025-10-31", // 1 month
});
// Add filters for large ranges
search_by_date_range({
start_date: "2025-01-01",
end_date: "2025-12-31", // 1 year
tags: ["performance"], // Filter!
});Performance characteristics:
| Operation | Time |
|---|---|
| First search (model load) | ~5s |
| Generate embedding | 50-100ms |
| sqlite-vec search | 10-50ms |
| Fetch entries | 10-50ms |
| Total (subsequent) | 70-200ms |
Optimization:
- Model cached after first load
- sqlite-vec index in SQLite database
- Batch entry fetching
Best practices:
// Use reasonable limits
semantic_search({ query: "...", limit: 10 }); // Fast
// Higher thresholds = faster
semantic_search({
query: "...",
similarity_threshold: 0.5, // Fewer results
});Performance:
- Entry-centric (depth 2): <100ms
- Tag-based (20 entries): <50ms
- Recursive CTE: efficient graph traversal
Best practices:
// Reasonable depth
visualize_relationships({ entry_id: 42, depth: 2 }); // Good
// Reasonable limits
visualize_relationships({ tags: ["feature"], limit: 20 }); // Good
// Avoid huge graphs
visualize_relationships({ depth: 5, limit: 100 }); // Slow to renderMinimal (no semantic search):
- Server: 20-30 MB
- SQLite cache: 64 MB
- Total: ~100 MB
With semantic search:
- Server: 20-30 MB
- SQLite cache: 64 MB
- Model: 80-100 MB
- sqlite-vec index: 1-10 MB (depends on entries)
- Total: ~200 MB
Performance:
- All operations <50ms
- Startup: 2-3s
- No optimization needed
Performance:
- Full-text search: <50ms
- Semantic search: <200ms
- Startup: 2-3s
Recommendations:
- Use search filters (tags, dates)
- Limit visualization size
- Regular ANALYZE (monthly)
Performance:
- Full-text search: <200ms
- Semantic search: <500ms
- Startup: 2-3s
Recommendations:
- Archive old entries
- Use specific search queries
- Increase similarity_threshold
- Consider database partitioning
Reduce context window consumption by disabling unused tools:
MEMORY_JOURNAL_MCP_TOOL_FILTER="-search,-analytics,-relationships,-export,-admin,-github,-backup"Token savings: From the full 67-tool set (~6,500 tokens baseline), filtering can save up to ~86% by exposing only the 6 core tools. Common configurations like -github (~36%) or -admin,-github (~48%) provide significant savings with minimal functionality loss.
Benefits:
- Faster AI responses - Smaller context = faster processing
- Reduced API costs - Fewer tokens = lower bills
- Stay under client limits - Essential for Windsurf (100-tool limit)
- Better tool selection - AI makes better choices with fewer options
Complete Tool Filtering Guide →
1. Use specific queries:
// Good
search_entries({ query: "lazy loading pattern" });
// Poor
search_entries({ query: "loading" });2. Filter early:
// Good
search_entries({
query: "optimization",
is_personal: false, // Reduces search space
});3. Use appropriate limits:
// Default 10 is good
search_entries({ query: "...", limit: 10 });
// Only increase if needed
search_entries({ query: "...", limit: 50 }); // Slower1. Disable auto_context if not needed:
create_entry({
content: "...",
auto_context: false, // Skips Git subprocess
});2. Tag batching is automatic:
As of v4.5+, linkTagsToEntry batches all tag inserts and lookups internally. No user action needed — passing multiple tags is already optimized.
1. Use appropriate depth:
// Good balance
visualize_relationships({ entry_id: 42, depth: 2 });
// Only use depth 3 if necessary
visualize_relationships({ entry_id: 42, depth: 3 }); // Slower2. Limit graph size:
visualize_relationships({
tags: ["feature"],
limit: 20, // Sweet spot
});Memory Journal is designed for extremely low overhead during AI task execution. We include a vitest bench suite to maintain these baseline guarantees:
-
Database Reads: Operations execute in fractions of a millisecond.
calculateImportanceis ~7x faster than retrieving 50 recent entries (composite index optimization narrows this gap by acceleratinggetRecentEntries~4x). -
Vector Search Engine: Both search (780 ops/sec) and indexing (640 ops/sec) are high-throughput via
sqlite-vecwith SQL-native KNN queries. -
Core MCP Routines:
getToolsuses cached O(1) dispatch (~4800x faster than tool execution).create_entryandsearch_entriesexecute through the full MCP layer with sub-millisecond overhead.
To run the benchmarking suite locally:
npm run benchPerformance improvements implemented in the server internals (v4.5+). These are transparent to users — no API changes.
Multi-row methods (getRecentEntries, getEntriesPage, searchEntries, searchByDateRange) previously called getTagsForEntry() per row, causing N+1 queries. Now uses batchGetTagsForEntries() + rowsToEntries() to fetch all tags in a single IN (...) query.
| Operation | Before | After |
|---|---|---|
getRecentEntries(50) |
51 queries | 2 queries |
searchEntries (20 results) |
21 queries | 2 queries |
linkTagsToEntry() batches tag inserts and lookups:
| Step | Before (per tag) | After (batched) |
|---|---|---|
| Insert tags | N × INSERT OR IGNORE
|
1 × multi-value INSERT
|
| Lookup IDs | N × SELECT id
|
1 × SELECT ... WHERE IN
|
| Total statements | 4N | 2 + 2N |
callTool() caches tool definitions in a Map for O(1) lookup instead of rebuilding all 67 ToolDefinition objects on every call.
| Benchmark | Before | After | Speedup |
|---|---|---|---|
create_entry |
105 ops/sec | 800 ops/sec | 7.6× |
search_entries |
95 ops/sec | 733 ops/sec | 7.7× |
get_recent_entries |
82 ops/sec | 106 ops/sec | 1.3× |
Cache invalidates when context parameters change (db, github, vectorManager, config, teamDb).
searchByDateRange only JOINs tag tables (entry_tags, tags) when a tag filter is provided. Without tags, queries skip the JOIN and DISTINCT, avoiding unnecessary row multiplication.
getStatistics() reduced from 5 sequential db.exec() calls to 3 using multi-statement exec() and SUM(CASE ...) aggregation.
rebuildIndex() removed a redundant orphan detection pass that preceded a delete-all pass. Now performs a single delete-all before re-indexing.
-- Update statistics
ANALYZE;
-- Check index usage
EXPLAIN QUERY PLAN SELECT ...;-- Reclaim space
VACUUM;
-- Integrity check
PRAGMA integrity_check;- Archive old entries
- Review slow queries
- Check database size
Check:
- Version (should be latest)
- ML dependencies installed
- System resources (CPU, disk)
Fix:
- Update to the latest version (includes all performance optimizations)
- Use Docker image (optimized)
- Remove ML dependencies if not needed
Check:
- Database size
- Query complexity
- Filters applied
Fix:
- Use more specific queries
- Add filters (tags, dates, is_personal)
- Run ANALYZE
- Consider archiving
Check:
- Multiple connections
- Long transactions
- Nested database calls
Fix:
- Update to the latest version (includes all locking fixes)
- Use single connection per transaction
- Avoid nested database calls
For long-running HTTP/SSE deployments, the server includes a built-in scheduler that can automate database optimization and index rebuilds on configurable intervals. See Configuration → Automated Scheduler for CLI flags and setup.
Next: Check Architecture or Security.