-
Notifications
You must be signed in to change notification settings - Fork 7
Semantic Search
ML-powered similarity search enabling AI to find conceptually related work even when exact keywords don't match.
Primary Use Case: Find past decisions and implementations by concept rather than exact wording - critical when you can't remember the exact terms you used.
Semantic search finds entries based on meaning, not just keywords. This is essential for AI context management because:
- You may not remember exact wording from past decisions
- AI needs conceptual connections beyond keyword matches
- Related work may use different terminology across time periods
The system uses machine learning embeddings to understand concepts in your entries and find semantically similar ones, even if they use completely different words.
Example:
- Query: "improving application startup time"
- Finds: Entries about "lazy loading", "initialization optimization", "boot performance"
No installation needed! Semantic search is included by default in v3.0.0 using @xenova/transformers (pure JavaScript).
npm install -g memory-journal-mcpAlso included by default:
docker pull writenotenow/memory-journal-mcp:latestsemantic_search({
query: "strategies for improving application performance",
limit: 5,
});Output:
🔍 Semantic Search Results for: 'strategies for improving application performance'
Found 3 semantically similar entries:
**Entry #42** (similarity: 0.687)
Type: technical_achievement | Personal: False | 2025-10-04 16:45:30
Content: Implemented lazy loading for ML dependencies - startup time improved from 14s to 2-3s!
**Entry #38** (similarity: 0.521)
Type: development_note | Personal: False | 2025-10-03 14:20:15
Content: Researching lazy initialization patterns for performance optimization...
semantic_search({
query: "database optimization techniques",
limit: 10,
similarity_threshold: 0.4,
is_personal: false,
});Parameters:
-
query(required): Natural language query -
limit(optional): Max results, default 10 -
similarity_threshold(optional): Min similarity 0.0-1.0, default 0.3 -
is_personal(optional): Filter by personal vs project -
hint_on_empty(optional): Include hint when no results found, defaulttrue
Model: all-MiniLM-L6-v2 (@xenova/transformers)
- Dimensions: 384
- Speed: Fast (50-100ms per embedding)
- Size: ~23MB (pure JS, no native deps)
- Quality: Excellent for semantic similarity
Process:
- Entry content → Embedding (384D vector)
- Store in SQLite (BLOB) + vectra index
- Query → Query embedding
- vectra finds nearest neighbors
- Fetch and rank results
Semantic search uses cosine similarity:
| Score | Meaning |
|---|---|
| 1.0 | Identical |
| 0.8-1.0 | Extremely similar |
| 0.6-0.8 | Very similar |
| 0.4-0.6 | Moderately similar |
| 0.3-0.4 | Somewhat similar |
| <0.3 | Not similar (filtered out) |
Default threshold: 0.3
semantic_search({ query: "..." })
Timeline:
- Load ML model: ~5 seconds
- Generate query embedding: ~100ms
- Search vectra index: ~50ms
- Fetch results: ~50ms
- Total: ~5 seconds (first time only)
semantic_search({ query: "..." })
Timeline:
- Model already loaded: 0ms
- Generate query embedding: ~100ms
- Search vectra index: ~50ms
- Fetch results: ~50ms
- Total: ~200ms
Optimization:
- ML model NOT loaded at startup
- Loads only on first semantic search
- Server startup: 2-3 seconds
- First search: ~5 seconds (loads model)
- Subsequent: <1 second
Find entries about a concept:
semantic_search({
query: "techniques for reducing memory usage",
});Finds entries mentioning:
- Memory optimization
- Heap management
- Garbage collection
- Resource cleanup
- Leak prevention
Ask questions:
semantic_search({
query: "How did I handle database connection pooling?",
});Finds:
- Entries about connection pools
- Database performance
- Connection management
- Thread safety
Based on description:
semantic_search({
query: "implementing lazy loading for heavy dependencies",
});Finds:
- Deferred initialization
- Lazy imports
- On-demand loading
- Performance optimization
Vague recollection:
semantic_search({
query: "that time I fixed the slow startup problem",
});Finds relevant entries even if you don't remember exact words used.
| Feature | Semantic Search | Full-Text Search |
|---|---|---|
| Matches | Concepts | Keywords |
| Query | Natural language | Keywords |
| Speed | Slower (~200ms) | Faster (<50ms) |
| Setup | Requires ML deps | Built-in |
| Best for | Discovery | Specific terms |
Descriptive queries: ✅ "strategies for improving application startup latency" ✅ "debugging concurrent database access issues in Python" ✅ "patterns for implementing retry logic with exponential backoff"
Poor queries: ❌ "fast" ❌ "database" ❌ "help"
High threshold (0.5-0.7):
- Fewer results
- Higher quality
- More specific
semantic_search({
query: "...",
similarity_threshold: 0.6, // Strict
});Low threshold (0.2-0.4):
- More results
- Lower quality
- Broader discovery
semantic_search({
query: "...",
similarity_threshold: 0.2, // Loose
});Strategy: Start semantic, refine with full-text
// 1. Semantic search for concepts
const semantic_results = semantic_search({
query: "performance optimization strategies",
});
// 2. Full-text for specific entries
const specific_results = search_entries({
query: "lazy loading",
});
// 3. Date range for time-based
const recent_results = search_by_date_range({
start_date: "2025-10-01",
end_date: "2025-10-31",
});Embeddings stored in SQLite:
CREATE TABLE embeddings (
entry_id INTEGER PRIMARY KEY,
embedding BLOB,
model_name TEXT,
FOREIGN KEY (entry_id) REFERENCES memory_journal(id) ON DELETE CASCADE
);Size per embedding: ~1.5KB (384 floats × 4 bytes)
vectra (Pure JavaScript Vector Search):
- In-memory index
- Fast nearest neighbor search
- Automatically updated when entries added
- No native dependencies
Index characteristics:
- Uses flat index for fast exact search
- Efficient for typical journal sizes (<50,000 entries)
- Persistent storage in JSON format
Embeddings generated automatically when:
- Creating entries (if semantic search enabled)
- Updating entry content
- First semantic search (backfills missing embeddings)
In v3.0.0+: This error should not occur as semantic search is included by default.
If you see this error:
- Restart the server
- Check for corrupted installation
- Try reinstalling:
npm install -g memory-journal-mcp@latest**
Expected behavior:
- First search: ~5 seconds (loads model)
- Subsequent: <1 second
If slower:
- Check system resources (CPU, RAM)
- Try Docker image (optimized)
- Ensure fast storage
Solutions:
1. Adjust threshold:
semantic_search({
query: "...",
similarity_threshold: 0.5, // Increase for better quality
});2. More specific queries:
// Good
"implementing lazy loading with error handling for ML dependencies";
// Poor
"loading";3. Use full-text search: For specific keywords, full-text is better.
Check:
- Do entries exist?
- Is threshold too high?
- Is query too specific?
Fix:
// Lower threshold
semantic_search({
query: "...",
similarity_threshold: 0.2,
});
// Broader query
semantic_search({
query: "database performance", // Instead of "PostgreSQL query optimization"
});When semantic search returns no results, a hint field is included by default:
{
"query": "...",
"entries": [],
"count": 0,
"hint": "No entries matched your query above the similarity threshold."
}Hint messages:
- Empty index: "No entries in vector index. Use rebuild_vector_index to index existing entries."
- No matches: "No entries matched your query above the similarity threshold."
Suppress hints (programmatic use):
semantic_search({
query: "...",
hint_on_empty: false, // Returns only query, entries, count
});all-MiniLM-L6-v2:
- Source: SentenceTransformers library
- Training: Microsoft MS MARCO dataset
- Context window: 256 tokens (~200 words)
- Output: 384-dimensional dense vector
Performance:
- Inference speed: 50-100ms per entry
- Memory: ~100MB loaded
- Disk: ~80MB model file
async function generateEmbedding(text: string): Promise<number[]> {
await this.ensureInitialized(); // Lazy load model
const result = await this.embedder(text, {
pooling: "mean",
normalize: true,
});
return Array.from(result.data); // Float32Array → number[]
}async function semanticSearch(
query: string,
limit: number,
threshold: number,
): Promise<SearchResult[]> {
// Generate query embedding
const queryEmbedding = await generateEmbedding(query);
// vectra nearest neighbor search
const results = await this.index.queryItems(queryEmbedding, limit);
// Filter by threshold (vectra returns score 0-1)
return results.filter((r) => r.score >= threshold);
}Pros:
- Fast inference
- Good quality
- Small size (80MB)
- Low RAM usage
Cons:
- 256 token limit
- English-focused
Larger models (better quality, slower):
-
all-mpnet-base-v2(420MB, 512 tokens) -
all-roberta-large-v1(1.3GB, 512 tokens)
Multilingual:
paraphrase-multilingual-MiniLM-L12-v2
Specialized:
- Code-specific models
- Domain-specific models
1. Use for discovery:
- Finding related work
- Rediscovering forgotten entries
- Concept-based exploration
2. Use full-text for specifics:
- Exact terms
- Known keywords
- Fast lookups
3. Combine with other searches:
- Semantic → discover concepts
- Date range → narrow time period
- Full-text → specific entries
4. Adjust threshold as needed:
- Start with default (0.3)
- Increase for quality
- Decrease for breadth
Next: Explore Export or Search Guide.