Skip to content

Semantic Search

Temp edited this page Oct 4, 2025 · 19 revisions

Semantic Search

ML-powered similarity search to find conceptually related entries.


Overview

Semantic search finds entries based on meaning, not just keywords. It uses machine learning to understand the concepts in your entries and find similar ones, even if they use different words.

Example:

  • Query: "improving application startup time"
  • Finds: Entries about "lazy loading", "initialization optimization", "boot performance"

Installation

PyPI

pip install sentence-transformers faiss-cpu

Size: ~500MB (model + dependencies)

Docker

Semantic search is included by default in the Docker image!

docker pull writenotenow/memory-journal-mcp:latest

Using Semantic Search

Basic Usage

semantic_search({
  query: "strategies for improving application performance",
  limit: 5
})

Output:

🔍 Semantic Search Results for: 'strategies for improving application performance'
Found 3 semantically similar entries:

**Entry #42** (similarity: 0.687)
Type: technical_achievement | Personal: False | 2025-10-04 16:45:30
Content: Implemented lazy loading for ML dependencies - startup time improved from 14s to 2-3s!

**Entry #38** (similarity: 0.521)
Type: development_note | Personal: False | 2025-10-03 14:20:15
Content: Researching lazy initialization patterns for performance optimization...

With Filters

semantic_search({
  query: "database optimization techniques",
  limit: 10,
  similarity_threshold: 0.4,
  is_personal: false
})

Parameters:

  • query (required): Natural language query
  • limit (optional): Max results, default 10
  • similarity_threshold (optional): Min similarity 0.0-1.0, default 0.3
  • is_personal (optional): Filter by personal vs project

How It Works

Vector Embeddings

Model: all-MiniLM-L6-v2 (SentenceTransformers)

  • Dimensions: 384
  • Speed: Fast (50-100ms per embedding)
  • Size: ~80MB
  • Quality: Excellent for semantic similarity

Process:

  1. Entry content → Embedding (384D vector)
  2. Store in SQLite (BLOB) + FAISS index
  3. Query → Query embedding
  4. FAISS finds nearest neighbors
  5. Fetch and rank results

Similarity Scores

Semantic search uses cosine similarity:

Score Meaning
1.0 Identical
0.8-1.0 Extremely similar
0.6-0.8 Very similar
0.4-0.6 Moderately similar
0.3-0.4 Somewhat similar
<0.3 Not similar (filtered out)

Default threshold: 0.3


Performance

First Use (One-Time)

semantic_search({ query: "..." })

Timeline:

  • Load ML model: ~15 seconds
  • Generate query embedding: ~100ms
  • Search FAISS index: ~50ms
  • Fetch results: ~50ms
  • Total: ~15 seconds (first time only)

Subsequent Uses

semantic_search({ query: "..." })

Timeline:

  • Model already loaded: 0ms
  • Generate query embedding: ~100ms
  • Search FAISS index: ~50ms
  • Fetch results: ~50ms
  • Total: ~200ms

Lazy Loading (v1.1.0)

Optimization:

  • ML model NOT loaded at startup
  • Loads only on first semantic search
  • Server startup: 2-3 seconds (regardless of ML installation)
  • First search: +15 seconds
  • Subsequent: <1 second

Use Cases

Concept-Based Discovery

Find entries about a concept:

semantic_search({
  query: "techniques for reducing memory usage"
})

Finds entries mentioning:

  • Memory optimization
  • Heap management
  • Garbage collection
  • Resource cleanup
  • Leak prevention

Natural Language Queries

Ask questions:

semantic_search({
  query: "How did I handle database connection pooling?"
})

Finds:

  • Entries about connection pools
  • Database performance
  • Connection management
  • Thread safety

Find Related Work

Based on description:

semantic_search({
  query: "implementing lazy loading for heavy dependencies"
})

Finds:

  • Deferred initialization
  • Lazy imports
  • On-demand loading
  • Performance optimization

Rediscover Forgotten Entries

Vague recollection:

semantic_search({
  query: "that time I fixed the slow startup problem"
})

Finds relevant entries even if you don't remember exact words used.


Comparison with Full-Text Search

Feature Semantic Search Full-Text Search
Matches Concepts Keywords
Query Natural language Keywords
Speed Slower (~200ms) Faster (<50ms)
Setup Requires ML deps Built-in
Best for Discovery Specific terms

Best Practices

Writing Good Queries

Descriptive queries: ✅ "strategies for improving application startup latency" ✅ "debugging concurrent database access issues in Python" ✅ "patterns for implementing retry logic with exponential backoff"

Poor queries: ❌ "fast" ❌ "database" ❌ "help"


Adjusting Threshold

High threshold (0.5-0.7):

  • Fewer results
  • Higher quality
  • More specific
semantic_search({
  query: "...",
  similarity_threshold: 0.6  // Strict
})

Low threshold (0.2-0.4):

  • More results
  • Lower quality
  • Broader discovery
semantic_search({
  query: "...",
  similarity_threshold: 0.2  // Loose
})

Combining Search Methods

Strategy: Start semantic, refine with full-text

// 1. Semantic search for concepts
const semantic_results = semantic_search({
  query: "performance optimization strategies"
})

// 2. Full-text for specific entries
const specific_results = search_entries({
  query: "lazy loading"
})

// 3. Date range for time-based
const recent_results = search_by_date_range({
  start_date: "2025-10-01",
  end_date: "2025-10-31"
})

Advanced Features

Embedding Storage

Embeddings stored in SQLite:

CREATE TABLE embeddings (
    entry_id INTEGER PRIMARY KEY,
    embedding BLOB,
    model_name TEXT,
    FOREIGN KEY (entry_id) REFERENCES memory_journal(id) ON DELETE CASCADE
);

Size per embedding: ~1.5KB (384 floats × 4 bytes)


FAISS Index

FAISS (Facebook AI Similarity Search):

  • In-memory index
  • Fast nearest neighbor search
  • Automatically updated when entries added

Index types:

  • Small (<1000 entries): Flat index
  • Medium (1000-10000): IVF index
  • Large (>10000): IVFPQ index (if needed)

Automatic Embedding Generation

Embeddings generated automatically when:

  1. Creating entries (if semantic search enabled)
  2. Updating entry content
  3. First semantic search (backfills missing embeddings)

Troubleshooting

"Semantic search unavailable"

Cause: ML dependencies not installed

Fix:

pip install sentence-transformers faiss-cpu

Restart server after installation.


Slow First Search

Expected behavior:

  • First search: 15 seconds (loads model)
  • Subsequent: <1 second

If slower:

  • Check system resources (CPU, RAM)
  • Try Docker image (optimized)
  • Ensure SSD (not HDD)

Low-Quality Results

Solutions:

1. Adjust threshold:

semantic_search({
  query: "...",
  similarity_threshold: 0.5  // Increase for better quality
})

2. More specific queries:

// Good
"implementing lazy loading with error handling for ML dependencies"

// Poor
"loading"

3. Use full-text search: For specific keywords, full-text is better.


No Results

Check:

  • Do entries exist?
  • Is threshold too high?
  • Is query too specific?

Fix:

// Lower threshold
semantic_search({
  query: "...",
  similarity_threshold: 0.2
})

// Broader query
semantic_search({
  query: "database performance"  // Instead of "PostgreSQL query optimization"
})

Technical Details

Model Information

all-MiniLM-L6-v2:

  • Source: SentenceTransformers library
  • Training: Microsoft MS MARCO dataset
  • Context window: 256 tokens (~200 words)
  • Output: 384-dimensional dense vector

Performance:

  • Inference speed: 50-100ms per entry
  • Memory: ~100MB loaded
  • Disk: ~80MB model file

Embedding Generation

def generate_embedding(text: str) -> np.ndarray:
    """Generate 384D embedding vector"""
    self._ensure_initialized()  # Lazy load model
    embedding = self.model.encode(text)
    return embedding  # numpy array, shape (384,)

Similarity Calculation

def semantic_search(query: str, limit: int, threshold: float):
    # Generate query embedding
    query_embedding = generate_embedding(query)
    
    # FAISS nearest neighbor search
    distances, indices = faiss_index.search(query_embedding, limit)
    
    # Convert distance to similarity (cosine)
    similarities = 1 - distances
    
    # Filter by threshold
    results = [r for r in results if r.similarity >= threshold]
    
    return results

Model Alternatives

Current: all-MiniLM-L6-v2

Pros:

  • Fast inference
  • Good quality
  • Small size (80MB)
  • Low RAM usage

Cons:

  • 256 token limit
  • English-focused

Future Options

Larger models (better quality, slower):

  • all-mpnet-base-v2 (420MB, 512 tokens)
  • all-roberta-large-v1 (1.3GB, 512 tokens)

Multilingual:

  • paraphrase-multilingual-MiniLM-L12-v2

Specialized:

  • Code-specific models
  • Domain-specific models

Best Practices

1. Use for discovery:

  • Finding related work
  • Rediscovering forgotten entries
  • Concept-based exploration

2. Use full-text for specifics:

  • Exact terms
  • Known keywords
  • Fast lookups

3. Combine with other searches:

  • Semantic → discover concepts
  • Date range → narrow time period
  • Full-text → specific entries

4. Adjust threshold as needed:

  • Start with default (0.3)
  • Increase for quality
  • Decrease for breadth

Next: Explore Export or Search Guide.

Clone this wiki locally