Semantic Search

Semantic Search - Conceptual Context Discovery

ML-powered similarity search enabling AI to find conceptually related work even when exact keywords don't match.

Primary Use Case: Find past decisions and implementations by concept rather than exact wording - critical when you can't remember the exact terms you used.

Overview

Semantic search finds entries based on meaning, not just keywords. This is essential for AI context management because:

You may not remember exact wording from past decisions
AI needs conceptual connections beyond keyword matches
Related work may use different terminology across time periods

The system uses machine learning embeddings to understand concepts in your entries and find semantically similar ones, even if they use completely different words.

Example:

Query: "improving application startup time"
Finds: Entries about "lazy loading", "initialization optimization", "boot performance"

Installation

v3.0.0 - Included by Default!

No installation needed! Semantic search is included by default in v3.0.0 using @xenova/transformers (pure JavaScript).

npm install -g memory-journal-mcp

Docker

Also included by default:

docker pull writenotenow/memory-journal-mcp:latest

Using Semantic Search

Basic Usage

semantic_search({
  query: "strategies for improving application performance",
  limit: 5,
});

Output:

🔍 Semantic Search Results for: 'strategies for improving application performance'
Found 3 semantically similar entries:

**Entry #42** (similarity: 0.687)
Type: technical_achievement | Personal: False | 2025-10-04 16:45:30
Content: Implemented lazy loading for ML dependencies - startup time improved from 14s to 2-3s!

**Entry #38** (similarity: 0.521)
Type: development_note | Personal: False | 2025-10-03 14:20:15
Content: Researching lazy initialization patterns for performance optimization...

With Filters

semantic_search({
  query: "database optimization techniques",
  limit: 10,
  similarity_threshold: 0.4,
  is_personal: false,
});

Parameters:

query (required): Natural language query
limit (optional): Max results, default 10
similarity_threshold (optional): Min similarity 0.0-1.0, default 0.3
is_personal (optional): Filter by personal vs project
hint_on_empty (optional): Include hint when no results found, default true

How It Works

Vector Embeddings

Model: all-MiniLM-L6-v2 (@xenova/transformers)

Dimensions: 384
Speed: Fast (50-100ms per embedding)
Size: ~23MB (pure JS, no native deps)
Quality: Excellent for semantic similarity

Process:

Entry content → Embedding (384D vector)
Store in SQLite (BLOB) + vectra index
Query → Query embedding
vectra finds nearest neighbors
Fetch and rank results

Similarity Scores

Semantic search uses cosine similarity:

Score	Meaning
1.0	Identical
0.8-1.0	Extremely similar
0.6-0.8	Very similar
0.4-0.6	Moderately similar
0.3-0.4	Somewhat similar
<0.3	Not similar (filtered out)

Default threshold: 0.3

Performance

First Use (One-Time)

semantic_search({ query: "..." })

Timeline:

Load ML model: ~5 seconds
Generate query embedding: ~100ms
Search vectra index: ~50ms
Fetch results: ~50ms
Total: ~5 seconds (first time only)

Subsequent Uses

semantic_search({ query: "..." })

Timeline:

Model already loaded: 0ms
Generate query embedding: ~100ms
Search vectra index: ~50ms
Fetch results: ~50ms
Total: ~200ms

Lazy Loading (v3.0.0)

Optimization:

ML model NOT loaded at startup
Loads only on first semantic search
Server startup: 2-3 seconds
First search: ~5 seconds (loads model)
Subsequent: <1 second

Use Cases

Concept-Based Discovery

Find entries about a concept:

semantic_search({
  query: "techniques for reducing memory usage",
});

Finds entries mentioning:

Memory optimization
Heap management
Garbage collection
Resource cleanup
Leak prevention

Natural Language Queries

Ask questions:

semantic_search({
  query: "How did I handle database connection pooling?",
});

Finds:

Entries about connection pools
Database performance
Connection management
Thread safety

Find Related Work

Based on description:

semantic_search({
  query: "implementing lazy loading for heavy dependencies",
});

Finds:

Deferred initialization
Lazy imports
On-demand loading
Performance optimization

Rediscover Forgotten Entries

Vague recollection:

semantic_search({
  query: "that time I fixed the slow startup problem",
});

Finds relevant entries even if you don't remember exact words used.

Comparison with Full-Text Search

Feature	Semantic Search	Full-Text Search
Matches	Concepts	Keywords
Query	Natural language	Keywords
Speed	Slower (~200ms)	Faster (<50ms)
Setup	Requires ML deps	Built-in
Best for	Discovery	Specific terms

Best Practices

Writing Good Queries

Descriptive queries: ✅ "strategies for improving application startup latency" ✅ "debugging concurrent database access issues in Python" ✅ "patterns for implementing retry logic with exponential backoff"

Poor queries: ❌ "fast" ❌ "database" ❌ "help"

Adjusting Threshold

High threshold (0.5-0.7):

Fewer results
Higher quality
More specific

semantic_search({
  query: "...",
  similarity_threshold: 0.6, // Strict
});

Low threshold (0.2-0.4):

More results
Lower quality
Broader discovery

semantic_search({
  query: "...",
  similarity_threshold: 0.2, // Loose
});

Combining Search Methods

Strategy: Start semantic, refine with full-text

// 1. Semantic search for concepts
const semantic_results = semantic_search({
  query: "performance optimization strategies",
});

// 2. Full-text for specific entries
const specific_results = search_entries({
  query: "lazy loading",
});

// 3. Date range for time-based
const recent_results = search_by_date_range({
  start_date: "2025-10-01",
  end_date: "2025-10-31",
});

Advanced Features

Embedding Storage

Embeddings stored in SQLite:

CREATE TABLE embeddings (
    entry_id INTEGER PRIMARY KEY,
    embedding BLOB,
    model_name TEXT,
    FOREIGN KEY (entry_id) REFERENCES memory_journal(id) ON DELETE CASCADE
);

Size per embedding: ~1.5KB (384 floats × 4 bytes)

vectra Index

vectra (Pure JavaScript Vector Search):

In-memory index
Fast nearest neighbor search
Automatically updated when entries added
No native dependencies

Index characteristics:

Uses flat index for fast exact search
Efficient for typical journal sizes (<50,000 entries)
Persistent storage in JSON format

Automatic Embedding Generation

Embeddings generated automatically when:

Creating entries (if semantic search enabled)
Updating entry content
First semantic search (backfills missing embeddings)

Troubleshooting

"Semantic search unavailable"

In v3.0.0+: This error should not occur as semantic search is included by default.

If you see this error:

Restart the server
Check for corrupted installation
Try reinstalling: npm install -g memory-journal-mcp@latest**

Slow First Search

Expected behavior:

First search: ~5 seconds (loads model)
Subsequent: <1 second

If slower:

Check system resources (CPU, RAM)
Try Docker image (optimized)
Ensure fast storage

Low-Quality Results

Solutions:

1. Adjust threshold:

semantic_search({
  query: "...",
  similarity_threshold: 0.5, // Increase for better quality
});

2. More specific queries:

// Good
"implementing lazy loading with error handling for ML dependencies";

// Poor
"loading";

3. Use full-text search: For specific keywords, full-text is better.

No Results

Check:

Do entries exist?
Is threshold too high?
Is query too specific?

Fix:

// Lower threshold
semantic_search({
  query: "...",
  similarity_threshold: 0.2,
});

// Broader query
semantic_search({
  query: "database performance", // Instead of "PostgreSQL query optimization"
});

Understanding Hints

When semantic search returns no results, a hint field is included by default:

{
  "query": "...",
  "entries": [],
  "count": 0,
  "hint": "No entries matched your query above the similarity threshold."
}

Hint messages:

Empty index: "No entries in vector index. Use rebuild_vector_index to index existing entries."
No matches: "No entries matched your query above the similarity threshold."

Suppress hints (programmatic use):

semantic_search({
  query: "...",
  hint_on_empty: false, // Returns only query, entries, count
});

Technical Details

Model Information

all-MiniLM-L6-v2:

Source: SentenceTransformers library
Training: Microsoft MS MARCO dataset
Context window: 256 tokens (~200 words)
Output: 384-dimensional dense vector

Performance:

Inference speed: 50-100ms per entry
Memory: ~100MB loaded
Disk: ~80MB model file

Embedding Generation

async function generateEmbedding(text: string): Promise<number[]> {
  await this.ensureInitialized(); // Lazy load model
  const result = await this.embedder(text, {
    pooling: "mean",
    normalize: true,
  });
  return Array.from(result.data); // Float32Array → number[]
}

Similarity Calculation

async function semanticSearch(
  query: string,
  limit: number,
  threshold: number,
): Promise<SearchResult[]> {
  // Generate query embedding
  const queryEmbedding = await generateEmbedding(query);

  // vectra nearest neighbor search
  const results = await this.index.queryItems(queryEmbedding, limit);

  // Filter by threshold (vectra returns score 0-1)
  return results.filter((r) => r.score >= threshold);
}

Model Alternatives

Current: all-MiniLM-L6-v2

Pros:

Fast inference
Good quality
Small size (80MB)
Low RAM usage

Cons:

256 token limit
English-focused

Future Options

Larger models (better quality, slower):

all-mpnet-base-v2 (420MB, 512 tokens)
all-roberta-large-v1 (1.3GB, 512 tokens)

Multilingual:

paraphrase-multilingual-MiniLM-L12-v2

Specialized:

Code-specific models
Domain-specific models

Best Practices

1. Use for discovery:

Finding related work
Rediscovering forgotten entries
Concept-based exploration

2. Use full-text for specifics:

Exact terms
Known keywords
Fast lookups

3. Combine with other searches:

Semantic → discover concepts
Date range → narrow time period
Full-text → specific entries

4. Adjust threshold as needed:

Start with default (0.3)
Increase for quality
Decrease for breadth

Next: Explore Export or Search Guide.

Semantic Search

Semantic Search - Conceptual Context Discovery

Overview

Installation

v3.0.0 - Included by Default!

Docker

Using Semantic Search

Basic Usage

With Filters

How It Works

Vector Embeddings

Similarity Scores

Performance

First Use (One-Time)

Subsequent Uses

Lazy Loading (v3.0.0)

Use Cases

Concept-Based Discovery

Natural Language Queries

Find Related Work

Rediscover Forgotten Entries

Comparison with Full-Text Search

Best Practices

Writing Good Queries

Adjusting Threshold

Combining Search Methods

Advanced Features

Embedding Storage

vectra Index

Automatic Embedding Generation

Troubleshooting

"Semantic search unavailable"

Slow First Search

Low-Quality Results

No Results

Understanding Hints

Technical Details

Model Information

Embedding Generation

Similarity Calculation

Model Alternatives

Current: all-MiniLM-L6-v2

Future Options

Best Practices

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

🏠 Home

Getting Started

Core Features

Knowledge Graph

GitHub Integration

Workflows