Skip to content

Full Text Search

Temp edited this page Sep 23, 2025 · 1 revision

Full-Text Search (FTS5)

Last Updated: September 23, 2025 1:48 PM EST

The SQLite MCP Server provides comprehensive full-text search capabilities through its integrated FTS5 extension with dedicated management tools.


πŸ”§ Available FTS5 Tools

Tool Description
create_fts_table Create FTS5 virtual tables for full-text search
rebuild_fts_index Rebuild FTS5 indexes for optimal performance
fts_search Perform enhanced full-text search with ranking and snippets

πŸš€ FTS5 Management Tools

Create FTS5 Tables

create_fts_table({
  "table_name": "documents_fts",
  "columns": ["title", "content", "category"],
  "content_table": "documents",  // Optional: populate from existing table
  "tokenizer": "unicode61"       // Optional: unicode61, porter, ascii
})

Configuration Options:

  • Tokenizers: unicode61 (default), porter, ascii, trigram
  • Content Tables: Automatically populate from existing tables
  • Column Weights: Assign different importance to columns
  • Custom Stopwords: Define words to ignore during indexing

Enhanced Search with Ranking

fts_search({
  "table_name": "documents_fts",
  "query": "database optimization",
  "limit": 10,
  "snippet_length": 50,
  "highlight_tags": ["<mark>", "</mark>"]
})

Returns:

  • BM25 ranking scores
  • Highlighted snippets
  • Structured results with metadata
  • Relevance-ordered results

Rebuild Indexes for Performance

rebuild_fts_index({
  "table_name": "documents_fts",
  "optimize": true  // Optional: run OPTIMIZE after rebuild
})

When to Rebuild:

  • After bulk data imports
  • When search performance degrades
  • After schema changes
  • Periodic maintenance (monthly)

πŸ” Search Query Syntax

Basic Search Patterns

// Single term search
fts_search({
  "table_name": "documents_fts",
  "query": "database"
})

// Multiple terms (implicit AND)
fts_search({
  "table_name": "documents_fts",
  "query": "database optimization"
})

// Explicit boolean operators
fts_search({
  "table_name": "documents_fts",
  "query": "database AND optimization"
})

Advanced Search Techniques

Phrase Matching

// Exact phrase search
fts_search({
  "table_name": "documents_fts",
  "query": "\"machine learning algorithm\""
})

Prefix Matching

// Prefix search (matches "integrate", "integration", "integrating", etc.)
fts_search({
  "table_name": "documents_fts",
  "query": "integrat*"
})

Boolean Operations

// AND operation
fts_search({
  "table_name": "documents_fts",
  "query": "python AND machine AND learning"
})

// OR operation
fts_search({
  "table_name": "documents_fts",
  "query": "python OR java OR javascript"
})

// NOT operation (exclusion)
fts_search({
  "table_name": "documents_fts",
  "query": "database NOT mysql"
})

Complex Combinations

// Complex query with grouping
fts_search({
  "table_name": "documents_fts",
  "query": "(python OR java) AND \"web development\" NOT deprecated"
})

πŸ“Š Column-Specific Search

Weighted Column Search

// Search specific columns with weights
create_fts_table({
  "table_name": "articles_fts",
  "columns": ["title", "content", "tags"],
  "column_weights": {"title": 3.0, "content": 1.0, "tags": 2.0}
})

// Search prioritizes title matches
fts_search({
  "table_name": "articles_fts",
  "query": "machine learning"
})

Column-Targeted Search

// Search only in specific columns
fts_search({
  "table_name": "articles_fts",
  "query": "title:machine content:learning"
})

// Multiple column constraints
fts_search({
  "table_name": "articles_fts",
  "query": "title:(python OR java) content:tutorial"
})

🎯 Search Optimization

Performance Tuning

// Create optimized FTS table for large datasets
create_fts_table({
  "table_name": "large_documents_fts",
  "columns": ["title", "content"],
  "tokenizer": "porter",  // Better for English text
  "prefix_lengths": [2, 3, 4],  // Enable prefix search
  "content_table": "documents",
  "optimize_for": "query_speed"
})

Index Maintenance

// Regular maintenance routine
rebuild_fts_index({
  "table_name": "documents_fts",
  "vacuum": true,      // Reclaim space
  "optimize": true,    // Optimize index structure
  "analyze": true      // Update statistics
})

Memory Usage Optimization

// Configure for memory-constrained environments
create_fts_table({
  "table_name": "documents_fts",
  "columns": ["title", "content"],
  "tokenizer": "unicode61",
  "max_token_length": 40,    // Limit token size
  "enable_parentheses": false, // Disable complex queries
  "memory_limit": "100MB"    // Set memory limit
})

πŸ’‘ Real-World Use Cases

Document Management System

// 1. Create comprehensive document index
create_fts_table({
  "table_name": "documents_fts",
  "columns": ["title", "content", "author", "tags", "category"],
  "column_weights": {
    "title": 5.0,
    "tags": 3.0,
    "content": 1.0,
    "author": 2.0,
    "category": 2.0
  },
  "tokenizer": "porter"
})

// 2. Advanced document search
fts_search({
  "table_name": "documents_fts",
  "query": "title:(project AND management) OR tags:agile",
  "limit": 20,
  "snippet_length": 100,
  "highlight_tags": ["<strong>", "</strong>"]
})

// 3. Category-specific search
fts_search({
  "table_name": "documents_fts",
  "query": "category:technical AND (API OR documentation)",
  "sort_by": "rank DESC"
})

E-commerce Product Search

// 1. Product catalog FTS
create_fts_table({
  "table_name": "products_fts",
  "columns": ["name", "description", "brand", "category", "features"],
  "column_weights": {
    "name": 4.0,
    "brand": 3.0,
    "category": 2.5,
    "features": 2.0,
    "description": 1.0
  }
})

// 2. Smart product search
fts_search({
  "table_name": "products_fts",
  "query": "wireless AND (headphones OR earbuds) AND NOT gaming",
  "limit": 50,
  "include_suggestions": true
})

// 3. Brand and category filtering
fts_search({
  "table_name": "products_fts",
  "query": "brand:(apple OR samsung) AND category:smartphone",
  "snippet_length": 80
})

Knowledge Base Search

// 1. Multi-language knowledge base
create_fts_table({
  "table_name": "kb_articles_fts",
  "columns": ["title", "content", "keywords", "category"],
  "tokenizer": "unicode61",  // Better for international content
  "remove_diacritics": true
})

// 2. Contextual help search
fts_search({
  "table_name": "kb_articles_fts",
  "query": "\"how to\" AND (setup OR configuration OR install*)",
  "category_filter": "tutorials",
  "limit": 10
})

// 3. Related article suggestions
fts_search({
  "table_name": "kb_articles_fts",
  "query": "keywords:(troubleshooting AND database)",
  "exclude_ids": [123, 456],  // Exclude currently viewed articles
  "similarity_threshold": 0.7
})

πŸ”§ Manual SQL Integration

Advanced FTS5 Queries

-- Search with result highlighting and ranking
SELECT 
  id, 
  title,
  snippet(documents_fts, 1, '<em>', '</em>', '...', 15) AS content_snippet,
  bm25(documents_fts) AS relevance_score
FROM 
  documents_fts
WHERE 
  documents_fts MATCH 'database optimization'
ORDER BY 
  bm25(documents_fts)
LIMIT 10;

Custom Ranking Functions

-- Custom ranking with date boost
SELECT 
  d.id,
  d.title,
  d.created_date,
  (bm25(fts) + 
   CASE 
     WHEN d.created_date > date('now', '-30 days') THEN 2.0
     WHEN d.created_date > date('now', '-90 days') THEN 1.0
     ELSE 0.0
   END) AS boosted_score
FROM 
  documents d
JOIN 
  documents_fts fts ON d.id = fts.rowid
WHERE 
  fts MATCH 'machine learning'
ORDER BY 
  boosted_score DESC;

Faceted Search

-- Search with category facets
SELECT 
  category,
  COUNT(*) as count,
  GROUP_CONCAT(title, '; ') as sample_titles
FROM 
  documents d
JOIN 
  documents_fts fts ON d.id = fts.rowid
WHERE 
  fts MATCH 'artificial intelligence'
GROUP BY 
  category
ORDER BY 
  count DESC;

🎯 Best Practices

1. Choose the Right Tokenizer

// For English text with stemming
create_fts_table({
  "table_name": "english_docs_fts",
  "columns": ["content"],
  "tokenizer": "porter"  // Handles word variations
})

// For international content
create_fts_table({
  "table_name": "international_docs_fts",
  "columns": ["content"],
  "tokenizer": "unicode61",  // Better Unicode support
  "remove_diacritics": true
})

// For code and technical content
create_fts_table({
  "table_name": "code_docs_fts",
  "columns": ["content"],
  "tokenizer": "ascii"  // Preserves exact formatting
})

2. Optimize Query Performance

// Use specific terms instead of broad searches
fts_search({
  "table_name": "documents_fts",
  "query": "\"React hooks\" AND tutorial",  // Specific
  "limit": 20
})

// Avoid overly complex queries
fts_search({
  "table_name": "documents_fts",
  "query": "react*",  // Simple prefix search
  "limit": 50
})

3. Regular Maintenance

// Monthly maintenance routine
rebuild_fts_index({
  "table_name": "documents_fts",
  "full_rebuild": true,
  "optimize": true
})

// Check index statistics
read_query({
  "query": "SELECT * FROM documents_fts_stat WHERE key = 'size'"
})

4. Handle Special Characters

// Escape special FTS5 characters in user queries
function escapeFTSQuery(query) {
  return query.replace(/['"*()]/g, '\\$&');
}

// Use parameterized queries for user input
fts_search({
  "table_name": "documents_fts",
  "query": escapeFTSQuery(userInput),
  "safe_mode": true
})

πŸ“š Related Pages


πŸ” Search Tip: FTS5 is most effective when combined with proper text preprocessing and regular index maintenance. Consider using it alongside semantic search for comprehensive search capabilities.

Clone this wiki locally