# Fulltext Search

This notebook demonstrates how to use Neo4j fulltext indexes for keyword-based search across entities like Company, Product, and RiskFactor.

Fulltext search complements vector search by enabling:
- **Exact keyword matching** - Find entities by specific terms
- **Fuzzy matching** - Handle typos and variations
- **Boolean operators** - Combine search terms with AND, OR, NOT
- **Wildcard search** - Match partial terms

---

Import the required Python modules and set up the Neo4j connection.

In [None]:
import sys
sys.path.insert(0, '../new-workshops/solutions')

from neo4j import GraphDatabase
from config import Neo4jConfig

Create and verify the connection to your Neo4j graph database.

In [None]:
neo4j_config = Neo4jConfig()
driver = GraphDatabase.driver(
    neo4j_config.uri, 
    auth=(
        neo4j_config.username, 
        neo4j_config.password
    ))
driver.verify_connectivity()
print("Connected to Neo4j")

## Verify Fulltext Index

Before running fulltext queries, verify that the `search_entities` index exists.

> **Note:** If the index doesn't exist, run:
> ```bash
> uv run python scripts/restore_neo4j.py --full-text
> ```

In [None]:
# Check if fulltext index exists
with driver.session() as session:
    result = session.run("""
        SHOW FULLTEXT INDEXES
        YIELD name, labelsOrTypes, properties, state
        RETURN name, labelsOrTypes, properties, state
    """)
    indexes = list(result)
    
    if indexes:
        print("Fulltext indexes found:")
        for idx in indexes:
            print(f"  - {idx['name']}: {idx['labelsOrTypes']} on {idx['properties']} ({idx['state']})")
    else:
        print("No fulltext indexes found. Run: uv run python scripts/restore_neo4j.py --full-text")

## Basic Fulltext Search

Search for entities by keyword. The fulltext index searches across Company, Product, and RiskFactor names.

The `db.index.fulltext.queryNodes` procedure returns:
- `node` - The matched node
- `score` - Lucene relevance score (higher = better match)

In [None]:
# Basic keyword search
search_term = "Apple"

with driver.session() as session:
    result = session.run("""
        CALL db.index.fulltext.queryNodes('search_entities', $term)
        YIELD node, score
        RETURN labels(node) AS labels, node.name AS name, score
        LIMIT 10
    """, term=search_term)
    
    print(f"Search results for '{search_term}':")
    for record in result:
        print(f"  [{record['labels'][0]}] {record['name']} (score: {record['score']:.4f})")

## Fuzzy Search

Use the `~` operator for fuzzy matching, which handles typos and spelling variations.

You can optionally specify an edit distance (0-2) after the tilde.

In [None]:
# Fuzzy search - handles typos
fuzzy_term = "Aplle~"  # Intentional typo

with driver.session() as session:
    result = session.run("""
        CALL db.index.fulltext.queryNodes('search_entities', $term)
        YIELD node, score
        RETURN labels(node) AS labels, node.name AS name, score
        LIMIT 5
    """, term=fuzzy_term)
    
    print(f"Fuzzy search results for '{fuzzy_term}':")
    for record in result:
        print(f"  [{record['labels'][0]}] {record['name']} (score: {record['score']:.4f})")

## Wildcard Search

Use `*` for multi-character wildcards and `?` for single-character wildcards.

This is useful for prefix matching or finding variations of a term.

In [None]:
# Wildcard search - prefix matching
wildcard_term = "Micro*"  # Matches Microsoft, Microservices, etc.

with driver.session() as session:
    result = session.run("""
        CALL db.index.fulltext.queryNodes('search_entities', $term)
        YIELD node, score
        RETURN labels(node) AS labels, node.name AS name, score
        LIMIT 10
    """, term=wildcard_term)
    
    print(f"Wildcard search results for '{wildcard_term}':")
    for record in result:
        print(f"  [{record['labels'][0]}] {record['name']} (score: {record['score']:.4f})")

## Boolean Operators

Combine search terms using:
- `AND` - Both terms must match
- `OR` - Either term matches
- `NOT` - Exclude term
- `+` - Term must be present
- `-` - Term must be absent

In [None]:
# Boolean search - find risk factors containing 'supply' but not 'chain'
boolean_term = "supply NOT chain"

with driver.session() as session:
    result = session.run("""
        CALL db.index.fulltext.queryNodes('search_entities', $term)
        YIELD node, score
        RETURN labels(node) AS labels, node.name AS name, score
        LIMIT 10
    """, term=boolean_term)
    
    print(f"Boolean search results for '{boolean_term}':")
    for record in result:
        print(f"  [{record['labels'][0]}] {record['name']} (score: {record['score']:.4f})")

## Combining Fulltext Search with Graph Traversal

The real power of fulltext search in Neo4j comes from combining keyword matching with graph traversal.

This example finds companies by name, then retrieves related documents and risk factors.

In [None]:
# Find company and traverse to related data
company_search = "Nvidia"

with driver.session() as session:
    result = session.run("""
        CALL db.index.fulltext.queryNodes('search_entities', $term)
        YIELD node, score
        WHERE 'Company' IN labels(node)
        WITH node AS company, score
        LIMIT 1
        
        // Get documents filed by company
        OPTIONAL MATCH (company)-[:FILED]->(doc:Document)
        WITH company, score, COLLECT(DISTINCT doc.path) AS documents
        
        // Get risk factors the company faces
        OPTIONAL MATCH (company)-[:FACES_RISK]->(risk:RiskFactor)
        WITH company, score, documents, COLLECT(DISTINCT risk.name)[0..5] AS risks
        
        RETURN company.name AS company, score, documents, risks
    """, term=company_search)
    
    record = result.single()
    if record:
        print(f"Company: {record['company']} (score: {record['score']:.4f})")
        print(f"\nRelated Documents:")
        for doc in record['documents']:
            print(f"  - {doc}")
        print(f"\nRisk Factors:")
        for risk in record['risks']:
            print(f"  - {risk}")
    else:
        print(f"No company found for '{company_search}'")

## Search Options

The fulltext query procedure accepts an options map for:
- `skip` - Skip first N results (pagination)
- `limit` - Limit number of results
- `analyzer` - Use a different analyzer for this query

In [None]:
# Paginated search using options
search_term = "risk"

with driver.session() as session:
    # Get page 2 (skip first 5, get next 5)
    result = session.run("""
        CALL db.index.fulltext.queryNodes('search_entities', $term, {skip: 5, limit: 5})
        YIELD node, score
        RETURN labels(node) AS labels, node.name AS name, score
    """, term=search_term)
    
    print(f"Search results for '{search_term}' (page 2):")
    for record in result:
        print(f"  [{record['labels'][0]}] {record['name']} (score: {record['score']:.4f})")

## Hybrid Search Pattern

Combine fulltext search with vector search for better retrieval:
1. Use fulltext to find relevant entities by keyword
2. Use those entities to filter vector search results

This pattern provides the precision of keyword matching with the semantic understanding of vector search.

In [None]:
# Hybrid search: Find chunks that extracted a company matching keyword
keyword = "Amazon"

with driver.session() as session:
    result = session.run("""
        // First: Find companies matching keyword
        CALL db.index.fulltext.queryNodes('search_entities', $keyword)
        YIELD node AS entity, score AS keyword_score
        WHERE 'Company' IN labels(entity)
        WITH entity, keyword_score
        LIMIT 1
        
        // Second: Get chunks that this company was extracted from
        MATCH (entity)-[:FROM_CHUNK]->(chunk:Chunk)
        
        // Return chunks with company context
        RETURN entity.name AS company, 
               keyword_score,
               chunk.text AS text
        LIMIT 5
    """, keyword=keyword)
    
    print(f"Chunks where '{keyword}' company was extracted:")
    for record in result:
        print(f"\n[{record['company']}] (keyword score: {record['keyword_score']:.4f})")
        print(f"  {record['text'][:200]}...")

## Summary

Fulltext search in Neo4j provides:

| Feature | Syntax | Example |
|---------|--------|----------|
| Basic search | `term` | `Apple` |
| Fuzzy search | `term~` | `Aplle~` |
| Wildcard | `term*` | `Micro*` |
| Boolean AND | `term1 AND term2` | `supply AND chain` |
| Boolean OR | `term1 OR term2` | `Apple OR Microsoft` |
| Boolean NOT | `term1 NOT term2` | `risk NOT financial` |
| Phrase | `"term1 term2"` | `"supply chain"` |

**When to use fulltext vs vector search:**
- **Fulltext**: Known entity names, exact terms, filtering
- **Vector**: Semantic similarity, concept matching, questions
- **Hybrid**: Combine both for best results

---

[View the complete code](../solutions/05_01_fulltext_search.py)

[Return to Vector Retriever](02_01_vector_retriever.ipynb)

In [None]:
# Cleanup
driver.close()