# Full-Text Search

Vector search finds content by **meaning** — it understands that "thermal management" and "cooling system" are related concepts. But what about searches where you need **exact term matching**? For example:

- Searching for a specific component ID like "HVB_3900"
- Finding requirements that mention a specific standard like "ISO 26262"
- Looking for fuzzy matches when you're not sure of the exact spelling

Full-text search complements vector search by providing keyword-based retrieval with support for fuzzy matching, wildcards, and boolean operators.

**Prerequisites:** Complete [02 Embeddings](02_embeddings.ipynb) first to populate the graph with requirement chunks.

**Learning Objectives:**
- Create full-text indexes in Neo4j
- Perform basic, fuzzy, wildcard, and boolean full-text searches
- Combine full-text search with graph traversal
- Understand when to use full-text vs. vector search

## Install Dependencies

First, install the required packages. This only needs to be run once per session.

In [None]:
# Install neo4j-graphrag with Bedrock support
%pip install "neo4j-graphrag[bedrock] @ git+https://github.com/neo4j-partners/neo4j-graphrag-python.git@bedrock-embeddings" python-dotenv pydantic-settings nest-asyncio -q

In [None]:
from data_utils import Neo4jConnection

## Connect to Neo4j

Create and verify the connection to your Neo4j graph database.

In [None]:
neo4j = Neo4jConnection().verify()
driver = neo4j.driver

## What is Full-Text Search?

Neo4j supports **full-text indexes** powered by Apache Lucene. Unlike vector indexes that compare embedding similarity, full-text indexes work with the actual text:

| Feature | Vector Search | Full-Text Search |
|---------|--------------|------------------|
| Matches by | Semantic meaning | Keywords/terms |
| Good for | "Find similar concepts" | "Find exact terms" |
| Fuzzy matching | Implicit (embeddings) | Explicit (~ operator) |
| Wildcards | Not supported | Supported (*, ?) |
| Boolean logic | Not supported | AND, OR, NOT |
| Performance | Requires embedding step | Direct text matching |

Full-text search is ideal when you know specific terms, IDs, or standards you're looking for.

## Create Full-Text Indexes

We'll create two full-text indexes:
1. **`requirement_text`** — On `Chunk.text` for searching requirement description content
2. **`search_entities`** — On `Component.name`, `Component.description`, `Requirement.name`, and `Requirement.description` for searching structured entity data

Full-text indexes automatically tokenize and index text, enabling fast keyword searches.

In [None]:
with driver.session() as session:
    # Drop existing indexes if they exist
    for idx_name in ['requirement_text', 'search_entities']:
        try:
            session.run(f"DROP INDEX {idx_name} IF EXISTS")
            print(f"Dropped existing index: {idx_name}")
        except Exception:
            pass

    # Create full-text index on Chunk text
    session.run("""
        CREATE FULLTEXT INDEX requirement_text IF NOT EXISTS
        FOR (c:Chunk) ON EACH [c.text]
    """)
    print("Created full-text index: requirement_text")

    # Create full-text index on entity names and descriptions
    session.run("""
        CREATE FULLTEXT INDEX search_entities IF NOT EXISTS
        FOR (n:Component|Requirement) ON EACH [n.name, n.description]
    """)
    print("Created full-text index: search_entities")

## Basic Full-Text Search

The simplest form of full-text search finds nodes containing specific terms. Use `db.index.fulltext.queryNodes()` with the index name and search query.

Results include a **score** indicating relevance (higher = better match).

In [None]:
def fulltext_search(driver, index_name, query, top_k=5):
    """Perform a full-text search and display results."""
    with driver.session() as session:
        result = session.run("""
            CALL db.index.fulltext.queryNodes($index, $query)
            YIELD node, score
            RETURN node, score, labels(node) AS labels
            LIMIT $limit
        """, index=index_name, query=query, limit=top_k)
        return list(result)


# Basic search: find chunks mentioning "thermal"
query = "thermal"
print(f'Search: "{query}"')
print("=" * 60)

results = fulltext_search(driver, 'requirement_text', query)
for i, record in enumerate(results):
    node = record['node']
    text = node.get('text', node.get('name', 'N/A'))
    print(f"\n[{i+1}] Score: {record['score']:.4f}")
    print(f"    {str(text)[:150]}...")

## Fuzzy Matching

Fuzzy search finds terms that are **similar** to your query, tolerating typos and spelling variations. Add `~` after a term to enable fuzzy matching. You can optionally specify an edit distance (default is 2).

```
battrey~     → matches "battery" (1 edit: transposition)
coling~1     → matches "cooling" (1 edit distance)
```

This is useful when engineers might use slightly different terminology or when searching for misspelled terms.

In [None]:
# Fuzzy search: tolerates typos
queries = [
    ("battrey~", "Fuzzy match for 'battery' (typo)"),
    ("coling~1", "Fuzzy match for 'cooling' (1 edit)"),
    ("safty~", "Fuzzy match for 'safety' (typo)"),
]

for query, description in queries:
    print(f'\nSearch: "{query}" — {description}')
    print("-" * 50)
    results = fulltext_search(driver, 'requirement_text', query, top_k=2)
    if results:
        for record in results:
            text = record['node'].get('text', 'N/A')
            print(f"  Score: {record['score']:.4f} | {str(text)[:120]}...")
    else:
        print("  No results")

## Wildcard Search

Wildcard search uses `*` (multiple characters) and `?` (single character) to match patterns:

```
therm*       → matches "thermal", "thermistor", "thermally"
cool?ng      → matches "cooling"
```

This is particularly useful for finding terms with common prefixes or suffixes.

In [None]:
# Wildcard search
queries = [
    ("therm*", "Matches thermal, thermistor, etc."),
    ("volt*", "Matches voltage, high-voltage, etc."),
    ("monitor*", "Matches monitoring, monitors, etc."),
]

for query, description in queries:
    print(f'\nSearch: "{query}" — {description}')
    print("-" * 50)
    results = fulltext_search(driver, 'requirement_text', query, top_k=2)
    if results:
        for record in results:
            text = record['node'].get('text', 'N/A')
            print(f"  Score: {record['score']:.4f} | {str(text)[:120]}...")
    else:
        print("  No results")

## Boolean Search

Boolean operators let you combine terms for precise searches:

| Operator | Meaning | Example |
|----------|---------|--------|
| `AND` | Both terms must appear | `thermal AND management` |
| `OR` | Either term can appear | `cooling OR thermal` |
| `NOT` | Exclude a term | `battery NOT module` |
| `"..."` | Exact phrase | `"energy density"` |

Boolean search is powerful for narrowing results to exactly what you need.

In [None]:
# Boolean search examples
queries = [
    ('thermal AND management', "Both terms must appear"),
    ('cooling OR thermal', "Either term matches"),
    ('battery NOT module', "Battery but not module"),
    ('"energy density"', "Exact phrase match"),
]

for query, description in queries:
    print(f'\nSearch: "{query}" — {description}')
    print("-" * 50)
    results = fulltext_search(driver, 'requirement_text', query, top_k=2)
    if results:
        for record in results:
            text = record['node'].get('text', 'N/A')
            print(f"  Score: {record['score']:.4f} | {str(text)[:120]}...")
    else:
        print("  No results")

## Searching Entity Names

The `search_entities` index searches across Component and Requirement names and descriptions. This is useful for finding specific entities by name or partial match.

In [None]:
# Search entity names and descriptions
queries = [
    "HVB*",
    "battery",
    "powertrain OR chassis",
]

for query in queries:
    print(f'\nEntity search: "{query}"')
    print("-" * 50)
    results = fulltext_search(driver, 'search_entities', query, top_k=3)
    for record in results:
        node = record['node']
        labels = record['labels']
        name = node.get('name', 'N/A')
        desc = node.get('description', '')
        print(f"  [{labels[0]}] {name}: {desc[:80]}  (score: {record['score']:.4f})")

## Combining Full-Text Search with Graph Traversal

The real power comes from combining full-text search with graph traversal. After finding chunks via keyword search, you can traverse to the parent Requirement and Component — just like with vector search.

This enables queries like: "Find all chunks mentioning 'coolant', then show which components and requirements they belong to."

In [None]:
# Full-text search with graph traversal
with driver.session() as session:
    result = session.run("""
        CALL db.index.fulltext.queryNodes('requirement_text', $query)
        YIELD node, score
        MATCH (node)<-[:HAS_CHUNK]-(req:Requirement)
        OPTIONAL MATCH (comp:Component)-[:COMPONENT_HAS_REQ]->(req)
        RETURN 
            node.text AS chunk_text,
            score,
            req.name AS requirement,
            comp.name AS component,
            comp.description AS component_description
        ORDER BY score DESC
        LIMIT 5
    """, query="coolant OR cooling")

    print('Search: "coolant OR cooling" with graph traversal')
    print("=" * 60)
    for record in result:
        print(f"\nComponent: {record['component']} ({record['component_description']})")
        print(f"Requirement: {record['requirement']}")
        print(f"Score: {record['score']:.4f}")
        print(f"Text: {record['chunk_text'][:150]}...")

## When to Use Full-Text vs. Vector Search

| Use Case | Best Approach |
|----------|---------------|
| "Find requirements about cooling" | **Vector search** — semantic understanding |
| "Find chunks containing 'HVB_3900'" | **Full-text search** — exact ID match |
| "Requirements mentioning ISO 26262" | **Full-text search** — specific standard |
| "What are the thermal management specs?" | **Vector search** — conceptual query |
| "Find 'battery' but not 'module'" | **Full-text search** — boolean filtering |
| "Requirements similar to energy density" | **Vector search** — similarity |
| "Find misspelled 'battrey'" | **Full-text search** — fuzzy matching |

In practice, the best results often come from **combining both approaches** — which is exactly what the HybridRetriever does in the next notebook.

## Summary

In this notebook, you learned how full-text search complements vector search:

1. **Full-text indexes** — Created `requirement_text` (on chunk content) and `search_entities` (on component/requirement names) for keyword-based retrieval.

2. **Search patterns** — Basic term matching, fuzzy matching (~) for typos, wildcards (*) for prefixes, and boolean operators (AND, OR, NOT) for precise filtering.

3. **Graph traversal** — Combined full-text search results with graph traversal to find parent requirements and components, just like with vector search.

4. **Complementary strengths** — Vector search excels at semantic similarity; full-text search excels at exact terms, IDs, standards, and boolean filtering.

**Key takeaway:** Full-text search is your go-to when you know the specific terms you're looking for. Vector search is better when you're searching by concept or meaning. The next notebook combines both for the best of both worlds.

---

**Next:** [Hybrid Search](06_hybrid_search.ipynb)

In [None]:
# Cleanup
neo4j.close()