# Hybrid Search with HybridRetriever

In the previous notebooks, you used **vector search** (semantic similarity) and **full-text search** (keyword matching) separately. Each has strengths:

- **Vector search** finds content by meaning — "thermal management" matches "cooling system"
- **Full-text search** finds exact terms — "HVB_3900" matches only that specific ID

The `HybridRetriever` from neo4j-graphrag **combines both approaches** in a single query, giving you the best of both worlds. It merges results from a vector index and a full-text index, using a configurable **alpha** parameter to balance between them.

**Prerequisites:** Complete [02 Embeddings](02_embeddings.ipynb) and [05 Full-Text Search](05_fulltext_search.ipynb) first.

**Learning Objectives:**
- Use `HybridRetriever` to combine vector and full-text search
- Tune the `alpha` parameter to balance semantic vs. keyword matching
- Use `HybridCypherRetriever` for graph-enhanced hybrid search
- Build a complete GraphRAG pipeline with hybrid retrieval

## Install Dependencies

First, install the required packages. This only needs to be run once per session.

In [None]:
# Install neo4j-graphrag with Bedrock support
%pip install "neo4j-graphrag[bedrock] @ git+https://github.com/neo4j-partners/neo4j-graphrag-python.git@bedrock-embeddings" python-dotenv pydantic-settings nest-asyncio -q

In [None]:
from neo4j_graphrag.retrievers import HybridRetriever, HybridCypherRetriever
from neo4j_graphrag.generation import GraphRAG

from data_utils import Neo4jConnection, get_llm, get_embedder

## Connect to Neo4j

Create and verify the connection to your Neo4j graph database.

In [None]:
neo4j = Neo4jConnection().verify()
driver = neo4j.driver

## Initialize LLM and Embedder

Set up the LLM and embedding model from AWS Bedrock.

In [None]:
llm = get_llm()
embedder = get_embedder()

print(f"LLM: {llm.model_id}")
print(f"Embedder: {embedder.model_id}")

## How Hybrid Search Works

The `HybridRetriever` runs **two searches in parallel**:

1. **Vector search** on the `requirement_embeddings` index — finds semantically similar chunks
2. **Full-text search** on the `requirement_text` index — finds keyword-matching chunks

It then **merges and ranks** the results. The `alpha` parameter controls the balance:

```
alpha = 1.0  →  100% vector search (pure semantic)
alpha = 0.5  →  50/50 blend (balanced)
alpha = 0.0  →  100% full-text search (pure keyword)
```

This lets you tune retrieval for your use case. For manufacturing requirements, a balanced approach often works best — semantic understanding captures conceptual queries while keyword matching catches specific IDs and standards.

## Initialize HybridRetriever

Create a `HybridRetriever` that combines both search approaches.

In [None]:
hybrid_retriever = HybridRetriever(
    driver=driver,
    vector_index_name='requirement_embeddings',
    fulltext_index_name='requirement_text',
    embedder=embedder,
    return_properties=['text']
)

print("HybridRetriever initialized!")

## Basic Hybrid Search

Let's start with a basic hybrid search to see the combined results.

In [None]:
# Basic hybrid search
query = "What are the thermal management requirements for the battery?"
result = hybrid_retriever.search(query_text=query, top_k=5)

print(f"Query: \"{query}\"")
print(f"Results: {len(result.items)}\n")

for i, item in enumerate(result.items):
    score = item.metadata.get('score', 'N/A')
    content_preview = str(item.content)[:120]
    print(f"[{i+1}] Score: {score:.4f} | {content_preview}...")

## Alpha Tuning

The `alpha` parameter is the key to tuning hybrid search. Let's compare different alpha values on the same query to see how the balance affects results.

- **High alpha (0.9)**: Emphasizes vector/semantic search — better for conceptual queries
- **Balanced alpha (0.5)**: Equal weight — good default for mixed queries
- **Low alpha (0.1)**: Emphasizes full-text/keyword search — better for specific terms

In [None]:
# Compare different alpha values
query = "battery coolant specifications"

print(f"Query: \"{query}\"\n")

for alpha in [0.0, 0.5, 1.0]:
    label = {0.0: "Full-text only", 0.5: "Balanced", 1.0: "Vector only"}[alpha]
    print(f"\n--- Alpha: {alpha} ({label}) ---")
    
    result = hybrid_retriever.search(
        query_text=query,
        top_k=3,
    )
    
    for i, item in enumerate(result.items):
        score = item.metadata.get('score', 'N/A')
        content_preview = str(item.content)[:100]
        print(f"  [{i+1}] Score: {score:.4f} | {content_preview}...")

## HybridCypherRetriever: Adding Graph Context

Just like `VectorCypherRetriever` adds graph traversal to vector search, `HybridCypherRetriever` adds graph traversal to hybrid search. This gives you the **triple benefit**:

1. **Semantic matching** from vector search
2. **Keyword matching** from full-text search
3. **Graph context** from Cypher traversal

The retrieval query uses the same `node` variable pattern as VectorCypherRetriever.

In [None]:
# Custom Cypher query for graph-enhanced hybrid retrieval
context_query = """
MATCH (node)<-[:HAS_CHUNK]-(req:Requirement)
OPTIONAL MATCH (comp:Component)-[:COMPONENT_HAS_REQ]->(req)
OPTIONAL MATCH (prev:Chunk)-[:NEXT_CHUNK]->(node)
OPTIONAL MATCH (node)-[:NEXT_CHUNK]->(next:Chunk)
WITH node, req, comp, prev, next
RETURN 
    'Component: ' + COALESCE(comp.name, 'N/A') + ' (' + COALESCE(comp.description, '') + ')' +
    '\nRequirement: ' + COALESCE(req.name, 'N/A') +
    '\nContent: ' + COALESCE(prev.text + ' ', '') + node.text + COALESCE(' ' + next.text, '') 
    AS context,
    req.name AS requirement_name
"""

hybrid_cypher_retriever = HybridCypherRetriever(
    driver=driver,
    vector_index_name='requirement_embeddings',
    fulltext_index_name='requirement_text',
    embedder=embedder,
    retrieval_query=context_query
)

print("HybridCypherRetriever initialized!")

In [None]:
# Test hybrid cypher retrieval
query = "What safety standards must the battery system comply with?"

result = hybrid_cypher_retriever.search(query_text=query, top_k=3)

print(f"Query: \"{query}\"")
print(f"Results: {len(result.items)}\n")

for i, item in enumerate(result.items):
    print(f"[Result {i+1}]")
    print(item.content)
    print()

## Complete GraphRAG Pipeline with Hybrid Search

Now let's build a complete question-answering pipeline using hybrid search. This combines:
- Hybrid retrieval (vector + full-text + graph traversal)
- LLM generation for natural language answers

This is the most powerful retrieval configuration available — it handles both conceptual questions and specific term lookups, with rich manufacturing context from graph traversal.

In [None]:
# Build complete GraphRAG pipeline with hybrid retrieval
rag = GraphRAG(llm=llm, retriever=hybrid_cypher_retriever)

query = "What are the cooling system specifications for the high-voltage battery?"
response = rag.search(query, retriever_config={"top_k": 3}, return_context=True)

print(f"Query: \"{query}\"\n")
print("Answer:")
print(response.answer)

In [None]:
# View the retrieved context used by the LLM
print("Retrieved Context:")
print("=" * 60)
for i, item in enumerate(response.retriever_result.items):
    print(f"\n[Result {i+1}]")
    print(item.content)

## Try Different Queries

Experiment with queries that benefit from hybrid search. Notice how some queries are more conceptual (benefit from vector) while others contain specific terms (benefit from full-text).

In [None]:
queries = [
    "What are the energy density specifications for battery cells?",
    "How is the battery pack protected against water ingress?",
    "What BMS safety monitoring is required?",
]

for query in queries:
    print(f"\nQuery: \"{query}\"")
    print("-" * 60)
    response = rag.search(query, retriever_config={"top_k": 3})
    print(f"Answer: {response.answer}")

## Summary

In this notebook, you built the most comprehensive retrieval system in the workshop:

1. **HybridRetriever** — Combines vector search (semantic) with full-text search (keyword) in a single query. The `alpha` parameter lets you tune the balance.

2. **HybridCypherRetriever** — Adds graph traversal on top of hybrid search, enriching results with component names, requirement metadata, and adjacent chunks from the manufacturing traceability graph.

3. **Alpha tuning** — Higher alpha emphasizes semantic matching (good for conceptual queries); lower alpha emphasizes keyword matching (good for specific terms and IDs).

4. **Complete GraphRAG pipeline** — Hybrid retrieval + LLM generation gives the best question-answering performance by leveraging all three dimensions: semantic similarity, keyword relevance, and graph structure.

### Retriever Selection Guide

| Retriever | When to Use |
|-----------|-------------|
| `VectorRetriever` | Simple semantic search |
| `VectorCypherRetriever` | Semantic search + graph context |
| `HybridRetriever` | Semantic + keyword search |
| `HybridCypherRetriever` | Semantic + keyword + graph context (most powerful) |

---

**Congratulations!** You've completed the GraphRAG labs. You now know how to:
- Load structured manufacturing data into Neo4j as a traceability graph
- Create vector embeddings and full-text indexes on requirement descriptions
- Build GraphRAG pipelines with Vector, VectorCypher, and Hybrid retrievers
- Enhance retrieval with custom Cypher queries that traverse the manufacturing traceability chain

Continue to [Lab 6 - Neo4j MCP Agent](../Lab_6_Neo4j_MCP_Agent/README.md) to learn how to build agents that query the knowledge graph using the Model Context Protocol.

In [None]:
# Cleanup
neo4j.close()