# Graph-Enhanced Retrieval with VectorCypherRetriever

This is where the "Graph" in GraphRAG really shines. The `VectorCypherRetriever` combines vector search with custom Cypher queries, allowing you to traverse graph relationships and gather additional context beyond just the matched chunks.

**Why this matters:** A single chunk might contain the answer to your question, but the manufacturing traceability graph connects that chunk to its parent requirement, the component it belongs to, and the broader product context. By using graph traversal, you can automatically include this rich context in every retrieval.

**Prerequisites:** Complete [02 Embeddings](02_embeddings.ipynb) first to populate the graph with embeddings and create the vector index.

**Learning Objectives:**
- Write custom Cypher retrieval queries for manufacturing traceability
- Use VectorCypherRetriever for graph-enhanced retrieval
- Implement the expanded context window pattern
- Compare standard vs. graph-enhanced retrieval

## Install Dependencies

First, install the required packages. This only needs to be run once per session.

In [None]:
# Install neo4j-graphrag with Bedrock support
%pip install "neo4j-graphrag[bedrock] @ git+https://github.com/neo4j-partners/neo4j-graphrag-python.git@bedrock-embeddings" python-dotenv pydantic-settings nest-asyncio -q

In [None]:
from neo4j_graphrag.retrievers import VectorRetriever, VectorCypherRetriever
from neo4j_graphrag.generation import GraphRAG

from data_utils import Neo4jConnection, get_llm, get_embedder

## Connect to Neo4j

Create and verify the connection to your Neo4j graph database.

In [None]:
neo4j = Neo4jConnection().verify()
driver = neo4j.driver

## Initialize LLM and Embedder

Set up the Large Language Model (LLM) and the embedding model from AWS Bedrock. Both are configured in `CONFIG.txt` via `MODEL_ID` and `EMBEDDING_MODEL_ID`.

In [None]:
# Initialize LLM and Embedder from AWS Bedrock
llm = get_llm()
embedder = get_embedder()

print(f"LLM: {llm.model_id}")
print(f"Embedder: {embedder.model_id}")

## VectorCypherRetriever with Custom Query

The `VectorCypherRetriever` adds a key capability: after finding chunks via vector search, it runs a **custom Cypher query** starting from those matched chunks. This lets you:

- Traverse to the parent Requirement for context
- Follow the traceability chain to the Component and TechnologyDomain
- Follow `NEXT_CHUNK` relationships to get surrounding context
- Gather any other graph data connected to the matched chunks

The `node` variable in your Cypher query refers to each chunk returned by the vector search.

In [None]:
# Custom Cypher query that returns chunk context with requirement and component info
context_query = """
MATCH (node)<-[:HAS_CHUNK]-(req:Requirement)
OPTIONAL MATCH (comp:Component)-[:COMPONENT_HAS_REQ]->(req)
OPTIONAL MATCH (prev:Chunk)-[:NEXT_CHUNK]->(node)
OPTIONAL MATCH (node)-[:NEXT_CHUNK]->(next:Chunk)
RETURN 
    node.text AS context,
    req.name AS requirement,
    comp.name AS component,
    comp.description AS component_description,
    node.index AS chunk_index,
    prev.text AS previous_chunk,
    next.text AS next_chunk
"""

vector_cypher_retriever = VectorCypherRetriever(
    driver=driver,
    index_name='requirement_embeddings',
    embedder=embedder,
    retrieval_query=context_query
)

print("VectorCypherRetriever initialized!")

**Understanding the Retrieval Query:**

```cypher
MATCH (node)<-[:HAS_CHUNK]-(req:Requirement)              -- Find the parent requirement
OPTIONAL MATCH (comp:Component)-[:COMPONENT_HAS_REQ]->(req) -- Find the component
OPTIONAL MATCH (prev:Chunk)-[:NEXT_CHUNK]->(node)          -- Find the previous chunk
OPTIONAL MATCH (node)-[:NEXT_CHUNK]->(next:Chunk)          -- Find the next chunk
RETURN ...
```

The query returns:
1. `node.text` - The matched chunk's text (what vector search found)
2. `req.name` - Which requirement this chunk belongs to
3. `comp.name` - Which component the requirement is for (e.g., HVB_3900)
4. `comp.description` - The component description (e.g., "High-Voltage Battery")
5. `prev.text` / `next.text` - Adjacent chunks for context expansion

> **Why OPTIONAL MATCH?** The first and last chunks don't have previous/next neighbors. OPTIONAL MATCH returns NULL instead of failing.

Now let's use this retriever in a GraphRAG pipeline:

In [None]:
# Initialize GraphRAG and Perform Search
query = "What are the cooling requirements for the high-voltage battery?"

rag = GraphRAG(llm=llm, retriever=vector_cypher_retriever)
response = rag.search(query, retriever_config={"top_k": 3}, return_context=True)

print(f"Query: \"{query}\"")
print(f"Number of results returned: {len(response.retriever_result.items)}\n")
print("Answer:")
print(response.answer)

## Inspecting Retrieved Context

One of the best ways to debug and improve your RAG system is to inspect what context is actually being passed to the LLM. Let's look at what the VectorCypherRetriever returned:

In [None]:
# View the context used in this query
print("Retrieved Context:")
print("=" * 60)
for i, item in enumerate(response.retriever_result.items):
    print(f"\n[Result {i+1}]")
    print(item.content)

## Expanded Context Window Pattern

The previous query returned separate fields for prev/current/next chunks plus component metadata. A more powerful pattern is to **concatenate them into a single expanded context string** enriched with manufacturing metadata. This gives the LLM a larger window of continuous text plus structural context to work with.

Think of it like this: if your chunks are 400 characters each, this pattern gives the LLM ~1200 characters of context (prev + current + next) for each vector match, **plus** the requirement name and component information. The LLM can then answer not just "what does the text say?" but "which component and requirement is this about?"

In [None]:
# Query that combines current chunk with adjacent chunks and manufacturing context
expanded_context_query = """
MATCH (node)<-[:HAS_CHUNK]-(req:Requirement)
OPTIONAL MATCH (comp:Component)-[:COMPONENT_HAS_REQ]->(req)
OPTIONAL MATCH (prev:Chunk)-[:NEXT_CHUNK]->(node)
OPTIONAL MATCH (node)-[:NEXT_CHUNK]->(next:Chunk)
WITH node, req, comp, prev, next
RETURN 
    'Component: ' + COALESCE(comp.name, 'N/A') + ' (' + COALESCE(comp.description, '') + ')' +
    '\nRequirement: ' + COALESCE(req.name, 'N/A') +
    '\nContent: ' + COALESCE(prev.text + ' ', '') + node.text + COALESCE(' ' + next.text, '') 
    AS expanded_context,
    req.name AS requirement_name,
    node.index AS center_chunk_index
"""

expanded_retriever = VectorCypherRetriever(
    driver=driver,
    index_name='requirement_embeddings',
    embedder=embedder,
    retrieval_query=expanded_context_query
)

# Test with expanded context
query = "What safety standards must the battery system comply with?"
rag_expanded = GraphRAG(llm=llm, retriever=expanded_retriever)
response = rag_expanded.search(query, retriever_config={"top_k": 2}, return_context=True)

print(f"Query: \"{query}\"\n")
print("Answer:")
print(response.answer)

In [None]:
# View the expanded context
print("\nExpanded Context (includes adjacent chunks):")
print("=" * 60)
for i, item in enumerate(response.retriever_result.items):
    print(f"\n[Result {i+1}]")
    print(item.content)

## Comparing Standard vs Expanded Context

Let's see the difference in practice. We'll ask the same question using:
1. **Standard VectorRetriever** - Returns only the matched chunks
2. **VectorCypherRetriever with expanded context** - Returns matched chunks + neighbors

Notice how the expanded context often produces more complete, nuanced answers because the LLM has more information to work with.

In [None]:
# Standard retriever (no graph traversal)
standard_retriever = VectorRetriever(
    driver=driver,
    index_name='requirement_embeddings',
    embedder=embedder,
    return_properties=['text']
)

query = "What are the cooling system specifications?"

# Standard retriever
print("=== Standard VectorRetriever ===")
rag_standard = GraphRAG(llm=llm, retriever=standard_retriever)
response_standard = rag_standard.search(query, retriever_config={"top_k": 2})
print(response_standard.answer)

# Expanded context retriever
print("\n=== VectorCypherRetriever (Expanded Context) ===")
response_expanded = rag_expanded.search(query, retriever_config={"top_k": 2})
print(response_expanded.answer)

## Summary

In this notebook, you learned the most powerful retrieval pattern in GraphRAG:

1. **VectorCypherRetriever** - Combines vector search with custom Cypher queries. The `node` variable in your query represents each chunk found by vector search.

2. **Manufacturing traceability traversal** - From a matched chunk, you can traverse to the parent Requirement, the Component it belongs to, and the broader product context. This adds structured metadata to unstructured text search.

3. **Expanded context windows** - By concatenating adjacent chunks and adding component/requirement metadata, you give the LLM more context while maintaining precise vector search. This often dramatically improves answer quality.

4. **The power of graphs** - This is what separates GraphRAG from simple vector stores. The relationships in your graph (HAS_CHUNK, COMPONENT_HAS_REQ, NEXT_CHUNK) enable retrieval patterns that aren't possible with vectors alone.

**Key takeaway:** Vector search finds the needle in the haystack. Graph traversal provides the manufacturing context around the needle — which component, which requirement, what adjacent specifications — so the LLM understands what it found.

---

**Next:** [Full-Text Search](05_fulltext_search.ipynb) to learn keyword-based search patterns, or skip ahead to [Hybrid Search](06_hybrid_search.ipynb) to combine vector and full-text search in a single pipeline.

Continue to [Lab 6 - Neo4j MCP Agent](../Lab_6_Neo4j_MCP_Agent/README.md) to learn how to build agents that query the knowledge graph using the Model Context Protocol.

In [None]:
# Cleanup
neo4j.close()