# GraphRAG Retrievers for Aircraft Maintenance

This notebook demonstrates retrieval strategies for GraphRAG applications, progressing from simple vector search to graph-enhanced retrieval that leverages your aircraft topology.

**Prerequisites:** Complete [03 Data and Embeddings](03_data_and_embeddings.ipynb) first.

**Learning Objectives:**
- Set up a VectorRetriever using Neo4j's vector index
- Perform semantic similarity searches over maintenance procedures
- Use GraphRAG to combine vector search with LLM-generated answers
- Create custom Cypher queries with VectorCypherRetriever for richer context
- Connect maintenance knowledge to your aircraft topology (Aircraft -> System -> Component)

---

## Retrieval Strategies Overview

We'll explore two retrieval approaches:

1. **VectorRetriever** - Simple semantic search using embeddings
   - Finds maintenance procedures by meaning similarity
   - Returns raw text for LLM context

2. **VectorCypherRetriever** - Graph-enhanced semantic search
   - Uses vector search as entry point
   - Traverses graph relationships to aircraft topology
   - Returns structured data (systems, components) alongside text

## Section 1: Configuration

Enter your Neo4j Aura connection details below (same credentials as notebook 03).

In [None]:
# ============================================
# CONFIGURATION - Enter your Neo4j credentials
# ============================================

NEO4J_URI = ""  # e.g., "neo4j+s://xxxxxxxx.databases.neo4j.io"
NEO4J_USERNAME = "neo4j"
NEO4J_PASSWORD = ""  # Your password from Lab 1

# Validate configuration
if not NEO4J_URI or not NEO4J_PASSWORD:
    print("WARNING: Please enter your Neo4j credentials above before running the notebook!")
else:
    print("Configuration ready!")
    print(f"Neo4j URI: {NEO4J_URI}")

## Setup

Import required modules and initialize connections.

In [None]:
from neo4j_graphrag.retrievers import VectorRetriever, VectorCypherRetriever
from neo4j_graphrag.generation import GraphRAG

from data_utils import Neo4jConnection, get_llm, get_embedder, EMBEDDING_DIMENSIONS

## Connect to Neo4j

Create and verify the connection to your Neo4j graph database using the credentials from Section 1.

In [None]:
neo4j = Neo4jConnection(uri=NEO4J_URI, username=NEO4J_USERNAME, password=NEO4J_PASSWORD).verify()
driver = neo4j.driver

# Show graph statistics
neo4j.get_graph_stats()

## Initialize LLM and Embedder

Set up the Large Language Model (LLM) and embedding model for GraphRAG workflows.

- **LLM**: Uses Databricks Foundation Model APIs (Llama 3.3 70B)
- **Embedder**: Uses Databricks Foundation Model APIs (BGE-large)

In [None]:
llm = get_llm()
embedder = get_embedder()

print(f"LLM initialized: {llm.model_id}")
print(f"Embedder initialized: {embedder.model_id}")

---

# Part 1: Vector Retriever

The VectorRetriever performs semantic search over your Neo4j knowledge graph. Instead of keyword matching, it finds the most contextually similar maintenance procedures to your query.

## Initialize Vector Retriever

Set up the vector-based retriever for semantic search over maintenance chunks.

In [None]:
INDEX_NAME = "maintenanceChunkEmbeddings"

vector_retriever = VectorRetriever(
    driver=driver,
    index_name=INDEX_NAME,
    embedder=embedder,
    return_properties=['text']
)

print("VectorRetriever initialized")

The **VectorRetriever** class:
- Connects to Neo4j using the provided `driver`
- Uses the `maintenanceChunkEmbeddings` vector index for semantic retrieval
- The `embedder` generates embeddings for the query
- Returns the `text` property from matching chunks

> **Tip:** You can modify the `return_properties` list to include additional properties like `index` for chunk ordering.

## Simple Vector Search

Test the vector search by retrieving the top 5 most relevant maintenance procedures for a given query.

In [None]:
query = "What are the steps to troubleshoot engine vibration?"
result = vector_retriever.search(query_text=query, top_k=5)

print(f"Query: \"{query}\"\n")
print(f"Number of results returned: {len(result.items)}\n")
print("=" * 70)

for item in result.items:
    print(f"\nScore: {item.metadata['score']:.4f}")
    print(f"Content: {item.content[0:200]}...")
    print(f"ID: {item.metadata['id']}")

**How it works:**
1. The query is converted to an embedding vector
2. `vector_retriever.search()` finds the top 5 matches based on vector similarity
3. Results show the similarity score, content snippet, and chunk ID

> **Tip:** Inspecting returned results helps verify relevance and adjust your chunking or embedding strategy.

## GraphRAG Pipeline

The `GraphRAG` class combines a Large Language Model (LLM) with a vector-based retriever to answer maintenance questions using both semantic search and generative reasoning.

In [None]:
query = "What are the normal EGT operating limits for the V2500 engine at different power settings?"

rag = GraphRAG(
    llm=llm,
    retriever=vector_retriever
)

response = rag.search(
    query,
    retriever_config={"top_k": 5},
    return_context=True,
    response_fallback="No relevant maintenance procedures found.",
)

print(f"Query: \"{query}\"\n")
print(f"Number of chunks used: {len(response.retriever_result.items)}\n")
print("=" * 70)
print("\nAnswer:")
print(response.answer)

**How it works:**
1. The retriever (`vector_retriever`) finds the most relevant maintenance chunks
2. The LLM uses the retrieved context to generate a natural language answer
3. The `return_context=True` option lets you see what context was used

The GraphRAG pipeline provides context-aware, accurate answers grounded in your maintenance documentation.

---

**Try different queries:**
- What should I check if there's a fuel starvation warning?
- How often should the hydraulic fluid be sampled?
- What fault codes indicate bearing wear?

---

# Part 2: Vector Cypher Retriever

The VectorCypherRetriever enhances vector search with custom Cypher queries, enabling you to traverse graph relationships and return richer, more contextual answers.

This approach is ideal when:
- Questions involve relationships between maintenance procedures and aircraft components
- You want structured data alongside text context
- Graph traversal can connect documentation to your aircraft topology

## Example 1: Document Context Enrichment

Create a VectorCypherRetriever that returns document metadata alongside the matching chunks, providing context about the source.

In [None]:
# Custom Cypher query to enrich results with document metadata
document_context_query = """
MATCH (node)-[:FROM_DOCUMENT]->(doc:Document)
RETURN 
    doc.documentId AS document_id,
    doc.aircraftType AS aircraft_type,
    doc.title AS document_title,
    node.index AS chunk_index,
    node.text AS context
"""

document_retriever = VectorCypherRetriever(
    driver=driver,
    index_name=INDEX_NAME,
    embedder=embedder,
    retrieval_query=document_context_query
)

print("VectorCypherRetriever initialized with document context query")

**How this query works:**

- Matches text chunks (`node`) to their source document
- Returns: document ID, aircraft type, title, chunk index, and context text

This provides traceability back to the source document for each retrieved chunk.

In [None]:
query = "What are the hydraulic system pressure limits?"

rag = GraphRAG(llm=llm, retriever=document_retriever)
response = rag.search(
    query,
    retriever_config={"top_k": 3},
    return_context=True,
    response_fallback="No relevant maintenance procedures found.",
)

print(f"Query: \"{query}\"\n")
print(f"Number of results: {len(response.retriever_result.items)}\n")
print("=" * 70)
print("\nAnswer:")
print(response.answer)

In [None]:
# View the enriched context used by the LLM
print("Context used:")
print("=" * 70)
for item in response.retriever_result.items:
    print(f"\n{item.content}")

Notice how the context includes document metadata (document ID, aircraft type, title) alongside the text. This enables more specific answers with source attribution.

> **Tip:** Modify `top_k` to see how changing the result count affects answer quality.

## Example 2: Adjacent Chunk Retrieval

Leverage the `NEXT_CHUNK` relationships to retrieve surrounding context, providing the LLM with more complete procedure information.

In [None]:
# Custom Cypher to include previous and next chunks for better context
adjacent_chunks_query = """
WITH node
OPTIONAL MATCH (prev:Chunk)-[:NEXT_CHUNK]->(node)
OPTIONAL MATCH (node)-[:NEXT_CHUNK]->(next:Chunk)
MATCH (node)-[:FROM_DOCUMENT]->(doc:Document)
RETURN 
    doc.documentId AS document_id,
    node.index AS chunk_index,
    COALESCE(prev.text, '') AS previous_context,
    node.text AS main_context,
    COALESCE(next.text, '') AS next_context
"""

adjacent_retriever = VectorCypherRetriever(
    driver=driver,
    index_name=INDEX_NAME,
    embedder=embedder,
    retrieval_query=adjacent_chunks_query
)

print("VectorCypherRetriever initialized with adjacent chunks query")

In [None]:
query = "How do I perform the engine vibration diagnostic flow?"

rag = GraphRAG(llm=llm, retriever=adjacent_retriever)
response = rag.search(
    query,
    retriever_config={"top_k": 3},
    return_context=True,
    response_fallback="No relevant maintenance procedures found.",
)

print(f"Query: \"{query}\"\n")
print(f"Number of results: {len(response.retriever_result.items)}\n")
print("=" * 70)
print("\nAnswer:")
print(response.answer)

**How this works:**

1. **Semantic Search:** Finds top-k text chunks relevant to the query
2. **Graph Traversal:** For each matched chunk:
   - Follows `NEXT_CHUNK` backward to get previous context
   - Follows `NEXT_CHUNK` forward to get next context
3. **Returns:** Previous, main, and next chunks as combined context

**Why this is powerful:**
- Procedures often span multiple chunks
- Adjacent context provides complete step sequences
- Decision trees and troubleshooting flows are better understood with surrounding content

## Example 3: Connecting to Aircraft Topology

This powerful example demonstrates how to connect maintenance documentation chunks to your aircraft knowledge graph from Lab 5. We'll extract system references from the chunk text and match them to actual System nodes.

> **Note:** This example uses pattern matching on text content to find systems. In production, you would create explicit relationships between chunks and systems during ingestion.

In [None]:
# Query that connects chunks to aircraft systems via keyword matching
# In production, create explicit (:Chunk)-[:REFERENCES]->(:System) relationships instead
system_context_query = """
WITH node
MATCH (node)-[:FROM_DOCUMENT]->(doc:Document)

// Find systems whose name matches keywords in the chunk text
CALL (node) {
    MATCH (a:Aircraft)-[:HAS_SYSTEM]->(s:System)
    WHERE 
        (node.text CONTAINS 'Engine' AND s.name CONTAINS 'Engine') OR
        (node.text CONTAINS 'Avionics' AND s.name CONTAINS 'Avionics') OR
        (node.text CONTAINS 'Hydraulic' AND s.name CONTAINS 'Hydraulic')
    RETURN a.tail_number AS tail_number, s.name AS system_name
    ORDER BY tail_number, system_name
    LIMIT 3
}

WITH node, doc, COLLECT({aircraft: tail_number, system: system_name}) AS related_systems

RETURN 
    doc.documentId AS document_id,
    doc.aircraftType AS aircraft_type,
    related_systems,
    node.text AS context
"""

system_retriever = VectorCypherRetriever(
    driver=driver,
    index_name=INDEX_NAME,
    embedder=embedder,
    retrieval_query=system_context_query
)

print("VectorCypherRetriever initialized with system context query")

In [None]:
query = "What maintenance is required for the engine fuel pump?"

rag = GraphRAG(llm=llm, retriever=system_retriever)
response = rag.search(
    query,
    retriever_config={"top_k": 3},
    return_context=True,
    response_fallback="No relevant maintenance procedures found.",
)

print(f"Query: \"{query}\"\n")
print(f"Number of results: {len(response.retriever_result.items)}\n")
print("=" * 70)
print("\nAnswer:")
print(response.answer)

In [None]:
# View the graph-connected context
print("Context with system connections:")
print("=" * 70)
for item in response.retriever_result.items:
    print(f"\n{item.content}")

**How this works:**

1. **Semantic Search:** Finds maintenance chunks about the query topic
2. **Pattern Matching:** Looks for system keywords in the chunk text
3. **Graph Traversal:** Matches to actual System nodes in your aircraft topology
4. **Returns:** Document metadata, related systems, and context text

**Production Enhancement:** Instead of text pattern matching, create explicit relationships during document ingestion:
```cypher
// During ingestion, link chunks to systems they reference
MATCH (c:Chunk), (s:System)
WHERE c.text CONTAINS s.name
CREATE (c)-[:REFERENCES]->(s)
```

---

# Part 3: Comparing Retrieval Strategies

Let's compare the same query using different retrievers to see the difference in context and answers.

In [None]:
comparison_query = "What should I do if the EGT exceeds normal limits?"

print(f"Query: \"{comparison_query}\"")
print("\n" + "=" * 70)

# Basic Vector Retriever
print("\n[1] VECTOR RETRIEVER (text only)")
print("-" * 40)
rag_basic = GraphRAG(llm=llm, retriever=vector_retriever)
response_basic = rag_basic.search(
    comparison_query,
    retriever_config={"top_k": 3},
    return_context=True,
    response_fallback="No relevant maintenance procedures found.",
)
print(response_basic.answer)

# Graph-Enhanced Retriever with adjacent chunks
print("\n" + "=" * 70)
print("\n[2] VECTOR CYPHER RETRIEVER (with adjacent chunks)")
print("-" * 40)
rag_enhanced = GraphRAG(llm=llm, retriever=adjacent_retriever)
response_enhanced = rag_enhanced.search(
    comparison_query,
    retriever_config={"top_k": 3},
    return_context=True,
    response_fallback="No relevant maintenance procedures found.",
)
print(response_enhanced.answer)

**Key Differences:**

| Aspect | VectorRetriever | VectorCypherRetriever |
|--------|-----------------|----------------------|
| Context | Raw text chunks | Text + structured graph data |
| Relationships | Implicit in text | Explicit via Cypher traversal |
| Answer completeness | May miss surrounding steps | Includes adjacent procedures |
| Best for | Quick lookups | Complete procedure retrieval |

**When to use each:**
- **VectorRetriever**: Simple semantic search, quick answers, fact lookups
- **VectorCypherRetriever**: Procedural questions, troubleshooting flows, when you need context from related chunks or graph entities

## Try Your Own Queries

Experiment with different maintenance questions. Here are some to try:

In [None]:
# Try different maintenance queries
sample_queries = [
    "What are the vibration limits that require engine shutdown?",
    "How do I check for hydraulic fluid contamination?",
    "What oil analysis levels indicate bearing wear?",
    "When should I perform a borescope inspection?"
]

# Use the adjacent retriever for better context
rag = GraphRAG(llm=llm, retriever=adjacent_retriever)

for query in sample_queries:
    print(f"\nQ: {query}")
    print("-" * 70)
    response = rag.search(
        query,
        retriever_config={"top_k": 2},
        return_context=True,
        response_fallback="No relevant maintenance procedures found.",
    )
    # Print first 500 chars of answer
    answer = response.answer[:500] + "..." if len(response.answer) > 500 else response.answer
    print(f"A: {answer}\n")

## Summary

In this notebook, you learned retrieval strategies for aircraft maintenance GraphRAG:

**Part 1 - Vector Retriever:**
1. Simple semantic search using vector embeddings
2. GraphRAG pipeline combining retrieval with LLM generation
3. Diagnostic inspection of search results

**Part 2 - Vector Cypher Retriever:**
4. Custom Cypher queries for graph traversal
5. Document metadata enrichment
6. Adjacent chunk retrieval for complete procedures
7. Connecting to aircraft topology (systems, components)

**Part 3 - Comparison:**
8. Understanding when to use each approach
9. Trade-offs between simplicity and context richness

The graph-enhanced approach leverages Neo4j's relationship traversal to provide more complete, contextual answers - particularly valuable for maintenance procedures that span multiple chunks or relate to specific aircraft systems.

---

**Your knowledge graph now combines:**
- **Structured topology**: Aircraft -> System -> Component hierarchy
- **Semantic search**: Maintenance manual chunks with embeddings
- **GraphRAG retrieval**: Context-aware answers grounded in documentation

In [None]:
# Cleanup
neo4j.close()