# Neo4j GraphRAG Retriever Notebook (with Explanations)

This notebook demonstrates how to use Neo4j and GraphRAG for retrieval-augmented generation (RAG) with asset manager and cybersecurity risk data.

**Sections include:**
- Environment and connection setup
- LLM and embedder initialization
- Vector and Cypher retrievers
- Diagnostics for vector search
- Example retrieval patterns
- Common troubleshooting tips

Each code cell is accompanied by a markdown explanation to help you understand what it does and how to adapt it for your own use cases.


## 1. Environment and Connection Setup

Load environment variables and connect to Neo4j and OpenAI. This ensures your credentials are not hardcoded in the notebook.


In [None]:
import os
from dotenv import load_dotenv
from neo4j import GraphDatabase

load_dotenv()
NEO4J_URI = os.getenv('NEO4J_URI')
NEO4J_USER = os.getenv('NEO4J_USERNAME')
NEO4J_PASSWORD = os.getenv('NEO4J_PASSWORD')
openai_api_key = os.getenv('OPENAI_API_KEY')


driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USER, NEO4J_PASSWORD))


## 2. LLM and Embedder Initialization

Set up the language model and embedding model for use in retrieval and generation.


In [18]:
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.embeddings import OpenAIEmbeddings

llm = OpenAILLM(model_name='gpt-3.5-turbo', api_key=openai_api_key)
embedder = OpenAIEmbeddings(api_key=openai_api_key)


## 3. Vector Retriever

The vector retriever lets you search for the most relevant chunks of text using semantic similarity.


In [19]:
from neo4j_graphrag.retrievers import VectorRetriever

vector_retriever = VectorRetriever(
    driver=driver,
    index_name='chunkEmbeddings',
    embedder=embedder,
    return_properties=['text']
)


## 4. Diagnostic: What Chunks Does the Vector Search Return?

This cell helps you debug if your vector search is returning any chunks for a query. If no chunks are returned, check your index, embeddings, or try a broader query.


In [17]:
query = 'apple risk factors'
vector_result = vector_retriever.search(query_text=query, top_k=10)
docs = getattr(vector_result, 'documents', [])

if docs:
    for i, doc in enumerate(docs):
        chunk_id = getattr(doc, 'id', None) or getattr(doc, 'element_id', None) or (doc.get('element_id') if isinstance(doc, dict) else None)
        preview = getattr(doc, 'text', str(doc))[:80]
        print(f'Chunk {i+1}: {chunk_id} | {preview}')
else:
    print('No chunks returned by vector search.')


No chunks returned by vector search.


## 5. VectorCypherRetriever: Cypher-Augmented Retrieval

Use this retriever to run Cypher queries starting from vector-retrieved chunks. Adjust the Cypher pattern to fit your data model.


In [14]:
from neo4j_graphrag.retrievers import VectorCypherRetriever
from neo4j_graphrag.generation.graphrag import GraphRAG

detail_context_query = """
    MATCH (node)-[:FROM_DOCUMENT]-(doc:Document)-[:FILED]-(company:Company)
    OPTIONAL MATCH (company)-[:FACES_RISK]-(r:RiskFactor)
    RETURN node.text AS context, company.name AS company_name, collect(DISTINCT r) AS risks
    LIMIT 10
"""
detail_context_cypher_retriever = VectorCypherRetriever(
    driver=driver,
    index_name='chunkEmbeddings',
    embedder=embedder,
    retrieval_query=detail_context_query
)
detail_context_cypher_rag = GraphRAG(llm=llm, retriever=detail_context_cypher_retriever)
response = detail_context_cypher_rag.search(query, retriever_config={'top_k': 10})
print(response)

answer='The provided context details the cybersecurity risks faced by companies such as PayPal, Microsoft Corp, NVIDIA Corporation, Apple Inc, and Amazon. The risks mentioned include cyberattacks, security breaches, data protection issues, system interruptions, and threats related to intellectual property and personal data. The companies are at risk of financial losses, reputational harm, and disruptions to their operations due to increasingly sophisticated cyber threats and vulnerabilities in their systems.\n\nOverall, the companies are focused on implementing security measures, enhancing their technology infrastructure, and complying with evolving regulations to mitigate these risks. However, the dynamic nature of cyber threats poses ongoing challenges that require continuous monitoring, improvement of security controls, and investment in cybersecurity practices to protect sensitive information and ensure business continuity.' retriever_result=None


## 6. Text2CypherRetriever: Natural Language to Cypher

This retriever converts a natural language question into a Cypher query using the LLM and your schema.

**Note:** If you get a syntax error about triple backticks, the LLM may have returned the Cypher inside a markdown code block. Remove the backticks before running, or adjust the prompt to instruct the LLM not to use code blocks.


In [None]:
from neo4j_graphrag.retrievers import Text2CypherRetriever

text2cypher_retriever = Text2CypherRetriever(
    driver=driver,
    llm=llm,
    schema="""
        (:Chunk)-[:FROM_DOCUMENT]-(:Document)-[:FILED]-(:Cusip6)-[:HAS_CUSIP]-(:Company)
        (:Company)-[:FACES_RISK]-(:RiskFactor)
        (:Company)-[:MANAGED_BY]-(:AssetManager)
    """
)
query = 'Show me all companies managed by BlackRock and the risk factors they face.'
response = text2cypher_retriever.search(query)
if hasattr(response, 'results') and response.results:
    print('--- Text2CypherRetriever Results ---')
    for i, row in enumerate(response.results):
        print(f'Result {i+1}: {row}')
else:
    print('No results found from Text2CypherRetriever. The LLM may not have generated a matching Cypher query or there may be no matching data.')


## 7. Troubleshooting Tips

- If no chunks are returned by vector search, try a broader query or check your index/embedding setup.
- If you get Cypher syntax errors, check for markdown code blocks in the generated Cypher.
- Use the notebook's diagnostic cells to debug data flow and retrieval patterns.
- Check your Neo4j schema and data with Cypher queries in the Neo4j Browser.
