# Advanced RAG: Contextual Cypher Retrieval

You can improve the vector retriever by using a custom Cypher query to provide richer, more contextual answers.

You will need to:

- Create a Cypher `retrieval_query` that will be used with the retriever
- Use the `VectorCypherRetriever` class to create the retriever
- Create a `GraphRAG` pipeline that uses the retriever

---

Import the required Python modules, load the environment variables, create the connection to the graph, the LLM, and the embedding model.

In [None]:
import sys
sys.path.insert(0, '../new-workshops/solutions')

from neo4j import GraphDatabase
from neo4j_graphrag.retrievers import VectorCypherRetriever
from neo4j_graphrag.generation import GraphRAG

from config import Neo4jConfig, get_llm, get_embedder

neo4j_config = Neo4jConfig()
driver = GraphDatabase.driver(neo4j_config.uri, auth=(neo4j_config.username, neo4j_config.password))

# --- Initialize LLM and Embedder from Microsoft Foundry ---
llm = get_llm()
embedder = get_embedder()

Create a `VectorCypherRetriever` that uses a Cypher query to return additional data.

In [2]:
asset_manager_query = """
MATCH (node)-[:FROM_DOCUMENT]-(doc:Document)-[:FILED]-(company:Company)-[:OWNS]-(manager:AssetManager)
RETURN company.name AS company, manager.managerName AS AssetManagerWithSharesInCompany, node.text AS context
"""

vector_cypher_retriever = VectorCypherRetriever(
    driver=driver,
    index_name='chunkEmbeddings',
    embedder=embedder,
    retrieval_query=asset_manager_query
)

**How it works:**  

- **Custom Cypher Query:**  
  The `asset_manager_query` matches text chunks (`node`) to their source documents, associated companies, and the asset managers.

  It returns:
  
  1. The company name
  2. A list of asset managers associated with the company
  3. The context text from the chunk

- **VectorCypherRetriever:**  
  - Performs semantic search using the `chunkEmbeddings` vector index.
  - Applies the Cypher `retrieval_query` to retrieve relevant context and associated asset managers.

---

Use the `GraphRAG` class to run a pipeline that uses the `vector_cypher_retriever`.

In [3]:
# --- Initialize RAG and Perform Search ---
query = "Who are the asset managers most affected by banking regulations?"

rag = GraphRAG(llm=llm, retriever=vector_cypher_retriever)
response = rag.search(query)
print(response.answer)

The asset managers most affected by banking regulations mentioned in relation to Microsoft Corp include:

1. ALLIANCEBERNSTEIN L.P.
2. AMERIPRISE FINANCIAL INC
3. AMUNDI
4. BANK OF AMERICA CORP /DE/
5. Bank of New York Mellon Corp
6. BlackRock Inc.
7. Capital World Investors
8. FMR LLC
9. GEODE CAPITAL MANAGEMENT, LLC
10. MORGAN STANLEY
11. NORTHERN TRUST CORP
12. STATE STREET CORP
13. WELLINGTON MANAGEMENT GROUP LLP
14. WELLS FARGO & COMPANY/MN

These asset managers have shares in Microsoft Corp and could be impacted by Microsoft's relationship with banking regulations as outlined in the provided context.


The `GraphRAG` pipeline will use the `vector_cypher_retriever` to gain the additional context from the graph. 

The `vector_cypher_retriever` enables highly specific, context-rich answers which leverage the graph relationships and semantic search.

This pattern is ideal when your question is about relationships or context that can be surfaced from relevant passages, and when you want to return both the context and the structured entities connected to it.  

If you ask the same question but don't return any additional context from the graph, the answer is less relevant, more generic, and doesn't include any specific asset managers.

---

You can provide additional parameters to the `search` method to return and customize the number of results returned by the retriever.

In [4]:
# --- Initialize RAG, search with options and return context ---
rag = GraphRAG(llm=llm, retriever=vector_cypher_retriever)
response = rag.search(
    query,
    retriever_config={"top_k": 5},
    return_context=True
    )
print(response.answer)
print("Context:", *response.retriever_result.items, sep="\n\n")

The asset managers most affected by banking regulations are likely to include prominent firms such as AllianceBernstein L.P., Ameriprise Financial Inc., Amundi, Bank of America Corp /DE/, Bank of New York Mellon Corp, BlackRock Inc., Capital World Investors, FMR LLC, Geode Capital Management, LLC, Morgan Stanley, Northern Trust Corp, State Street Corp, Wellington Management Group LLP, and Wells Fargo & Company/MN. These asset managers are listed as having shares in Microsoft Corp, which faces banking regulations that could impact its financial statements.
Context:

content='<Record company=\'MICROSOFT CORP\' AssetManagerWithSharesInCompany=\'ALLIANCEBERNSTEIN L.P.\' context=\'regulation. Adverse outcomes in some or all of these claims may result in significant monetary\\ndamages or injunctive relief that could adversely affect our ability to conduct our business. The\\nlitigation and other claims are subject to inherent uncertainties and management\\\'s view of these\\nmatters may chan

The context is returned and printed to the screen, you can use this information to determine the relevance of the data being sent to the LLM.

Modify the `"top_k"` value from `5` to `10` and then to `20` and review the context returned.

As you increase the number of values returned the results become less and less relevant.

## Finding Shared Risks Among Companies

You can combine semantic search with graph traversal to uncover relationships - specifically, risks that connect major tech companies.

In [5]:
## VectorCypherRetriever Example: Finding Shared Risks Among Companies
vector_company_risk_query = """
WITH node
MATCH (node)-[:FROM_DOCUMENT]-(doc:Document)-[:FILED]-(c1:Company)
MATCH (c1)-[:FACES_RISK]->(risk:RiskFactor)<-[:FACES_RISK]-(c2:Company)
WHERE c1 <> c2
RETURN
  c1.name AS source_company,
  collect(DISTINCT c2.name) AS related_companies,
  collect(DISTINCT risk.name) AS shared_risks
LIMIT 10
"""

vector_cypher_retriever = VectorCypherRetriever(
    driver=driver,
    index_name="chunkEmbeddings",
    embedder=embedder,
    retrieval_query=vector_company_risk_query
)

query = "What risks connect major tech companies?"
rag = GraphRAG(llm=llm, retriever=vector_cypher_retriever)
response = rag.search(
    query,
    retriever_config={"top_k": 5},
    return_context=True
    )
print(response.answer)

The risks that connect major tech companies such as Microsoft, Apple, PayPal, and Amazon include foreign exchange rates, interest rates, supply chain disruptions, and macroeconomic conditions. Additionally, climate change and regulatory environment are shared risks among some of these companies.


In [6]:
# View the context used in this query
print("Context:", *response.retriever_result.items, sep="\n\n")

Context:

content="<Record source_company='MICROSOFT CORP' related_companies=['AMAZON', 'PG&E CORP', 'NVIDIA CORPORATION', 'PAYPAL'] shared_risks=['supply chain disruptions', 'climate change', 'foreign exchange rates', 'interest rates']>" metadata=None

content="<Record source_company='APPLE INC' related_companies=['NVIDIA CORPORATION', 'PAYPAL'] shared_risks=['natural disasters', 'adverse economic conditions', 'macroeconomic conditions', 'Interest Rate Risk', 'Foreign Exchange Rate Risk']>" metadata=None

content="<Record source_company='PAYPAL' related_companies=['APPLE INC', 'NVIDIA CORPORATION', 'MICROSOFT CORP', 'PG&E CORP'] shared_risks=['macroeconomic conditions', 'Interest Rate Risk', 'climate change', 'Regulatory Environment', 'catastrophic event', 'licensing requirements']>" metadata=None

content="<Record source_company='AMAZON' related_companies=['MICROSOFT CORP', 'PG&E CORP'] shared_risks=['supply chain disruptions', 'foreign exchange rates', 'interest rates']>" metadata=N

**How this works:**

- **Semantic Search:**  
  The vector retriever finds the top-k text chunks most relevant to your query ("What risks connect major tech companies?").

- **Graph Traversal:**  
  For each retrieved chunk (`node`):
  - Follows the `:FROM_DOCUMENT` and `:FILED` relationships to a company (`c1`).
  - Finds all risk factors (`risk`) that `c1` faces.
  - Finds other companies (`c2`) that also face the same risk factor.
  - Ensures that `c1` and `c2` are different companies.

- **Returns:**  
  - `source_company`: The company from the retrieved chunk.
  - `related_companies`: Companies sharing at least one risk with the source company.
  - `shared_risks`: The names of the risk factors connecting these companies.

- **Why this is powerful:**  
  - Leverages the chunk as the semantic anchor, but then uses graph logic to discover structured, multi-entity relationships.
  - Surfaces both the context (from the chunk) and the broader network of shared risksâ€”something that pure semantic or pure graph search alone would struggle to do as effectively.

This approach is ideal for exploratory questions about relationships in your graph, where you want to start from relevant context but end up with structured, comparative insights.

---

Experiment with these examples, modify the `query` to review how the context returned by the retriever changes the response.

[View the complete code](../new-workshops/solutions/02_02_vector_cypher_retriever.py)

[Move on to the Text2Cypher Retriever Notebook](02_03_text2cypher_retriever.ipynb)

In [7]:
# Cleanup
driver.close()