# Text2Cypher Retriever

Vector retrievers are great for finding relevant data based on semantic similarity or keyword matching.

To answer more specific questions, you may need to perform more complex queries to find data relating to specific nodes, relationships, or properties.

For example, you want to find:
- What asset manager owns a specific organization.
- How many stock types there are.
- What organizations are exposed to a certain risk factor. 

Text to Cypher retrievers allow you to convert natural language queries into Cypher queries that can be executed against the graph.

---

You will use the `Text2CypherRetriever` class to create a new retriever and use it in a `GraphRAG` pipeline.

Import the required Python modules, load the environment variables, create the connection to the graph, the LLM, and the embedding model.

In [None]:
import sys
sys.path.insert(0, '../new-workshops/solutions')

from neo4j import GraphDatabase
from neo4j_graphrag.retrievers import Text2CypherRetriever
from neo4j_graphrag.generation import GraphRAG
from neo4j_graphrag.schema import get_schema

from config import Neo4jConfig, get_llm, get_embedder

neo4j_config = Neo4jConfig()
driver = GraphDatabase.driver(neo4j_config.uri, auth=(neo4j_config.username, neo4j_config.password))

# --- Initialize LLM and Embedder from Microsoft Foundry ---
llm = get_llm()
embedder = get_embedder()

The `Text2CypherRetriever` automatically generates Cypher queries from natural language questions.

**How it works:**
- The retriever uses a Large Language Model (LLM) to translate your plain-English query into a Cypher query, based on your Neo4j schema.
- The schema is provided as a string describing the main node types and relationships in your graph (e.g., companies, risk factors, asset managers).

You can view the schema using the `get_schema` method.

In [2]:
schema = get_schema(driver)
print(schema)

Node properties:
AssetManager {managerName: STRING, :ID: STRING}
Company {name: STRING, ticker: STRING, :ID: STRING}
Document {path: STRING, :ID: STRING}
Chunk {embedding: LIST, text: STRING, :ID: STRING}
Executive {name: STRING, :ID: STRING}
FinancialMetric {name: STRING, :ID: STRING}
Product {name: STRING, :ID: STRING}
RiskFactor {name: STRING, :ID: STRING}
StockType {name: STRING, :ID: STRING}
TimePeriod {name: STRING, :ID: STRING}
Transaction {name: STRING, :ID: STRING}
Relationship properties:
OWNS {position_status: STRING}
The relationships:
(:AssetManager)-[:OWNS]->(:Company)
(:Company)-[:FILED]->(:Document)
(:Company)-[:FROM_CHUNK]->(:Chunk)
(:Company)-[:MENTIONS]->(:Executive)
(:Company)-[:MENTIONS]->(:Product)
(:Company)-[:MENTIONS]->(:TimePeriod)
(:Company)-[:MENTIONS]->(:Company)
(:Company)-[:MENTIONS]->(:RiskFactor)
(:Company)-[:MENTIONS]->(:Transaction)
(:Company)-[:HAS_METRIC]->(:FinancialMetric)
(:Company)-[:HAS_METRIC]->(:StockType)
(:Company)-[:HAS_METRIC]->(:Product)

Create the `Text2CypherRetriever` using the Neo4j `driver`, `llm`, and the `schema`.

In [3]:
# --- Text2CypherRetriever Example ---
text2cypher_retriever = Text2CypherRetriever(
    driver=driver,
    llm=llm,
    neo4j_schema=schema
)

You run the retriever by passing a natural language query, for example "What companies are owned by BlackRock Inc.".

The retriever then:

1. Generates a corresponding Cypher query using the `schema` and the `llm`.
2. Executes the Cypher query using the `driver`.
3. Returns the generated Cypher and the results.

In [4]:
query = "What companies are owned by BlackRock Inc."
cypher_query = text2cypher_retriever.get_search_results(query)

print("Original Query:", query)
print("Generated Cypher:", cypher_query.metadata["cypher"])

print("Cypher Query Results:")
for record in cypher_query.records:
    print(record)

Original Query: What companies are owned by BlackRock Inc.
Generated Cypher: MATCH (am:AssetManager {managerName: "BlackRock Inc."})-[:OWNS]->(c:Company)
RETURN c.name, c.ticker
Cypher Query Results:
<Record c.name='APPLE INC' c.ticker='AAPL'>
<Record c.name='MICROSOFT CORP' c.ticker='MSFT'>
<Record c.name='AMAZON' c.ticker='AMZN'>
<Record c.name='INTEL CORP' c.ticker='INTC'>
<Record c.name='PG&E CORP' c.ticker='PCG'>
<Record c.name='NVIDIA CORPORATION' c.ticker='NVDA'>
<Record c.name='MCDONALDS CORP' c.ticker='MCD'>
<Record c.name='PAYPAL HLDGS INC' c.ticker='PYPL'>


> **Tip:**
> You can configure the retriever to only look at part of the graph by removing node labels and relationship types from the `schema`.

You can use the `Text2CypherRetriever` retriever as part of a `GraphRAG` pipeline. The `GraphRAG` pipeline will generate responses based on the original `query` and the results return by the generated Cypher query. 

In [5]:
# --- Initialize RAG and Perform Search ---
query = "Who are the assets managers?"
rag = GraphRAG(llm=llm, retriever=text2cypher_retriever)
response = rag.search(
    query,
    return_context=True
    )
print(response.answer)

The asset managers are:

- ALLIANCEBERNSTEIN L.P.
- AMERIPRISE FINANCIAL INC
- AMUNDI
- BANK OF AMERICA CORP /DE/
- Bank of New York Mellon Corp
- Berkshire Hathaway Inc
- BlackRock Inc.
- Capital World Investors
- FMR LLC
- GEODE CAPITAL MANAGEMENT, LLC
- MORGAN STANLEY
- NORTHERN TRUST CORP
- STATE STREET CORP
- WELLINGTON MANAGEMENT GROUP LLP
- WELLS FARGO & COMPANY/MN


In [6]:
# View the generated Cypher and results used in this query
print("Generate Cypher:", response.retriever_result.metadata["cypher"])
print("Context:", *response.retriever_result.items, sep="\n")

Generate Cypher: MATCH (a:AssetManager) RETURN a.managerName AS ManagerName
Context:
content="<Record ManagerName='ALLIANCEBERNSTEIN L.P.'>" metadata=None
content="<Record ManagerName='AMERIPRISE FINANCIAL INC'>" metadata=None
content="<Record ManagerName='AMUNDI'>" metadata=None
content="<Record ManagerName='BANK OF AMERICA CORP /DE/'>" metadata=None
content="<Record ManagerName='Bank of New York Mellon Corp'>" metadata=None
content="<Record ManagerName='Berkshire Hathaway Inc'>" metadata=None
content="<Record ManagerName='BlackRock Inc.'>" metadata=None
content="<Record ManagerName='Capital World Investors'>" metadata=None
content="<Record ManagerName='FMR LLC'>" metadata=None
content="<Record ManagerName='GEODE CAPITAL MANAGEMENT, LLC'>" metadata=None
content="<Record ManagerName='MORGAN STANLEY'>" metadata=None
content="<Record ManagerName='NORTHERN TRUST CORP'>" metadata=None
content="<Record ManagerName='STATE STREET CORP'>" metadata=None
content="<Record ManagerName='WELLINGTON 

Text2Cypher retrievers allow you to answer more specific questions and gain fact based context from the graph, they also:

- Remove the need to manually write Cypher for each question.
- Make graph querying accessible to non-technical users.
- Support rapid prototyping, exploration, and building natural language interfaces to your knowledge graph.

> **Tip:**
> You can tailor the generated Cypher by providing `examples` of user and Cypher queries when creating the `Text2CypherRetriever`.

---

Experiment with the Text2Cypher retriever and GraphRAG pipeline, review the answers and the generated Cypher. 

[View the complete code](../new-workshops/solutions/02_03_text2cypher_retriever.py)

In [7]:
# Cleanup
driver.close()