# Text to Cypher

## Installation

This notebook requires the following dependencies:

In [None]:
%pip install neo4j-graphrag langchain-core langchain-openai langchain-neo4j langgraph python-dotenv

## Connecting to Neo4j

The following cell creates an instance of the Neo4j Python Driver that the retrievers require to connect to the database.  The driver is created with environment variables set in your `.env` file.


In [1]:
%load_ext dotenv
%dotenv

from os import getenv

NEO4J_URL = getenv("NEO4J_URI") or "neo4j://localhost:7687"
NEO4J_USERNAME = getenv("NEO4J_USERNAME") or "neo4j"
NEO4J_PASSWORD = getenv("NEO4J_PASSWORD") or "neoneoneo"
NEO4J_DATABASE = getenv("NEO4J_DATABASE") or "neo4j"

from neo4j import GraphDatabase

driver = GraphDatabase.driver(
    NEO4J_URL,
    auth=(NEO4J_USERNAME, NEO4J_PASSWORD)
)

driver.verify_connectivity() # Throws an error if the connection is not successful


## Text to Cypher

A vector index already exists called `chunkEmbeddings`.  You can [create your own using the `create_vector_index` function](https://github.com/neo4j/neo4j-graphrag-python?tab=readme-ov-file#creating-a-vector-index) or [populate an existing index using the `upsert_vectors` function](https://github.com/neo4j/neo4j-graphrag-python?tab=readme-ov-file#populating-a-vector-index).

### Obtaining the database schema

The Neo4j GraphRAG library provides two functions for retrieving the schema of an existing graph database.

In [4]:
from neo4j_graphrag.schema import get_schema

schema = get_schema(driver, database=NEO4J_DATABASE)

print(schema)



Node properties:
Chunk {:ID: STRING, embedding: LIST, text: STRING, __tmp_internal_id: STRING, index: INTEGER}
AssetManager {:ID: STRING, managerName: STRING}
Company {:ID: STRING, name: STRING, ticker: STRING, __tmp_internal_id: STRING}
Document {:ID: STRING, path: STRING}
Executive {:ID: STRING, name: STRING}
FinancialMetric {:ID: STRING, name: STRING}
Product {:ID: STRING, name: STRING}
RiskFactor {:ID: STRING, name: STRING}
StockType {:ID: STRING, name: STRING}
TimePeriod {:ID: STRING, name: STRING}
Transaction {:ID: STRING, name: STRING}
Location {name: STRING, __tmp_internal_id: STRING}
Country {name: STRING, __tmp_internal_id: STRING}
FundingRound {__tmp_internal_id: STRING, amount: FLOAT, id: STRING, series: STRING}
Relationship properties:
OWNS {position_status: STRING}
FUNDED_BY {amount: FLOAT, date: STRING}
The relationships:
(:Chunk)-[:NEXT_CHUNK]->(:Chunk)
(:Chunk)-[:FROM_DOCUMENT]->(:Document)
(:AssetManager)-[:OWNS]->(:Company)
(:Company)-[:FACES_RISK]->(:RiskFactor)
(:C

### Creating a retriever

The `Text2CypherRetriever` retriever will use this schema to generate a Cypher statement

In [14]:
from neo4j_graphrag.llm import OpenAILLM
from neo4j_graphrag.retrievers import Text2CypherRetriever

llm = OpenAILLM(model_name="gpt-4o", model_params={"temperature": 0})

text2cypher_retriever = Text2CypherRetriever(
    llm=llm,
    driver=driver,
    neo4j_database=NEO4J_DATABASE,
    neo4j_schema=schema
)

result = text2cypher_retriever.search("What are the top risk factors that APPLE INC faces?")

In [15]:
for item in result.items:
    print(item.content)


<Record riskFactor='Geography'>
<Record riskFactor='Aggressive price competition'>
<Record riskFactor='Frequent introduction of new products'>
<Record riskFactor='Short product life cycles'>
<Record riskFactor='Evolving industry standards'>
<Record riskFactor='Commodity pricing fluctuations'>
<Record riskFactor='Industry-wide shortage and significant commodity pricing fluctuations'>
<Record riskFactor='Initial capacity constraints when new technologies are used'>
<Record riskFactor='Availability of components at acceptable prices'>
<Record riskFactor='Ability to extend or renew component supply agreements'>
<Record riskFactor='Rapid technological advances in industry'>
<Record riskFactor='Need to seek or renew licenses for third-party intellectual property'>
<Record riskFactor='Potential workplace risks'>
<Record riskFactor='General safety, security, and crisis management hazards'>
<Record riskFactor='Risks in potentially high-hazard environments'>
<Record riskFactor='Reputation, finan

The `GraphRAG` class creates a retrieval pipeline that accepts a user input, uses a retriever to fetch the context, and uses an LLM to generate an answer.

In [40]:
from neo4j_graphrag.generation import GraphRAG

# Instantiate the RAG pipeline
rag = GraphRAG(
    retriever=text2cypher_retriever,
    llm=llm
)

# Query the graph
query = "What are the top risk factors that APPLE INC faces?"

text2cypher_response = rag.search(query_text=query, return_context=True)

print(text2cypher_response.answer)


Apple Inc. faces several top risk factors, including:

1. **Geography**: Risks related to different geographic markets.
2. **Aggressive Price Competition**: Intense competition affecting pricing strategies.
3. **Frequent Introduction of New Products**: The need to constantly innovate and release new products.
4. **Short Product Life Cycles**: Rapid obsolescence of products.
5. **Evolving Industry Standards**: Keeping up with changing industry norms.
6. **Commodity Pricing Fluctuations**: Variability in the cost of raw materials.
7. **Industry-wide Shortage and Significant Commodity Pricing Fluctuations**: Supply chain disruptions and cost issues.
8. **Initial Capacity Constraints with New Technologies**: Challenges in scaling new technologies.
9. **Availability of Components at Acceptable Prices**: Ensuring component supply at reasonable costs.
10. **Rapid Technological Advances in Industry**: Keeping pace with technological changes.
11. **Macroeconomic and Industry Risks**: Economic d

In [39]:
print(text2cypher_response.retriever_result.metadata['cypher'])


for item in text2cypher_response.retriever_result.items:
    print(item.content)


cypher
MATCH (c:Company {name: "Apple"})-[:FACES_RISK]->(r:RiskFactor)
RETURN r.name AS riskFactor
ORDER BY r.name



## Demo datasets

Try running text-to-Cypher on a [demo dataset hosted on demo.neo4jlabs.com](https://demo.neo4jlabs.com/).  Use `neo4j+s://demo.neo4jlabs.com` as the connection URL and the dataset name as the username, password and database:

* `recommendations`
* `movies`
* `northwind`
* `fincen`
* `twitter`
* `stackoverflow`
* `gameofthrones`
* `gameofthrones`
* `neoflix`
* `wordnet`
* `slack`



In [20]:
DEMO = 'northwind'

got_driver = GraphDatabase.driver(
    "neo4j+s://demo.neo4jlabs.com",
    auth=(DEMO, DEMO)
)

got_driver.verify_connectivity()


In [23]:
got_text2cypher_retriever = Text2CypherRetriever(
    llm=llm,
    driver=got_driver,
    neo4j_database=DEMO,
    neo4j_schema=get_schema(got_driver, database=DEMO)
)

got_result = got_text2cypher_retriever.search("What are the top grossing products??")

In [28]:
print(got_result.metadata['cypher'])

got_result.items

MATCH (:Order)-[o:ORDERS]->(p:Product)
RETURN p.productName, SUM(toFloat(o.unitPrice) * o.quantity * (1 - toFloat(o.discount))) AS totalRevenue
ORDER BY totalRevenue DESC
LIMIT 10


[RetrieverResultItem(content="<Record p.productName='Côte de Blaye' totalRevenue=141396.735>", metadata=None),
 RetrieverResultItem(content="<Record p.productName='Thüringer Rostbratwurst' totalRevenue=80368.672>", metadata=None),
 RetrieverResultItem(content="<Record p.productName='Raclette Courdavault' totalRevenue=71155.70000000001>", metadata=None),
 RetrieverResultItem(content="<Record p.productName='Tarte au sucre' totalRevenue=47234.97>", metadata=None),
 RetrieverResultItem(content="<Record p.productName='Camembert Pierrot' totalRevenue=46825.479999999996>", metadata=None),
 RetrieverResultItem(content="<Record p.productName='Gnocchi di nonna Alice' totalRevenue=42593.06>", metadata=None),
 RetrieverResultItem(content="<Record p.productName='Manjimup Dried Apples' totalRevenue=41819.65>", metadata=None),
 RetrieverResultItem(content="<Record p.productName='Alice Mutton' totalRevenue=32698.379999999997>", metadata=None),
 RetrieverResultItem(content="<Record p.productName='Carna