# GraphRAG and Neo4j

This tutorial looks at GraphRAG, i.e., how to provide information from a Knowledge Graph (KG) to an LLM, for making the LLM more informed about it's answers. Like RAG, GraphRAG aims to get information from a credible source of information. Unlike RAG, the information here is not provided in form of embeddings, but is rather captured from relationships between nodes within a graph.

Therefore, GraphRAG is not applicable anywhere, since we need to make sure that there is a credible and meaningful source of information provided in the form of a graph. These graphs live in specialized databases that are indexed properly for quick retrieval of relevant information. Information from graphs are retrieved using special queries that allow fast communications with those databases and provide specific answers that can be extracted "algebraically" from the graph.

It is therefore necessary to have such a database and a language in place, so that we can allow the LLM to ask proper questions to the graph database and get helpful answers. *Neo4j* is such a library, that is accompanied by the *Cypher* language, which allows retrieval from the database. Working with Neo4j requires starting out a server (local server is OK) that hosts the database of graphs, connecting to the server from python and sending queries to the server for retrieving information back in python. For more information about Neo4j you can have a look here: [supplementary/Neo4j.md](./supplementary/Neo4j.md).

In [1]:
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_core.documents import Document
from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate

In [2]:
text = '''
The solar system consists of the Sun and the objects that orbit it, including planets, moons, asteroids, comets, and meteoroids.
The Sun is a star at the center of the Solar System.
Mercury is a planet in the Solar System. Mercury orbits the Sun. Mercury has no atmosphere and no magnetic field.
Venus is a planet in the Solar System. Venus orbits the Sun. Venus has a thick atmosphere. The atmosphere of Venus is composed mainly of carbon dioxide. Venus has no magnetic field.
Earth is a planet in the Solar System. Earth orbits the Sun. Earth has one moon called the Moon. Earth has a thick atmosphere composed mainly of nitrogen and oxygen. Earth has a strong magnetic field.
Mars is a planet in the Solar System. Mars orbits the Sun. Mars has two moons called Phobos and Deimos. Mars has a thin atmosphere composed mainly of carbon dioxide. Mars has a weak magnetic field.
Jupiter is a planet in the Solar System. Jupiter orbits the Sun. Jupiter has moons called Io, Europa, Ganymede, and Callisto. Jupiter has a thick atmosphere composed mainly of hydrogen and helium. Jupiter has a strong magnetic field.
'''
print(text)


The solar system consists of the Sun and the objects that orbit it, including planets, moons, asteroids, comets, and meteoroids.
The Sun is a star at the center of the Solar System.
Mercury is a planet in the Solar System. Mercury orbits the Sun. Mercury has no atmosphere and no magnetic field.
Venus is a planet in the Solar System. Venus orbits the Sun. Venus has a thick atmosphere. The atmosphere of Venus is composed mainly of carbon dioxide. Venus has no magnetic field.
Earth is a planet in the Solar System. Earth orbits the Sun. Earth has one moon called the Moon. Earth has a thick atmosphere composed mainly of nitrogen and oxygen. Earth has a strong magnetic field.
Mars is a planet in the Solar System. Mars orbits the Sun. Mars has two moons called Phobos and Deimos. Mars has a thin atmosphere composed mainly of carbon dioxide. Mars has a weak magnetic field.
Jupiter is a planet in the Solar System. Jupiter orbits the Sun. Jupiter has moons called Io, Europa, Ganymede, and Callis

In [3]:
documents = [Document(page_content=text)]

In [4]:
# Initialize the ChatOllama model with the specified model name
# model_name = 'qwen3-vl:4b'
# model_name = 'llama3.2:3b'  # Or another text-focused model
model_name = 'tomasonjo/llama3-text2cypher-demo:8b_4bit'
# and initialize the ChatOllama instance
chat_model = ChatOllama(
    model=model_name,
    validate_model_on_init=True,
    temperature=0
)

### Query

We have started with initializing a model as in previous tutorials. Let's also introduce a query that the LLM can answer on its own, or consult the graph for more information.

In [5]:
query_text = "How many moons does Mars have?"

### Connecting to a Neo4j graph server

Let's first connect to the Neo4j database. Then we can create a graph, pass it to the database and ask queries to it. More information about good practices for not accidentally sharing user names and passwords in repositories can be found here: [supplementary/Secrets.md](./supplementary/Secrets.md).

In [6]:
import os
from dotenv import load_dotenv
from langchain_neo4j import Neo4jGraph

# Load environment variables from .env file
load_dotenv()

# Get credentials from environment variables
neo4j_url = os.getenv("NEO4J_URL", "bolt://localhost:7687")
neo4j_user = os.getenv("NEO4J_USER", "neo4j")
neo4j_password = os.getenv("NEO4J_PASSWORD")

if not neo4j_password:
    raise ValueError("NEO4J_PASSWORD environment variable is not set. Please create a .env file with your credentials.")

graph = Neo4jGraph(
    url=neo4j_url,
    username=neo4j_user,
    password=neo4j_password
)

### Creating the KG

As in the previous tutorial, we construct the KG using an LLM and specifically with the schema prompt.

In [7]:
# Create a ChatPromptTemplate for graph extraction
graph_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are an expert Neo4j Cypher query generator.

TASK:
- Translate the user's natural language question into a Cypher query.

CONSTRAINTS:
- Use ONLY the schema provided below.
- Do NOT invent labels, relationship types, or properties.
- Do NOT explain the query.
- Output ONLY valid Cypher.
- If the question cannot be answered unambiguously using the schema, output:
  // CANNOT_ANSWER

GRAPH SCHEMA:
Node labels:
- Star {{id}}
- Planet {{id}}
- Moon {{id}}
- Atmosphere {{id}}
- Substance {{id}}
- MagneticFieldStrength {{id}}

Relationships:
- (Planet)-[:ORBITS]->(Star)
- (Moon)-[:ORBITS]->(Planet)
- (Planet)-[:HAS_ATMOSPHERE]->(Atmosphere)
- (Atmosphere)-[:COMPOSED_OF]->(Substance)
- (Planet)-[:HAS_MAGNETIC_FIELD]->(MagneticFieldStrength)
     
ALLOWED VALUES:
- MagneticField.id \\in {{"none", "weak", "strong"}}

QUERY RULES:
1. Always specify node labels.
2. Always specify relationship directions.
3. MagneticField nodes MUST be matched or merged by description
4. Use meaningful variable names.
5. Return only properties, not full nodes.
6. Use DISTINCT unless duplicates are required.
7. Use OPTIONAL MATCH if information may be missing.
8. Do not use APOC or procedures.

FAILURE CONDITIONS:
- If required entities, labels, or relationships are missing from the schema,
  output:
  // CANNOT_ANSWER

EXAMPLES:
Question:
Which planet orbits the Sun?

Cypher:
MATCH (planet:Planet)-[:ORBITS]->(star:Star {{id: "Sun"}})
RETURN DISTINCT planet.id

Question:
Which moon orbits planet Mars?

Cypher:
MATCH (moon:Moon)-[:ORBITS]->(planet:Planet {{id: "Mars"}})
RETURN DISTINCT moon.id

Question:
What substances compose the atmosphere of Mars?

Cypher:
MATCH (planet:Planet {{id: "Mars"}})
      -[:HAS_ATMOSPHERE]->(atm:Atmosphere)
      -[:COMPOSED_OF]->(substance:Substance)
RETURN DISTINCT substance.id

Question:
Does Jupiter have a magnetic field?

Cypher:
MATCH (planet:Planet {{id: "Jupiter"}})
      -[:HAS_MAGNETIC_FIELD]->(prop:MagneticFieldStrength)
RETURN DISTINCT prop.id
"""),
    ("human", "{input}")
])


In [8]:
prompt_schema = LLMGraphTransformer(
    llm=chat_model,
    prompt=graph_prompt,
)

In [9]:
graph_prompt_schema = prompt_schema.convert_to_graph_documents(documents)
print(graph_prompt_schema)

[GraphDocument(nodes=[Node(id='Sun', type='Star', properties={}), Node(id='Mercury', type='Planet', properties={}), Node(id='Venus', type='Planet', properties={}), Node(id='Earth', type='Planet', properties={}), Node(id='Moon', type='Moon', properties={}), Node(id='Mars', type='Planet', properties={}), Node(id='Phobos', type='Moon', properties={}), Node(id='Deimos', type='Moon', properties={}), Node(id='Jupiter', type='Planet', properties={}), Node(id='Io', type='Moon', properties={}), Node(id='Europa', type='Moon', properties={}), Node(id='Ganymede', type='Moon', properties={}), Node(id='Callisto', type='Moon', properties={})], relationships=[Relationship(source=Node(id='Mercury', type='Planet', properties={}), target=Node(id='Sun', type='Star', properties={}), type='ORBITS', properties={}), Relationship(source=Node(id='Venus', type='Planet', properties={}), target=Node(id='Sun', type='Star', properties={}), type='ORBITS', properties={}), Relationship(source=Node(id='Earth', type='Pla

### Pass graph to Neo4j

We can now pass this graph to the Neo4j database from python. If we open the database on a browser, we will be able to see the graph that is showin in the image below.

In [10]:
graph.add_graph_documents(graph_prompt_schema)

<img src="figs/graph_prompt.png" width=400px height=400px />

### Sending Cypher queries

We can send Cypher queries to the database directly from python like so:

```python
graph.query("MATCH (n) DETACH DELETE n;") # clear database from all nodes
graph.query("MATCH (n) RETURN n;") # show all nodes
```

In [None]:
# graph.query("MATCH (n) DETACH DELETE n;")
# graph.query("MATCH (n) RETURN n;")

### LLMs can make Cypher

Especially larger commercial models are trained with the ability to generate cypher queries from natural language text on-demand. Smaller models are not that capable, but there are some examples of models that are especially fine-tuned with this ability, like the `tomasonjo/llama3-text2cypher-demo:8b_4bit` we are using in this tutorial.

The idea is to have the user provide a question in natural language and have the LLM translate this question into a proper Cypher query that will be sent to the graph for getting more robust information. LangChain provides the infrastructure to try to pursuade a model to output cypher code out of the user natural language question, through the `GraphCypherQAChain`. Later in this tutorial we dive into a bit more detail about what `GraphCypherQAChain` does (there is also link to an additional document in this tutorial for more details).

For now, however, let's focus on another aspect: for the `GraphCypherQAChain` to work properly and translate accurately the user-provided question to a meaningful Cypher query, we need to make sure that the generated cypher query does indeed reflect the schema of the graph. That is, we need to provide the schema to the process triggered by `GraphCypherQAChain`, so that the query does not include nodes and relationships that are not initially in the graph. To this end we need to provide the process a `cypher_prompt`, which is almost the same as the `graph_prompt` that we used to construct the graph in the first place.

In [12]:
cypher_prompt = ChatPromptTemplate.from_messages([
    ("system", """
You are an expert Neo4j Cypher query generator.

TASK:
- Translate the user's natural language question into a Cypher query.

CONSTRAINTS:
- Use ONLY the schema provided below.
- Do NOT invent labels, relationship types, or properties.
- Do NOT explain the query.
- Output ONLY valid Cypher.
- If the question cannot be answered unambiguously using the schema, output:
  // CANNOT_ANSWER

GRAPH SCHEMA:
Node labels:
- Star {{id}}
- Planet {{id}}
- Moon {{id}}
- Atmosphere {{id}}
- Substance {{id}}
- MagneticFieldStrength {{id}}

Relationships:
- (Planet)-[:ORBITS]->(Star)
- (Moon)-[:ORBITS]->(Planet)
- (Planet)-[:HAS_ATMOSPHERE]->(Atmosphere)
- (Atmosphere)-[:COMPOSED_OF]->(Substance)
- (Planet)-[:HAS_MAGNETIC_FIELD]->(MagneticFieldStrength)
     
ALLOWED VALUES:
- MagneticField.id \\in {{"none", "weak", "strong"}}

QUERY RULES:
1. Always specify node labels.
2. Always specify relationship directions.
3. MagneticField nodes MUST be matched or merged by description
4. Use meaningful variable names.
5. Return only properties, not full nodes.
6. Use DISTINCT unless duplicates are required.
7. Use OPTIONAL MATCH if information may be missing.
8. Do not use APOC or procedures.

FAILURE CONDITIONS:
- If required entities, labels, or relationships are missing from the schema,
  output:
  // CANNOT_ANSWER

"""),
    ("human", "{question}")
])

### Make sure results are considered

Additonally, we need to make sure in the `GraphCypherQAChain` process that the retrieved results from the Cypher query are taken into into account by the LLM before answering back to the user question with free text. That is, we need to make sure that the "raw" information that the LLM obtained from the graph through the Cypher query, in indeed embedded properly in the context so that the LLM takes it under consideration when generating the final natural language response. To this end, we introduce the following `qa_prompt`.

In [13]:
qa_prompt = ChatPromptTemplate.from_messages([
    ("system", """
You are an expert assistant answering questions using results retrieved from a Neo4j graph.

RULES:
- The provided context comes from a trusted database query.
- If the context contains a numeric value that answers the question, use it directly.
- Do NOT say "I don't know" if the answer is present in the context.
- Answer concisely and directly in natural language.
- If the context is empty or missing required information, say:
  "I don't know the answer."
"""),
    ("human", """
Question:
{question}

Context:
{context}
""")
])

### What does `GraphCypherQAChain.from_llm` do (short version)

Now we are ready to initiate the process of `GraphCypherQAChain`. To trigger this process we use the `from_llm` function, which compiles the information from the `cypher_prompt` and the `qa_prompt` and uses it on the available Neo4j `graph` that we have already available into our database (in this example, constructed from an LLM).

This process involves a chain of events where the user question is translated into a cypher query, then both the question and the retrieved results from cypher are included into a QA prompt, based on which the LLM generates a final natural text answer. More information on how `GraphCypherQAChain.from_llm` works, is given in this file: [supplementary/GraphCypherQAChain.md](./supplementary/GraphCypherQAChain.md).

In [14]:
from langchain_neo4j import GraphCypherQAChain

# The process of occurs in two steps:
# 1) The LLM generates a Cypher query based on the user's question and the graph schema.
# 2) The returned Cypher query is turned to a text answer.
# cypher_prompt=cypher_prompt, concerns the first step
# qa_prompt=qa_prompt, concerns the second step

graphchain = GraphCypherQAChain.from_llm(
    chat_model,
    graph=graph,
    cypher_prompt=cypher_prompt,
    qa_prompt=qa_prompt,
    verbose=True,
    return_intermediate_steps=True,
    allow_dangerous_requests=True
)

### Examine the Cypher query

When defining the `GraphCypherQAChain` process, we have declared `verbose=True` which allows us to see the generated query.

In [15]:
results = graphchain.invoke({"query": query_text})
print(results)



[1m> Entering new GraphCypherQAChain chain...[0m


Generated Cypher:
[32;1m[1;3mMATCH (p:Planet {id: "Mars"})<-[:ORBITS]-(m:Moon)
RETURN count(m) AS moon_count[0m
Full Context:
[32;1m[1;3m[{'moon_count': 2}][0m

[1m> Finished chain.[0m
{'query': 'How many moons does Mars have?', 'result': 'Mars has 2 moons.', 'intermediate_steps': [{'query': 'MATCH (p:Planet {id: "Mars"})<-[:ORBITS]-(m:Moon)\nRETURN count(m) AS moon_count'}, {'context': [{'moon_count': 2}]}]}


### Results from GraphRAG

The LLM has taken into account the response obtained from the Cypher query, according to which Mars has two moons. The generated natural text answer is correct. In the next tutorial we can see how such a small model fails in simple counting and how RAG and GraphRAG comes to the resque.

In [16]:
print('QUERY \n', results['query'])
print('INTERMEDIATE STEPS: \n', results['intermediate_steps'])
print('RESULT: \n', results['result'])

QUERY 
 How many moons does Mars have?
INTERMEDIATE STEPS: 
 [{'query': 'MATCH (p:Planet {id: "Mars"})<-[:ORBITS]-(m:Moon)\nRETURN count(m) AS moon_count'}, {'context': [{'moon_count': 2}]}]
RESULT: 
 Mars has 2 moons.
