# Build a Question Answering application over a Graph Database
Adapted from a tutorial on [langchain.com](https://python.langchain.com/v0.2/docs/tutorials/graph/)

In this guide we'll go over the basic ways to create a Q&A chain over a graph database. These systems will allow us to ask a question about the data in a graph database and get back a natural language answer.

## ⚠️ Security note ⚠️
Building Q&A systems of graph databases requires executing model-generated graph queries. There are inherent risks in doing this. Make sure that your database connection permissions are always scoped as narrowly as possible for your chain/agent's needs. This will mitigate though not eliminate the risks of building a model-driven system. For more on general security best practices, see [here](https://python.langchain.com/v0.2/docs/security/).

## Architecture
At a high-level, the steps of most graph chains are:

1. Convert question to a graph database query: Model converts user input to a graph database query (e.g. Cypher).
2. Execute graph database query: Execute the graph database query.
3. Answer the question: Model responds to user input using the query results.

## Setup
First, get required packages and set environment variables. In this example, we will be using Neo4j graph database.

In [1]:
%pip install --upgrade --quiet  langchain langchain-community langchain-openai neo4j

[0mNote: you may need to restart the kernel to use updated packages.


A non exhaustive list of Python packages used is re-produced here

```bash
langchain==0.2.6
langchain-chroma==0.1.2
langchain-community==0.2.6
langchain-core==0.2.10
langchain-openai==0.1.13
langchain-text-splitters==0.2.0
langchainhub==0.1.20
langgraph==0.0.60
langserve==0.2.1
langsmith==0.1.81
neo4j==5.22.0
```

### Setting credentials with python-dot-env
Load credentials from a `.env` file and the [python-dotenv package](https://pypi.org/project/python-dotenv/)

In [3]:
import os
from dotenv import load_dotenv

load_dotenv()
assert os.environ["OPENAI_API_KEY"]

### Setup Neo4j
We will use Docker to run Neo4j in a container
```bash
docker run \
    --name neo4jdb \
    --detach \
    --rm \
    --env NEO4J_apoc_export_file_enabled=true \
    --env NEO4J_apoc_import_file_enabled=true \
    --env NEO4J_apoc_import_file_use__neo4j__config=true \
    --env NEO4J_PLUGINS='["apoc"]' \
    --env NEO4J_dbms_security_procedures_unrestricted=apoc.* \
    --env NEO4J_dbms_security_procedures_allowlist=apoc.* \
    --publish=7474:7474 --publish=7687:7687 \
    --env NEO4J_AUTH=neo4j/password \
    neo4j:5.21.0
```

The username will be `neo4j` and the password will be `password`

### Test the connection
Set up the credentials as `ENV`

In [18]:
os.environ["NEO4J_URI"] = "bolt://localhost:7687"
os.environ["NEO4J_USERNAME"] = "neo4j"
os.environ["NEO4J_PASSWORD"] = "password"

The example below creates a connection with a Neo4j database and will populate it with example data about movies and their actors.

In [22]:
from langchain_community.graphs import Neo4jGraph

graph = Neo4jGraph()

# Import movie information

movies_query = """
LOAD CSV WITH HEADERS FROM 
'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies.csv'
AS row
MERGE (m:Movie {id:row.movieId})
SET m.released = date(row.released),
    m.title = row.title,
    m.imdbRating = toFloat(row.imdbRating)
FOREACH (director in split(row.director, '|') | 
    MERGE (p:Person {name:trim(director)})
    MERGE (p)-[:DIRECTED]->(m))
FOREACH (actor in split(row.actors, '|') | 
    MERGE (p:Person {name:trim(actor)})
    MERGE (p)-[:ACTED_IN]->(m))
FOREACH (genre in split(row.genres, '|') | 
    MERGE (g:Genre {name:trim(genre)})
    MERGE (m)-[:IN_GENRE]->(g))
"""

graph.query(movies_query)

[]

__API Reference__: [Neo4jGraph](https://api.python.langchain.com/en/latest/graphs/langchain_community.graphs.neo4j_graph.Neo4jGraph.html)

## Graph schema
In order for an LLM to be able to generate a Cypher statement, it needs information about the graph schema. When you instantiate a graph object, it retrieves the information about the graph schema. If you later make any changes to the graph, you can run the `refresh_schema` method to refresh the schema information.

In [24]:
graph.refresh_schema()
print(graph.schema)

Node properties:
Movie {imdbRating: FLOAT, id: STRING, released: DATE, title: STRING}
Person {name: STRING}
Genre {name: STRING}
Relationship properties:

The relationships:
(:Movie)-[:IN_GENRE]->(:Genre)
(:Person)-[:DIRECTED]->(:Movie)
(:Person)-[:ACTED_IN]->(:Movie)


Great! We've got a graph database that we can query. Now let's try hooking it up to an LLM.

## Chain
Let's use a simple chain that takes a question, turns it into a Cypher query, executes the query, and uses the result to answer the original question.

1. The User asks a question
2. The LangChain Cypher module translates the question into a Cyyper statement
3. The generated Cypher is used to query the Neo4j database
4. The results from the database are converted to natural language
5. The answer is returned to the user

LangChain comes with a built-in chain for this workflow that is designed to work with Neo4j: [GraphCypherQAChain](https://python.langchain.com/v0.2/docs/integrations/graphs/neo4j_cypher/)



In [12]:
from langchain.chains import GraphCypherQAChain
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo",
                 temperature=0)
chain = GraphCypherQAChain.from_llm(graph=graph,
                                    llm=llm,
                                    verbose=True)
response = chain.invoke({"query": "What was the cast of the Casino?"})
response



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (:Movie {title: 'Casino'})<-[:ACTED_IN]-(actors:Person)
RETURN actors.name[0m
Full Context:
[32;1m[1;3m[{'actors.name': 'James Woods'}, {'actors.name': 'Joe Pesci'}, {'actors.name': 'Robert De Niro'}, {'actors.name': 'Sharon Stone'}][0m

[1m> Finished chain.[0m


{'query': 'What was the cast of the Casino?',
 'result': 'The cast of Casino included James Woods, Joe Pesci, Robert De Niro, and Sharon Stone.'}

__API Reference__: [GraphCypherQAChain](https://api.python.langchain.com/en/latest/chains/langchain_community.chains.graph_qa.cypher.GraphCypherQAChain.html) | [ChatOpenAI](https://api.python.langchain.com/en/latest/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html)

## Validating relationship direction
LLMs can struggle with relationship directions in generated Cypher statement. Since the graph schema is predefined, we can validate and optionally correct relationship directions in the generated Cypher statements by using the `validate_cypher` parameter.

In [14]:
chain = GraphCypherQAChain.from_llm(
    graph=graph,
    llm=llm,
    verbose=True,
    validate_cypher=True # validate and optionally correct relation directions in the generated Cypher statements
)
response = chain.invoke({"query": "What was the cast of the Casino?"})
response



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (:Movie {title: "Casino"})<-[:ACTED_IN]-(actor:Person)
RETURN actor.name[0m
Full Context:
[32;1m[1;3m[{'actor.name': 'James Woods'}, {'actor.name': 'Joe Pesci'}, {'actor.name': 'Robert De Niro'}, {'actor.name': 'Sharon Stone'}][0m

[1m> Finished chain.[0m


{'query': 'What was the cast of the Casino?',
 'result': 'The cast of Casino included James Woods, Joe Pesci, Robert De Niro, and Sharon Stone.'}

In [25]:
chain = GraphCypherQAChain.from_llm(
    graph=graph,
    llm=llm,
    verbose=True,
    validate_cypher=True # validate and optionally correct relation directions in the generated Cypher statements
)
response = chain.invoke({"query": "What was the cast of the Tombstone?"})
response



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie {title: "Tombstone"})<-[:ACTED_IN]-(p:Person)
RETURN p.name[0m
Full Context:
[32;1m[1;3m[{'p.name': 'Val Kilmer'}, {'p.name': 'Kurt Russell'}, {'p.name': 'Sam Elliott'}, {'p.name': 'Bill Paxton'}][0m

[1m> Finished chain.[0m


{'query': 'What was the cast of the Tombstone?',
 'result': 'The cast of Tombstone included Val Kilmer, Kurt Russell, Sam Elliott, and Bill Paxton.'}

In [30]:
chain = GraphCypherQAChain.from_llm(
    graph=graph,
    llm=llm,
    verbose=True,
    validate_cypher=True # validate and optionally correct relation directions in the generated Cypher statements
)
response = chain.invoke({"query": "What was the cast of the Godfather?"})
response



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (:Movie {title: 'The Godfather'})<-[:ACTED_IN]-(actors:Person)
RETURN actors.name[0m
Full Context:
[32;1m[1;3m[][0m

[1m> Finished chain.[0m


{'query': 'What was the cast of the Godfather?',
 'result': "I don't know the answer."}

### Stop Neo4j container
Tear down the Neo4j container
```bash
docker container stop neo4jdb
```

Note you used the `--rm` flag earlier, Docker will remove the container upon stopping it