https://python.langchain.com/docs/use_cases/graph/quickstart/

**Start the Movie graph vectorDB in neo4j Desktop**

In [1]:
import os
from langchain.chat_models import AzureChatOpenAI
# from langchain.schema import SystemMessage, HumanMessage
from langchain_community.graphs import Neo4jGraph
# Warning control

**Load the LLM (e.g: GPT 3.5)**

In [2]:
from langchain.chat_models import ChatOpenAI

model_name = "gpt-3.5-turbo"

llm = ChatOpenAI(
    model=model_name,
    temperature=0.0)

  warn_deprecated(


**Add Neo4j credentials (These information need to be kept secret)**

In [3]:
NEO4J_URL = "neo4j://localhost:7687"
NEO4J_USERNAME = "neo4j"
NEO4J_PASSWORD = "fireinthehole"
NEO4J_DATABASE = 'neo4j'

In [4]:
graph = Neo4jGraph(url=NEO4J_URL, username=NEO4J_USERNAME, password=NEO4J_PASSWORD, database=NEO4J_DATABASE)

**Print the graph database schema**

In [5]:
graph.refresh_schema()
print(graph.schema)

Node properties:
Movie {imdbRating: FLOAT, taglineEmbedding: LIST, tagline: STRING, id: STRING, released: DATE, title: STRING}
Person {name: STRING}
Genre {name: STRING}
Location {name: STRING}
SimilarMovie {name: STRING}
Relationship properties:

The relationships:
(:Movie)-[:IS_SIMILAR_TO]->(:SimilarMovie)
(:Movie)-[:IN_GENRE]->(:Genre)
(:Movie)-[:WAS_TAKEN_IN]->(:Location)
(:Person)-[:DIRECTED]->(:Movie)
(:Person)-[:ACTED_IN]->(:Movie)


Questions

In [6]:
q_one = "What was the cast of the Casino?"
q_two = "What are the most common genres for movies released in 1995?"
q_three = "What are the similar movies to the ones that Tom Hanks acted in?"

### **Chain**

**`Simple Agent (a)`:**

In [7]:
from langchain.chains import GraphCypherQAChain
chain = GraphCypherQAChain.from_llm(graph=graph, llm=llm, verbose=True)

In [8]:
response = chain.invoke({"query": q_one})
print(response)
print("\nLLM response:", response["result"])



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie {title: "Casino"})<-[:ACTED_IN]-(p:Person)
RETURN p.name[0m
Full Context:
[32;1m[1;3m[{'p.name': 'James Woods'}, {'p.name': 'Robert De Niro'}, {'p.name': 'Sharon Stone'}, {'p.name': 'Joe Pesci'}][0m

[1m> Finished chain.[0m
{'query': 'What was the cast of the Casino?', 'result': 'The cast of Casino included James Woods, Robert De Niro, Sharon Stone, and Joe Pesci.'}

LLM response: The cast of Casino included James Woods, Robert De Niro, Sharon Stone, and Joe Pesci.


In [9]:
response = chain.invoke({"query": q_two})
print(response)
print("\nLLM response:", response["result"])



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie)
WHERE m.released.year = 1995
MATCH (m)-[:IN_GENRE]->(g:Genre)
RETURN g.name, COUNT(*) AS numMovies
ORDER BY numMovies DESC
LIMIT 5[0m
Full Context:
[32;1m[1;3m[{'g.name': 'Comedy', 'numMovies': 10}, {'g.name': 'Adventure', 'numMovies': 6}, {'g.name': 'Action', 'numMovies': 5}, {'g.name': 'Romance', 'numMovies': 5}, {'g.name': 'Children', 'numMovies': 4}][0m

[1m> Finished chain.[0m
{'query': 'What are the most common genres for movies released in 1995?', 'result': 'Comedy, Adventure, Action, Romance, and Children are the most common genres for movies released in 1995.'}

LLM response: Comedy, Adventure, Action, Romance, and Children are the most common genres for movies released in 1995.


In [10]:
response = chain.invoke({"query": q_three})
print(response)
print("\nLLM response:", response["result"])



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person {name: "Tom Hanks"})-[:ACTED_IN]->(:Movie)-[:IS_SIMILAR_TO]->(s:SimilarMovie)
RETURN s;[0m
Full Context:
[32;1m[1;3m[{'s': {'name': 'Finding Nemo'}}][0m

[1m> Finished chain.[0m
{'query': 'What are the similar movies to the ones that Tom Hanks acted in?', 'result': "I don't know the answer."}

LLM response: I don't know the answer.


**`Simple Agent (b):`**

**Validating relationship direction**

LLMs can struggle with relationship directions in generated Cypher statement. Since the graph schema is predefined, we can validate and optionally correct relationship directions in the generated Cypher statements by using the validate_cypher parameter.

In [11]:
chain = GraphCypherQAChain.from_llm(
    graph=graph, llm=llm, verbose=True, validate_cypher=True
)

In [12]:
response = chain.invoke({"query": q_one})
print(response)
print("\nLLM response:", response["result"])



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie {title: "Casino"})<-[:ACTED_IN]-(p:Person)
RETURN p.name[0m
Full Context:
[32;1m[1;3m[{'p.name': 'James Woods'}, {'p.name': 'Robert De Niro'}, {'p.name': 'Sharon Stone'}, {'p.name': 'Joe Pesci'}][0m

[1m> Finished chain.[0m
{'query': 'What was the cast of the Casino?', 'result': 'The cast of Casino included James Woods, Robert De Niro, Sharon Stone, and Joe Pesci.'}

LLM response: The cast of Casino included James Woods, Robert De Niro, Sharon Stone, and Joe Pesci.


In [13]:
response = chain.invoke({"query": q_two})
print(response)
print("\nLLM response:", response["result"])



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie)
WHERE m.released.year = 1995
MATCH (m)-[:IN_GENRE]->(g:Genre)
RETURN g.name, COUNT(*) AS numMovies
ORDER BY numMovies DESC
LIMIT 5;[0m
Full Context:
[32;1m[1;3m[{'g.name': 'Comedy', 'numMovies': 10}, {'g.name': 'Adventure', 'numMovies': 6}, {'g.name': 'Action', 'numMovies': 5}, {'g.name': 'Romance', 'numMovies': 5}, {'g.name': 'Children', 'numMovies': 4}][0m

[1m> Finished chain.[0m
{'query': 'What are the most common genres for movies released in 1995?', 'result': 'Comedy, Adventure, Action, Romance, and Children are the most common genres for movies released in 1995.'}

LLM response: Comedy, Adventure, Action, Romance, and Children are the most common genres for movies released in 1995.


In [14]:
response = chain.invoke({"query": q_three})
print(response)
print("\nLLM response:", response["result"])



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person {name: "Tom Hanks"})-[:ACTED_IN]->(:Movie)-[:IS_SIMILAR_TO]->(s:SimilarMovie)
RETURN s;[0m
Full Context:
[32;1m[1;3m[{'s': {'name': 'Finding Nemo'}}][0m

[1m> Finished chain.[0m
{'query': 'What are the similar movies to the ones that Tom Hanks acted in?', 'result': "I don't know the answer."}

LLM response: I don't know the answer.


----------------------------------------

**`Improved Agents`: Contains 4 steps**
1. Detecting entities in the user input
2. Match entities to database.
3. Define a custom Cypher prompt that takes the entity mapping information along with the schema and the user question to construct a Cypher statement.
4. Generating answers based on database results

### **Strategies to improve graph database query generation by mapping values from user inputs to database**

When using the built-in graph chains, the LLM is aware of the graph schema, but has no information about the values of properties stored in the database. Therefore, we can introduce a new step in graph database QA system to accurately map values.

**Detecting entities in the user input**

We have to extract the types of entities/values we want to map to a graph database. In this example, we are dealing with a movie graph, so we can map movies and people to the database.

In [15]:
from typing import List

from langchain.chains.openai_functions import create_structured_output_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field

class Entities(BaseModel):
    """Identifying information about entities."""

    names: List[str] = Field(
        ...,
        description="All the person or movies appearing in the text",
    )


prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are extracting person, movies, and years from the text.",
        ),
        (
            "human",
            "Use the given format to extract information from the following "
            "input: {question}",
        ),
    ]
)


entity_chain = create_structured_output_chain(Entities, llm, prompt)

  warn_deprecated(


In [16]:
entities_q_two = entity_chain.invoke({"question": q_two})
print(entities_q_two)
entities_q_three = entity_chain.invoke({"question": q_three})
print(entities_q_three)

{'question': 'What are the most common genres for movies released in 1995?', 'function': Entities(names=['1995'])}
{'question': 'What are the similar movies to the ones that Tom Hanks acted in?', 'function': Entities(names=['Tom Hanks'])}


**Utilizing a simple CONTAINS clause to match entities to database. In practice, you might want to use a fuzzy search or a fulltext index to allow for minor misspellings.**

In [17]:
match_query = """MATCH (p:Person|Movie)
WHERE p.name CONTAINS $value OR p.title CONTAINS $value
RETURN coalesce(p.name, p.title) AS result, labels(p)[0] AS type
LIMIT 1
"""

def map_to_database(values)->str:
    """
    Maps the values to entities in the database and returns the mapping information.

    Args:
        values (list): A list of values to map to entities in the database.

    Returns:
        str: A string containing the mapping information of each value to entities in the 
    """
    result = ""
    for entity in values.names:
        response = graph.query(match_query, {"value": entity})
        try:
            result += f"{entity} maps to {response[0]['result']} {response[0]['type']} in database\n" # Query the database to find the mapping for the entity
        except IndexError:
            pass
    return result

In [18]:
print("2:", map_to_database(entities_q_two["function"]))
print("3:", map_to_database(entities_q_three["function"]))

2: 
3: Tom Hanks maps to Tom Hanks Person in database



**Custom Cypher generating chain**

We need to define a custom Cypher prompt that takes the entity mapping information along with the schema and the user question to construct a Cypher statement. We will be using the LangChain expression language to accomplish that.

In [19]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Generate Cypher statement based on natural language input
cypher_template = """Based on the Neo4j graph schema below, write a Cypher query that would answer the user's question:
{schema}
Entities in the question map to the following database values:
{entities_list}
Question: {question}
Cypher query:"""  # noqa: E501

cypher_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Given an input question, convert it to a Cypher query. No pre-amble.",
        ),
        ("human", cypher_template),
    ]
)

cypher_response = (
    RunnablePassthrough.assign(names=entity_chain)
    | RunnablePassthrough.assign(
        entities_list=lambda x: map_to_database(x["names"]["function"]),
        schema=lambda _: graph.get_schema,
    )
    | cypher_prompt
    | llm.bind(stop=["\nCypherResult:"])
    | StrOutputParser()
)

In [20]:
cypher_q_three = cypher_response.invoke({"question": entities_q_three})
print(cypher_q_three)

MATCH (p:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie)
MATCH (m)-[:IS_SIMILAR_TO]->(sm:SimilarMovie)
RETURN sm.name


**Generating answers based on database results**

Now that we have a chain that generates the Cypher statement, we need to execute the Cypher statement against the database and send the database results back to an LLM to generate the final answer. Again, we will be using LCEL

In [29]:
from langchain.chains.graph_qa.cypher_utils import CypherQueryCorrector, Schema

# Cypher validation tool for relationship directions
corrector_schema = [
    Schema(el["start"], el["type"], el["end"])
    for el in graph.structured_schema.get("relationships")
]
cypher_validation = CypherQueryCorrector(corrector_schema)

# Generate natural language response based on database results
response_template = """Based on the the question, Cypher query, and Cypher response, write a natural language response:
Question: {question}
Cypher query: {query}
Cypher Response: {response}"""

response_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Given an input question and Cypher response, convert it to a natural"
            " language answer. No pre-amble.",
        ),
        ("human", response_template),
    ]
)

chain = (
    RunnablePassthrough.assign(query=cypher_response)
    | RunnablePassthrough.assign(
        response=lambda x: graph.query(cypher_validation(x["query"])),
    )
    | response_prompt
    | llm
    | StrOutputParser()
)

In [30]:
chain.invoke({"question": q_one})

'The cast of the movie "Casino" includes James Woods, Robert De Niro, Sharon Stone, and Joe Pesci.'

In [31]:
chain.invoke({"question": q_two})

'The most common genres for movies released in 1995 are Comedy with 10 movies, Adventure with 6 movies, Action and Romance with 5 movies each, and Children with 4 movies.'

In [32]:
chain.invoke({"question": q_three})

'Similar movies to the ones that Tom Hanks acted in include "Finding Nemo".'

In [33]:
chain.invoke({"question": "How many of the movies have the Action genre?"})

'There are 5 movies in the database that belong to the Action genre.'

Exercise:

In [34]:
chain.invoke({"question": "From the movies that were taken in United States, how many had the comedy genre?"})

ValueError: Generated Cypher Statement is not valid
{code: Neo.ClientError.Statement.SyntaxError} {message: Unexpected end of input: expected CYPHER, EXPLAIN, PROFILE or Query (line 0, column 0 (offset: 1))
""
 ^}

Let's try updating the prompt

In [57]:
from langchain.prompts.prompt import PromptTemplate

CYPHER_RECOMMENDATION_TEMPLATE = """
Task:Generate Cypher statement to query a graph database.
Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Schema:
{schema}
Cypher examples:
# How many streamers are from Norway?
MATCH (s:Stream)-[:HAS_LANGUAGE]->(:Language {{name: 'no'}})
RETURN count(s) AS streamers
# Which streamers do you recommend if I like kimdoe?
MATCH (s:Stream)
WHERE s.name = "kimdoe"
WITH collect(s) AS sourceNodes
CALL gds.pageRank.stream("shared-audience", 
  {{sourceNodes:sourceNodes, relationshipTypes:['SHARED_AUDIENCE'], 
    nodeLabels:['Stream']}})
YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS node, score
WHERE NOT node in sourceNodes
RETURN node.name AS streamer, score
ORDER BY score DESC LIMIT 3

Note: Do not include any explanations or apologies in your responses.
Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.
Do not include any text except the generated Cypher statement.

The question is:
{question}
"""

CYPHER_RECOMMENDATION_PROMPT = PromptTemplate(
    input_variables=["schema", "question"], template=CYPHER_RECOMMENDATION_TEMPLATE
)

In [58]:
chain_recommendation_example = GraphCypherQAChain.from_llm(
    llm=llm, graph=graph, verbose=True,
    cypher_prompt=CYPHER_RECOMMENDATION_PROMPT, 
)

q = "From the movies that were taken in United States, how many had the comedy genre?"
chain_recommendation_example.invoke({"query": q})



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie)-[:IN_GENRE]->(:Genre {name: 'Comedy'})-[:WAS_TAKEN_IN]->(:Location {name: 'United States'})
RETURN count(m) AS comedyMoviesInUS[0m
Full Context:
[32;1m[1;3m[{'comedyMoviesInUS': 0}][0m

[1m> Finished chain.[0m


{'query': 'From the movies that were taken in United States, how many had the comedy genre?',
 'result': 'There are 0 comedy movies in the United States.'}

It's working fine now