# How to map values to a graph database

In this guide we'll go over strategies to improve graph database query generation by mapping values from user inputs to database. When using the built-in graph chains, the LLM is aware of the graph schema, but has no information about the values of properties stored in the database. Therefore, we can introduce a new step in graph database QA system to accurately map values.

In [1]:
import os
from dotenv import load_dotenv
load_dotenv()
os.environ["LANGCHAIN_API_KEY"]=os.environ.get('LANGCHAIN_API_KEY')
os.environ["LANGCHAIN_TRACING_V2"]="true"
os.environ["LANGCHAIN_PROJECT"]="Q&A_over_Graph_data"

In [2]:
os.environ["NEO4J_URI"] = os.environ.get('NEO4J_URI')
os.environ["NEO4J_USERNAME"] = os.environ.get('NEO4J_USERNAME')
os.environ["NEO4J_PASSWORD"] = os.environ.get('NEO4J_PASSWORD')

In [3]:
from langchain_community.graphs import Neo4jGraph

graph = Neo4jGraph()


In [4]:

# Import movie information

movies_query = """
LOAD CSV WITH HEADERS FROM 
'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv'
AS row
MERGE (m:Movie {id:row.movieId})
SET m.released = date(row.released),
    m.title = row.title,
    m.imdbRating = toFloat(row.imdbRating)
FOREACH (director in split(row.director, '|') | 
    MERGE (p:Person {name:trim(director)})
    MERGE (p)-[:DIRECTED]->(m))
FOREACH (actor in split(row.actors, '|') | 
    MERGE (p:Person {name:trim(actor)})
    MERGE (p)-[:ACTED_IN]->(m))
FOREACH (genre in split(row.genres, '|') | 
    MERGE (g:Genre {name:trim(genre)})
    MERGE (m)-[:IN_GENRE]->(g))
"""


# Detecting entities in the user input

We have to extract the types of entities/values we want to map to a graph database. In this example, we are dealing with a movie graph, so we can map movies and people to the database.

In [5]:
from typing import List, Optional

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import ChatOpenAI

In [12]:
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

In [13]:
class Entities(BaseModel):
    """Identifying information about entities."""

    names: List[str] = Field(
        ...,
        description="All the person or movies appearing in the text",
    )

In [14]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are extracting person and movies from the text.",
        ),
        (
            "human",
            "Use the given format to extract information from the following "
            "input: {question}",
        ),
    ]
)

In [15]:
entity_chain = prompt | llm.with_structured_output(Entities)

In [16]:
entities = entity_chain.invoke({"question": "Who played in Casino movie?"})
entities

Entities(names=['Casino'])

We will utilize a simple CONTAINS clause to match entities to database. In practice, you might want to use a fuzzy search or a fulltext index to allow for minor misspellings.

In [17]:
match_query = """MATCH (p:Person|Movie)
WHERE p.name CONTAINS $value OR p.title CONTAINS $value
RETURN coalesce(p.name, p.title) AS result, labels(p)[0] AS type
LIMIT 1
"""

In [18]:
def map_to_database(entities: Entities) -> Optional[str]:
    result = ""
    for entity in entities.names:
        response = graph.query(match_query, {"value": entity})
        try:
            result += f"{entity} maps to {response[0]['result']} {response[0]['type']} in database\n"
        except IndexError:
            pass
    return result

In [19]:
map_to_database(entities)

'Casino maps to Casino Movie in database\n'

# Custom Cypher generating chain
We need to define a custom Cypher prompt that takes the entity mapping information along with the schema and the user question to construct a Cypher statement. We will be using the LangChain expression language to accomplish that.

In [20]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough


# Generate Cypher statement based on natural language input


In [21]:
cypher_template = """Based on the Neo4j graph schema below, write a Cypher query that would answer the user's question:
{schema}
Entities in the question map to the following database values:
{entities_list}
Question: {question}
Cypher query:"""

In [22]:
cypher_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Given an input question, convert it to a Cypher query. No pre-amble.",
        ),
        ("human", cypher_template),
    ]
)

In [23]:

cypher_response = (
    RunnablePassthrough.assign(names=entity_chain)
    | RunnablePassthrough.assign(
        entities_list=lambda x: map_to_database(x["names"]),
        schema=lambda _: graph.get_schema,
    )
    | cypher_prompt
    | llm.bind(stop=["\nCypherResult:"])
    | StrOutputParser()
)

In [24]:
cypher = cypher_response.invoke({"question": "Who played in Casino movie?"})
cypher

"MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: 'Casino'})\nRETURN p.name AS Actor;"

# Generating answers based on database results

Now that we have a chain that generates the Cypher statement, we need to execute the Cypher statement against the database and send the database results back to an LLM to generate the final answer. Again, we will be using LCEL.



In [28]:
! pip show langchain-community

Name: langchain-community
Version: 0.0.38
Summary: Community contributed LangChain integrations.
Home-page: https://github.com/langchain-ai/langchain
Author: 
Author-email: 
License: MIT
Location: c:\langchain\langchain-basics\venv\lib\site-packages
Requires: aiohttp, dataclasses-json, langchain-core, langsmith, numpy, PyYAML, requests, SQLAlchemy, tenacity
Required-by: langchain


In [29]:
! pip install -U langchain-community

Collecting langchain-community
  Downloading langchain_community-0.2.15-py3-none-any.whl.metadata (2.7 kB)
Collecting langchain<0.3.0,>=0.2.15 (from langchain-community)
  Downloading langchain-0.2.15-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-text-splitters<0.3.0,>=0.2.0 (from langchain<0.3.0,>=0.2.15->langchain-community)
  Using cached langchain_text_splitters-0.2.2-py3-none-any.whl.metadata (2.1 kB)
Downloading langchain_community-0.2.15-py3-none-any.whl (2.3 MB)
   ---------------------------------------- 0.0/2.3 MB ? eta -:--:--
   ---- ----------------------------------- 0.3/2.3 MB 5.9 MB/s eta 0:00:01
   --------- ------------------------------ 0.5/2.3 MB 6.7 MB/s eta 0:00:01
   ------------------ --------------------- 1.1/2.3 MB 7.5 MB/s eta 0:00:01
   ------------------------- -------------- 1.5/2.3 MB 8.0 MB/s eta 0:00:01
   --------------------------------- ------ 2.0/2.3 MB 8.3 MB/s eta 0:00:01
   ----------------------------------- ---- 2.1/2.3 MB 7.3 MB/s et

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ionic-langchain 0.2.3 requires langchain<0.2.0,>=0.1.0, but you have langchain 0.2.15 which is incompatible.
langchain-experimental 0.0.58 requires langchain<0.2.0,>=0.1.17, but you have langchain 0.2.15 which is incompatible.
langchain-experimental 0.0.58 requires langchain-core<0.2.0,>=0.1.52, but you have langchain-core 0.2.37 which is incompatible.


In [30]:
from langchain_community.chains.graph_qa.cypher_utils import CypherQueryCorrector,Schema


In [31]:

# Cypher validation tool for relationship directions
corrector_schema = [
    Schema(el["start"], el["type"], el["end"])
    for el in graph.structured_schema.get("relationships")
]


In [32]:
cypher_validation = CypherQueryCorrector(corrector_schema)

# Generate natural language response based on database results
response_template = """Based on the the question, Cypher query, and Cypher response, write a natural language response:
Question: {question}
Cypher query: {query}
Cypher Response: {response}"""


In [33]:
response_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "Given an input question and Cypher response, convert it to a natural"
            " language answer. No pre-amble.",
        ),
        ("human", response_template),
    ]
)

In [34]:
chain = (
    RunnablePassthrough.assign(query=cypher_response)
    | RunnablePassthrough.assign(
        response=lambda x: graph.query(cypher_validation(x["query"])),
    )
    | response_prompt
    | llm
    | StrOutputParser()
)

In [35]:
chain.invoke({"question": "Who played in Casino movie?"})

'The actors who played in the movie "Casino" are Robert De Niro, Joe Pesci, Sharon Stone, and James Woods.'