# Hands-On: Grounding LLMs with a Knowledge Graph

In this session we will see two different architectural patterns through which a knowledge graph can be used together with an LLM to provide grounded answers to the users.

We will use the movie neo4j knowledge graph we saw 2 days ago.

To create an instance of this graph, go to the Neo4j sandbox in this [link](https://sandbox.neo4j.com/), log in, and click on "New Project."  From here, select the Movies graph and "Create".

In [None]:
from neo4j import GraphDatabase

# Configure connection details
uri = "bolt://localhost:7687"  # Default URI for local Neo4j instances
username = "neo4j"             # Replace with your username
password = "neo4j"          # Replace with your password

# Initialize the driver
driver = GraphDatabase.driver(uri, auth=(username, password))

# Function to execute a query
def run_query(query):
    with driver.session() as session:
        result = session.run(query)
        return [record for record in result]

# Example: Run a simple Cypher query
query = "MATCH (n) RETURN n LIMIT 5"
results = run_query(query)

# Print the results
for record in results:
    print(record)

# Close the driver connection
driver.close()


: 

## Setup


In [None]:
# install the necessary libraries
!pip install langchain
!pip install -U langchain-openai langchain-community
!pip install openai
!pip install neo4j

In [None]:
# The OpenAI key will be valid for the duration of this hands-on. We are using chatGPT 3.5

OPENAI_API_KEY = "..."
OPENAI_ENDPOINT = 'https://api.openai.com/v1/embeddings'


In [None]:
from langchain_openai import ChatOpenAI
from langchain.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain
from langchain.prompts import PromptTemplate

In [None]:
# Connect to GPT. Temperature is set to 0 as we want to have consinstency in the GPT responses
llm = ChatOpenAI(
    openai_api_key=OPENAI_API_KEY, temperature = 0
)

In [None]:
# Connect to the movie graph by specifying the Bolt URL, the username and the password. These are available under the "Connection Details" tab in the instance we created.

movie_graph = Neo4jGraph(
    url="...",
    username="...",
    password="...",
)

## Pattern 1: The LLM answers questions by transforming them into Cypher queries and executing them against a knowledge graph

In [None]:
# We define a prompt template to generate a cypher query from an input question, given a knowledge graph schema.

CYPHER_GENERATION_TEMPLATE = """
Task:Generate Cypher statement to query a graph database.
Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Schema:
{schema}

Note: Do not include any explanations or apologies in your responses.
Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.
Do not include any text except the generated Cypher statement.

The question is:
{question}"""


cypher_generation_prompt = PromptTemplate(
    template=CYPHER_GENERATION_TEMPLATE,
    input_variables=["schema", "question"],
)

In [None]:
# We create a Cypher QA chain and pass as parameters the llm, the graph, and the prompt template
cypher_chain = GraphCypherQAChain.from_llm(
    llm,
    graph=movie_graph,
    cypher_prompt=cypher_generation_prompt,
    verbose=True,
    allow_dangerous_requests = True
)

In [None]:
# Now, we can use the chain to transform natural language questions into Cypher queries and get an answer by executing the queries
def answer_question_using_cypher(question):
  try:
    answer = cypher_chain.run(question)
    print("Answer: ", answer)
  except Exception as e:
    print("Problem answering the question: ",e)


In [None]:
answer_question_using_cypher("Who directed Top Gun?")

In [None]:
answer_question_using_cypher("Which actors have acted in more than two movies?")

In [None]:
answer_question_using_cypher("Which movies have more than four actors?")

## Pattern 2: The LLM answers questions/requests by using indexed information about entitues that come from a knowledge graph

In [None]:
# For this example, we create an index of movie entity vector embeddings, using their taglines.

# First, we create an empty vector index that will hold vector embeddings for each movie entity in the graph.
# The embeddings will be based on the 'tagline' attribute of each movie

movie_graph.query("""
  CREATE VECTOR INDEX movie_tagline_embeddings IF NOT EXISTS
  FOR (m:Movie) ON (m.taglineEmbedding)
  OPTIONS { indexConfig: {
    `vector.dimensions`: 1536,
    `vector.similarity_function`: 'cosine'
  }}"""
)

movie_graph.query("""
  SHOW VECTOR INDEXES
  """
)

In [None]:
# Then, we populate the vector index by calculating a vector representation for each movie tagline using the OpenAI endpoint
# We add the vector to each `Movie` node as `taglineEmbedding` property

movie_graph.query("""
    MATCH (movie:Movie) WHERE movie.tagline IS NOT NULL
    WITH movie, genai.vector.encode(
        movie.tagline,
        "OpenAI",
        {
          token: $openAiApiKey,
          endpoint: $openAiEndpoint
        }) AS vector
    CALL db.create.setNodeVectorProperty(movie, "taglineEmbedding", vector)
    """,
    params={"openAiApiKey":OPENAI_API_KEY, "openAiEndpoint": OPENAI_ENDPOINT} )

[]

In [None]:
# We can see the vector of each movie node (though it's not much informative!)
result = movie_graph.query("""
    MATCH (m:Movie)
    WHERE m.tagline IS NOT NULL
    RETURN m.title, m.tagline, m.taglineEmbedding
    LIMIT 1
    """
)

first_movie_title = result[0]['m.title']
first_tagline = result[0]['m.tagline']
first_vector = result[0]['m.taglineEmbedding']

print(first_movie_title, first_tagline, first_vector)

The Matrix Welcome to the Real World [0.017436133697628975, -0.005479085724800825, -0.0020337789319455624, -0.025571251288056374, -0.014344528317451477, 0.016715632751584053, -0.017056234180927277, 0.0004707821935880929, -0.02521754987537861, -0.02955365926027298, 0.0005567510961554945, 0.020056139677762985, -0.00607841182500124, -0.004634134005755186, 0.008069615811109543, -0.002898380858823657, 0.026986053213477135, -0.03065406158566475, 0.005721436347812414, -0.007833815179765224, -0.017527835443615913, 0.01688593439757824, -0.006324037443846464, -0.03607747331261635, -0.011901373974978924, -0.010316270403563976, 0.02502105012536049, -0.02322634682059288, 0.014239728450775146, -0.02262374572455883, -0.0033798066433519125, -0.008515017107129097, 0.010185270570218563, -0.024261247366666794, -0.004791334271430969, -0.010375220328569412, -0.01668943278491497, -0.018300736322999, 0.011266021989285946, 0.007100214250385761, 0.02834845706820488, -0.0042771585285663605, 0.006884063594043255

In [None]:
# Now that we have the entity vector index we can query it with the help of the LLM. As an example, let's define a method that retrieves the 5 movies are most related to a given topic

def retrieve_movies_from_kg(topic):

  question = f"What movies are about {topic}"

  answer = movie_graph.query("""
      WITH genai.vector.encode(
          $question,
          "OpenAI",
          {
            token: $openAiApiKey,
            endpoint: $openAiEndpoint
          }) AS question_embedding
      CALL db.index.vector.queryNodes(
          'movie_tagline_embeddings',
          $top_k,
          question_embedding
          ) YIELD node AS movie, score
      RETURN movie.title, movie.tagline, score
      """,
      params={"openAiApiKey":OPENAI_API_KEY,
              "openAiEndpoint": OPENAI_ENDPOINT,
              "question": question,
              "top_k": 5
              })

  return answer

In [None]:
# We can run the method directly
retrieve_movies_from_kg("horror")

In [None]:
# Or we can run the method to provide context in a prompt
def recommend_movies_based_on_kg(topic):

  template = """Lately I am interested in movies about {movie_topic}.
  Suggest up to five movies I should watch next and a brief description of why, based only on the provided context.

  {context}
  """

  prompt = PromptTemplate.from_template(template).format_prompt(movie_topic = topic,context=retrieve_movies_from_kg(topic))

  llm_reply = llm(prompt.to_messages()).content
  print(llm_reply)

In [None]:
recommend_movies_based_on_kg("love")