This notebook accompanies the article [Building an Educational Chatbot for GraphAcademy with Neo4j Using LLMs and Vector Search
](https://medium.com/neo4j/building-an-educational-chatbot-for-graphacademy-with-neo4j-f707c4ce311b).


## Setup


### Load Environment Variables 

Load the environment variables from `.env` to connect to the Neo4j instance and configure `openai`.  The `.env` file should contain the following settings:

```env
OPENAI_API_KEY=sk-...
NEO4J_URI=neo4j+s://[dbhash].databases.neo4j.io
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=...
```

In [2]:
%load_ext dotenv
%dotenv

In [24]:
import os 
import pandas as pd
import json

os.getenv('NEO4J_USERNAME')

'neo4j'

### Setup OpenAI

Import the `openai` library and set the API Key.

In [60]:
import openai 

openai_model = os.getenv('OPENAI_MODEL', 'gpt-3.5-turbo')

# Set the OpenAI API Key
openai.api_key = os.getenv('OPENAI_API_KEY')

openai.api_key[0:3]

'sk-'

### Connect to Neo4j

Create a Neo4j Driver instance and verify that the credentials supplied are correct.

In [11]:
from neo4j import GraphDatabase

driver = GraphDatabase.driver(
    os.getenv('NEO4J_URI'),
    auth=(os.getenv('NEO4J_USERNAME'), os.getenv('NEO4J_PASSWORD'))
)

driver.verify_connectivity()

## Building a basic Chatbot

Define a function to take a question and use the OpenAI Embeddings API to convert it into an embedding.

In [26]:
def get_embedding(question):
    chunks = openai.Embedding.create(
        input=question, 
        model='text-embedding-ada-002'
    )
    
    return chunks.data[0]["embedding"]

Create an embedding for the question _What is Cypher?_

In [41]:
embedding = get_embedding("What is Cypher?")

embedding[0:10], len(embedding)

([0.006597098428755999,
  0.003896713722497225,
  -0.023041438311338425,
  -0.018975302577018738,
  -0.01728799380362034,
  0.0009491108357906342,
  -0.005922866519540548,
  -0.0019276113016530871,
  -0.0018567305523902178,
  -0.02269567735493183],
 1536)

Define a function to query the vector index in Neo4j to find _chunks_ in that have are similar embedding to a user input.

In [49]:
def get_similar_chunks(embedding, limit = 3):
    with driver.session() as session:
        # Create a Unit of work to run a read statement in a read transaction
        def query_index(tx, embedding, limit):
            res = tx.run("""
                CALL db.index.vector.queryNodes('chatbot-embeddings', $limit, $embedding) 
                YIELD node, score
                MATCH (node)<-[:HAS_SECTION]-(p)
                RETURN 
                    p.title AS pageTitle, 
                    p.url AS pageUrl, 
                    node.title AS sectionTitle, 
                    node.url AS sectionUrl, 
                    node.text AS sectionText, 
                    score
                ORDER BY score DESC 
            """, embedding=embedding, limit=limit)

            # Get the results as a dict
            return [dict(record) for record in res]

        return session.execute_read(query_index, embedding=embedding, limit=limit)

Which sections in the database are the most similar to embedding?

In [50]:
similar = get_similar_chunks(embedding, limit=10)

pd.DataFrame(similar)

Unnamed: 0,pageTitle,pageUrl,sectionTitle,sectionUrl,sectionText,score
0,Introduction to Cypher,https://graphacademy.neo4j.com/courses/cypher-...,What is Cypher?,https://graphacademy.neo4j.com/courses/cypher-...,What is Cypher? - Cypher is a query language d...,0.941263
1,,https://neo4j.com/docs/apoc/current/cypher-exe...,,https://neo4j.com/docs/apoc/current/cypher-exe...,"Cypher can be used as a safe, graph-aware, par...",0.937309
2,Benefits of Neo4j,https://graphacademy.neo4j.com/courses/adminis...,Optimized graph engine,https://graphacademy.neo4j.com/courses/adminis...,Optimized graph engine - \n\n\n\n\nThe Neo4j g...,0.92191
3,Functions to Transform Element Types,https://graphacademy.neo4j.com/courses/cypher-...,Summary,https://graphacademy.neo4j.com/courses/cypher-...,"Summary - In this lesson, you about the Cypher...",0.914596
4,The Neo4j Type System,https://graphacademy.neo4j.com/courses/app-jav...,Lesson Summary,https://graphacademy.neo4j.com/courses/app-jav...,Lesson Summary - In this lesson you have learn...,0.912985
5,The Neo4j Type System,https://graphacademy.neo4j.com/courses/app-dot...,Lesson Summary,https://graphacademy.neo4j.com/courses/app-dot...,Lesson Summary - In this lesson you have learn...,0.912952
6,Processing Results,https://graphacademy.neo4j.com/courses/app-jav...,Lesson Summary,https://graphacademy.neo4j.com/courses/app-jav...,Lesson Summary - You now have all the informat...,0.912521
7,Processing Results,https://graphacademy.neo4j.com/courses/app-dot...,Lesson Summary,https://graphacademy.neo4j.com/courses/app-dot...,Lesson Summary - You now have all the informat...,0.912521
8,Creating a Driver Instance,https://graphacademy.neo4j.com/courses/app-go/...,Lesson Summary,https://graphacademy.neo4j.com/courses/app-go/...,"Lesson Summary - In this challenge, you used y...",0.912365
9,The @cypher Directive,https://graphacademy.neo4j.com/courses/graphql...,Summary,https://graphacademy.neo4j.com/courses/graphql...,"Summary - In this lesson, you learned how to u...",0.912257


Define a function to pass these results to the OpenAI API.

In [61]:
def ask_question(question):
    with driver.session() as session:
        # Create an embedding
        embedding = get_embedding(question)

        # Get similar content to the embedding
        context = get_similar_chunks(embedding)

        # Define the LLM's role with system messages 
        messages = [
          {
            "role": "system", 
            "content": """
              You are a chatbot teaching users to how use Neo4j GraphAcademy.
              Attempt to answer the users question with the context provided.
              Respond in a short, but friendly way.
              Use your knowledge to fill in any gaps.
              If you cannot answer the question, ask for more clarification.

              Provide a code sample if possible.
              Also include a link to the sectionUrl.
            """
          },
          {
            "role": "assistant", 
            "content": """
              Your Context:
              {}
            """.format(json.dumps(context))
          },
        ]


        #  Append a message detailing what the user has asked.
        messages.append(
          {
            "role": "user", 
            "content": """
              Answer the users question, wrapped in three backticks:

              ```
              {}
              ```
            """.format(question)
          }
        )

        # Send the message to OpenAI
        chat_completion = openai.ChatCompletion.create(model=openai_model, messages=messages)

        return context, chat_completion.choices[0].message.content

## Ask Questions

Can the LLM answer some questions?

In [62]:
context, answer = ask_question("What is the Cypher MATCH clause?")

answer

'The MATCH clause in Cypher is used to define the pattern in the graph that you would like to search for. It is used to retrieve data from the graph. You can think of it as similar to the FROM clause in an SQL statement. Here is an example of the MATCH clause in Cypher:\n\n```\nMATCH (p:Person)\nRETURN p\n```\n\nThis query matches all nodes in the graph with the Person label and returns them. You can learn more about the MATCH clause in the Introduction to Cypher section of the GraphAcademy: [Introduction to Cypher](https://graphacademy.neo4j.com/courses/cypher-fundamentals/1-reading/1-intro-cypher/)'

What pages were passed as context to the LLM in order to answer this question?

In [63]:
pd.DataFrame(context)

Unnamed: 0,pageTitle,pageUrl,sectionTitle,sectionUrl,sectionText,score
0,Introduction to Cypher,https://graphacademy.neo4j.com/courses/cypher-...,1. Read data,https://graphacademy.neo4j.com/courses/cypher-...,1. Read data - Which Cypher clause do you use ...,0.921759
1,Introduction to Cypher,https://graphacademy.neo4j.com/courses/cypher-...,What is Cypher?,https://graphacademy.neo4j.com/courses/cypher-...,What is Cypher? - Cypher is a query language d...,0.920465
2,Introduction to Cypher,https://graphacademy.neo4j.com/courses/cypher-...,2. Filtering,https://graphacademy.neo4j.com/courses/cypher-...,2. Filtering - What Cypher keyword can you use...,0.914499


In [65]:
_, answer = ask_question("How can I get a license for Neo4j Bloom?")

answer

Failed to read from defunct connection ResolvedIPv4Address(('3.224.243.10', 7687)) (ResolvedIPv4Address(('3.224.243.10', 7687)))
Failed to read from defunct connection ResolvedIPv4Address(('54.205.140.194', 7687)) (ResolvedIPv4Address(('54.205.140.194', 7687)))
Failed to read from defunct connection ResolvedIPv4Address(('34.237.189.213', 7687)) (ResolvedIPv4Address(('34.237.189.213', 7687)))
Failed to read from defunct connection IPv4Address(('p-41411cfe-6cb0af18-4.production-orch-0359.neo4j.io', 7687)) (ResolvedIPv4Address(('3.224.243.10', 7687)))
Transaction failed and will be retried in 0.9492277910762981s (Failed to read from defunct connection IPv4Address(('p-41411cfe-6cb0af18-4.production-orch-0359.neo4j.io', 7687)) (ResolvedIPv4Address(('3.224.243.10', 7687))))
Failed to read from defunct connection IPv4Address(('p-41411cfe-6cb0af18-5.production-orch-0359.neo4j.io', 7687)) (ResolvedIPv4Address(('52.21.223.250', 7687)))
Transaction failed and will be retried in 2.394296786323612s

'To obtain a license for Neo4j Bloom, you can reach out to the Neo4j sales team through their website: [Neo4j Sales](https://neo4j.com/lp/sales-inquiry/) They will be able to provide you with all the necessary information and assistance in obtaining a license for Neo4j Bloom.'

In [66]:
_, answer = ask_question("Write a query to find the person who directed Toy Story")

answer

"To find the person who directed Toy Story, you can use the following Cypher query:\n\n```\nMATCH (p:Person)-[:DIRECTED]->(:Movie {title: 'Toy Story'})\nRETURN p.name AS Director\n```\n\nThis query will return the name of the director of Toy Story."

## Comments, Questions, Feedback

For any comments, questions or feedback contact [graphacademy@neo4j.com](mailto:graphacademy@neo4j.com).