<a href="https://colab.research.google.com/github/hungryjins/Graph_DB/blob/main/GraphRAG_chatbot(Gradio).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 0. Setting

In [1]:
!pip install neo4j-graphrag neo4j openai

Collecting neo4j-graphrag
  Downloading neo4j_graphrag-1.8.0-py3-none-any.whl.metadata (18 kB)
Collecting neo4j
  Downloading neo4j-5.28.1-py3-none-any.whl.metadata (5.9 kB)
Collecting fsspec<2025.0.0,>=2024.9.0 (from neo4j-graphrag)
  Downloading fsspec-2024.12.0-py3-none-any.whl.metadata (11 kB)
Collecting json-repair<0.40.0,>=0.39.1 (from neo4j-graphrag)
  Downloading json_repair-0.39.1-py3-none-any.whl.metadata (11 kB)
Collecting pypdf<6.0.0,>=5.1.0 (from neo4j-graphrag)
  Downloading pypdf-5.8.0-py3-none-any.whl.metadata (7.1 kB)
Collecting types-pyyaml<7.0.0.0,>=6.0.12.20240917 (from neo4j-graphrag)
  Downloading types_pyyaml-6.0.12.20250516-py3-none-any.whl.metadata (1.8 kB)
Downloading neo4j_graphrag-1.8.0-py3-none-any.whl (189 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m189.7/189.7 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading neo4j-5.28.1-py3-none-any.whl (312 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m312.3/312.3

In [2]:
import os
os.environ["OPENAI_API_KEY"] = "OPENAI_API_KEY"

## 1. GraphDB

Neo4j Sandbox : https://sandbox.neo4j.com/

### 1-1. Neo4j driver setting

In [8]:
from neo4j import GraphDatabase, basic_auth
import openai

driver = GraphDatabase.driver(
  "neo4j://18.212.176.170:7687",
  auth=basic_auth("neo4j", "menus-classifications-smell"))

In [9]:
cypher_query = '''
MATCH (m:Movie {title:$movie})<-[:RATED]-(u:User)-[:RATED]->(rec:Movie)
RETURN distinct rec.title AS recommendation LIMIT 20
'''

with driver.session(database="neo4j") as session:
  results = session.read_transaction(
    lambda tx: tx.run(cypher_query,
                      movie="Crimson Tide").data())
  for record in results:
    print(record['recommendation'])

#driver.close()

Mr. Holland's Opus
Apollo 13
Dead Man Walking
Seven (a.k.a. Se7en)
Heat
Get Shorty
Fugitive, The
Dave
Addams Family Values
True Lies
Speed
Lion King, The
Four Weddings and a Funeral
Forrest Gump
Star Trek: Generations
Shawshank Redemption, The
Stargate
Pulp Fiction
Outbreak
Miracle on 34th Street


  results = session.read_transaction(


## 2. GRAPH RAG

### RAG method based on graph query results created with Text2Cypher Retriever

In [3]:
from neo4j_graphrag.retrievers import Text2CypherRetriever
from neo4j_graphrag.llm import OpenAILLM

# Generate Cypher query based on query text and use LLM to generate response after retrieval
llm = OpenAILLM(model_name="gpt-4o", model_params={"temperature": 0})

### 1) Text2Cypher Retriever

Provide information required for automatic Cypher generation
- Neo4j DB Schema
- Input / Output(Query)

#### Neo4j DB Schema

```
Node properties:
Person {name: STRING, born: INTEGER}
Movie {tagline: STRING, title: STRING, released: INTEGER}
Relationship properties:
ACTED_IN {roles: LIST}
REVIEWED {summary: STRING, rating: INTEGER}
The relationships:
(:Person)-[:ACTED_IN]->(:Movie)
(:Person)-[:DIRECTED]->(:Movie)
(:Person)-[:PRODUCED]->(:Movie)
(:Person)-[:WROTE]->(:Movie)
(:Person)-[:FOLLOWS]->(:Person)
(:Person)-[:REVIEWED]->(:Movie)
```

In [13]:
from neo4j import GraphDatabase, basic_auth
from neo4j.time import Date

def get_node_datatype(value):
    """
    Returns the data type of the given node value.
    """
    if isinstance(value, str):
        return "STRING"
    elif isinstance(value, int):
        return "INTEGER"
    elif isinstance(value, float):
        return "FLOAT"
    elif isinstance(value, bool):
        return "BOOLEAN"
    elif isinstance(value, list):
        # If the list is non-empty, infer the type from its first element
        return f"LIST[{get_node_datatype(value[0])}]" if value else "LIST"
    elif isinstance(value, Date):
        return "DATE"
    else:
        return "UNKNOWN"

def get_schema(uri, user, password):
    """
    Connects to the graph database, extracts node and relationship properties,
    and returns a schema dictionary.
    """
    driver = GraphDatabase.driver(
        uri,
        auth=basic_auth(user, password)
    )

    with driver.session() as session:
        # Extract node labels, property keys, and sample values
        node_query = """
        MATCH (n)
        WITH DISTINCT labels(n) AS node_labels, keys(n) AS property_keys, n
        UNWIND node_labels AS label
        UNWIND property_keys AS key
        RETURN label, key, n[key] AS sample_value
        """
        nodes = session.run(node_query)

        # Extract relationship types, property keys, and sample values
        rel_query = """
        MATCH ()-[r]->()
        WITH DISTINCT type(r) AS rel_type, keys(r) AS property_keys, r
        UNWIND property_keys AS key
        RETURN rel_type, key, r[key] AS sample_value
        """
        relationships = session.run(rel_query)

        # Extract relationship patterns (start label, type, end label)
        rel_direction_query = """
        MATCH (a)-[r]->(b)
        RETURN DISTINCT labels(a) AS start_label, type(r) AS rel_type, labels(b) AS end_label
        ORDER BY start_label, rel_type, end_label
        """
        rel_directions = session.run(rel_direction_query)

        # Initialize schema structure
        schema = {"nodes": {}, "relationships": {}, "relations": []}

        # Populate node property types
        for record in nodes:
            label = record["label"]
            key = record["key"]
            sample_value = record["sample_value"]  # sample for type inference
            inferred_type = get_node_datatype(sample_value)
            schema["nodes"].setdefault(label, {})[key] = inferred_type

        # Populate relationship property types
        for record in relationships:
            rel_type = record["rel_type"]
            key = record["key"]
            sample_value = record["sample_value"]  # sample for type inference
            inferred_type = get_node_datatype(sample_value)
            schema["relationships"].setdefault(rel_type, {})[key] = inferred_type

        # Record relationship patterns
        for record in rel_directions:
            start_label = record["start_label"][0]
            rel_type = record["rel_type"]
            end_label = record["end_label"][0]
            schema["relations"].append(f"(:{start_label})-[:{rel_type}]->(:{end_label})")

        return schema

def format_schema(schema):
    """
    Formats the schema dictionary into a string representation suitable
    for providing to an LLM.
    """
    lines = []

    # Node properties section
    lines.append("Node properties:")
    for label, properties in schema["nodes"].items():
        props = ", ".join(f"{k}: {v}" for k, v in properties.items())
        lines.append(f"{label} {{{props}}}")

    # Relationship properties section
    lines.append("Relationship properties:")
    for rel_type, properties in schema["relationships"].items():
        props = ", ".join(f"{k}: {v}" for k, v in properties.items())
        lines.append(f"{rel_type} {{{props}}}")

    # Relationship patterns section
    lines.append("The relationships:")
    for relation in schema["relations"]:
        lines.append(relation)

    return "\n".join(lines)


In [14]:
# Neo4j DB Schema
schema = get_schema("neo4j://18.212.176.170:7687", "neo4j", "menus-classifications-smell")
neo4j_schema = format_schema(schema)
print(neo4j_schema)

Node properties:
Movie {url: STRING, runtime: INTEGER, revenue: INTEGER, budget: INTEGER, plotEmbedding: LIST[FLOAT], imdbRating: FLOAT, released: STRING, countries: LIST[STRING], languages: LIST[STRING], plot: STRING, imdbVotes: INTEGER, imdbId: STRING, year: INTEGER, poster: STRING, movieId: STRING, tmdbId: STRING, title: STRING}
Genre {name: STRING}
User {userId: STRING, name: STRING}
Actor {bornIn: STRING, born: DATE, died: DATE, tmdbId: STRING, imdbId: STRING, name: STRING, url: STRING, bio: STRING, poster: STRING}
Person {bornIn: STRING, born: DATE, died: DATE, tmdbId: STRING, imdbId: STRING, name: STRING, url: STRING, bio: STRING, poster: STRING}
Director {url: STRING, bornIn: STRING, bio: STRING, died: DATE, born: DATE, imdbId: STRING, name: STRING, poster: STRING, tmdbId: STRING}
Relationship properties:
RATED {rating: FLOAT, timestamp: INTEGER}
ACTED_IN {role: STRING}
DIRECTED {role: STRING}
The relationships:
(:Actor)-[:ACTED_IN]->(:Movie)
(:Actor)-[:DIRECTED]->(:Movie)
(:Ac

#### Retriever example

- User Input : Which actors starred in the Toy Story?

- Auto-generated Cypher example: `MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) WHERE m.title = 'Toy Story' RETURN a.name`

※ Added example for recommendation system



In [4]:
# LLM INPUT / QUERY examples
examples = [
    "USER INPUT: 'Which actors appear in Toy Story?' QUERY: MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) WHERE m.title = 'Toy Story' RETURN a.name",
    "USER INPUT: 'What is the average rating for Toy Story?' QUERY: MATCH (u:User)-[r:RATED]->(m:Movie) WHERE m.title = 'Toy Story' RETURN AVG(r.rating)",

    """USER INPUT: 'I love the Toy Story movies. People who enjoyed Toy Story, what other movies did they enjoy?'
    QUERY: MATCH (m:Movie)<-[r:RATED]-(u:User)-[recr:RATED]->(userBasedRec:Movie)
    WHERE m.title = 'Toy Story' AND r.rating >= 4 AND recr.rating >= 4
    WITH userBasedRec, COUNT(recr) AS recCount, AVG(recr.rating) AS avgRating
    ORDER BY avgRating DESC, recCount DESC
    RETURN DISTINCT userBasedRec.title, avgRating, recCount
    LIMIT 10
    """,

    """USER INPUT: 'I like movies like 'The Wizard of Oz'. Can you recommend some similar movies?',
    QUERY: MATCH (m:Movie) WHERE m.title = 'Wizard of Oz, The'
    MATCH (m)-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(rec:Movie)
    WITH m, rec, count(*) AS gs

    OPTIONAL MATCH (m)<-[:ACTED_IN]-(a)-[:ACTED_IN]->(rec)
    WITH m, rec, gs, count(a) AS as

    OPTIONAL MATCH (m)<-[:DIRECTED]-(d)-[:DIRECTED]->(rec)
    WITH m, rec, gs, as, count(d) AS ds

    RETURN rec.title AS recommendation,
            rec.poster AS rec_poster,
            gs AS genre_similarity,
            as AS actor_similarity,
            ds AS director_similarity,
           (5*gs)+(3*as)+(4*ds) AS score
    ORDER BY score DESC LIMIT 10
    """,

    """USER INPUT: 'Please recommend a movie with a similar genre or atmosphere to the movie 'Inception'.'
    QUERY: MATCH (m:Movie)-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(rec:Movie)
    WHERE m.title = 'Inception' WITH rec, collect(g.name) AS genres, count(*) AS commonGenres
    RETURN rec.title, genres, commonGenres ORDER BY commonGenres DESC LIMIT 10;"""
]

In [15]:
# Text2CypherRetriever
retriever = Text2CypherRetriever(
    driver=driver,
    llm=llm,  # type: ignore
    neo4j_schema=neo4j_schema,
    examples=examples,
)

# Generate Cypher query via LLM, send it to Neo4j DB, and return the result => This result is utilized in RAG.
query_text = "What movies has Tom Hanks been in?"
search_result = retriever.search(query_text=query_text)

In [None]:
search_result.items

[RetrieverResultItem(content="<Record m.title='Punchline'>", metadata=None),
 RetrieverResultItem(content="<Record m.title='Catch Me If You Can'>", metadata=None),
 RetrieverResultItem(content="<Record m.title='Dragnet'>", metadata=None),
 RetrieverResultItem(content="<Record m.title='Saving Mr. Banks'>", metadata=None),
 RetrieverResultItem(content="<Record m.title='Bachelor Party'>", metadata=None),
 RetrieverResultItem(content="<Record m.title='Volunteers'>", metadata=None),
 RetrieverResultItem(content="<Record m.title='Man with One Red Shoe, The'>", metadata=None),
 RetrieverResultItem(content="<Record m.title='Splash'>", metadata=None),
 RetrieverResultItem(content="<Record m.title='Big'>", metadata=None),
 RetrieverResultItem(content="<Record m.title='Nothing in Common'>", metadata=None),
 RetrieverResultItem(content="<Record m.title='Money Pit, The'>", metadata=None),
 RetrieverResultItem(content="<Record m.title='Toy Story of Terror'>", metadata=None),
 RetrieverResultItem(con

In [16]:
query_text = "I love Titanic. Can you recommend some similar movies?"
search_result = retriever.search(query_text=query_text)

In [17]:
search_result.metadata['cypher']

"MATCH (m:Movie) WHERE m.title = 'Titanic'\nMATCH (m)-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(rec:Movie)\nWITH m, rec, count(*) AS gs\n\nOPTIONAL MATCH (m)<-[:ACTED_IN]-(a:Actor)-[:ACTED_IN]->(rec)\nWITH m, rec, gs, count(a) AS as\n\nOPTIONAL MATCH (m)<-[:DIRECTED]-(d:Director)-[:DIRECTED]->(rec)\nWITH m, rec, gs, as, count(d) AS ds\n\nRETURN rec.title AS recommendation,\n       rec.poster AS rec_poster,\n       gs AS genre_similarity,\n       as AS actor_similarity,\n       ds AS director_similarity,\n       (5*gs)+(3*as)+(4*ds) AS score\nORDER BY score DESC LIMIT 10"

In [18]:
print(search_result.metadata['cypher'])

MATCH (m:Movie) WHERE m.title = 'Titanic'
MATCH (m)-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(rec:Movie)
WITH m, rec, count(*) AS gs

OPTIONAL MATCH (m)<-[:ACTED_IN]-(a:Actor)-[:ACTED_IN]->(rec)
WITH m, rec, gs, count(a) AS as

OPTIONAL MATCH (m)<-[:DIRECTED]-(d:Director)-[:DIRECTED]->(rec)
WITH m, rec, gs, as, count(d) AS ds

RETURN rec.title AS recommendation,
       rec.poster AS rec_poster,
       gs AS genre_similarity,
       as AS actor_similarity,
       ds AS director_similarity,
       (5*gs)+(3*as)+(4*ds) AS score
ORDER BY score DESC LIMIT 10


### 2) Retriever-based RAG generation

In [19]:
from neo4j_graphrag.generation import GraphRAG
# Initialize RAG pipeline
rag = GraphRAG(retriever=retriever, llm=llm)

Check the context information used in the answer :
https://github.com/neo4j/neo4j-graphrag-python/blob/89411ca2c9ae7fdce63ee9678fe658b2e2ec30dd/src/neo4j_graphrag/generation/graphrag.py#L101

In [20]:
# Question
query_text = "Please recommend some movies of a similar genre to Titanic."

response = rag.search(query_text=query_text, return_context = True)
print("==== [Cypher automatically generated via Text2Cypher] ====")
print(response.retriever_result.metadata['cypher'])
print("\n==== [Generate final answer based on generated Cypher] ====")
print(response.answer)

==== [Cypher automatically generated via Text2Cypher] ====
MATCH (m:Movie)-[:IN_GENRE]->(g:Genre)<-[:IN_GENRE]-(rec:Movie)
WHERE m.title = 'Titanic'
WITH rec, collect(g.name) AS genres, count(*) AS commonGenres
RETURN rec.title, genres, commonGenres
ORDER BY commonGenres DESC
LIMIT 10;

==== [Generate final answer based on generated Cypher] ====
Here are some movies with similar genres to Titanic, which include Drama and Romance:

1. Dirty Mary Crazy Larry
2. Shiri (Swiri)
3. Absolute Giganten
4. Eight Below
5. Kingdom of Heaven
6. Helen of Troy
7. Robin Hood
8. Legend of the Red Dragon (a.k.a. New Legend of Shaolin, The)
9. Casanova
10. House of Flying Daggers (Shi mian mai fu)


In [22]:
# Question
query_text = "People who like Toy Story and The Godfather movies, what other movies do they like?"

response = rag.search(query_text=query_text, return_context = True)
print("==== [Cypher automatically generated via Text2Cypher] ====")
print(response.retriever_result.metadata['cypher'])
print("\n==== [Generate final answer based on generated Cypher] ====")
print(response.answer)

==== [Cypher automatically generated via Text2Cypher] ====
MATCH (m1:Movie)<-[r1:RATED]-(u:User)-[r2:RATED]->(rec:Movie), 
      (m2:Movie)<-[r3:RATED]-(u)
WHERE m1.title = 'Toy Story' AND m2.title = 'The Godfather' 
  AND r1.rating >= 4 AND r3.rating >= 4 AND r2.rating >= 4
WITH rec, COUNT(r2) AS recCount, AVG(r2.rating) AS avgRating
ORDER BY avgRating DESC, recCount DESC
RETURN DISTINCT rec.title, avgRating, recCount
LIMIT 10

==== [Generate final answer based on generated Cypher] ====
People who like both Toy Story and The Godfather movies often appreciate a diverse range of films that combine strong storytelling, memorable characters, and impactful themes. They might also enjoy movies such as:

1. **The Shawshank Redemption** - Known for its powerful narrative and character development.
2. **Pulp Fiction** - Offers a mix of humor, drama, and unique storytelling.
3. **The Lion King** - Another animated classic with emotional depth and universal themes.
4. **Forrest Gump** - Combines

## 3. Deploying Chatbots with Gradio

In [23]:
!pip install gradio



In [24]:
import gradio as gr
from gradio.themes.base import Base

class Seafoam(Base):
    pass
seafoam = Seafoam()

with gr.Blocks(theme=seafoam) as demo: #'JohnSmith9982/small_and_pretty'
    def default_llm(message):
        prompt_text = f"""
        You are a chatbot with a movie recommendation system. Respond to user_input, but encourage the user to tell you what movies or genres they like.
        user_input : {message}
        """
        return llm.invoke(prompt_text).content

    def intent_detection(message):
        prompt_text = f"""
        If the given query_text appears to be a legitimate question for getting movie recommendations, return True, otherwise return False.
        example : [("query_text": "Hi Nice to meet you", "answer": "False"), ("query_text": "I heard that you're so good at recommending movies?", "answer": "False"), ("query_text": "Can you recommend some movies in a similar genre to Titanic?", "answer": "True")]
        query_text : {message}
        """
        return llm.invoke(prompt_text).content == 'True'

    def response(message, chat_history):
        #### INTENT DETECTION ####
        if(intent_detection(message)):
            #### ANSWER ####
            rag_result = rag.search(query_text=message
                                    + "(Please also provide evidence for how you used context in your answer.)"
                                    , return_context = True)
            chat_history.append((message, rag_result.answer))
            return chat_history, rag_result.retriever_result.metadata['cypher'], rag_result.retriever_result.items
        else:
            llm_result = default_llm(message)
            chat_history.append((message, llm_result))
            return chat_history, "It wasn't a movie-related question."

    with gr.Row():
        with gr.Column(scale=4):
            gr.HTML("""<div style="text-align: center; max-width: 1000px; margin: 10px auto;">
                <div>
                    <h1>Graph RAG Chatbot !</h1>
                </div>
                <p style="margin-bottom: 10px; font-size: 95%">
                    💭 The answer is based on the movie review dataset from Graph DB. Please also check the DB query results used in the answer. </a>
                </p>
            </div>""")


    with gr.Row():
        with gr.Column(scale = 1):
            generated_query = gr.Textbox(label="Generated Cypher query")
            query_result = gr.Textbox(label="Query lookup results")
        with gr.Column(scale = 3):
            chatbot = gr.Chatbot()
            msg = gr.Textbox(placeholder="What kind of movie would you like to receive a recommendation for? (It would be helpful if you could also mention the genre you are interested in or a movie you enjoyed.)", label="Input")
            examples = gr.Examples(
                examples=[
                    "I like movies like 'Net, The'. Can you recommend some similar movies?",
                    "Please recommend a movie with a similar genre or atmosphere to the movie 'Inception'."
                ],
                inputs=[msg],
            )
            with gr.Row():
                gr.HTML("""<div style="text-align: center; max-width: 500px; margin: 0px auto;">
                    <div>
                        <h1>  </h1>
                    </p>
                </div>""")
                gr.HTML("""<div style="text-align: center; max-width: 500px; margin: 0px auto;">
                    <div>
                        <h1>  </h1>
                    </p>
                </div>""")
                btn = gr.Button("Submit", variant="primary")
                clear = gr.Button("Clear")

    btn.click(fn=response, inputs=[msg, chatbot], outputs=[chatbot, generated_query, query_result])
    msg.submit(response, [msg, chatbot], [chatbot, generated_query, query_result])

    clear.click(lambda: None, None, msg, queue=False)

demo.launch(debug=True, share=True)#debug=True, share=True

  chatbot = gr.Chatbot()


Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://a2b0e3cef5d9231484.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/gradio/queueing.py", line 625, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/gradio/route_utils.py", line 322, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/gradio/blocks.py", line 2201, in process_api
    data = await self.postprocess_data(block_fn, result["prediction"], state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/gradio/blocks.py", line 1928, in postprocess_data
    self.validate_outputs(block_fn, predictions)  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/gradio/blocks.py", line 1883, in validate_outputs
    rai

Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://a2b0e3cef5d9231484.gradio.live


