# Movie QA System with Graph Database

- Author: [Heesun Moon](https://github.com/MoonHeesun)
- Design: 
- Peer Review: 
- This is a part of [LangChain Open Tutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/19-Cookbook/03-GraphDB/04-MovieQASystem.ipynb) [![Open in GitHub](https://img.shields.io/badge/Open%20in%20GitHub-181717?style=flat-square&logo=github&logoColor=white)](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/99-TEMPLATE/00-BASE-TEMPLATE-EXAMPLE.ipynb)

## Overview

This tutorial covers the implementation of **a movie QA system using a graph database** with `Neo4j` .

It explains the process of storing data in a graph database and implementing a `text2Cypher` conversion feature using LLMs. This allows natural language queries to be converted into Cypher queries, and the answers retrieved from the database are returned in natural language as well.

![basic-workflow](./assets/04-movie-qa-system-basic-workflow.png)

>### ⚠️Security Note
>Building Q&A systems with graph databases involves executing model-generated queries, which carries inherent risks. To minimize these risks, restrict database permissions to the narrowest scope required for your chain or agent. While this reduces vulnerabilities, it does not eliminate them entirely. For more security best practices, see the [LangChain security documentation](https://python.langchain.com/docs/security/).

### Table of Contents

- [Overview](#overview)
- [Environement Setup](#environment-setup)
- [Connect to Neo4j Graph Database](#connect-to-neo4j-graph-database)
- [Graph Schema](#graph-schema)
- [GraphQACypherChain](#graphqacypherchain)

### References

- [Neo4j](https://neo4j.com/)
- [LangChain: Build a Question Answering application over a Graph Database](https://python.langchain.com/docs/tutorials/graph/#graphqacypherchain)
----

## Environment Setup

Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.

**[Note]**
- `langchain-opentutorial` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. 
- You can checkout the [`langchain-opentutorial`](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details.

In [None]:
%%capture --no-stderr
%pip install langchain-opentutorial

In [None]:
# Install required packages
from langchain_opentutorial import package

package.install(
    [
        "langchain",
        "langchain_neo4j",
        "langchain_openai",
    ],
    verbose=False,
    upgrade=False,
)

In [None]:
# Set environment variables
from langchain_opentutorial import set_env

set_env(
    {
        "OPENAI_API_KEY": "",
        "NEO4J_URI": "",
        "NEO4J_USERNAME": "",
        "NEO4J_PASSWORD": "",
        "LANGCHAIN_API_KEY": "",
        "LANGCHAIN_TRACING_V2": "true",
        "LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
        "LANGCHAIN_PROJECT": "MovieQASystem",
    }
)

You can alternatively set API keys such as `OPENAI_API_KEY` in a `.env` file and load them.

[Note] This is not necessary if you've already set the required API keys in previous steps.

In [2]:
# Load API keys from .env file
from dotenv import load_dotenv

load_dotenv(override=True)

True

## Connect to Neo4j Graph Database

First, install the Neo4j graph database. This tutorial is based on `Neo4j Desktop` .

- [installation](https://neo4j.com/docs/operations-manual/current/installation/)

[Note] You can also set up a free, cloud-based Neo4j instance using [Neo4j Sandbox](https://neo4j.com/sandbox/), an online platform for working with graph databases.

After installing, open Neo4j Desktop. A default **Example Project** should be available. We'll use this project for this tutorial.

### Activate Database

Click the `Start` to activate the `Movie DBMS` .

![setup-01](./assets/04-movie-qa-system-setup-01.png)

### Setup APOC Plugin

- Open the `Movie DBMS` project.  
- Go to the `Plugins` section.  
- Select `APOC` and click `Install and Restart` .

![setup-02](./assets/04-movie-qa-system-setup-02.png)

To allow external network connections, you need to update the `neo4j.conf` file.

Open the terminal in `Neo4j Desktop` .

![setup-03](./assets/04-movie-qa-system-setup-03.png)

Navigate to the configuration file directory and edit `neo4j.conf` .

In [None]:
# Change Directory
cd conf
# Windows
notepad neo4j.conf
# Linux/Mac
nano neo4j.conf

Add or modify the following line in the `neo4j.conf` file:  

- `server.default_listen_address=0.0.0.0`

### Define Neo4j Credentials

Next, you need to define your Neo4j credentials. If you haven't done this in the previous steps, you can define them using the `os` package.

[Note] This is not necessary if you've already set the required Neo4j credentials in previous steps.

>The default user account information:
>
>- Default username: `neo4j`
>- Default password: `neo4j`
>
>You are required to change the password upon your first login.

In [3]:
import os

os.environ["NEO4J_URI"] = "bolt://localhost:7687"
os.environ["NEO4J_USERNAME"] = "neo4j"
os.environ["NEO4J_PASSWORD"] = "movie1234"  # 초기 설정으로 바꾸기기

The following example demonstrates how to connect to a Neo4j database and populate it with sample data about movies and actors.

In [4]:
from langchain_neo4j import Neo4jGraph

graph = Neo4jGraph()

# Import movie information

movies_query = """
LOAD CSV WITH HEADERS FROM 
'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv'
AS row
MERGE (m:Movie {id:row.movieId})
SET m.released = date(row.released),
    m.title = row.title,
    m.imdbRating = toFloat(row.imdbRating)
FOREACH (director in split(row.director, '|') | 
    MERGE (p:Person {name:trim(director)})
    MERGE (p)-[:DIRECTED]->(m))
FOREACH (actor in split(row.actors, '|') | 
    MERGE (p:Person {name:trim(actor)})
    MERGE (p)-[:ACTED_IN]->(m))
FOREACH (genre in split(row.genres, '|') | 
    MERGE (g:Genre {name:trim(genre)})
    MERGE (m)-[:IN_GENRE]->(g))
"""

graph.query(movies_query)

[]

## Graph Schema

LLMs use graph schema information to convert natural language questions into Cypher queries. The converted query is executed in the database, and the execution results are provied to the LLM for question answering.

In [5]:
# Update schema
graph.refresh_schema()
# Print the current schema
print(graph.schema)

Node properties:
Movie {tagline: STRING, title: STRING, released: INTEGER, id: STRING, imdbRating: FLOAT}
Person {born: INTEGER, name: STRING}
Genre {name: STRING}
Relationship properties:
ACTED_IN {roles: LIST}
REVIEWED {summary: STRING, rating: INTEGER}
The relationships:
(:Movie)-[:IN_GENRE]->(:Genre)
(:Person)-[:ACTED_IN]->(:Movie)
(:Person)-[:DIRECTED]->(:Movie)
(:Person)-[:PRODUCED]->(:Movie)
(:Person)-[:WROTE]->(:Movie)
(:Person)-[:FOLLOWS]->(:Person)
(:Person)-[:REVIEWED]->(:Movie)


Now, let's check the enhanced schema information. Enhanced Schema Information provides a detailed overview of the graph structure, including node labels, relationship types, and property details such as data types and value ranges. This is particularly useful for generating more complex queries and leveraging LLMs for advanced use cases.

To enable Enhanced Schema, set the parameter `enhanced_schema=True` when initializing the graph object.

Key Features: 

- `Property Types` : Shows the data type (e.g., STRING, INTEGER, FLOAT) of each property.
- `Value Ranges` : Displays minimum and maximum values for numeric fields.
- `Examples` : Provides sample values for quick reference.
- `Relationships` : Lists connection types between nodes and their property details.

In [6]:
enhanced_graph = Neo4jGraph(enhanced_schema=True)
print(enhanced_graph.schema)



Node properties:
- **Movie**
  - `tagline`: STRING Example: "Welcome to the Real World"
  - `title`: STRING Example: "The Matrix"
  - `released`: INTEGER Min: 1964-12-16, Max: 2012
  - `id`: STRING Example: "1"
  - `imdbRating`: FLOAT Min: 2.4, Max: 9.3
- **Person**
  - `born`: INTEGER Min: 1929, Max: 1996
  - `name`: STRING Example: "Keanu Reeves"
- **Genre**
  - `name`: STRING Example: "Adventure"
Relationship properties:
- **ACTED_IN**
  - `roles`: LIST Min Size: 1, Max Size: 6
- **REVIEWED**
  - `summary`: STRING Available options: ['Pretty funny at times', 'Silly, but fun', 'Slapstick redeemed only by the Robin Williams and ', 'Dark, but compelling', 'An amazing journey', 'A solid romp', 'The coolest football movie ever', 'Fun, but a little far fetched']
  - `rating`: INTEGER Min: 45, Max: 100
The relationships:
(:Movie)-[:IN_GENRE]->(:Genre)
(:Person)-[:ACTED_IN]->(:Movie)
(:Person)-[:DIRECTED]->(:Movie)
(:Person)-[:PRODUCED]->(:Movie)
(:Person)-[:WROTE]->(:Movie)
(:Person)-[:FOL

## GraphQACypherChain

The `GraphCypherQAChain` simplifies the process of querying graph databases using natural language by integrating LLMs and Neo4j. It enables users to interact with graph data through natural language queries by converting them into Cypher queries, which are executed against a Neo4j database.

Key Features:

- `Text-to-Cypher Conversion` : Automatically translates natural language questions into Cypher queries.
- `Query Execution` : Executes the generated queries against a Neo4j graph database.
- `Natural Language Answers` : Processes and formats query results into human-readable answers.
- `LLM Integration` : Leverages Large Language Models (LLMs), like OpenAI’s GPT, for query generation and result interpretation.
- `Enhanced Schema Support` : Can work with enhanced schema information for more accurate query generation.

![chain-flow](./assets/04-movie-qa-system-chain-flow.png)

In [7]:
from langchain_neo4j import GraphCypherQAChain
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o", temperature=0)
chain = GraphCypherQAChain.from_llm(
    graph=enhanced_graph, llm=llm, verbose=True, allow_dangerous_requests=True
)
response = chain.invoke({"query": "What was the cast of the Casino?"})
response



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mcypher
MATCH (p:Person)-[:ACTED_IN]->(m:Movie {title: "Casino"})
RETURN p.name AS cast
[0m
Full Context:
[32;1m[1;3m[{'cast': 'Robert De Niro'}, {'cast': 'Joe Pesci'}, {'cast': 'Sharon Stone'}, {'cast': 'James Woods'}][0m

[1m> Finished chain.[0m


{'query': 'What was the cast of the Casino?',
 'result': 'Robert De Niro, Joe Pesci, Sharon Stone, and James Woods were the cast of Casino.'}

In [8]:
response = chain.invoke({"query": "Recommend the popular romance movie."})
response



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mcypher
MATCH (m:Movie)-[:IN_GENRE]->(g:Genre {name: "Romance"})
RETURN m.title, m.imdbRating
ORDER BY m.imdbRating DESC
LIMIT 1
[0m
Full Context:
[32;1m[1;3m[{'m.title': 'Chungking Express (Chung Hing sam lam)', 'm.imdbRating': 8.1}][0m

[1m> Finished chain.[0m


{'query': 'Recommend the popular romance movie.',
 'result': 'Chungking Express (Chung Hing sam lam) with an IMDb rating of 8.1 is a popular romance movie.'}