# Semantic layer with a Graph Database

This notebook shows how to use LLMs in combination with [Neo4j](https://neo4j.com/), a graph database, to construct a semantic layer of various tools, which an LLM agent can use to interact with a graph database.

## Why use a semantic layer?

In the realm of leveraging LLMs capabilities for intelligent data interactions, a semantic layer emerges as a reliable approach.
This layer, integrated with an LLM agent, offers a robust and dynamic interface for interacting with graph databases.
Unlike the traditional approach of directly generating Cypher or other database queries, which often proves to be fragile and brittle, our method leverages Cypher templates combined with dynamically generated input parameters using LLM.
This approach significantly enhances robustness and adaptability, making it well-suited for a wide range of applications.

## Why Graph Databases?

Ideal for data where relationships are crucial, offering advanced insights into connections and correlations.
Key Strengths:
* Navigating complex hierarchies.
* Unveiling hidden connections between data points.
* Exploring relationships and correlations.
  
## Use Cases
Graph databases shine in scenarios like:

* Recommendation systems.
* Social networks.
* Fraud detection.
* Drug discovery

In this notebook, we will build a **movie recommendation chatbot**, designed to both answer queries about movies and offer movie recommendations.

## Setup
We will start by installing and importing the relevant libraries.

Make sure you have your OpenAI account set up and you have your OpenAI API key handy.

In [None]:
# Optional: run to install the libraries locally if you haven't already
!pip3 install langchain
!pip3 install openai
!pip3 install neo4j

In [2]:
import os
import pandas as pd

In [None]:
# Optional: run to load environment variables from a .env file.
# This is not required if you have exported your env variables in another way or if you set it manually
!pip3 install python-dotenv
from dotenv import load_dotenv

load_dotenv()

# Set the OpenAI API key env variable manually
# os.environ["OPENAI_API_KEY"] = "<your_api_key>"
# print(os.environ["OPENAI_API_KEY"])

## Connecting to graph db

Make sure your Neo4j database has the [APOC plugin](https://neo4j.com/docs/apoc/current/) installed.

In [4]:
# DB credentials
url = "bolt://localhost:7687"
username = "neo4j"
password = "<your_password>"

In [5]:
from langchain.graphs import Neo4jGraph

graph = Neo4jGraph(url=url, username=username, password=password)

## Dataset
We will use a subset of the [MovieLens dataset](https://grouplens.org/datasets/movielens/), which contains 100k ratings of 9000 movies by 600 users.

We will then load this data into the graph db to be able to query it.

## Loading dataset

We will start by defining unique constraints for faster import

In [None]:
graph.query("CREATE CONSTRAINT IF NOT EXISTS FOR (m:Movie) REQUIRE m.id IS UNIQUE;")
graph.query("CREATE CONSTRAINT IF NOT EXISTS FOR (u:User) REQUIRE u.id IS UNIQUE;")
graph.query("CREATE CONSTRAINT IF NOT EXISTS FOR (p:Person) REQUIRE p.name IS UNIQUE;")
graph.query("CREATE CONSTRAINT IF NOT EXISTS FOR (g:Genre) REQUIRE g.name IS UNIQUE;")

In [None]:
# Import movie information
movie_path = "data/movies/movies.csv"
movie_data = pd.read_csv(movie_path)
movies_query = """
UNWIND $data AS row
CALL {
    WITH row
    MERGE (m:Movie {id:row.movieId})
    SET m.released = row.released,
        m.title = row.title
    FOREACH (director in split(row.director, '|') | 
        MERGE (p:Person {name:trim(director)})
        MERGE (p)-[:DIRECTED]->(m))
    FOREACH (actor in split(row.actors, '|') | 
        MERGE (p:Person {name:trim(actor)})
        MERGE (p)-[:ACTED_IN]->(m))
    FOREACH (genre in split(row.genres, '|') | 
        MERGE (g:Genre {name:trim(genre)})
        MERGE (m)-[:HAS_GENRE]->(g))
} IN TRANSACTIONS
"""

graph.query(movies_query, params={"data": movie_data.to_dict("records")})

In [None]:
# Import rating information
rating_path = "data/movies/ratings.csv"
rating_data = pd.read_csv(rating_path)
rating_query = """
UNWIND $data AS row
CALL {
    WITH row
    MATCH (m:Movie {id:row.movieId})
    MERGE (u:User {id:row.userId})
    MERGE (u)-[r:RATED]->(m)
    SET r.rating = row.rating,
        r.timestamp = row.timestamp
} IN TRANSACTIONS OF 10000 ROWS
"""

graph.query(rating_query, params={"data": rating_data.to_dict("records")})

## Defining fulltext index

The fulltext index will be used to map movies or actors from user queries to database.
We will create a single fulltext index that covers both **Person** and **Movie** nodes.

In [None]:
graph.query(
    "CREATE FULLTEXT INDEX movieOrPerson IF NOT EXISTS FOR (m:Person|Movie) ON EACH [m.title, m.name]"
)

## Defining tools

A semantic layer consists of various tools an LLM agent can use.
These tools are typically accessed through specific function calls, adhering to OpenAI's defined syntax and parameters, allowing the LLM to utilize advanced features and capabilities beyond basic text generation.
This functional approach integrates diverse functionalities, enabling the LLM to perform a wide range of tasks in various domains.

Int this notebook, we will implement two tools:
* Search: Used to find various information about movies and actors
* Recommender: Used to recommend movies based on existing preferences

### Search tool

Search tool uses fulltext index to map movies or actors from user input to a node in the database and return available information.

In [10]:
description_query = """
CALL db.index.fulltext.queryNodes("movieOrPerson", $fulltextQuery) 
YIELD node AS m
WITH m LIMIT 1
MATCH (m)-[r:ACTED_IN|DIRECTED|HAS_GENRE]-(t)
WITH m, type(r) as type, collect(coalesce(t.name, t.title)) as names
WITH m, type+": "+reduce(s="", n IN names | s + n + ", ") as types
WITH m, collect(types) as contexts
WITH m, "type:" + labels(m)[0] + "\ntitle: "+ coalesce(m.title, m.name) + "\nyear: "+coalesce(m.released,"") +"\n" +
       reduce(s="", c in contexts | s + substring(c, 0, size(c)-2) +"\n") as context
RETURN context LIMIT 1
"""


def generate_full_text_query(input: str) -> str:
    full_text_query = ""
    for word in input.split():
        full_text_query += f" {word}~0.8 OR"
    full_text_query += f' "{input}"~7'
    return full_text_query


def get_information(entity: str) -> str:
    data = graph.query(
        description_query, params={"fulltextQuery": generate_full_text_query(entity)}
    )
    try:
        return data[0]["context"]
    except:
        return "No information was found about the movie or person in the database"

In [11]:
print(get_information("Matrix"))

type:Movie
title: Matrix, The
year: 1999-03-31
DIRECTED: Lilly Wachowski, Lana Wachowski
ACTED_IN: Carrie-Anne Moss, Keanu Reeves, Hugo Weaving, Laurence Fishburne
HAS_GENRE: Sci-Fi, Action, Thriller



### Recommendation tool

In this example, we will use **user-based collaborative filtering** to recommend movies.
User-based collaborative filtering is a recommendation technique that leverages user behavior and preferences to suggest items.
In this method, the system identifies users with similar tastes and preferences to the active user by comparing their past behaviors, such as movie ratings or purchase history.
Based on this similarity, the system then recommends items that these similar users have liked or interacted with but the active user has not yet encountered.

Since we lack specific information about the active user's preferences, our approach will be to take the movie provided as input and recommend other movies that have been liked by users who also enjoyed this particular movie.

In [12]:
recommendation_query = """
CALL db.index.fulltext.queryNodes("movieOrPerson", $fulltextQuery) 
YIELD node AS m
WHERE m:Movie
WITH m 
LIMIT 1
MATCH (m)<-[r1:RATED]-()-[r2:RATED]->(m1)
WHERE r1.rating > 3.5 AND r2.rating > 3.5
WITH m, m1, count(*) AS count
ORDER BY count DESC LIMIT 3
RETURN m1.title AS recommendation
"""


def recommend_movie(input_movie: str) -> str:
    data = graph.query(
        recommendation_query,
        params={"fulltextQuery": generate_full_text_query(input_movie)},
    )
    try:
        return ", ".join([el["recommendation"] for el in data])
    except:
        return "I am sorry, but no recommendations were found"

In [13]:
print(recommend_movie("Matrix"))

Pulp Fiction, Shawshank Redemption, The, Forrest Gump


## Creating an agent with LangChain expression language


Langchain Expression Language is a powerful tool for defining and manipulating language models and their outputs in complex ways.
As a framework, Langchain is particularly useful for implementing agents, enabling more sophisticated and context-aware responses.
Utilizing an agent within Langchain, we can interact with a semantic layer atop a graph database, allowing for intricate queries and data manipulation based on natural language inputs, thus bridging the gap between complex data structures and user-friendly interfaces.

In [15]:
from langchain.agents import Tool
from langchain.tools.render import format_tool_to_openai_function
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo")

tools = [
    Tool(
        name="Search",
        func=get_information,
        description=(
            "useful for when you need to answer questions about various actors or movies. "
            "The input should be particular movie or person"
        ),
    ),
    Tool(
        name="Recommender",
        func=recommend_movie,
        description=(
            "useful for when you need to recommend a movie based on existing preferences of other movies. "
            "The input should be a movie title"
        ),
    ),
]

llm_with_tools = llm.bind(functions=[format_tool_to_openai_function(t) for t in tools])

In [16]:
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.agents.format_scratchpad import format_to_openai_function_messages
from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser
from langchain.agents import AgentExecutor


prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant that finds information about movies and recommends them",
        ),
        ("user", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ]
)

agent = (
    {
        "input": lambda x: x["input"],
        "agent_scratchpad": lambda x: format_to_openai_function_messages(
            x["intermediate_steps"]
        ),
    }
    | prompt
    | llm_with_tools
    | OpenAIFunctionsAgentOutputParser()
)

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

In [17]:
agent_executor.invoke({"input": "What do you know about pulp fiction?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `Search` with `pulp fiction`


[0m[36;1m[1;3mtype:Movie
title: Pulp Fiction
year: 1994-10-14
DIRECTED: Quentin Tarantino
ACTED_IN: Laura Lovelace, John Travolta, Samuel L. Jackson, Tim Roth
HAS_GENRE: Crime, Comedy, Thriller, Drama
[0m[32;1m[1;3m"Pulp Fiction" is a movie directed by Quentin Tarantino and released on October 14, 1994. It features a star-studded cast including John Travolta, Samuel L. Jackson, Tim Roth, and Laura Lovelace. The movie falls under the genres of crime, comedy, thriller, and drama.[0m

[1m> Finished chain.[0m


{'input': 'What do you know about pulp fiction?',
 'output': '"Pulp Fiction" is a movie directed by Quentin Tarantino and released on October 14, 1994. It features a star-studded cast including John Travolta, Samuel L. Jackson, Tim Roth, and Laura Lovelace. The movie falls under the genres of crime, comedy, thriller, and drama.'}

In [18]:
agent_executor.invoke(
    {"input": "I liked pulp fiction. Can you recommend me a similarly good movie?"}
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `Recommender` with `Pulp Fiction`


[0m[33;1m[1;3mShawshank Redemption, The, Silence of the Lambs, The, Forrest Gump[0m[32;1m[1;3mI would recommend you to watch "Shawshank Redemption", "Silence of the Lambs", or "Forrest Gump". These movies are highly acclaimed and have a similar level of quality and storytelling as "Pulp Fiction". Enjoy![0m

[1m> Finished chain.[0m


{'input': 'I liked pulp fiction. Can you recommend me a similarly good movie?',
 'output': 'I would recommend you to watch "Shawshank Redemption", "Silence of the Lambs", or "Forrest Gump". These movies are highly acclaimed and have a similar level of quality and storytelling as "Pulp Fiction". Enjoy!'}

## Conclusion

### Semantic layer

In conclusion, the integration of semantic layer over a graph database represents a robust solution in the field of language model interaction and data handling.
By moving away from the fragile and brittle method of generating database queries, and instead utilizing query templates with input parameters generated by LLM, this approach offers a more robust and adaptable solution.
This methodology allows you complete control over the information and the format of the retrieved data, enabling the implementation of a complex retrieval strategy with diverse and multifaceted options.

### Graph database

A semantic layer is not unique to a graph database and can be implemented over any other database system or even APIs. However, one advantage of a graph database is that it can handle complex structured and unstructured information in a single database system.
