## LLM Agents

* Local implementation following this tutorial: https://graphacademy.neo4j.com/courses/llm-fundamentals/3-intro-to-langchain/4-agents/

### Requirements

In [11]:
!pip install langchain openai langchain-openai neo4j python-dotenv langchainhub langchain-community --quiet

In [1]:
%load_ext watermark
%watermark -p langchain,langchainhub,langchain_community

langchain          : 0.1.5
langchainhub       : 0.1.14
langchain_community: 0.0.17



### Imports

In [2]:
import os
from graphdatascience import GraphDataScience
from dotenv import load_dotenv, find_dotenv
from pathlib import Path
import neo4j

### Settings

In [3]:
project_path = Path(os.getcwd()).parent
data_path = project_path / "data"
model_path = project_path / "models"
output_path = project_path / "output"

# load env settings
_ = load_dotenv(find_dotenv())

llm_model = "gpt-4"
database = "recommendations-50"

openai_api_key = os.getenv('OPENAI_API_KEY')

### Connect to Neo4j

In [15]:
graph = Neo4jGraph(
    url=os.getenv('NEO4J_URL'),
    username=os.getenv('NEO4J_USER'),
    password=os.getenv('NEO4J_PASS'),
    database=database
)
result = graph.query("""
MATCH (m:Movie{title: 'Toy Story'}) 
RETURN m.title, m.plot, m.poster
""")
print(result)

[{'m.title': 'Toy Story', 'm.plot': "A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy's room.", 'm.poster': 'https://image.tmdb.org/t/p/w440_and_h660_face/uXDfjJbdP4ijW5hWSBrPrlKpxab.jpg'}]


In [16]:
print(graph.schema)

Node properties are the following:
Movie {url: STRING, runtime: INTEGER, revenue: INTEGER, embedding: LIST, imdbRating: FLOAT, released: STRING, countries: LIST, languages: LIST, plot: STRING, imdbVotes: INTEGER, imdbId: STRING, year: INTEGER, poster: STRING, movieId: STRING, tmdbId: STRING, title: STRING, budget: INTEGER},Genre {name: STRING},User {userId: STRING, name: STRING},Actor {url: STRING, bornIn: STRING, bio: STRING, died: DATE, born: DATE, imdbId: STRING, name: STRING, poster: STRING, tmdbId: STRING},Director {url: STRING, bornIn: STRING, born: DATE, died: DATE, tmdbId: STRING, imdbId: STRING, name: STRING, poster: STRING, bio: STRING},Person {url: STRING, died: DATE, bornIn: STRING, born: DATE, imdbId: STRING, name: STRING, poster: STRING, tmdbId: STRING, bio: STRING}
Relationship properties are the following:
RATED {rating: FLOAT, timestamp: INTEGER},ACTED_IN {role: STRING},DIRECTED {role: STRING}
The relationships are the following:
(:Movie)-[:IN_GENRE]->(:Genre),(:User)-

* NB: after initialization, Neo4jGraph overwrites the passed arguments and replaces them with environment settings, which does not seem to be such a good practice to me.

<pre>
url = get_from_env("url", "NEO4J_URI", url)
username = get_from_env("username", "NEO4J_USERNAME", username)
password = get_from_env("password", "NEO4J_PASSWORD", password)
database = get_from_env("database", "NEO4J_DATABASE", database)
</pre>

### Retrievers

https://graphacademy.neo4j.com/courses/llm-fundamentals/3-intro-to-langchain/6-retrievers/

The Neo4jVector is a Langchain vector store that uses a Neo4j database as the underlying data structure.

You can use the Neo4jVector to generate embeddings, store them in the database and retrieve them.

In [35]:
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores.neo4j_vector import Neo4jVector

embedding_provider = OpenAIEmbeddings(
    openai_api_key=openai_api_key
)

movie_plot_vector = Neo4jVector.from_existing_index(
    embedding_provider,
    url=os.getenv('NEO4J_URL'),
    username=os.getenv('NEO4J_USER'),
    password=os.getenv('NEO4J_PASS'),
    index_name="moviePlots",
    embedding_node_property="embedding",
    text_node_property="plot"
)

result = movie_plot_vector.similarity_search("A movie where aliens land and attack earth.")
for doc in result:
    print(doc.metadata["title"], "-", doc.page_content)

Coneheads - Aliens with conical crania crash land on Earth.
Aliens - The planet from Alien (1979) has been colonized, but contact is lost. This time, the rescue team has impressive firepower, but will it be enough?
Independence Day (a.k.a. ID4) - The aliens are coming and their goal is to invade and destroy Earth. Fighting superior technology, mankind's best weapon is the will to survive.
Arrival, The - Zane, an astronomer, discovers intelligent alien life. But the aliens are keeping a deadly secret, and will do anything to stop Zane from learning it.


#### Lets compare the results with a call to queryNodes

#### Use movie_plot_vector.similarity_search

In [32]:
movie_title = "Toy Story"
result = movie_plot_vector.similarity_search(movie_title, k=4)
for doc in result:
    print("*", doc.metadata["title"], "-", doc.page_content)

* Toy Story - A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy's room.
* NeverEnding Story III, The - A young boy must restore order when a group of bullies steal the magical book that acts as a portal between Earth and the imaginary world of Fantasia.
* E.T. the Extra-Terrestrial - A troubled child summons the courage to help a friendly alien escape Earth and return to his home-world.
* Last Action Hero - With the help of a magic ticket, a young film fan is transported into the fictional world of his favorite action film character.


##### Use similarity_search_with_relevance_scores

In [55]:
movie_title = "Toy Story"
result = movie_plot_vector.similarity_search_with_relevance_scores(movie_title, k=4)
for doc, similarity in result:
    print(doc.metadata['title'], similarity)
    print(doc.page_content)
    print()

Toy Story 0.9472463130950928
A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy's room.

NeverEnding Story III, The 0.9195799827575684
A young boy must restore order when a group of bullies steal the magical book that acts as a portal between Earth and the imaginary world of Fantasia.

E.T. the Extra-Terrestrial 0.916156530380249
A troubled child summons the courage to help a friendly alien escape Earth and return to his home-world.

Last Action Hero 0.9161075353622437
With the help of a magic ticket, a young film fan is transported into the fictional world of his favorite action film character.



##### Use query: CALL db.index.vector.queryNodes

In [56]:
movie_title = "Toy Story"
nr_nearest_neighbours = 4

query = f"""
MATCH (m:Movie {{title: "{movie_title}"}})

CALL db.index.vector.queryNodes('moviePlots', {nr_nearest_neighbours}, m.embedding)
YIELD node, score

RETURN node.title AS title, node.plot AS plot, score
"""

graph.query(query)

[{'title': 'Toy Story',
  'plot': "A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy's room.",
  'score': 1.0},
 {'title': 'Little Rascals, The',
  'plot': 'Alfalfa is wooing Darla and his He-Man-Woman-Hating friends attempt to sabotage the relationship.',
  'score': 0.9214372634887695},
 {'title': 'NeverEnding Story III, The',
  'plot': 'A young boy must restore order when a group of bullies steal the magical book that acts as a portal between Earth and the imaginary world of Fantasia.',
  'score': 0.9206198453903198},
 {'title': 'Drop Dead Fred',
  'plot': 'A young woman finds her already unstable life rocked by the presence of a rambunctious imaginary friend from childhood.',
  'score': 0.9199690818786621}]

### Create embeddings

The Neo4jVector class can also generate embeddings and vector indexes - this is useful when creating vectors programmatically or at run time.

The following code would create embeddings and a new index called myVectorIndex in the database for Chunk nodes with a text property:

In [101]:
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores.neo4j_vector import Neo4jVector
from langchain.schema import Document

# A list of Documents
documents = [
    Document(
        page_content="Text to be indexed",
        metadata={"source": "local"}
    ),
    Document(
        page_content="Toy Story",
        metadata={"source": "local"}
    )
]

# Service used to create the embeddings
embedding_provider = OpenAIEmbeddings(openai_api_key=openai_api_key)

new_vector = Neo4jVector.from_documents(
    documents,
    embedding_provider,
    url=os.getenv('NEO4J_URL'),
    username=os.getenv('NEO4J_USER'),
    password=os.getenv('NEO4J_PASS'),
    index_name="myVectorIndex",
    node_label="Chunk",
    text_node_property="text",
    embedding_node_property="embedding",
    create_id_index=True,
)

* `Neo4jVector.from_documents` created a node Chunk

In [105]:
graph.refresh_schema()
print(graph.schema)

Node properties are the following:
Movie {url: STRING, runtime: INTEGER, revenue: INTEGER, embedding: LIST, imdbRating: FLOAT, released: STRING, countries: LIST, languages: LIST, plot: STRING, imdbVotes: INTEGER, imdbId: STRING, year: INTEGER, poster: STRING, movieId: STRING, tmdbId: STRING, title: STRING, budget: INTEGER},Genre {name: STRING},User {userId: STRING, name: STRING},Actor {url: STRING, bornIn: STRING, bio: STRING, died: DATE, born: DATE, imdbId: STRING, name: STRING, poster: STRING, tmdbId: STRING},Director {url: STRING, bornIn: STRING, born: DATE, died: DATE, tmdbId: STRING, imdbId: STRING, name: STRING, poster: STRING, bio: STRING},Person {url: STRING, bornIn: STRING, bio: STRING, died: DATE, born: DATE, imdbId: STRING, name: STRING, poster: STRING, tmdbId: STRING},Chunk {embedding: LIST, text: STRING, source: STRING, id: STRING}
Relationship properties are the following:
RATED {rating: FLOAT, timestamp: INTEGER},ACTED_IN {role: STRING},DIRECTED {role: STRING}
The relati

In [106]:
results = graph.query("MATCH (n:Chunk) RETURN n.id, n.text, n.source LIMIT 25")
for r in results:
    print(r)

{'n.id': '63a6fc72-c1d9-11ee-8ffa-266f09938552', 'n.text': 'Text to be indexed', 'n.source': 'local'}
{'n.id': '43b125d6-c1da-11ee-8ffa-266f09938552', 'n.text': 'Text to be indexed', 'n.source': 'local'}
{'n.id': '43b1291e-c1da-11ee-8ffa-266f09938552', 'n.text': 'Toy Story', 'n.source': 'local'}
{'n.id': 'a568abe6-c1da-11ee-8ffa-266f09938552', 'n.text': 'Text to be indexed', 'n.source': 'local'}
{'n.id': 'a568af7e-c1da-11ee-8ffa-266f09938552', 'n.text': 'Toy Story', 'n.source': 'local'}
{'n.id': 'ae107116-c1da-11ee-8ffa-266f09938552', 'n.text': 'Text to be indexed', 'n.source': 'local'}
{'n.id': 'ae1073e6-c1da-11ee-8ffa-266f09938552', 'n.text': 'Toy Story', 'n.source': 'local'}


* Each call to `from_documents` will create new chunks. 
* id seems to be linked to the call, so multiple documents share the same id

### Retriever chain

To incorporate a retriever and Neo4j vector into a Langchain application, you can create a retrieval chain.

The Neo4jVector class has a as_retriever() method that returns a retriever.

The RetrievalQA class is a chain that uses a retriever as part of its pipeline. It will use the retriever to retrieve documents and pass them to a language model.

By incorporating Neo4jVector into a RetrievalQA chain, you can use data and vectors in Neo4j in a Langchain application.

Review this program incorporating the moviePlots vector index into a retrieval chain.

In [115]:
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores.neo4j_vector import Neo4jVector


chat_llm = ChatOpenAI(openai_api_key=openai_api_key, verbose=True)

embedding_provider = OpenAIEmbeddings(openai_api_key=openai_api_key)

movie_plot_vector = Neo4jVector.from_existing_index(
    embedding_provider,
    url=os.getenv('NEO4J_URL'),
    username=os.getenv('NEO4J_USER'),
    password=os.getenv('NEO4J_PASS'),
    index_name="moviePlots",
    embedding_node_property="embedding",
    text_node_property="plot",
)

plot_retriever = RetrievalQA.from_llm(
    llm=chat_llm,
    retriever=movie_plot_vector.as_retriever(),
    verbose=True,
    return_source_documents=True
)

result = plot_retriever.invoke(
    {"query": "A movie where a mission to the moon goes wrong"}
)

print(result)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
{'query': 'A movie where a mission to the moon goes wrong', 'result': 'The movie you are referring to is "2001: A Space Odyssey" (1968). In this film, a mission to the moon encounters a mysterious and dangerous object buried beneath the lunar surface, leading to unexpected consequences for the crew and their intelligent computer, H.A.L. 9000.', 'source_documents': [Document(page_content='Humanity finds a mysterious, obviously artificial object buried beneath the Lunar surface and, with the intelligent computer H.A.L. 9000, sets off on a quest.', metadata={'budget': 12000000, 'movieId': '924', 'tmdbId': '62', 'imdbVotes': 407650, 'runtime': 149, 'countries': ['USA', ' UK'], 'imdbId': '0062622', 'url': 'https://themoviedb.org/movie/62', 'released': '1968-05-15', 'languages': ['English', ' Russian'], 'imdbRating': 8.3, 'title': '2001: A Space Odyssey', 'poster': 'https://image.tmdb.org/t/p/w440_and_h660_face/zmmYdPa8

In [117]:
result['result']

'The movie you are referring to is "2001: A Space Odyssey" (1968). In this film, a mission to the moon encounters a mysterious and dangerous object buried beneath the lunar surface, leading to unexpected consequences for the crew and their intelligent computer, H.A.L. 9000.'

In [120]:
for document in result['source_documents']:
    print(f"- {document.metadata['title']}")

- 2001: A Space Odyssey
- Apollo 13
- Aliens
- Coneheads


* Euh.. I would have said Apollo 13, not 2001: A Space Odyssey :)

In [121]:
query = "MATCH (m:Movie) WHERE m.title CONTAINS 'Space' OR m.title CONTAINS 'Apollo' RETURN m.title"
graph.query(query)

[{'m.title': 'Apollo 13'},
 {'m.title': 'Space Jam'},
 {'m.title': '2001: A Space Odyssey'},
 {'m.title': 'Lost in Space'},
 {'m.title': 'Plan 9 from Outer Space'},
 {'m.title': 'Cat from Outer Space, The'},
 {'m.title': 'Office Space'},
 {'m.title': 'It Came from Outer Space'},
 {'m.title': 'Muppets From Space'},
 {'m.title': 'Spaceballs'},
 {'m.title': 'Quatermass 2 (Enemy from Space)'},
 {'m.title': 'Space Cowboys'},
 {'m.title': 'Dogs in Space'},
 {'m.title': 'SpaceCamp'},
 {'m.title': 'Morons From Outer Space'},
 {'m.title': 'Spacehunter: Adventures in the Forbidden Zone'},
 {'m.title': 'Space Chimps'},
 {'m.title': 'Apollo 13: To the Edge and Back'},
 {'m.title': 'Monsters vs Aliens: Mutant Pumpkins from Outer Space'}]

In [122]:
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores.neo4j_vector import Neo4jVector


chat_llm = ChatOpenAI(openai_api_key=openai_api_key)

embedding_provider = OpenAIEmbeddings(openai_api_key=openai_api_key)

movie_plot_vector = Neo4jVector.from_existing_index(
    embedding_provider,
    url=os.getenv('NEO4J_URL'),
    username=os.getenv('NEO4J_USER'),
    password=os.getenv('NEO4J_PASS'),
    index_name="moviePlots",
    embedding_node_property="embedding",
    text_node_property="plot",
)

plot_retriever = RetrievalQA.from_llm(
    llm=chat_llm,
    retriever=movie_plot_vector.as_retriever()
)

result = plot_retriever.invoke(
    {"query": "A movie where a mission to the moon goes wrong and it's not 2001.. "}
)

print(result)

{'query': "A movie where a mission to the moon goes wrong and it's not 2001.. ", 'result': 'Apollo 13 (1995) is a movie where a mission to the moon goes wrong. It is based on the true story of the Apollo 13 mission, where an oxygen tank exploded onboard the spacecraft, jeopardizing the lives of the astronauts and forcing NASA to come up with a strategy to bring them back safely to Earth.'}


* Ah, with a little persuasion it gets to Apollo 13 :)