# Vector Search

In this lesson, you will learn how to use vectors indexes with LangChain to perform vector search.

## Similarity Search

The [Neo4jVector](https://python.langchain.com/api_reference/neo4j/vectorstores/langchain_neo4j.vectorstores.neo4j_vector.Neo4jVector.html) class provides an interface to use vector indexes in Neo4j. You can use Neo4jVector to create a vector store that can modify data and perform similarity search.

In [1]:
import os
from dotenv import load_dotenv
load_dotenv()

from langchain_neo4j import Neo4jGraph

# Connect to Neo4j
graph = Neo4jGraph(
    url=os.getenv("NEO4J_URI"),
    username=os.getenv("NEO4J_USERNAME"), 
    password=os.getenv("NEO4J_PASSWORD"),
)

## Embedding model

The movie plot embeddings were created using the OpenAI `text-embedding-ada-002` model. You need to use the same model to convert the query into vectors.

Use the `OpenAIEmbeddings` class to create the embedding model:

In [2]:
from langchain_openai import OpenAIEmbeddings

# Create the embedding model
embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002")

## Vector Store

Use the `Neo4jVector` class to create a vector store that connects to the Neo4j database, uses the embedding model, and the `moviePlots` index.

When specifying the vector index you must also state the properties that contain the text (`text_node_property`) and the embedding (`embedding_node_property`).

In [3]:
from langchain_neo4j import Neo4jVector

# Create Vector
plot_vector = Neo4jVector.from_existing_index(
    embedding_model,
    graph=graph,
    index_name="moviePlots",
    embedding_node_property="plotEmbedding",
    text_node_property="plot",
)

## Search

The `similarity_search` method of the `Neo4jVector` class allows you to perform a similarity search based on a query.

Running the code will return the most similar movies to the query.

The method returns a list of LangChain [Document](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html) objects, each containing the plot as the content and the node properties as `metadata`.

You can parse the results to extract the movie titles and plots.

In [4]:
# Search for similar movie plots
plot = "Toys come alive"
result = plot_vector.similarity_search(plot, k=3)
print(result)

[Document(metadata={'budget': 30000000, 'movieId': 1, 'tmdbId': 862, 'genres': ['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy'], 'imdbVotes': 591836, 'runtime': 81, 'countries': ['USA'], 'imdbId': 114709, 'released': neo4j.time.Date(1995, 11, 22), 'languages': ['English'], 'imdbRating': 8.3, 'title': 'Toy Story', 'year': 1995, 'revenue': 373554033}, page_content="A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy's room."), Document(metadata={'budget': 65000000, 'movieId': 2, 'tmdbId': 8844, 'genres': ['Adventure', 'Children', 'Fantasy'], 'imdbVotes': 198355, 'runtime': 104, 'countries': ['USA'], 'imdbId': 113497, 'released': neo4j.time.Date(1995, 12, 15), 'languages': ['English', 'French'], 'imdbRating': 6.9, 'title': 'Jumanji', 'year': 1995, 'revenue': 262797249}, page_content='When two kids find and play a magical board game, they release a man trapped for decades in it and a host of dangers that can only be sto

In [5]:
# Parse the documents
for doc in result:
    print(f"Title: {doc.metadata['title']}")
    print(f"Plot: {doc.page_content}\n")

Title: Toy Story
Plot: A cowboy doll is profoundly threatened and jealous when a new spaceman figure supplants him as top toy in a boy's room.

Title: Jumanji
Plot: When two kids find and play a magical board game, they release a man trapped for decades in it and a host of dangers that can only be stopped by finishing the game.

Title: Powder
Plot: A young bald albino boy with unique powers shakes up the rural community he lives in.



# Vector Retriever

Vector search can be used in Retrieval Augmented Generation (RAG) applications to find relevant documents based on their content.

In this lesson, you will update the LangChain agent to use a vector retriever that will allow you to search for movies based on plot.

In [6]:
import os
from dotenv import load_dotenv
load_dotenv()

from langchain_core.documents import Document
from langchain.chat_models import init_chat_model
from langgraph.graph import START, StateGraph
from langchain_core.prompts import PromptTemplate
from typing_extensions import List, TypedDict
from langchain_neo4j import Neo4jGraph
from langchain_neo4j import Neo4jVector
from langchain_openai import OpenAIEmbeddings

# Initialize the LLM
model = init_chat_model("gpt-4o", model_provider="openai")

# Create a prompt
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}

Answer:"""

prompt = PromptTemplate.from_template(template)

# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

# Connect to Neo4j
graph = Neo4jGraph(
    url=os.getenv("NEO4J_URI"),
    username=os.getenv("NEO4J_USERNAME"), 
    password=os.getenv("NEO4J_PASSWORD"),
)

# Create the embedding model
embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002")

# Create Vector
plot_vector = Neo4jVector.from_existing_index(
    embedding_model,
    graph=graph,
    index_name="moviePlots",
    embedding_node_property="plotEmbedding",
    text_node_property="plot",
)

# Define functions for each step in the application

# Retrieve context 
def retrieve(state: State):
    # Use the vector to find relevant documents
    context = plot_vector.similarity_search(
        state["question"], 
        k=6
    )
    return {"context": context}

# Generate the answer based on the question and context
def generate(state: State):
    messages = prompt.invoke({"question": state["question"], "context": state["context"]})
    response = model.invoke(messages)
    return {"answer": response.content}

# Define application steps
workflow = StateGraph(State).add_sequence([retrieve, generate])
workflow.add_edge(START, "retrieve")
app = workflow.compile()

# Run the application
question = "What is the movie with the pig who wants to be a sheep dog?"
response = app.invoke({"question": question})
print("Answer:", response["answer"])

The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.
Answer: The movie with the pig who wants to be a sheepdog is "Babe."


# Graph Retrieval

You can add an additional Cypher retrieval query to the `Neo4jVector` class. The **retrieval query** is run after the similarity search and the data it returns is added to the `Document` metadata.

You can use this retrieval query to retrieve useful context from the graph.

In the movie plot example, you could retrieve additional information about the movies, such as the actors or user ratings.

The additional context can be used to improve and expand the agentâ€™s responses, for example:

    Who acts in movies about Love and Romance?

The vector retriever will return movies about Love and Romance, the Cypher retrieval query will return the actors in those movies, and the agent can use this information to answer the question.

This method of vector + graph retrieval is a common approach to GraphRAG (Graph Retrieval Augmented Generation).

## Retrieval Query

In [7]:
import os
from dotenv import load_dotenv
load_dotenv()

from langchain_core.documents import Document
from langchain.chat_models import init_chat_model
from langgraph.graph import START, StateGraph
from langchain_core.prompts import PromptTemplate
from typing_extensions import List, TypedDict
from langchain_openai import OpenAIEmbeddings
from langchain_neo4j import Neo4jGraph, Neo4jVector

# Initialize the LLM
model = init_chat_model("gpt-4o", model_provider="openai")

# Create a prompt
template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}

Answer:"""

prompt = PromptTemplate.from_template(template)

# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

# Connect to Neo4j
graph = Neo4jGraph(
    url=os.getenv("NEO4J_URI"),
    username=os.getenv("NEO4J_USERNAME"), 
    password=os.getenv("NEO4J_PASSWORD"),
)

# Create the embedding model
embedding_model = OpenAIEmbeddings(model="text-embedding-ada-002")

# Define the retrieval query
retrieval_query = """
MATCH (node)<-[r:RATED]-()
WITH node, score, avg(r.rating) AS userRating
RETURN 
    "Title: " + node.title + ", Plot: " + node.plot AS text, 
    score, 
    {
        title: node.title,
        genres: [ (node)-[:IN_GENRE]->(g) | g.name ],
        actors: [ (person)-[r:ACTED_IN]->(node) | [person.name, r.role] ],
        userRating: userRating
    } AS metadata
ORDER BY userRating DESC
"""

# Create Vector
plot_vector = Neo4jVector.from_existing_index(
    embedding_model,
    graph=graph,
    index_name="moviePlots",
    embedding_node_property="plotEmbedding",
    text_node_property="plot",
    retrieval_query=retrieval_query,
)

# Define functions for each step in the application

# Retrieve context 
def retrieve(state: State):
    # Use the vector to find relevant documents
    context = plot_vector.similarity_search(
        state["question"], 
        k=6,
    )
    return {"context": context}

# Generate the answer based on the question and context
def generate(state: State):
    messages = prompt.invoke({"question": state["question"], "context": state["context"]})
    response = model.invoke(messages)
    return {"answer": response.content}

# Define application steps
workflow = StateGraph(State).add_sequence([retrieve, generate])
workflow.add_edge(START, "retrieve")
app = workflow.compile()

# Run the application
question = "Who acts in movies about Love and Romance?"
response = app.invoke({"question": question})
print("Answer:", response["answer"])
print("Context:", response["context"])

Answer: The actors who act in movies about love and romance from the provided context are:

1. Philippe Noiret in "Postman, The (Postino, Il)"
2. Renato Scarpa in "Postman, The (Postino, Il)"
3. Maria Grazia Cucinotta in "Postman, The (Postino, Il)"
4. Massimo Troisi in "Postman, The (Postino, Il)"
5. Annette Bening in "American President, The"
6. Martin Sheen in "American President, The"
7. Michael Douglas in "American President, The"
8. Michael J. Fox in "American President, The"
9. Noah Emmerich in "Beautiful Girls"
10. Matt Dillon in "Beautiful Girls"
11. Annabeth Gish in "Beautiful Girls"
12. Lauren Holly in "Beautiful Girls"
13. Stacey Dash in "Clueless"
14. Alicia Silverstone in "Clueless"
15. Paul Rudd in "Clueless"
16. Brittany Murphy in "Clueless"
17. Christian Slater in "Bed of Roses"
18. Mary Stuart Masterson in "Bed of Roses"
19. Josh Brolin in "Bed of Roses"
20. Pamela Adlon in "Bed of Roses"
Context: [Document(metadata={'title': 'Postman, The (Postino, Il)', 'genres': ['