#### <div style= "font-family: Cambria; font-weight:bold; letter-spacing: 0px; color:white; font-size:180%; text-align:left;padding:3.0px; background: maroon; border-bottom: 8px solid black" > TABLE OF CONTENTS<br><div>
* [Imports](#1)
* [Introduction](#2)
* [Neo4j Vector Store](#3)
* [Simmilarity Search](#4)
* [Hybrid Search](#5)
* [PLANNED WAY FORWARD](#6) 

In [2]:
import getpass
import os

# Load env

from dotenv import load_dotenv

_ = load_dotenv()

from langchain_community.vectorstores import Neo4jVector
from langchain_openai import OpenAIEmbeddings
from langchain.document_loaders import WikipediaLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings

<a id="2"></a>
# <div style= "font-family: Cambria; font-weight:bold; letter-spacing: 0px; color: white; font-size:120%; text-align:left;padding:3.0px; background: maroon; border-bottom: 8px solid black" > Introduction<br><div>

In this notebook we are going to show how to use Langchain and Neo4J to create a Vector store. We will show both the approach based in 'Semantic simmilarity' and Keywords for this vector store. We will base this notebook in the official langchain [documentation](https://python.langchain.com/v0.2/docs/integrations/vectorstores/neo4jvector/), customizing  and further developing it to make it more comprehensive.

<a id="3"></a>
# <div style= "font-family: Cambria; font-weight:bold; letter-spacing: 0px; color: white; font-size:120%; text-align:left;padding:3.0px; background: maroon; border-bottom: 8px solid black" > Semantic Simmilarity search<br><div>

We will start by creating an index that will allow to perform Semantic Simmilarity search. For this we need a embedding model, that will map our text to a 'high' dimension space.

In [3]:


# Read the wikipedia article
raw_documents = WikipediaLoader(query="The Umbrella Academy TV Show",load_max_docs=3).load()
# Define chunking strategy
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=1000, chunk_overlap=20
)
# Chunk the document
documents = text_splitter.split_documents(raw_documents)
# Remove the summary
for d in documents:
    del d.metadata["summary"]



In [4]:
## Uncomment de wanted embedding model
# embeddings = OpenAIEmbeddings()
model_name = "sentence-transformers/all-MiniLM-L6-v2" # You can specify any sentence-transformer model from the hub
embeddings = HuggingFaceEmbeddings(model_name=model_name)

  warn_deprecated(
  from tqdm.autonotebook import tqdm, trange


In [5]:
documents

[Document(metadata={'title': 'The Umbrella Academy (TV series)', 'source': 'https://en.wikipedia.org/wiki/The_Umbrella_Academy_(TV_series)'}, page_content='The Umbrella Academy is an American superhero television series based on the comic book series of the same name written by Gerard Way, illustrated by Gabriel Bá, and published by Dark Horse Comics. Created for Netflix by Steve Blackman and developed by Jeremy Slater, it revolves around a dysfunctional family of adopted sibling superheroes who reunite to solve the mystery of their father\'s death and the threat of an imminent apocalypse. The series is produced by Borderline Entertainment (season 1–2), Irish Cowboy (season 3), Dark Horse Entertainment, and Universal Content Productions. Netflix gave seasons 1 and 2 a TV-14 rating, while seasons 3 and 4 received a TV-MA rating.\nThe cast features Elliot Page, Tom Hopper, David Castañeda, Emmy Raver-Lampman, Robert Sheehan, Aidan Gallagher, Cameron Britton, Mary J. Blige, John Magaro, A

## Connect to NEO4j

You need credentials and other environment variables in order to connect to neo4j. See the Readme file to get more details.

In [6]:
# Neo4jVector requires the Neo4j database credentials
url = "bolt://localhost:7687"



In [7]:
# The Neo4jVector Module will connect to Neo4j and create a vector index if needed.
#  the database has to exist already, so you need to create it from neo4j.

db = Neo4jVector.from_documents(
    documents, OpenAIEmbeddings(), url=os.environ["NEO4J_URL"], username=os.environ["NEO4J_USERNAME"], password=os.environ["NEO4J_PASSWORD"] ,database="vectordb"
)

With this functionality we already have a new Vector Store index created in our neo4j local instance, and we are ready to query it. We need to know that once the index is created, it gets defined base of the embedding length, so we couldn't change the embedding model or size without redefining it. 

## Testing the retrieval

In [8]:
query = "Who are the main characters of The Umbrella Academy?"
docs_with_score = db.similarity_search_with_score(query, k=2)

In [9]:
for doc, score in docs_with_score:
    print("-" * 80)
    print("Score: ", score)
    print(doc.page_content)
    print("-" * 80)

--------------------------------------------------------------------------------
Score:  0.9384150505065918
The Umbrella Academy is an American superhero television series based on the comic book series of the same name written by Gerard Way, illustrated by Gabriel Bá, and published by Dark Horse Comics. Created for Netflix by Steve Blackman and developed by Jeremy Slater, it revolves around a dysfunctional family of adopted sibling superheroes who reunite to solve the mystery of their father's death and the threat of an imminent apocalypse. The series is produced by Borderline Entertainment (season 1–2), Irish Cowboy (season 3), Dark Horse Entertainment, and Universal Content Productions. Netflix gave seasons 1 and 2 a TV-14 rating, while seasons 3 and 4 received a TV-MA rating.
The cast features Elliot Page, Tom Hopper, David Castañeda, Emmy Raver-Lampman, Robert Sheehan, Aidan Gallagher, Cameron Britton, Mary J. Blige, John Magaro, Adam Godley, Colm Feore, Justin H. Min, Ritu Arya, 

<a id="4"></a>
# <div style= "font-family: Cambria; font-weight:bold; letter-spacing: 0px; color: white; font-size:120%; text-align:left;padding:3.0px; background: maroon; border-bottom: 8px solid black" > Hybrid search<br><div>

### From existing Graph functionality allows to initialize the embeddings from a already generated graph

In [10]:
# Now we initialize from existing graph
existing_graph = Neo4jVector.from_existing_graph(
    embedding=OpenAIEmbeddings(),
    url=url,
    username = os.environ["NEO4J_USERNAME"], 
    password=os.environ["NEO4J_PASSWORD"],
    database="vectordb",
    # index_name="person_index",
    node_label="Chunk",
    text_node_properties=["name", "location"],
    embedding_node_property="embedding",
    search_type="hybrid",
    keyword_index_name= "keyword"
)
# result = existing_graph.similarity_search("Slovenia", k=1)

In [11]:
existing_graph.similarity_search("What is the Umbrella Academy?")

[Document(metadata={'text': 'The Umbrella Academy is an American comic book series created and written by Gerard Way and illustrated by Gabriel Bá. It follows a dysfunctional family of adopted superhero siblings with bizarre powers attempting both to save the world and find their place within it. Published by Dark Horse Comics, the comic is released as limited series, typically lasting six issues. Since 2007, three volumes have been published, as have two spin-offs. The fourth volume of the main series is currently in development.\nThe comic has garnered a close following and has been praised by critics, with the first limited series, Apocalypse Suite, winning of the 2007 Eisner Award for Best Finite Series/Limited Series. A popular television adaptation ran on Netflix for four seasons from 2019 to 2024.\n\n\n== Synopsis ==\n\n\n=== Plot summary ===\nThe titular team of The Umbrella Academy is described as a "dysfunctional family of superheroes". In the mid-20th century, at the instant

In [12]:
# # The Neo4jVector Module will connect to Neo4j and create a vector and keyword indices if needed.
# hybrid_db =  Neo4jVector.from_documents(
#     documents,
#     OpenAIEmbeddings(),
#     url=os.environ["NEO4J_URL"],
#     username=os.environ["NEO4J_USERNAME"],
#     password=os.environ["NEO4J_PASSWORD"],
#     database="vectordb",
#     search_type = "hybrid"
# )

In [15]:
retriever = existing_graph.as_retriever()
retriever.invoke("Who is the main character of the Umbrella Academy")[0]

Document(metadata={'text': 'The Umbrella Academy is an American comic book series created and written by Gerard Way and illustrated by Gabriel Bá. It follows a dysfunctional family of adopted superhero siblings with bizarre powers attempting both to save the world and find their place within it. Published by Dark Horse Comics, the comic is released as limited series, typically lasting six issues. Since 2007, three volumes have been published, as have two spin-offs. The fourth volume of the main series is currently in development.\nThe comic has garnered a close following and has been praised by critics, with the first limited series, Apocalypse Suite, winning of the 2007 Eisner Award for Best Finite Series/Limited Series. A popular television adaptation ran on Netflix for four seasons from 2019 to 2024. In 2019, Dark Horse Comics signed a collaboration with Studio71 to make a card game based on The Umbrella Academy.\n\n\n== Synopsis ==\n\n\n=== Plot summary ===\nThe titular team of The

In [16]:
existing_graph.retrieve_existing_fts_index()

'Chunk'

This tells us the type of node that is being embedded. Because, the queries based in this vectors, are made agains a specific type of node always. This was specified under the node_label argument in the index definition.

In [19]:
existing_graph.similarity_search_with_relevance_scores(query="Madeup words indav 9owbsh",k=10)

[(Document(metadata={'text': 'The Umbrella Academy is an American comic book series created and written by Gerard Way and illustrated by Gabriel Bá. It follows a dysfunctional family of adopted superhero siblings with bizarre powers attempting both to save the world and find their place within it. Published by Dark Horse Comics, the comic is released as limited series, typically lasting six issues. Since 2007, three volumes have been published, as have two spin-offs. The fourth volume of the main series is currently in development.\nThe comic has garnered a close following and has been praised by critics, with the first limited series, Apocalypse Suite, winning of the 2007 Eisner Award for Best Finite Series/Limited Series. A popular television adaptation ran on Netflix for four seasons from 2019 to 2024.\n\n\n== Synopsis ==\n\n\n=== Plot summary ===\nThe titular team of The Umbrella Academy is described as a "dysfunctional family of superheroes". In the mid-20th century, at the instan

Here we can see other of the main points of performing a hybrid search. In this case, both simmilarities are being normalized so the values are between 0 and 1, and both methods can be compared. We can see the real queries being made in the [docs](https://python.langchain.com/v0.2/api_reference/community/vectorstores/langchain_community.vectorstores.neo4j_vector.IndexType.html#langchain_community.vectorstores.neo4j_vector.IndexType) (_get_search_index_query)