# FalkorDB Vector Index
<a href="https://docs.falkordb.com/" target="_blank">FalkorDB</a> is an open-source graph database with integrated support for vector similarity search

it supports:
- approximate nearest neighbor search
- Euclidean similarity & cosine similiarity
- Hybrid search combining vector and keyword searches

This notebook shows how to use the FalkorDB vector index (`FalkorDB`)

See the <a href="https://docs.falkordb.com/" target="_blank">installation instruction</a>



In [1]:
# Pip install necessary package
%pip install --quiet git+https://github.com/openai/whisper.git
%pip install --quiet langchain-openai langchain-community
%pip install --quiet falkordb

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.3.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.3.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.3.1 -> 24.2
[notice] To update, run: python.exe -m pip install --upgrade pip


We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key

In [None]:
import getpass
import os

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

In [3]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores.falkordb_vector import FalkorDBVector
from langchain_core.documents import Document
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

In [4]:
loader = TextLoader("../../how_to/state_of_the_union.txt", encoding='utf-8')

documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()

You can use FalkorDBVector locally with docker. See <a href="https://docs.falkordb.com/" target="_blank">installation instruction</a>

In [5]:
host = 'localhost'
port = 6379

Or you can use FalkorDBVector with <a href="https://app.falkordb.cloud">FalkorDB Cloud</a>

In [6]:
# E.g
# host = "r-6jissuruar.instance-zwb082gpf.hc-v8noonp0c.europe-west1.gcp.f2e0a955bb84.cloud"
# port = 62471
# username = "falkordb" # SET ON FALKORDB CLOUD
# password = "password" # SET ON FALKORDB CLOUD

## Similarity Search with Cosine Distance (Default)



In [7]:
# The FalkorDBVector Module will connect to FalkorDB and create a vector index if needed.
db = FalkorDBVector.from_documents(
    docs, OpenAIEmbeddings(), host=host, port=port,
)

Database name:  ZeCf
Metadatas:  [{'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'st

 Let's check if all the indexes where created properly. `process_data` is function that specifically processess the `CALL db.indexes()` query result

In [8]:
from langchain_community.vectorstores.falkordb_vector import process_index_data

result = db._query(
query = """
CALL db.indexes()
"""
)

process_index_data(result)


[{'entity_label': 'Chunk',
  'entity_property': 'embedding',
  'entity_type': 'NODE',
  'index_type': 'VECTOR',
  'index_status': 'OPERATIONAL',
  'index_dimension': 1536,
  'index_similarityFunction': 'euclidean'}]

You should see something similiar too;

`
[{'entity_label': 'Chunk', ...'index_similarityFunction': 'euclidean'}]
`

and if you do. Yay! we successfully created a vectorstore with our `docs` variable. But let's still check if our document is represented properly in the vectorestore. That is as nodes.

In [9]:
result2 = db._query(
    query="MATCH (n) RETURN n"
)
for node in result2:
    print(node[0])

(:Chunk{embedding:[-0.00357032613828778, -0.0104853445664048, -0.0186745300889015, -0.0178781747817993, 0.00581007543951273, 0.0200283341109753, 0.0149316610768437, -0.00952971819788218, -0.00311076268553734, -0.00655665853992105, 0.0153298387303948, 0.00899881403893232, -0.0209175981581211, 0.000946501386351883, 0.00198259274475276, 0.00631443364545703, 0.0182365346699953, -0.0147325722500682, 0.029704051092267, -0.0223377645015717, 0.00977526046335697, -0.0132062248885632, 0.0153563842177391, 0.0124563239514828, -0.000565744063351303, 0.0110494289547205, 0.0369774289429188, -0.0340574607253075, 0.0191788896918297, -0.0244746524840593, -0.012409869581461, -0.0296775065362453, -0.0239702928811312, -0.00475490465760231, -0.0248462837189436, -0.0209441427141428, -0.0133854048326612, -0.0122041441500187, 0.00355373532511294, -0.0184887144714594, 0.00743928551673889, -0.00229781679809093, -0.0111157922074199, 0.00570057658478618, -0.0247401036322117, 0.00558775942772627, -0.019656702876091

In [10]:
query = "What did the president say about Ketanji Brown Jackson"
docs_with_score = db.similarity_search_with_score(query, k=2)

In [11]:
for doc, score in docs_with_score:
    print("-" * 80)
    print("Score: ", score)
    print(doc.page_content)
    print("-" * 80)

--------------------------------------------------------------------------------
Score:  0.607930839061737
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. 

Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. 

One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. 

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
--------------------------------------------------------------------------------
------------------------

## Working with vectorstore

Above, we created a vectorstore from scratch. However, often times we want to work with an existing vectorstore. In order to do that, we can initialize it directly.

<b>NOTE:</b> FalkorDB does not support index names for any <a href="https://docs.falkordb.com/cypher/indexing#vector-indexing" target='_blank'>index type</a> it has. Indexes are represented in FalkorDBVector by their Node labels or Relation types

In [12]:
node_label = "Chunk" # default node label

store = FalkorDBVector.from_existing_index(
    OpenAIEmbeddings(),
    host=host,
    port=port,
    node_label=node_label
)

Database name:  ZeCf


We can also initialize a vectorstore from existing graph using the `from_existing_graph` method. This method pulls relevant text information from the database, and calculates and stores the text embeddings back to the database

In [13]:
# first we create sample data in graph
result = store._query(
    "MERGE (p:Person {name: 'Tomaz', location:'Slovenia', hobby:'Bicycle', age: 33}) RETURN p"
)

As usual we check if our sample data was created in the graph 🧐

In [14]:
print(result[0][0])

(:Person{age:33,hobby:"Bicycle",location:"Slovenia",name:"Tomaz"})


`(:Person{age:33,...,name:"Tomaz"})` Perfect!

In [15]:
# Now we initialize from existing graph (using the graph/database's name)
# NOTE: All graphs are databases but not all databases are graphs
existing_graph = FalkorDBVector.from_existing_graph(
    embedding=OpenAIEmbeddings(),
    database=store.database_name,
    host=host,
    port=port,
    node_label="Person",
    text_node_properties=["name", "location"],
    embedding_node_property="embedding",
)


Database name:  ZeCf
Making index for node label: Person


`Making index for node label: Person` If your output contains that and no errors you on the right track. Let's make our search now

In [16]:
result = existing_graph.similarity_search("Slovenia", k=1)
result[0]

Docs_and_scores:  [(Document(metadata={'hobby': 'Bicycle', 'age': 33}, page_content='\nname: Tomaz\nlocation: Slovenia'), 0.496724128723145)]


Document(metadata={'hobby': 'Bicycle', 'age': 33}, page_content='\nname: Tomaz\nlocation: Slovenia')

`Document...location: Slovenia')` Nice! You are getting the hang of this

FalkorDB also supports relationship vector indexes, where an embedding is stored as a relationship property and indexed. A relationship vector index cannot be populated via LangChain, but you can connect it to existing relationship vector.

In [17]:
# First we create sample data and index in graph
embedding = OpenAIEmbeddings().embed_query("example text")
embedding_dimension = len(list(embedding))
embedding_node_property = "embedding"
relation_type = "FRIEND"


store._query(
    "MERGE (p1:Person {name: 'Tomaz'}) "
    "MERGE (p2:Person {name:'Leann'}) "
    f"MERGE (p1)-[:`{relation_type}`"
    "{text:'example text', embedding:vecf32($embedding)}]->(p2)",
    params={f"{embedding_node_property}": embedding},
)

[]

As usual we check if our data was created

In [18]:
result = store._query(
    "MATCH (p:Person)"
    "OPTIONAL MATCH (p)-[r:FRIEND]->(f:Person)"
    "RETURN p, r, f"
)
for person in result[0]:
    print(person)

(:Person{age:33,embedding:[0.00747425435110927, 0.000132322296849452, -0.0107339518144727, -0.0228112936019897, 0.0113068679347634, 0.0612691305577755, -0.0376939512789249, 0.00152613082900643, 0.010694439522922, -0.0291858110576868, 0.00997664779424667, 0.0075664478354156, 0.00821180175989866, -0.0169504247605801, 0.00377993145957589, -0.0147114405408502, 0.00925885606557131, -0.00408285250887275, 0.0157123971730471, -0.0242600478231907, -0.0172401741147041, 0.00277403509244323, 0.0178065057843924, -0.0109051680192351, -0.000319178652716801, -0.0101610347628593, 0.0217708237469196, -0.0154621582478285, 0.0200981721282005, -0.0233117714524269, 0.001980512868613, -0.0227717813104391, -0.0192947722971439, -0.0190181918442249, -0.0120905125513673, -0.0101478649303317, 0.00521222222596407, -0.00995689257979393, 0.0207303557544947, -0.032583799213171, 0.0191498957574368, 0.000761419127229601, -0.00137878593523055, -0.0334003679454327, -0.01348658464849, 0.0242337062954903, 0.004932349547743

`...(:Person{name:"Leann"})` Nice! our data was created successfully. Next we...

In [19]:
# Create a vector index
store.create_new_index_on_relationship(
    relation_type=relation_type,
    embedding_node_property=embedding_node_property,
    embedding_dimension=embedding_dimension
)

In [20]:
relationship_vector = FalkorDBVector.from_existing_relationship_index(
    OpenAIEmbeddings(),
    host=host,
    port=port,
    relation_type=relation_type,
    text_node_property="text"
)

Database name:  ZeCf


In [21]:
relationship_vector.retrieve_existing_relationship_index()

(1536, 'RELATIONSHIP', 'FRIEND', 'embedding')

In [22]:
relationship_vector.similarity_search("Example")

Docs_and_scores:  [(Document(metadata={'text': 'example text'}, page_content='example text'), 0.508674323558807)]


[Document(metadata={'text': 'example text'}, page_content='example text')]

## Add documents
We can add documents to the existing vectorstore

In [23]:
store.add_documents([Document(page_content="foo")])

Database name:  ZeCf
Metadatas:  [{}]


['acbd18db4cc2f85cedef654fccc4a4d8']

In [24]:
docs_with_score = store.similarity_search_with_score("foo")

In [25]:
docs_with_score[0]

(Document(metadata={'text': 'foo', 'id': 'acbd18db4cc2f85cedef654fccc4a4d8'}, page_content='foo'),
 0.00301193865016103)

## Customize response with retrieval query

You can also customize responses by using a custom Cypher snippet that can fetch other information from the graph. Under the hood, the final Cypher statement is constructed like so;

    read_query = (
        "CALL db.idx.vector.queryNodes($entity_label, $entity_property, $k, vecf32($embedding)) "
        "YIELD node, score "
    ) + retrieval_query


The retrieval query must return the following three columns:
- `text`: Union[str, Dict] = Value used to populate page_content of a document
- `source`: Float = Similarity score
- `metadata`: Dict = Additional metadata of a document


In [26]:
retrieval_query = """
RETURN "Name:" + node.name AS text, score, {foo:"bar"} AS metadata
"""
retrieval_example = FalkorDBVector.from_existing_index(
    OpenAIEmbeddings(),
    host=host,
    port=port,
    node_label='Person',
    retrieval_query=retrieval_query,
)

retrieval_example.similarity_search("Foo", k=1)

Database name:  ZeCf
Docs_and_scores:  [(Document(metadata={'foo': 'bar'}, page_content='Name:Tomaz'), 0.658456861972809)]


[Document(metadata={'foo': 'bar'}, page_content='Name:Tomaz')]

Here is an example of passing all node properties except for embedding as a dictionary to text column

## Hybrid search (vector + keyword)

FalkorDB integrates both vector and keyword indexes, which allows you to use a hybrid search approach

In [27]:
# The FalkorDBVector Module will connect to FalkorDB and create a vector and keyword indices if needed.
hybrid_db = FalkorDBVector.from_documents(
    docs,
    OpenAIEmbeddings(),
    host=host,
    port=port,
    search_type="hybrid",
)

Database name:  ZeCf
Metadatas:  [{'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'state_of_union.txt'}, {'source': 'st

To load the hybrid search from existing indexes you have to provide the label of node that is being searched 

In [28]:
node_label = 'Chunk' #default node label

store = FalkorDBVector.from_existing_index(
    OpenAIEmbeddings(),
    node_label=node_label,
    host=host,
    port=port,
    search_type="hybrid",
)

Database name:  ZeCf


## Retriever options
This section shows how to use `FalkorDBVector` as a retriever

In [29]:
retriever = store.as_retriever()
retriever.invoke(query)[0]

Docs_and_scores:  [(Document(metadata={'text': 'A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n\nWe can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.  \n\nWe’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.  \n\nWe’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n\nWe’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', 'id': '782fb46f146c20294

Document(metadata={'text': 'A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n\nWe can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.  \n\nWe’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.  \n\nWe’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n\nWe’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', 'id': '782fb46f146c2029453b8532e5502691', 's

## Question Answering with Sources
This section goes how to do question-answering with sources over an Index. It does this by using
`RetrievalQAWithSourcesChain`, which does the lookup of documents from an Index.

In [30]:
from langchain.chains import RetrievalQAWithSourcesChain
from langchain_openai import ChatOpenAI

In [31]:
chain = RetrievalQAWithSourcesChain.from_chain_type(
    ChatOpenAI(temperature=0), chain_type="stuff", retriever=retriever
)

In [32]:
chain.invoke(
    {"question": "What did the president say about Justice Breyer"},
    return_only_outputs=True,
)

Docs_and_scores:  [(Document(metadata={'text': 'A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. \n\nAnd if we are to advance liberty and justice, we need to secure the Border and fix the immigration system. \n\nWe can do both. At our border, we’ve installed new technology like cutting-edge scanners to better detect drug smuggling.  \n\nWe’ve set up joint patrols with Mexico and Guatemala to catch more human traffickers.  \n\nWe’re putting in place dedicated immigration judges so families fleeing persecution and violence can have their cases heard faster. \n\nWe’re securing commitments and supporting partners in South and Central America to host more refugees and secure their own borders.', 'id': '782fb46f146c20294

{'answer': 'The president honored Justice Stephen Breyer for his service.\n',
 'sources': 'state_of_union.txt'}