# Aerospike

[Aerospike Vector Search](https://aerospike.com/docs/vector)(AVS) is a vector database built on
top of the performant and robust Aerospike database.

This notebook showcases the functionality of the langchain Aerospike VectorStore
integration. Before we get started we need to make sure we have a running AVS instance. Use one of the [available
installation methods](https://aerospike.com/docs/vector/install). 

We will store the IP and Port of your AVS instance to use later on in this demo

In [1]:
PROXIMUS_HOST="<avs-ip>"
PROXIMUS_PORT=5000

## Install Dependencies 
The sentence-transformers dependency is rather large so this could take ~3-5 minutes.

TODO: CHANGE GIT REPO TO OFFICIAL LANGCHAIN

In [2]:
!pip install --upgrade --quiet aerospike-vector-search sentence-transformers git+https://github.com/aerospike/langchain.git@VEC-131-add-aerospike#subdirectory=libs/community

## Download Quotes Dataset

We will download a dataset of ~500k quotes and use a subset of them for semantic search.

In [3]:
!wget https://github.com/aerospike/aerospike-vector-search-examples/raw/7dfab0fccca0852a511c6803aba46578729694b5/quote-semantic-search/container-volumes/quote-search/data/quotes.csv.tgz


--2024-05-09 17:23:42--  https://archive.org/download/quotes_20230625/quotes.csv
Resolving archive.org (archive.org)... 207.241.224.2
Connecting to archive.org (archive.org)|207.241.224.2|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://ia902709.us.archive.org/27/items/quotes_20230625/quotes.csv [following]
--2024-05-09 17:23:42--  https://ia902709.us.archive.org/27/items/quotes_20230625/quotes.csv
Resolving ia902709.us.archive.org (ia902709.us.archive.org)... 207.241.228.209
Connecting to ia902709.us.archive.org (ia902709.us.archive.org)|207.241.228.209|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 144428641 (138M) [text/csv]
Saving to: ‘quotes.csv’


2024-05-09 17:24:35 (2.62 MB/s) - ‘quotes.csv’ saved [144428641/144428641]



## Load the Quotes Into Documents
We will load out dataset of quotes using the DocumentLoader `CSVLoader`. In this case `lazy_load` returns an iterator to more efficiently ingest our quotes. In this example we are only going to load 5000 quotes rather than all 500k.

In [4]:
from langchain_community.document_loaders.csv_loader import CSVLoader
import itertools

NUM_QUOTES = 5000
documents = CSVLoader('./quotes.csv', metadata_columns=["author", "category"]).lazy_load()
documents = list(itertools.islice(documents, NUM_QUOTES)) # Allows us to slice an iterator

In [5]:
print(documents[0])

page_content="quote: I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best." metadata={'source': './quotes.csv', 'row': 0, 'author': 'Marilyn Monroe', 'category': 'attributed-no-source, best, life, love, mistakes, out-of-control, truth, worst'}


## Create your Embedder
Here we are using HuggingFaceEmbeddings and a sentence transformer model "all-MiniLM-L6-v2" to embed our documents so that we can perform a vector search.

In [6]:
from langchain_community.embeddings import HuggingFaceEmbeddings

MODEL_DIM = 384
embedder = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")



## Create an Aerospike Index and Embed Documents

Before we add documents, we need to create an index. In this example, we have some convenience code that checks to see if the expected index is already created. Make sure to replace <proximus-ip> with the IP to your AVS instance.

In [7]:
from langchain_community.vectorstores import Aerospike
from aerospike_vector_search import AdminClient, Client, HostPort
from aerospike_vector_search.types import VectorDistanceMetric

# Here we are using the AVS host and port you configured earlier
seed = HostPort(host=PROXIMUS_HOST, port=PROXIMUS_PORT) 

# The namespace of where to place our vectors. This should match the vector configured in your aerospike.conf file.
NAMESPACE = "test"

# The name of our new index.
INDEX_NAME = "quote-miniLM-L6-v2"

# AVS needs to know which metadata key contains our vector when creating the index and inserting documents.
VECTOR_KEY = "vector" 

client = Client(
    seeds=seed
)
admin_client = AdminClient(
    seeds=seed,
)
index_exists = False

# Check if the index already exists. If not, create it
for index in admin_client.index_list():
    if (
        index["id"]["namespace"] == NAMESPACE
        and index["id"]["name"] == INDEX_NAME
    ):
        index_exists = True
        print(f"{INDEX_NAME} already exists. Skipping creation"}
        break

if not index_exists:
    print(f"{INDEX_NAME} does not exist. Creating index"}
    admin_client.index_create(
        namespace=NAMESPACE,
        name=INDEX_NAME,
        vector_field=VECTOR_KEY,
        vector_distance_metric=VectorDistanceMetric.COSINE,
        dimensions=MODEL_DIM,
        index_meta_data={
            "model": "miniLM-L6-v2",
            "date": "05/04/2024",
            "dim": str(MODEL_DIM),
            "distance": "cosine",
        }
    )

aerospike = Aerospike.from_documents(
    documents,
    embedder,
    client,
    NAMESPACE,
    vector_key=VECTOR_KEY,
    index_name=INDEX_NAME,
)

## Search the Documents
Now that we've inserted our vectors we can now use vector search on our quotes.

In [8]:
query = "A quote about the beauty of the cosmos"
docs = aerospike.similarity_search(query, k=5, index_name=INDEX_NAME, metadata_keys=["_id", "author"])

def print_documents(docs):
    for i, doc in enumerate(docs):
        print("~~~~ Document", i, "~~~~")
        print("auto-generated id:", doc.metadata["_id"])
        print("author: ",doc.metadata["author"])
        print(doc.page_content)
        print("~~~~~~~~~~~~~~~~~~~~\n")
    
print_documents(docs)

~~~~ Document 0 ~~~~
auto-generated id: 7afc2aee-d22b-4527-81a8-c168038ee9ae
author:  Carl Sagan, Cosmos
quote: The Cosmos is all that is or was or ever will be. Our feeblest contemplations of the Cosmos stir us -- there is a tingling in the spine, a catch in the voice, a faint sensation, as if a distant memory, of falling from a height. We know we are approaching the greatest of mysteries.
~~~~~~~~~~~~~~~~~~~~

~~~~ Document 1 ~~~~
auto-generated id: e740c36c-b411-4d7e-bc6a-6f1e18ad93e3
author:  Renee Ahdieh, The Rose & the Dagger
quote: From the stars, to the stars.
~~~~~~~~~~~~~~~~~~~~

~~~~ Document 2 ~~~~
auto-generated id: b09d0587-5703-4c3a-b0b6-05810b517773
author:  Elizabeth Gilbert
quote: The love that moves the sun and the other stars.
~~~~~~~~~~~~~~~~~~~~

~~~~ Document 3 ~~~~
auto-generated id: 931dec9a-30b1-4f2d-a0ce-ad8148cf7094
author:  Dante Alighieri, Paradiso
quote: Love, that moves the sun and the other stars
~~~~~~~~~~~~~~~~~~~~

~~~~ Document 4 ~~~~
auto-generated

## Embedding Additional Quotes as Text
We can also add additional quotes by using `add_texts`.

In [9]:
aerospike = Aerospike(
    client,
    embedder,
    NAMESPACE,
    index_name=INDEX_NAME,
    vector_key=VECTOR_KEY,
)

ids = aerospike.add_texts(
    [
        "quote: Rebellions are built on hope.", 
        "quote: Logic is the beginning of wisdom, not the end.",
        "quote: If wishes were fishes, we’d all cast nets."
    ],
    metadatas=[
        {"author": "Jyn Erso, Rogue One"}, 
        {"author": "Spock, Star Trek"},
        {"author": "Frank Herbert, Dune"},
    ],
)

print("New IDs")
print(ids)

New IDs
['11b37904-b2cc-4f07-8346-b49f256f6d0b', '2c590987-aeb7-44ad-a21a-2645eee86657', '893eeb4f-7580-45f0-8120-785221271395']


## Search Documents Using Max Marginal Relevance Search

We can utilize max marginal relevance search to find vectors that are similar to our query but dissimilar to each other.  In this example, we create a retriever object using `as_retriever`, but this could be done just as easily by calling `aerospike.max_marginal_relevance_search` directly. The search_kargs lambda_mult determines the diversity of our query response. 0 corresponds to maximum diversity and 1 to minimum diversity.

In [10]:
query = "A quote about our favorite four-legged pets"
retriever = aerospike.as_retriever(search_type="mmr", search_kwargs={"fetch_k": 20, "lambda_mult":0.7})
matched_docs = retriever.invoke(query)

print_documents(matched_docs)

~~~~ Document 0 ~~~~
auto-generated id: e3d8ac56-5505-45bc-922d-d3189f7fa3bb
author:  John Grogan, Marley and Me: Life and Love With the World's Worst Dog
quote: Such short little lives our pets have to spend with us, and they spend most of it waiting for us to come home each day. It is amazing how much love and laughter they bring into our lives and even how much closer we become with each other because of them.
~~~~~~~~~~~~~~~~~~~~

~~~~ Document 1 ~~~~
auto-generated id: fcc06179-7c6f-4780-8d9a-d0d9ab3cf346
author:  Colleen Houck, Tiger's Curse
quote: He then put both hands on the door on either side of my head and leaned in close, pinning me against it. I trembled like a downy rabbit caught in the clutches of a wolf. The wolf came closer. He bent his head and began nuzzling my cheek. The problem was…I wanted the wolf to devour me.
~~~~~~~~~~~~~~~~~~~~

~~~~ Document 2 ~~~~
auto-generated id: 15e9a694-de71-47fd-adfc-8179fcdb62a4
author:  Roger A. Caras
quote: Dogs have given us thei

## Search Documents with a Relevance Threshold

Another useful feature is a similarity search with a relevance threshold. Generally, we only want results that are most similar to our query but also within some range of proximity. A relevance of 1 is most similar and a relevance of is most dissimilar.

In [11]:
query = "A quote about stormy weather"
retriever = aerospike.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={'score_threshold': 0.4} # A greater value returns items with more relevance
)
matched_docs = retriever.invoke(query)

print_documents(matched_docs)

~~~~ Document 0 ~~~~
auto-generated id: 796b6eb1-fa79-49f7-ac4d-1adc9fd8f081
author:  Roy T. Bennett, The Light in the Heart
quote: Never lose hope. Storms make people stronger and never last forever.
~~~~~~~~~~~~~~~~~~~~

~~~~ Document 1 ~~~~
auto-generated id: f01592be-d8c4-40f7-96ee-0cad56f6d358
author:  Roy T. Bennett, The Light in the Heart
quote: Difficulties and adversities viciously force all their might on us and cause us to fall apart, but they are necessary elements of individual growth and reveal our true potential. We have got to endure and overcome them, and move forward. Never lose hope. Storms make people stronger and never last forever.
~~~~~~~~~~~~~~~~~~~~

~~~~ Document 2 ~~~~
auto-generated id: c3ad5654-8962-4dcf-b885-75c42769e6be
author:  Vincent van Gogh, The Letters of Vincent van Gogh
quote: There is peace even in the storm
~~~~~~~~~~~~~~~~~~~~

~~~~ Document 3 ~~~~
auto-generated id: 2be8745c-e523-4b24-b355-a38ae469d88d
author:  Edwin Morgan, A Book of Lives
qu

Exception in thread Thread-1612:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.11/threading.py", line 1401, in run
    self.function(*self.args, **self.kwargs)
  File "/opt/conda/lib/python3.11/site-packages/aerospike_vector_search/internal/channel_provider.py", line 91, in _tend
    for node, channel_endpoints in temp_node_channels:
RuntimeError: dictionary changed size during iteration
Exception in thread Thread-1613:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.11/threading.py", line 1401, in run
    self.function(*self.args, **self.kwargs)
  File "/opt/conda/lib/python3.11/site-packages/aerospike_vector_search/internal/channel_provider.py", line 91, in _tend
    for node, channel_endpoints in temp_node_channels:
RuntimeError: dictionary changed size durin

## Clean up

We need to make sure to close our client to release resources and cleanup threads.

In [None]:
admin_client.close()
client.close()

## Ready. Set. Search!

Now that you are up to speed with Aerospike Vector Search's Langchain integration you now have the power of the Aerospike Database and the Langchain ecosystem at your finger tips. Happy building!