# Cloudflare Vectorize Walkthrough

This notebook demonstrates Cloudflare Vectorize's functionality via the LangChain community integration.

In [1]:
import json
import asyncio
import warnings
from uuid import uuid4
import os
from dotenv import load_dotenv

warnings.filterwarnings('ignore')

from langchain_community.embeddings.cloudflare_workersai import CloudflareWorkersAIEmbeddings
from langchain_community.vectorstores.cloudflare_vectorize import CloudflareVectorize

from langchain_community.document_loaders import WikipediaLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Setup/Params

In [2]:
# name your vectorize index
vectorize_index_name = f"test-langchain-{uuid4().hex}"

## Embeddings

For storage of embeddings, semantic search and retrieval, you must embed your raw values as embeddings.  Specify an embedding model, one available on WorkersAI

[https://developers.cloudflare.com/workers-ai/models/](https://developers.cloudflare.com/workers-ai/models/)

In [3]:
MODEL_WORKERSAI = "@cf/baai/bge-large-en-v1.5"

## Raw Values with D1

Vectorize only stores embeddings, metadata and namespaces. If you want to store and retrieve raw values, you must leverage Cloudflare's SQL Database D1.

You can create a database here and retrieve its id:

[https://dash.cloudflare.com/YOUR-ACCT-NUMBER/workers/d1

In [4]:
# provide the id of your D1 Database
d1_database_id = "8ce9ce08-8961-475c-98fb-1ef0e6e4ca40"

## API Tokens

This Python package is a wrapper around Cloudflare's REST API.  To interact with the API, you need to provid an API token with the appropriate privileges.

You can create and manage API tokens here:

https://dash.cloudflare.com/YOUR-ACCT-NUMBER/api-tokens

In [5]:
load_dotenv(".env");

**Note:**
CloudflareVectorize depends on WorkersAI, Vectorize (and D1 if you are using it to store and retrieve raw values).

While you can create a single `api_token` with Edit privileges to all needed resources (WorkersAI, Vectorize & D1), you may want to follow the principle of "least privilege access" and create separate API tokens for each service


In [6]:
cf_acct_id = os.getenv("cf_acct_id")

# single token with WorkersAI, Vectorize & D1
api_token = os.getenv("api_token")

# OR, separate tokens with access to each service
cf_vectorize_token = os.getenv("cf_vectorize_token")
cf_d1_token = os.getenv("d1_api_token")

# Documents

For this example, we will use LangChain's Wikipedia loader to pull an article about Cloudflare.  We will store this in Vectorize and query its contents later.

In [7]:
docs = WikipediaLoader(query="Cloudflare", load_max_docs=2).load()

We will then create some simple chunks with metadata based on the chunk sections.

In [8]:
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size=100,
    chunk_overlap=20,
    length_function=len,
    is_separator_regex=False,
)
texts = text_splitter.create_documents([docs[0].page_content])

running_section = ""
for idx, text in enumerate(texts):
    if text.page_content.startswith("="):
        running_section = text.page_content
        running_section = running_section.replace("=", "").strip()
    else:
        if running_section == "":
            text.metadata = {"section": "Introduction"}
        else:
            text.metadata = {"section": running_section}

These chunks look like this:


In [9]:
print(texts[0], "\n\n", texts[-1])

page_content='Cloudflare, Inc., is an American company that provides content delivery network services,' metadata={'section': 'Introduction'} 

 page_content='attacks, Cloudflare ended up being attacked as well; Google and other companies eventually' metadata={'section': 'DDoS mitigation'}


# Embeddings

In this example, we will create some embeddings using an embeddings model from WorkersAI and the `CloudflareWorkersAIEmbeddings` class from LangChain.

This will instantiate that "embedder" for later use.


In [10]:
cf_ai_token = os.getenv("cf_ai_token")  # needed if you want to use workersAI for embeddings

embedder = CloudflareWorkersAIEmbeddings(
    account_id=cf_acct_id,
    api_token=cf_ai_token,
    model_name=MODEL_WORKERSAI
)

# CloudflareVectorize Class

Now we can create the CloudflareVectorize instance.  Here we passed:

* The `embedding` instance from earlier
* The account ID
* A global API token for all services (WorkersAI, Vectorize, D1)
* Individual API tokens for each service

In [11]:
cfVect = CloudflareVectorize(
    embedding=embedder,
    account_id=cf_acct_id,
    api_token=api_token,  #(Optional if using service-specific token)
    d1_api_token=cf_d1_token,  #(Optional if using global token)
    vectorize_api_token=cf_vectorize_token,  #(Optional if using global token)
    d1_database_id=d1_database_id,  #(Optional if not using D1)
)

**Note:** These service-specific tokens (if provided) will take preference over a global token.  You could provide these instead of a global token.


# Cleanup
Before we get started, let's delete any `test-langchain*` indexes we have for this walkthrough

In [12]:
# depending on your notebook environment you might need to include:
# import nest_asyncio
# nest_asyncio.apply()

arr_indexes = cfVect.list_indexes()
arr_indexes = [x for x in arr_indexes if "test-langchain" in x.get("name")]
arr_async_requests = [
    cfVect.adelete_index(index_name=x.get("name"))
    for x in arr_indexes
]
await asyncio.gather(*arr_async_requests);

## Gotchyas

A few "gotchyas" are shown below for various missing token/parameter combinations

D1 Database ID provided but no "global" `api_token` and no `d1_api_token`

In [13]:
try:
    cfVect = CloudflareVectorize(
        embedding=embedder,
        account_id=cf_acct_id,
        # api_token=api_token, #(Optional if using service-specific token)
        ai_api_token=cf_ai_token,  #(Optional if using global token)
        # d1_api_token=cf_d1_token,  #(Optional if using global token)
        vectorize_api_token=cf_vectorize_token,  #(Optional if using global token)
        d1_database_id=d1_database_id,  #(Optional if not using D1)
    )
except Exception as e:
    print(str(e))

`d1_database_id` provided, but no global `api_token` provided and no `d1_api_token` provided.


No "global" `api_token` provided and either missing `ai_api_token` or `vectorize_api_token`

In [14]:
try:
    cfVect = CloudflareVectorize(
        embedding=embedder,
        account_id=cf_acct_id,
        # api_token=api_token, #(Optional if using service-specific token)
        # ai_api_token=cf_ai_token,  #(Optional if using global token)
        d1_api_token=cf_d1_token,  #(Optional if using global token)
        vectorize_api_token=cf_vectorize_token,  #(Optional if using global token)
        d1_database_id=d1_database_id,  #(Optional if not using D1)
    )
except Exception as e:
    print(str(e))

# Creating an Index

Let's start off this example by creating and index (and first deleting if it exists).  If the index doesn't exist we will get a an error from Cloudflare telling us so.

In [15]:
%%capture

try:
    cfVect.delete_index(index_name=vectorize_index_name)
except Exception as e:
    print(e)

In [16]:
r = cfVect.create_index(
    index_name=vectorize_index_name
)

In [17]:
print(r)

{'created_on': '2025-04-01T15:51:56.961142Z', 'modified_on': '2025-04-01T15:51:56.961142Z', 'name': 'test-langchain-4453e5b35b734508b15a9f5804564148', 'description': '', 'config': {'dimensions': 1024, 'metric': 'cosine'}}


# Listing Indexes

Now, we can list our indexes on our account

In [18]:
indexes = cfVect.list_indexes()
indexes = [x for x in indexes if "test-langchain" in x.get("name")]
print(indexes)

[{'created_on': '2025-04-01T15:51:56.961142Z', 'modified_on': '2025-04-01T15:51:56.961142Z', 'name': 'test-langchain-4453e5b35b734508b15a9f5804564148', 'description': '', 'config': {'dimensions': 1024, 'metric': 'cosine'}}]


# Get Index
We can also get certain indexes and retrieve more granular information about an index

In [19]:
r = cfVect.get_index(index_name=vectorize_index_name)
print(r)

{'created_on': '2025-04-01T15:51:56.961142Z', 'modified_on': '2025-04-01T15:51:56.961142Z', 'name': 'test-langchain-4453e5b35b734508b15a9f5804564148', 'description': '', 'config': {'dimensions': 1024, 'metric': 'cosine'}}


This call returns a `processedUpToMutation` which can be used to track the status of operations such as creating indexes, adding or deleting records.

In [20]:
r = cfVect.get_index_info(index_name=vectorize_index_name)
print(r)

{'dimensions': 1024, 'vectorCount': 0}


# Adding Metadata Indexes

It is common to assist retrieval by supplying metadata filters in quereies.  In Vectorize, this is accomplished by first creating a "metadata index" on your Vectorize Index.  We will do so for our example by creating one on the `section` field in our documents.

**Reference:** [https://developers.cloudflare.com/vectorize/reference/metadata-filtering/](https://developers.cloudflare.com/vectorize/reference/metadata-filtering/)


In [21]:
r = cfVect.create_metadata_index(
    property_name="section",
    index_type="string",
    index_name=vectorize_index_name,
    wait=True
)
print(r)

{'mutationId': 'd72b809d-68a3-4a51-9378-64cfab1b6121'}


# Listing Metadata Indexes

In [22]:
r = cfVect.list_metadata_indexes(
    index_name=vectorize_index_name
)
print(r)

[{'propertyName': 'section', 'indexType': 'String'}]


# Adding Documents

Now we will add documents to our Vectorize Index.

**Note:**
Adding embeddings to Vectorize happens `asyncronously`, meaning there will be a small delay between adding the embeddings and being able to query them.  By default `add_documents` has a `wait=True` parameter which waits for this operation to complete before returning a response.  If you do not want the program to wait for embeddings availability, you can set this to `wait=False`.


In [23]:
r = cfVect.add_documents(
    index_name=vectorize_index_name,
    documents=texts,
    wait=True
)

In [24]:
print(json.dumps(r)[:300])

["0b55e4ed-63ca-45fd-bca6-86ab9840c0bd", "86beb57b-af02-4039-b172-eb5179781be8", "4dab0e0f-20ae-4ba8-a777-86b2024b2d26", "c66d977e-1357-4b4b-b45b-63cbd050c877", "f48dd942-c3fc-443b-ab22-82d116c5412b", "344bde04-beb0-4aef-9f1b-801506413d46", "6eef3e5b-ae8a-40cd-a64a-1ed68a4da975", "be2cfdf2-b06d-463f


# Query/Search

We will do some searches on our embeddings.  We can specify our search `query` and the top number of results we want with `k`.


In [25]:
query_documents = cfVect.similarity_search(
    index_name=vectorize_index_name,
    query="Workers AI",
    k=100
)

print(f"{len(query_documents)} results:\n{query_documents[0:2]}")

20 results:
[Document(id='c64e0a83-c54e-46f1-ac40-6c64a5c95659', metadata={'section': 'Artificial intelligence'}, page_content="In 2023, Cloudflare launched Workers AI, a framework allowing for use of Nvidia GPU's within"), Document(id='ee8e3742-ef2f-4b4b-ac2a-dc0baf54bac4', metadata={'section': 'Artificial intelligence'}, page_content='based on queries by leveraging Workers AI.Cloudflare announced plans in September 2024 to launch a')]


## Output

If you want to return metadata you can pass `return_metadata='all' | 'indexed'`.  The default is `none` or no metadata returned.

If you want to return the embeddings values, you can pass `return_values=True`.  The default is `False`

**Note:**
If you pass non-default values for either of these, the results will be limited to 20.

[https://developers.cloudflare.com/vectorize/platform/limits/](https://developers.cloudflare.com/vectorize/platform/limits/)

In [26]:
query_documents = cfVect.similarity_search(
    index_name=vectorize_index_name,
    query="Workers AI",
    return_values=True,
    return_metadata='all',
    k=100
)

In [27]:
print(f"{len(query_documents)} results:\n{str(query_documents[0])}")

20 results:
page_content='In 2023, Cloudflare launched Workers AI, a framework allowing for use of Nvidia GPU's within' metadata={'section': 'Artificial intelligence'}


If you'd like the similarity `scores` to be returned, you can use `similarity_search_with_score`


In [28]:
query_documents, query_scores = \
    cfVect.similarity_search_with_score(
        index_name=vectorize_index_name,
        query="Workers AI",
        k=100,
        return_metadata="all",
    )

In [29]:
print(query_documents[:4])

[Document(id='c64e0a83-c54e-46f1-ac40-6c64a5c95659', metadata={'section': 'Artificial intelligence'}, page_content="In 2023, Cloudflare launched Workers AI, a framework allowing for use of Nvidia GPU's within"), Document(id='ee8e3742-ef2f-4b4b-ac2a-dc0baf54bac4', metadata={'section': 'Artificial intelligence'}, page_content='based on queries by leveraging Workers AI.Cloudflare announced plans in September 2024 to launch a'), Document(id='e9e11d8e-f699-41dd-9d20-8ec684816d67', metadata={}, page_content='=== Artificial intelligence ==='), Document(id='c2fff29f-f7dc-49bf-a878-ff3318ecf8a3', metadata={'section': 'Artificial intelligence'}, page_content='To build automatic bot detector models, the company analyzed AI bots and crawler traffic.The')]


In [30]:
print(query_scores[:4])

[0.7912225, 0.7665613, 0.7359264, 0.6721707]


## Including D1 for "Raw Values"
All of the `add` and `search` methods on CloudflareVectorize support a `include_d1` parameter (default=True).

This is to configure whether you want to store/retrieve raw values.

If you do not want to use D1 for this, you can set this to `include=False`.  This will return documents with an empty `page_content` field.

In [31]:
query_documents, query_scores = \
    cfVect.similarity_search_with_score(
        index_name=vectorize_index_name,
        query="california",
        k=100,
        return_metadata="all",
        include_d1=False
    )

In [32]:
print(f"{len(query_documents)} results:\n{query_scores[0]} - {str(query_documents[0])[:900]}")

20 results:
0.6114662 - page_content='' metadata={'section': 'Introduction'}


## Searching with Metadata Filtering

As mentioned before, Vectorize supports filtered search via filtered on indexes metadata fields.  Here is an example where we search for `Introduction` values within the indexed `section` metadata field.

More info on searching on Metadata fields is here: [https://developers.cloudflare.com/vectorize/reference/metadata-filtering/](https://developers.cloudflare.com/vectorize/reference/metadata-filtering/)


In [33]:
query_documents, query_scores = \
    cfVect.similarity_search_with_score(
        index_name=vectorize_index_name,
        query="California",
        k=100,
        md_filter={"section": "Introduction"},
        return_metadata="all",
        return_values=True
    )

In [34]:
print(f"{len(query_documents)} results:\n{query_scores[0]} - {str(query_documents[0])[:900]}")

6 results:
0.6114662 - page_content='and other services. Cloudflare's headquarters are in San Francisco, California. According to' metadata={'section': 'Introduction'}


## Search by IDs
We can also retrieve specific records for specific IDs

In [35]:
sample_ids = [x.id for x in query_documents]

In [36]:
query_documents = cfVect.get_by_ids(
    index_name=vectorize_index_name,
    ids=sample_ids
)

In [37]:
print(f"{len(query_documents)} results:\n{str(query_documents[0])[:300]}")

6 results:
page_content='and other services. Cloudflare's headquarters are in San Francisco, California. According to' metadata={'section': 'Introduction', '_namespace': None}


## Search by Namespace
We can also search for vectors by `namespace`.  We just need to add it to the `namespaces` array when adding it to our vector database.

In [38]:
from langchain_core.documents import Document
import uuid

namespace_name = f"test-namespace-{uuid.uuid4().hex[:8]}"

new_documents = [
    Document(
        page_content="This is a new namespace specific document!",
        metadata={"section": "Namespace Test1"},
    ),
    Document(
        page_content="This is another namespace specific document!",
        metadata={"section": "Namespace Test2"},
    )
]

r = cfVect.add_documents(
    index_name=vectorize_index_name,
    documents=new_documents,
    namespaces=[namespace_name * len(new_documents)],
    wait=True
)

When you return metadata with your queries, the namespace will be included in the `_namespace` field along with your other metadata.

In [39]:
cfVect.similarity_search(
    index_name=vectorize_index_name,
    query="California",
    # return_values=True,
    # return_metadata='all',
    namespace=namespace_name,
)

[]

# Upserts

Vectorize supports Upserts which you can perform by setting `upsert=True`.



In [40]:
query_documents[0].page_content = "Updated: " + query_documents[0].page_content

In [41]:
query_documents[0]

Document(id='c66d977e-1357-4b4b-b45b-63cbd050c877', metadata={'section': 'Introduction', '_namespace': None}, page_content="Updated: and other services. Cloudflare's headquarters are in San Francisco, California. According to")

In [42]:
new_document_id = "12345678910"
new_document = Document(
    id=new_document_id,
    page_content="This is a new document!",
    metadata={"section": "Introduction"},
)

In [43]:
r = cfVect.add_documents(
    index_name=vectorize_index_name,
    documents=[query_documents[0], new_document],
    upsert=True,
    wait=True
)

In [44]:
query_documents_updated = cfVect.get_by_ids(
    index_name=vectorize_index_name,
    ids=[
        query_documents[0].id,
        new_document_id
    ]
)

In [45]:
query_documents_updated

[Document(id='c66d977e-1357-4b4b-b45b-63cbd050c877', metadata={'_namespace': None, 'section': 'Introduction'}, page_content="Updated: and other services. Cloudflare's headquarters are in San Francisco, California. According to"),
 Document(id='12345678910', metadata={'section': 'Introduction', '_namespace': None}, page_content='This is a new document!')]

# Deleting Records
We can delete records by their ids as well


In [46]:
r = cfVect.delete(
    index_name=vectorize_index_name,
    ids=sample_ids,
    wait=True
)

In [47]:
print(r)

{'result': {'mutationId': 'b85b8b7b-2b67-409a-ab85-492fe0b8db44'}, 'result_info': None, 'success': True, 'errors': [], 'messages': [], 'ids': ['c66d977e-1357-4b4b-b45b-63cbd050c877', '86beb57b-af02-4039-b172-eb5179781be8', '0b55e4ed-63ca-45fd-bca6-86ab9840c0bd', '4dab0e0f-20ae-4ba8-a777-86b2024b2d26', '344bde04-beb0-4aef-9f1b-801506413d46', 'f48dd942-c3fc-443b-ab22-82d116c5412b']}


And to confirm deletion

In [48]:
query_documents = cfVect.get_by_ids(
    index_name=vectorize_index_name,
    ids=sample_ids
)
assert len(query_documents) == 0

# Creating from Documents
LangChain stipulates that all vectorstores must have a `from_documents` method to instantiate a new Vectorstore from documents.  This is a more streamlined method than the individual `create, add` steps shown above.

You can do that as shown here:

In [49]:
vectorize_index_name = "test-langchain-from-docs"

In [50]:
cfVect = CloudflareVectorize.from_documents(
    account_id=cf_acct_id,
    index_name=vectorize_index_name,
    documents=texts,
    embedding=embedder,
    d1_database_id=d1_database_id,
    d1_api_token=cf_d1_token,
    vectorize_api_token=cf_vectorize_token,
    wait=True
)

In [51]:
#query for documents
query_documents = cfVect.similarity_search(
    index_name=vectorize_index_name,
    query="Edge Computing",
)

print(f"{len(query_documents)} results:\n{str(query_documents[0])[:300]}")

20 results:
page_content='utilizing edge computing, reverse proxies for web traffic, data center interconnects, and a content' metadata={'section': 'Products'}


# Async Examples
This section will show some Async examples


## Creating Indexes

In [52]:
vectorize_index_name1 = "test-langchain1"
vectorize_index_name2 = "test-langchain2"
vectorize_index_name3 = "test-langchain3"

In [53]:
# depending on your notebook environment you might need to include these:
# import nest_asyncio
# nest_asyncio.apply()

async_requests = [
    cfVect.acreate_index(index_name=vectorize_index_name1),
    cfVect.acreate_index(index_name=vectorize_index_name2),
    cfVect.acreate_index(index_name=vectorize_index_name3)
]

await asyncio.gather(*async_requests);

## Creating Metadata Indexes

In [54]:
async_requests = [
    cfVect.acreate_metadata_index(
        property_name="section",
        index_type="string",
        index_name=vectorize_index_name1,
        wait=True
    ),
    cfVect.acreate_metadata_index(
        property_name="section",
        index_type="string",
        index_name=vectorize_index_name2,
        wait=True
    ),
    cfVect.acreate_metadata_index(
        property_name="section",
        index_type="string",
        index_name=vectorize_index_name3,
        wait=True
    )
]

await asyncio.gather(*async_requests);

## Adding Documents

In [55]:
async_requests = [
    cfVect.aadd_documents(
        index_name=vectorize_index_name1,
        documents=texts,
        wait=True
    ),
    cfVect.aadd_documents(
        index_name=vectorize_index_name2,
        documents=texts,
        wait=True
    ),
    cfVect.aadd_documents(
        index_name=vectorize_index_name3,
        documents=texts,
        wait=True
    )
]

await asyncio.gather(*async_requests);

In [56]:
texts[:3]

[Document(metadata={'section': 'Introduction'}, page_content='Cloudflare, Inc., is an American company that provides content delivery network services,'),
 Document(metadata={'section': 'Introduction'}, page_content='network services, cybersecurity, DDoS mitigation, wide area network services, reverse proxies,'),
 Document(metadata={'section': 'Introduction'}, page_content='reverse proxies, Domain Name Service, ICANN-accredited domain registration, and other services.')]

## Querying/Search

In [57]:
async_requests = [
    cfVect.asimilarity_search(
        index_name=vectorize_index_name1,
        query="Workers AI"
    ),
    cfVect.asimilarity_search(
        index_name=vectorize_index_name2,
        query="Edge Computing"
    ),
    cfVect.asimilarity_search(
        index_name=vectorize_index_name3,
        query="SASE"
    )
]

async_results = await asyncio.gather(*async_requests);

In [58]:
print(f"{len(async_results[0])} results:\n{str(async_results[0][0])[:300]}")
print(f"{len(async_results[1])} results:\n{str(async_results[1][0])[:300]}")
print(f"{len(async_results[1])} results:\n{str(async_results[2][0])[:300]}")

20 results:
page_content='In 2023, Cloudflare launched Workers AI, a framework allowing for use of Nvidia GPU's within'
20 results:
page_content='utilizing edge computing, reverse proxies for web traffic, data center interconnects, and a content'
20 results:
page_content='== Products =='


### Returning Metadata/Values

In [59]:
async_requests = [
    cfVect.asimilarity_search(
        index_name=vectorize_index_name1,
        query="California",
        return_values=True,
        return_metadata='all'
    ),
    cfVect.asimilarity_search(
        index_name=vectorize_index_name2,
        query="California",
        return_values=True,
        return_metadata='all'
    ),
    cfVect.asimilarity_search(
        index_name=vectorize_index_name3,
        query="California",
        return_values=True,
        return_metadata='all'
    )
]

async_results = await asyncio.gather(*async_requests);

In [60]:
print(f"{len(async_results[0])} results:\n{str(async_results[0][0])[:300]}")
print(f"{len(async_results[1])} results:\n{str(async_results[1][0])[:300]}")
print(f"{len(async_results[1])} results:\n{str(async_results[2][0])[:300]}")

20 results:
page_content='and other services. Cloudflare's headquarters are in San Francisco, California. According to' metadata={'section': 'Introduction'}
20 results:
page_content='and other services. Cloudflare's headquarters are in San Francisco, California. According to' metadata={'section': 'Introduction'}
20 results:
page_content='and other services. Cloudflare's headquarters are in San Francisco, California. According to' metadata={'section': 'Introduction'}


## Searching with Metadata Filtering

In [61]:
async_requests = [
    cfVect.asimilarity_search(
        index_name=vectorize_index_name1,
        query="Cloudflare services",
        k=2,
        md_filter={"section": "Products"},
        return_metadata='all',
        # return_values=True
    ),
    cfVect.asimilarity_search(
        index_name=vectorize_index_name2,
        query="Cloudflare services",
        k=2,
        md_filter={"section": "Products"},
        return_metadata='all',
        # return_values=True
    ),
    cfVect.asimilarity_search(
        index_name=vectorize_index_name3,
        query="Cloudflare services",
        k=2,
        md_filter={"section": "Products"},
        return_metadata='all',
        # return_values=True
    )
]

async_results = await asyncio.gather(*async_requests);

In [62]:
[doc.metadata["section"] == "Products" for doc in async_results[0]]

[True, True]

In [63]:
print(f"{len(async_results[0])} results:\n{str(async_results[0][-1])[:300]}")
print(f"{len(async_results[1])} results:\n{str(async_results[1][0])[:300]}")
print(f"{len(async_results[1])} results:\n{str(async_results[2][0])[:300]}")

2 results:
page_content='Cloudflare also provides analysis and reports on large-scale outages, including Verizon’s October' metadata={'section': 'Products'}
2 results:
page_content='Cloudflare provides network and security products for consumers and businesses, utilizing edge' metadata={'section': 'Products'}
2 results:
page_content='Cloudflare provides network and security products for consumers and businesses, utilizing edge' metadata={'section': 'Products'}


### Search by IDs

In [64]:
sample_ids_1 = [x.id for x in async_results[0]][:3]
sample_ids_2 = [x.id for x in async_results[1]][:3]
sample_ids_3 = [x.id for x in async_results[2]][:3]

In [65]:
sample_ids_1

['6ed79c00-4330-4c19-815c-e6a03369291b',
 '8cb95f1e-a80d-4e60-a776-06a1c70b84e0']

In [66]:
async_requests = [
    cfVect.aget_by_ids(
        index_name=vectorize_index_name1,
        ids=sample_ids_1
    ),
    cfVect.aget_by_ids(
        index_name=vectorize_index_name2,
        ids=sample_ids_2
    ),
    cfVect.aget_by_ids(
        index_name=vectorize_index_name3,
        ids=sample_ids_3
    )
]

async_results = await asyncio.gather(*async_requests);

In [67]:
print(f"{len(async_results[0])} results:\n{str(async_results[0][0])[:300]}")
print(f"{len(async_results[1])} results:\n{str(async_results[1][0])[:300]}")
print(f"{len(async_results[1])} results:\n{str(async_results[2][0])[:300]}")

2 results:
page_content='Cloudflare provides network and security products for consumers and businesses, utilizing edge' metadata={'section': 'Products', '_namespace': None}
2 results:
page_content='Cloudflare provides network and security products for consumers and businesses, utilizing edge' metadata={'section': 'Products', '_namespace': None}
2 results:
page_content='Cloudflare provides network and security products for consumers and businesses, utilizing edge' metadata={'section': 'Products', '_namespace': None}


### Deleting Records

In [68]:
async_requests = [
    cfVect.adelete(
        index_name=vectorize_index_name1,
        ids=sample_ids_1,
        wait=True
    ),
    cfVect.adelete(
        index_name=vectorize_index_name2,
        ids=sample_ids_2,
        wait=True
    ),
    cfVect.adelete(
        index_name=vectorize_index_name3,
        ids=sample_ids_3,
        wait=True
    )
]

await asyncio.gather(*async_requests);

In [69]:
async_requests = [
    cfVect.aget_by_ids(
        index_name=vectorize_index_name1,
        ids=sample_ids_1
    ),
    cfVect.aget_by_ids(
        index_name=vectorize_index_name2,
        ids=sample_ids_2
    ),
    cfVect.aget_by_ids(
        index_name=vectorize_index_name3,
        ids=sample_ids_3
    )
]

async_results = await asyncio.gather(*async_requests);

In [70]:
assert len(async_results[0]) == 0
assert len(async_results[1]) == 0
assert len(async_results[2]) == 0

# Cleanup
Let's finish by deleting all of the indexes we created in this notebook.

In [71]:
arr_indexes = cfVect.list_indexes()
arr_indexes = [x for x in arr_indexes if "test-langchain" in x.get("name")]

In [72]:
arr_async_requests = [
    cfVect.adelete_index(index_name=x.get("name"))
    for x in arr_indexes
]
await asyncio.gather(*arr_async_requests);