[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mongodb-developer/GenAI-Showcase/blob/main/notebooks/advanced_techniques/retrieval_cost_latency_optimization.ipynb)

[![View Article](https://img.shields.io/badge/View%20Article-blue)](https://www.mongodb.com/company/blog/technical/retrieval-optimization/?utm_campaign=devrel&utm_source=cross-post&utm_medium=organic_social&utm_content=https%3A%2F%2Fgithub.com%2Fmongodb-developer%2FGenAI-Showcase&utm_term=apoorva.joshi)

# Optimizing Retrieval Performance using Voyage AI and MongoDB

## Step 1: Install required packages

- **voyageai**: Voyage AI's Python SDK
- **pymongo**: MongoDB's Python driver
- **datasets**: Python library to interact with datasets on Hugging Face
- **scikit-learn**: Python library consisting of modules for machine learning and data mining

In [None]:
!pip install -qU voyageai==0.3.7 pymongo==4.15.5 datasets==4.5.0 scikit-learn==1.7.2

## Step 2: Setup prerequisites

**Voyage AI**
- [Obtain a Voyage AI API key](https://dashboard.voyageai.com/organization/api-keys)

**MongoDB**
- Register for a [free MongoDB Atlas account](https://www.mongodb.com/cloud/atlas/register)
- [Create a new database cluster](https://www.mongodb.com/docs/guides/atlas/cluster/)
- [Obtain the connection string](https://www.mongodb.com/docs/guides/atlas/connection-string/) for your database cluster

In [5]:
import getpass
import os

import voyageai
from pymongo import MongoClient

In [6]:
# Set Voyage API key as an environment variable
os.environ["VOYAGE_API_KEY"] = getpass.getpass("Enter your Voyage API key:")
# Initialize the Voyage AI client
vo = voyageai.Client()

Enter your Voyage API key: ········


In [7]:
# Set the MongoDB connection string
MONGODB_URI = getpass.getpass("Enter your MongoDB connection string:")
# Initialize the MongoDB client
mongodb_client = MongoClient(
    MONGODB_URI, appname="devrel.showcase.agentic_video_search"
)
# Check MongoDB connection
mongodb_client.admin.command("ping")

Enter your MongoDB connection string: ········


{'ok': 1.0,
 '$clusterTime': {'clusterTime': Timestamp(1769446214, 1),
  'signature': {'hash': b'\x82\xd0\xc3=\x9a\x17T\xf4\x81\xea\x80\x0b\xe1\x90c\x01\xc6\xe1b\xf0',
   'keyId': 7558184680432861186}},
 'operationTime': Timestamp(1769446214, 1)}

## Step 3: Download the dataset

In [17]:
import pandas as pd
from datasets import load_dataset

In [18]:
# Download a dataset from Hugging Face
data = load_dataset("mteb/nfcorpus", "corpus", split="corpus")
corpus = pd.DataFrame(data)
# Preview the dataset
corpus.head()

README.md: 0.00B [00:00, ?B/s]

corpus.jsonl:   0%|          | 0.00/5.97M [00:00<?, ?B/s]

Generating corpus split:   0%|          | 0/3633 [00:00<?, ? examples/s]

Unnamed: 0,_id,title,text
0,MED-10,Statin Use and Breast Cancer Survival: A Natio...,"Recent studies have suggested that statins, an..."
1,MED-14,Statin use after diagnosis of breast cancer an...,BACKGROUND: Preclinical studies have shown tha...
2,MED-118,Alkylphenols in human milk and their relations...,The aims of this study were to determine the c...
3,MED-301,Methylmercury: A Potential Environmental Risk ...,Epilepsy or seizure disorder is one of the mos...
4,MED-306,Sensitivity of Continuous Performance Test (CP...,Hit Reaction Time latencies (HRT) in the Conti...


In [19]:
# Download the evaluation queriesfrom Hugging Face
data = load_dataset("mteb/nfcorpus", "queries", split="queries")
queries = pd.DataFrame(data)
# Preview the evaluation queries
queries.head()

queries.jsonl:   0%|          | 0.00/180k [00:00<?, ?B/s]

Generating queries split:   0%|          | 0/3237 [00:00<?, ? examples/s]

Unnamed: 0,_id,text
0,PLAIN-3,Breast Cancer Cells Feed on Cholesterol
1,PLAIN-4,Using Diet to Treat Asthma and Eczema
2,PLAIN-5,Treating Asthma With Plants vs. Pills
3,PLAIN-6,How Fruits and Vegetables Can Treat Asthma
4,PLAIN-7,How Fruits and Vegetables Can Prevent Asthma


In [20]:
# Download the ground truth relevance scores from Hugging Face
data = load_dataset("mteb/nfcorpus", split="test")
qrels = pd.DataFrame(data)
# Preview the relevance score dataframe
qrels.head()

train.jsonl: 0.00B [00:00, ?B/s]

dev.jsonl: 0.00B [00:00, ?B/s]

test.jsonl: 0.00B [00:00, ?B/s]

Generating train split:   0%|          | 0/110575 [00:00<?, ? examples/s]

Generating dev split:   0%|          | 0/11385 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/12334 [00:00<?, ? examples/s]

Unnamed: 0,query-id,corpus-id,score
0,PLAIN-2,MED-2427,2.0
1,PLAIN-2,MED-10,2.0
2,PLAIN-2,MED-2429,2.0
3,PLAIN-2,MED-2430,2.0
4,PLAIN-2,MED-2431,2.0


In [21]:
# Merge `qrels` with `queries` to get query text
# Only keep queries that exist in `qrels`
eval_df = qrels.merge(
    queries[["_id", "text"]], left_on="query-id", right_on="_id", how="inner"
).drop(columns=["_id"])

# Group by query-id and aggregate docs and scores
eval_df = (
    eval_df.groupby(["query-id", "text"])
    .agg(
        {
            "corpus-id": list,  # Collect all corpus IDs into a list
            "score": list,  # Collect all scores into a list
        }
    )
    .reset_index()
)

In [22]:
# Preview the formatted evaluation dataset
eval_df.head()

Unnamed: 0,query-id,text,corpus-id,score
0,PLAIN-1008,deafness,"[MED-4532, MED-4533, MED-4534, MED-4535, MED-4...","[1.0, 1.0, 1.0, 1.0, 1.0]"
1,PLAIN-1018,DHA,"[MED-2750, MED-2751, MED-2752, MED-2754, MED-2...","[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, ..."
2,PLAIN-102,Stopping Heart Disease in Childhood,"[MED-3254, MED-3253, MED-3255, MED-5322, MED-5...","[2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, ..."
3,PLAIN-1028,dietary scoring,[MED-4686],[1.0]
4,PLAIN-1039,domoic acid,"[MED-4380, MED-4381]","[1.0, 1.0]"


## Step 4: Embed the corpus

In [12]:
from tqdm import tqdm

In [13]:
def generate_embeddings(texts: list[str], model: str, input_type: str, dims: int):
    embeddings = vo.embed(
        texts=texts,
        model=model,
        input_type=input_type,
        output_dimension=dims,
    ).embeddings
    return embeddings

In [203]:
def embed_batch(batch_size: int, **embed_params):
    embeddings = []
    for i in tqdm(range(0, len(corpus), batch_size)):
        batch_texts = corpus["text"].iloc[i : i + batch_size].tolist()
        batch_embeddings = generate_embeddings(batch_texts, **embed_params)
        embeddings.extend(batch_embeddings)
    return embeddings

In [204]:
# Full-precision, default dimensionality embeddings
corpus["1024_embedding"] = embed_batch(
    batch_size=100, model="voyage-4-large", input_type="document", dims=1024
)

100%|██████████| 37/37 [00:11<00:00,  3.21it/s]


In [205]:
# Full-precision, reduced dimensionality embeddings
corpus["512_embedding"] = embed_batch(
    batch_size=100, model="voyage-4-large", input_type="document", dims=512
)

100%|██████████| 37/37 [00:10<00:00,  3.52it/s]


In [206]:
# Preview the corpus dataframe with embeddings
corpus.head()

Unnamed: 0,_id,title,text,1024_embedding,512_embedding
0,MED-10,Statin Use and Breast Cancer Survival: A Natio...,"Recent studies have suggested that statins, an...","[0.006306333001703024, 0.03950955346226692, 0....","[0.00884974654763937, 0.05544419586658478, 0.0..."
1,MED-14,Statin use after diagnosis of breast cancer an...,BACKGROUND: Preclinical studies have shown tha...,"[0.010320048779249191, 0.023050619289278984, 0...","[0.014489011839032173, 0.03236231580376625, 0...."
2,MED-118,Alkylphenols in human milk and their relations...,The aims of this study were to determine the c...,"[0.015265388414263725, -0.04633313789963722, -...","[0.021802613511681557, -0.06617476046085358, -..."
3,MED-301,Methylmercury: A Potential Environmental Risk ...,Epilepsy or seizure disorder is one of the mos...,"[-0.05990750342607498, -0.020487235859036446, ...","[-0.08300717920064926, -0.02838689088821411, 0..."
4,MED-306,Sensitivity of Continuous Performance Test (CP...,Hit Reaction Time latencies (HRT) in the Conti...,"[-0.04549531638622284, -0.02429511770606041, -...","[-0.060744885355234146, -0.03243859484791756, ..."


In [207]:
# Convert dataframe into a list of dictionaries
corpus_dict = corpus.to_dict("records")

## Step 5: Ingest data into MongoDB

In [14]:
db = mongodb_client["mongodb_eval"]
collection = db["docs"]

In [209]:
# Delete existing documents from collection
collection.delete_many({})

DeleteResult({'n': 3633, 'electionId': ObjectId('7fffffff000000000000004b'), 'opTime': {'ts': Timestamp(1769039974, 364), 't': 75}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1769039974, 364), 'signature': {'hash': b'J#\xa5&q\x1bJ:\xe0\xfe\xe3WN\x04q\xbf\xe5\xcaKt', 'keyId': 7558184680432861186}}, 'operationTime': Timestamp(1769039974, 364)}, acknowledged=True)

In [None]:
collection.insert_many(corpus_dict)

## Step 6: Create vector search indexes

In [211]:
from pymongo.operations import SearchIndexModel

In [225]:
definitions = {
    "1024_full_precision": {
        "type": "vector",
        "path": "1024_embedding",
        "numDimensions": 1024,
        "similarity": "cosine",
    },
    "1024_quantized": {
        "type": "vector",
        "path": "1024_embedding",
        "numDimensions": 1024,
        "similarity": "cosine",
        "quantization": "binary",
    },
    "512_full_precision": {
        "type": "vector",
        "path": "512_embedding",
        "numDimensions": 512,
        "similarity": "cosine",
    },
    "512_quantized": {
        "type": "vector",
        "path": "512_embedding",
        "numDimensions": 512,
        "similarity": "cosine",
        "quantization": "binary",
    },
}

In [226]:
collection.create_search_indexes(
    [
        SearchIndexModel(
            name=name, type="vectorSearch", definition={"fields": [definition]}
        )
        for name, definition in definitions.items()
    ]
)

['1024_full_precision',
 '1024_quantized',
 '512_full_precision',
 '512_quantized']

## Step 7: Evaluation

In [8]:
import numpy as np
from sklearn.metrics import ndcg_score

In [9]:
def vector_search(query: str, config: dict):
    query_embedding = generate_embeddings(
        texts=[query], model=config["model"], input_type="query", dims=config["dims"]
    )[0]
    pipeline = [
        {
            "$vectorSearch": {
                "index": config["index"],
                "queryVector": query_embedding,
                "path": config["path"],
                "numCandidates": 200,
                "limit": 10,
            }
        },
        {
            "$project": {
                "_id": 1,
                "score": {"$meta": "vectorSearchScore"},
            }
        },
    ]
    # Get execution stats
    explain_result = collection.database.command(
        "explain",
        {"aggregate": collection.name, "pipeline": pipeline, "cursor": {}},
        verbosity="executionStats",
    )
    # Extract the execution time
    vector_search_explain = explain_result["stages"][0]["$vectorSearch"]
    execution_time_ms = vector_search_explain["explain"]["query"]["stats"]["context"][
        "millisElapsed"
    ]
    # Execute the pipeline
    results = list(collection.aggregate(pipeline))
    return results, execution_time_ms

In [10]:
def evaluate(config):
    latencies = []
    ndcgs = []
    for _, row in tqdm(eval_df.iterrows(), desc=config["description"]):
        results, execution_time = vector_search(row["text"], config)
        latencies.append(execution_time)

        relevant_docs = row["corpus-id"]
        relevant_scores = row["score"]
        relevance_map = dict(zip(relevant_docs, relevant_scores))

        # Calculate NDCG
        y_true = [relevance_map.get(doc["_id"], 0) for doc in results]
        y_score = [doc["score"] for doc in results]
        ndcg = ndcg_score([y_true], [y_score])
        ndcgs.append(ndcg)

    return {
        "ndcg": np.mean(ndcgs),
        "p50_latency": np.median(latencies),
        "p95_latency": np.percentile(latencies, 95),
    }

In [23]:
configs = {
    "baseline": {
        "description": "Model: voyage-4-large, Dims: 1024, Quantization: None",
        "model": "voyage-4-large",
        "dims": 1024,
        "index": "1024_full_precision",
        "path": "1024_embedding",
    },
    "dim_reduction": {
        "description": "Model: voyage-4-large, Dims: 512, Quantization: None",
        "model": "voyage-4-large",
        "dims": 512,
        "index": "512_full_precision",
        "path": "512_embedding",
    },
    "quantization": {
        "description": "Model: voyage-4-large, Dims: 1024, Quantization: Binary",
        "model": "voyage-4-large",
        "dims": 1024,
        "index": "1024_quantized",
        "path": "1024_embedding",
    },
    "dim_reduction_and_quantization": {
        "description": "Model: voyage-4-large, Dims: 512, Quantization: Binary",
        "model": "voyage-4-large",
        "dims": 512,
        "index": "512_quantized",
        "path": "512_embedding",
    },
    "asymmetric_retrieval": {
        "description": "Model: voyage-4, Dims: 1024, Quantization: None",
        "model": "voyage-4",
        "dims": 1024,
        "index": "1024_full_precision",
        "path": "1024_embedding",
    },
}

In [24]:
for config_name, config in configs.items():
    print(f"Evaluating {config_name} config...")
    results = evaluate(config)
    print(f"NDCG@10: {results['ndcg']:.3f}")
    print(f"P50 Latency: {results['p50_latency']:.3f}ms")
    print(f"P95 Latency: {results['p95_latency']:.3f}ms")

Evaluating baseline config...


Model: voyage-4-large, Dims: 1024, Quantization: None: 323it [00:36,  8.88it/s]


NDCG@10: 0.650
P50 Latency: 1.888ms
P95 Latency: 3.387ms
Evaluating dim_reduction config...


Model: voyage-4-large, Dims: 512, Quantization: None: 323it [00:32,  9.89it/s]


NDCG@10: 0.643
P50 Latency: 1.124ms
P95 Latency: 1.351ms
Evaluating quantization config...


Model: voyage-4-large, Dims: 1024, Quantization: Binary: 323it [00:34,  9.40it/s]


NDCG@10: 0.645
P50 Latency: 0.758ms
P95 Latency: 0.976ms
Evaluating dim_reduction_and_quantization config...


Model: voyage-4-large, Dims: 512, Quantization: Binary: 323it [00:33,  9.74it/s]


NDCG@10: 0.634
P50 Latency: 0.671ms
P95 Latency: 0.768ms
Evaluating asymmetric_retrieval config...


Model: voyage-4, Dims: 1024, Quantization: None: 323it [00:28, 11.18it/s]

NDCG@10: 0.640
P50 Latency: 1.743ms
P95 Latency: 2.004ms



