# Optimizing Vector Database Performance: Reducing Retrieval Latency with Quantization

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mongodb-developer/GenAI-Showcase/blob/main/notebooks/advanced_techniques/automatic_quantization_of_nomic_emebddings_with_mongodb.ipynb)

---



**Summary**

This notebook explores techniques for optimizing vector database performance, focusing on reducing retrieval latency through the use of quantization methods. We examine the practical application of various embedding types
- float32_embedding
- int8_embedding
- binary_embedding

We analyze their impact on query precision and retrieval speed.

By leveraging quantization strategies like scalar and binary quantization, we highlight the trade-offs between precision and efficiency.

The notebook also includes a step-by-step demonstration of executing vector searches, measuring retrieval latencies, and visualizing results in a comparative framework.







**Use Case:**

The notebook demonstrates how to optimize vector database performance, specifically focusing on reducing retrieval latency using quantization methods.

**Scenario**:
You have a large dataset of text data (in this case, a book from Gutenberg) and you want to build a system that can efficiently find similar pieces of text based on a user's query.

**Approach**:
- Embeddings: The notebook uses SentenceTransformer to convert text into numerical vectors (embeddings) which capture the semantic meaning of the text.
- Vector Database: MongoDB is used as a vector database to store and search these embeddings efficiently.
- Quantization: To speed up retrieval, the notebook applies quantization techniques (scalar and binary) to the embeddings. This reduces the size of the embeddings, making searches faster but potentially impacting precision.
Goal: By comparing the performance of different embedding types (float32, int8, binary), the notebook aims to show the trade-offs between retrieval speed and accuracy when using quantization. This helps in choosing the best approach for a given use case.

## Step 1: Install Libaries

Here's a breakdown of the libraries and their roles:

- **unstructured**: This library is used to process and structure various data formats, including text, enabling efficient analysis and extraction of information.
- **pymongo**: This library provides the tools necessary to interact with MongoDB allowing for storage and retrieval of data within the project.
- **nomic**: This library is used for vector embedding and other functions related to Nomic AI's models, specifically for generating and working with text embeddings.
- **pandas**: This popular library is used for data manipulation and analysis, providing data structures and functions for efficient data handling and exploration.
- **sentence_transformers**: This library is used for generating embeddings for text data using the SentenceTransformer model.

By installing these packages, the code sets up the tools necessary for data processing, embedding generation, and storage with MongoDB.


In [10]:
%pip install --quiet -U unstructured pymongo nomic pandas sentence_transformers einops

Note: you may need to restart the kernel to use updated packages.


In [3]:
import getpass
import os


# Function to securely get and set environment variables
def set_env_securely(var_name, prompt):
    value = getpass.getpass(prompt)
    os.environ[var_name] = value

## Step 2: Data Loading and Preparation

**Dataset Information**

The dataset used in this example is "Pushing to the Front," an ebook from Project Gutenberg. This book, focusing on self-improvement and success, is freely available for public use.

The code leverages the ```unstructured``` library to process this raw text data, transforming it into a structured format suitable for semantic analysis and search. By chunking the text based on titles, the code creates meaningful units that can be embedded and stored in a vector database for efficient retrieval. This preprocessing is essential for building a robust and performant semantic search system.



The code below ```requests``` library to fetch the text content of the book "Pushing to the Front" from Project Gutenberg's website. The URL points to the raw text file of the book.

In [4]:
import requests

url = "https://www.gutenberg.org/cache/epub/21291/pg21291.txt"
response = requests.get(url)
response.raise_for_status()
book_text = response.text

Data Cleaning: The ```unstructured``` library is used to clean and structure the raw text. The ```group_broken_paragraphs``` function helps in combining fragmented paragraphs, ensuring better text flow.



In [5]:
# import nltk
# nltk.download('punkt')
# nltk.download('averaged_perceptron_tagger')

from unstructured.cleaners.core import group_broken_paragraphs
from unstructured.partition.text import partition_text

cleaned_text = group_broken_paragraphs(book_text)

parsed_sections = partition_text(text=cleaned_text)

The ```partition_text``` function further processes the cleaned text, dividing it into logical sections. These sections could represent chapters, sub-sections, or other meaningful units within the book.

In [6]:
# Show the first 5 sections
for text in parsed_sections[:5]:
    print(text)
    print("\n")

The Project Gutenberg eBook of Pushing to the Front


This ebook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this ebook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook.


Title: Pushing to the Front


Author: Orison Swett Marden


Release date: May 4, 2007 [eBook #21291]




Chunking by Title: The ```chunk_by_title``` function identifies titles or headings within the parsed sections and uses them to create distinct chunks of text. This step is crucial for organizing the data into manageable units for subsequent embedding generation and semantic search.



In [7]:
from unstructured.chunking.title import chunk_by_title

chunks = chunk_by_title(parsed_sections)

In [8]:
for chunk in chunks:
    print(chunk)
    break

The Project Gutenberg eBook of Pushing to the Front

This ebook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this ebook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook.


## Step 3: Embeddings Generation

In [11]:
from sentence_transformers import SentenceTransformer
from tqdm import tqdm

embedding_model = SentenceTransformer(
    "nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True
)

# Determine the maximum sequence length for the model
max_seq_length = embedding_model.max_seq_length


def chunk_text(text, tokenizer, max_length=8192, overlap=50):
    """
    Split the text into overlapping chunks based on token length.
    """
    tokens = tokenizer.tokenize(text)
    chunks = []
    for i in range(0, len(tokens), max_length - overlap):
        chunk_tokens = tokens[i : i + max_length]
        chunk = tokenizer.convert_tokens_to_string(chunk_tokens)
        chunks.append(chunk)
    return chunks


def get_embedding(text, task_prefix):
    """
    Generate embeddings for a text string with a task-specific prefix.
    """

    if not text.strip():
        print("Attempted to get embedding for empty text.")
        return []

    # Prepend the task instruction prefix to the text
    prefixed_text = f"{task_prefix}: {text}"

    # Get the tokenizer from the model
    tokenizer = embedding_model.tokenizer

    # Split text into chunks if it's too long
    chunks = chunk_text(prefixed_text, tokenizer, max_length=max_seq_length)

    # Embed each chunk
    chunk_embeddings = embedding_model.encode(chunks)

    # Return the first embedding as a list
    return chunk_embeddings[0].tolist()

<All keys matched successfully>


The embedding generation might take a approximately 20 minutes

In [12]:
from tqdm import tqdm

# Pass chunks into embedding function with a progress bar
embeddings = []
# If you don't want to chunk the entire document simply slice the chunks
# e.g for chunk in tqdm(chunks[:20], desc="Generating embeddings")
for chunk in tqdm(chunks, desc="Generating embeddings"):
    embedding = get_embedding(str(chunk), task_prefix="search_document")
    embeddings.append(embedding)

Generating embeddings: 100%|██████████| 4135/4135 [03:09<00:00, 21.77it/s]


In [16]:
# Store the embedding data alongside the chunk, so a datapoing is {chunk:"text", embedding: "embedding"}
embedding_data = []
for chunk, embedding in zip(chunks, embeddings):
    embedding_data.append(
        {
            "chunk": chunk.text,
            "float32_embedding": embedding,
            "int8_embedding": embedding,
            "binary_embedding": embedding,
        }
    )

In [17]:
# Convert the embedding data to a Pandas dataframe
import pandas as pd

dataset_df = pd.DataFrame(embedding_data)

When visualizing the dataset values, you will observe that the embedding attributes: float32_embedding, int_embedding and binary_emebedding all have the same values.

In downstream proceses the values of the int_embedding and binary_embedding attributes for each data point will be modified to their respective data types, as a result of MongoDB Atlas auto quantization feature.

In [18]:
dataset_df.head()

Unnamed: 0,chunk,float32_embedding,int8_embedding,binary_embedding
0,﻿The Project Gutenberg eBook of Pushing to the...,"[0.6708638668060303, 1.6244561672210693, -3.93...","[0.6708638668060303, 1.6244561672210693, -3.93...","[0.6708638668060303, 1.6244561672210693, -3.93..."
1,Title: Pushing to the Front\n\nAuthor: Orison ...,"[0.40157127380371094, 0.9250307679176331, -3.8...","[0.40157127380371094, 0.9250307679176331, -3.8...","[0.40157127380371094, 0.9250307679176331, -3.8..."
2,"SAN JOSE\n\nCOPYRIGHT, 1911,\n\nBy ORISON SWET...","[0.8498777747154236, 1.207460880279541, -4.070...","[0.8498777747154236, 1.207460880279541, -4.070...","[0.8498777747154236, 1.207460880279541, -4.070..."
3,"It has sent thousands of youths, with renewed ...","[0.21242937445640564, 0.9721437096595764, -3.2...","[0.21242937445640564, 0.9721437096595764, -3.2...","[0.21242937445640564, 0.9721437096595764, -3.2..."
4,The author has received thousands of letters f...,"[0.1889897882938385, 1.1587932109832764, -3.47...","[0.1889897882938385, 1.1587932109832764, -3.47...","[0.1889897882938385, 1.1587932109832764, -3.47..."


## Step 4: MongoDB (Operational and Vector Database)

MongoDB acts as both an operational and vector database for the RAG system.
MongoDB Atlas specifically provides a database solution that efficiently stores, queries and retrieves vector embeddings.

Creating a database and collection within MongoDB is made simple with MongoDB Atlas.

1. First, register for a [MongoDB Atlas account](https://www.mongodb.com/cloud/atlas/register). For existing users, sign into MongoDB Atlas.
2. [Follow the instructions](https://www.mongodb.com/docs/atlas/tutorial/deploy-free-tier-cluster/). Select Atlas UI as the procedure to deploy your first cluster.

Follow MongoDB’s [steps to get the connection](https://www.mongodb.com/docs/manual/reference/connection-string/) string from the Atlas UI. After setting up the database and obtaining the Atlas cluster connection URI, securely store the URI within your development environment.


In [19]:
# Set MongoDB URI
set_env_securely("MONGO_URI", "Enter your MONGO URI: ")

In [20]:
import pymongo


def get_mongo_client(mongo_uri):
    """Establish and validate connection to the MongoDB."""

    client = pymongo.MongoClient(
        mongo_uri, appname="devrel.showcase.quantized_embeddings_nomic.python"
    )

    # Validate the connection
    ping_result = client.admin.command("ping")
    if ping_result.get("ok") == 1.0:
        # Connection successful
        print("Connection to MongoDB successful")
        return client
    else:
        print("Connection to MongoDB failed")
    return None


MONGO_URI = os.environ["MONGO_URI"]
if not MONGO_URI:
    print("MONGO_URI not set in environment variables")

In [21]:
from pymongo.errors import CollectionInvalid

mongo_client = get_mongo_client(MONGO_URI)

DB_NAME = "career_coach"
COLLECTION_NAME = "pushing_to_the_front_orison_quantized"

# Create or get the database
db = mongo_client[DB_NAME]

# Check if the collection exists
if COLLECTION_NAME not in db.list_collection_names():
    try:
        # Create the collection
        db.create_collection(COLLECTION_NAME)
        print(f"Collection '{COLLECTION_NAME}' created successfully.")
    except CollectionInvalid as e:
        print(f"Error creating collection: {e}")
else:
    print(f"Collection '{COLLECTION_NAME}' already exists.")

# Assign the collection
collection = db[COLLECTION_NAME]

Connection to MongoDB successful
Collection 'pushing_to_the_front_orison_quantized' created successfully.


## Step 5: Data Ingestion

In [22]:
collection.delete_many({})

DeleteResult({'n': 0, 'electionId': ObjectId('7fffffff0000000000000039'), 'opTime': {'ts': Timestamp(1734581901, 1), 't': 57}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1734581901, 1), 'signature': {'hash': b'\xd8+\x12\xed3V\x1bi\x9df-\xf8\x06\x97\xd3p\xa64\x05A', 'keyId': 7390008424139849730}}, 'operationTime': Timestamp(1734581901, 1)}, acknowledged=True)

In [23]:
documents = dataset_df.to_dict("records")
collection.insert_many(documents)

print("Data ingestion into MongoDB completed")

Data ingestion into MongoDB completed


## Step 6: Vector Search Index Creation

In [24]:
import time

from pymongo.operations import SearchIndexModel


def setup_vector_search_index(collection, index_definition, index_name="vector_index"):
    """
    Setup a vector search index for a MongoDB collection and wait for 30 seconds.

    Args:
    collection: MongoDB collection object
    index_definition: Dictionary containing the index definition
    index_name: Name of the index (default: "vector_index")
    """
    new_vector_search_index_model = SearchIndexModel(
        definition=index_definition, name=index_name, type="vectorSearch"
    )

    # Create the new index
    try:
        result = collection.create_search_index(model=new_vector_search_index_model)
        print(f"Creating index '{index_name}'...")

        # Sleep for 60 seconds
        print(f"Waiting for 60 seconds to allow index '{index_name}' to be created...")
        time.sleep(60)

        print(f"60-second wait completed for index '{index_name}'.")
        return result

    except Exception as e:
        print(f"Error creating new vector search index '{index_name}': {e!s}")
        return None

In [25]:
def create_vector_index_definition():
    """
    Create a vector index definition with predefined quantization methods.

    This function defines vector index fields with specific paths, dimensionalities,
    and similarity metrics. It includes support for quantization methods:
    - "scalar" quantization is applied to the "int8_embedding" field.
    - "binary" quantization is applied to the "binary_embedding" field.
    - No quantization is applied to the "float32_embedding" field.

    Returns:
      dict: A dictionary containing the vector index definition, including
      fields with their respective paths, quantization methods, dimensions,
      and similarity measures.
    """

    # Define the field types
    base_fields = [
        {
            "type": "vector",
            "path": "float32_embedding",
            "numDimensions": 768,
            "similarity": "cosine",
        },
        {
            "type": "vector",
            "path": "int8_embedding",
            "quantization": "scalar",
            "numDimensions": 768,
            "similarity": "cosine",
        },
        {
            "type": "vector",
            "path": "binary_embedding",
            "quantization": "binary",
            "numDimensions": 768,
            "similarity": "euclidean",
        },
    ]

    return {"fields": base_fields}

In [26]:
vector_index_definition = create_vector_index_definition()

In [27]:
print(vector_index_definition)

{'fields': [{'type': 'vector', 'path': 'float32_embedding', 'numDimensions': 768, 'similarity': 'cosine'}, {'type': 'vector', 'path': 'int8_embedding', 'quantization': 'scalar', 'numDimensions': 768, 'similarity': 'cosine'}, {'type': 'vector', 'path': 'binary_embedding', 'quantization': 'binary', 'numDimensions': 768, 'similarity': 'euclidean'}]}


In [28]:
setup_vector_search_index(collection, vector_index_definition, "vector_index")

Creating index 'vector_index'...
Waiting for 60 seconds to allow index 'vector_index' to be created...
60-second wait completed for index 'vector_index'.


'vector_index'

## Step 7: Vector Search Operation

In [75]:
def custom_vector_search(
    user_query, collection, embedding_path, vector_search_index_name="vector_index"
):
    """
    Perform a vector search in the MongoDB collection based on the user query.

    Args:
        user_query (str): The user's query string.
        collection (MongoCollection): The MongoDB collection to search.
        embedding_path (str): The path of the embedding field in the documents.
        vector_search_index_name (str): The name of the vector search index.

    Returns:
        list: A list of matching documents.
    """

    # Generate embedding for the user query
    query_embedding = get_embedding(user_query, task_prefix="search_query")

    if query_embedding is None:
        return "Invalid query or embedding generation failed."

    # Define the vector search stage
    vector_search_stage = {
        "$vectorSearch": {
            "index": vector_search_index_name,  # Specifies the index to use for the search
            "queryVector": query_embedding,  # The vector representing the query
            "path": embedding_path,  # Field in the documents containing the vectors to search against
            "numCandidates": 1000,  # Number of candidate matches to consider
            "limit": 10,  # Return top 5 matches
        }
    }

    project_stage = {
        "$project": {
            "_id": 0,  # Exclude the _id field
            "chunk": 1,
            "score": {
                "$meta": "vectorSearchScore"  # Include the search score
            },
        }
    }

    # Define the aggregate pipeline with the vector search stage and additional stages
    pipeline = [vector_search_stage, project_stage]

    # Execute the explain command
    explain_result = collection.database.command(
        "explain",
        {"aggregate": collection.name, "pipeline": pipeline, "cursor": {}},
        verbosity="executionStats",
    )

    # Extract the execution time
    vector_search_explain = explain_result["stages"][0]["$vectorSearch"]
    execution_time_ms = vector_search_explain["explain"]["query"]["stats"]["context"][
        "millisElapsed"
    ]

    # Execute the actual query
    results = list(collection.aggregate(pipeline))

    return results, execution_time_ms

In [77]:
def run_vector_search_operations(
    user_query, collection, vector_search_index_name="vector_index"
):
    """
    Run vector search operations for different embedding paths and store results in a DataFrame.
    """
    embedding_paths = ["float32_embedding", "int8_embedding", "binary_embedding"]
    results_data = []

    for path in embedding_paths:
        try:
            results, execution_time_ms = custom_vector_search(
                user_query=user_query,
                collection=collection,
                embedding_path=path,
                vector_search_index_name=vector_search_index_name,
            )

            # Format all results for this precision type into a single string
            formatted_results = "\n".join(
                [f"[{result['score']:.4f}] {result['chunk']}" for result in results]
            )

            results_data.append(
                {
                    "Precision (Data Type)": path.split("_")[0],
                    "Retrieval Latency (ms)": f"{execution_time_ms:.6f}",
                    "Results": formatted_results,
                }
            )

        except Exception as e:
            results_data.append(
                {
                    "Precision (Data Type)": path.split("_")[0],
                    "Retrieval Latency (ms)": "Error",
                    "Results": str(e),
                }
            )

    results_df = pd.DataFrame(results_data)

    # Set display options for better visibility
    pd.set_option("display.max_colwidth", None)
    pd.set_option("display.max_rows", None)
    pd.set_option("display.width", None)

    return results_df

## Step 8: Retrieving Documents and Analysing Results

In [82]:
# Perform the vector search and visualize the results
user_query = "How do I increase my productivity for maximum output"
results_df = run_vector_search_operations(user_query, collection)

One key point to note: If you’ve followed this example with a small dataset, you likely won’t observe significant retrieval latency improvements. Quantization methods truly demonstrate their benefits when dealing with large-scale datasets—on the order of one million or more embeddings—where memory savings and speed gains become substantially more noticeable.


In [83]:
# To display the results:
results_df.head()

Unnamed: 0,Precision (Data Type),Retrieval Latency (ms),Results
0,float32,4.510477,"[0.8310] Horace Greeley said that the best product of labor is the high-minded workman with an enthusiasm for his work.\n\n""The best method is obtained by earnestness,"" said Salvini. ""If you can impress people with the conviction that you feel what you say, they will pardon many shortcomings. And above all, study, study, study! All the genius in the world will not help you along with any art, unless you become a hard student. It has taken me years to master a single part.""\n[0.8151] Adopt this motto as yours. Hang it up in your bedroom, in your office or place of business, put it into your pocket-book, weave it into the texture of everything you do, and your life-work will be what every one's should be--A MASTERPIECE.\n\nCHAPTER XXIII\n\nTHE REWARD OF PERSISTENCE\n\nEvery noble work is at first impossible.--CARLYLE.\n[0.8134] ""Many persons seeing me so much engaged in active life,"" said Edward Bulwer Lytton, ""and as much about the world as if I had never been a student, have said to me, 'When do you get time to write all your books? How on earth do you contrive to do so much work?' I shall surprise you by the answer I made. The answer is this--'I contrive to do so much by never doing too much at a time. A man to get through work well must not overwork himself; or, if he do too much to-day, the reaction of fatigue\n[0.8131] The way to get the best out of yourself is to put things right up to yourself, handle yourself without gloves, and talk to yourself as you would to a son of yours who has great ability but who is not using half of it.\n\nWhen you go into an undertaking just say to yourself, ""Now, this thing is right up to me. I've got to make good, to show the man in me or the coward. There is no backing out.""\n[0.8123] ""You are capable of something much better than what you are doing. You must start out to-day with a firm resolution to make the returns from your work greater to-night than ever before. You must make this a red-letter day. Bestir yourself; get the cobwebs out of your head; brush off the brain ash. Think, think, think to some purpose! Do not mull and mope like this. You are only half-alive, man; get a move on you!""\n[0.8123] Edwin Chadwick, in his report to the British Parliament, stated that children, working on half time (that is, studying three hours a day and working the rest of their time out of doors), really made the greatest intellectual progress during the year. Business men have often accomplished wonders during the busiest lives by simply devoting one, two, three, or four hours daily to study or other literary work.\n[0.8121] Refresh, renew, rejuvenate yourself by play and pleasant recreation. Play as hard as you work; have a jolly good time, and then you will get that refreshing, invigorating sleep which gives an overplus of energy, a buoyancy of spirit which will make you eager to plunge into the next day's work.\n[0.8110] A successful manufacturer says: ""If you make a good pin, you will earn more money than if you make a bad steam engine."" ""If a man can write a better book, preach a better sermon, or make a better mousetrap than his neighbor,"" says Emerson, ""though he build his house in the woods, the world will make a path to his door.""\n[0.8103] No matter where you go, study the situation. Think why the man does not do better if he is not doing well, why he remains in mediocrity all his life. If he is making a remarkable success, try to find out why. Keep your eyes open, your ears open. Make deductions from what you see and hear. Trace difficulties; look up evidences of success or failure everywhere. It will be one of the greatest factors in your own success.\n\nCHAPTER XXX\n\nSELF\n[0.8088] There is a sense of great power in a vocation after a man has reached the point of efficiency in it, the point of productiveness, the point where his skill begins to tell and brings in returns. Up to this point of efficiency, while he is learning his trade, the time seems to have been almost thrown away. But he has been storing up a vast reserve of knowledge of detail, laying foundations, forming his acquaintances, gaining his reputation for truthfulness, trustworthiness, and integrity, and in"
1,int8,3.791106,"[0.8307] Horace Greeley said that the best product of labor is the high-minded workman with an enthusiasm for his work.\n\n""The best method is obtained by earnestness,"" said Salvini. ""If you can impress people with the conviction that you feel what you say, they will pardon many shortcomings. And above all, study, study, study! All the genius in the world will not help you along with any art, unless you become a hard student. It has taken me years to master a single part.""\n[0.8152] Adopt this motto as yours. Hang it up in your bedroom, in your office or place of business, put it into your pocket-book, weave it into the texture of everything you do, and your life-work will be what every one's should be--A MASTERPIECE.\n\nCHAPTER XXIII\n\nTHE REWARD OF PERSISTENCE\n\nEvery noble work is at first impossible.--CARLYLE.\n[0.8135] ""Many persons seeing me so much engaged in active life,"" said Edward Bulwer Lytton, ""and as much about the world as if I had never been a student, have said to me, 'When do you get time to write all your books? How on earth do you contrive to do so much work?' I shall surprise you by the answer I made. The answer is this--'I contrive to do so much by never doing too much at a time. A man to get through work well must not overwork himself; or, if he do too much to-day, the reaction of fatigue\n[0.8127] ""You are capable of something much better than what you are doing. You must start out to-day with a firm resolution to make the returns from your work greater to-night than ever before. You must make this a red-letter day. Bestir yourself; get the cobwebs out of your head; brush off the brain ash. Think, think, think to some purpose! Do not mull and mope like this. You are only half-alive, man; get a move on you!""\n[0.8126] The way to get the best out of yourself is to put things right up to yourself, handle yourself without gloves, and talk to yourself as you would to a son of yours who has great ability but who is not using half of it.\n\nWhen you go into an undertaking just say to yourself, ""Now, this thing is right up to me. I've got to make good, to show the man in me or the coward. There is no backing out.""\n[0.8125] Edwin Chadwick, in his report to the British Parliament, stated that children, working on half time (that is, studying three hours a day and working the rest of their time out of doors), really made the greatest intellectual progress during the year. Business men have often accomplished wonders during the busiest lives by simply devoting one, two, three, or four hours daily to study or other literary work.\n[0.8122] Refresh, renew, rejuvenate yourself by play and pleasant recreation. Play as hard as you work; have a jolly good time, and then you will get that refreshing, invigorating sleep which gives an overplus of energy, a buoyancy of spirit which will make you eager to plunge into the next day's work.\n[0.8105] No matter where you go, study the situation. Think why the man does not do better if he is not doing well, why he remains in mediocrity all his life. If he is making a remarkable success, try to find out why. Keep your eyes open, your ears open. Make deductions from what you see and hear. Trace difficulties; look up evidences of success or failure everywhere. It will be one of the greatest factors in your own success.\n\nCHAPTER XXX\n\nSELF\n[0.8103] A successful manufacturer says: ""If you make a good pin, you will earn more money than if you make a bad steam engine."" ""If a man can write a better book, preach a better sermon, or make a better mousetrap than his neighbor,"" says Emerson, ""though he build his house in the woods, the world will make a path to his door.""\n[0.8082] There is a sense of great power in a vocation after a man has reached the point of efficiency in it, the point of productiveness, the point where his skill begins to tell and brings in returns. Up to this point of efficiency, while he is learning his trade, the time seems to have been almost thrown away. But he has been storing up a vast reserve of knowledge of detail, laying foundations, forming his acquaintances, gaining his reputation for truthfulness, trustworthiness, and integrity, and in"
2,binary,6.250622,"[0.0032] Horace Greeley said that the best product of labor is the high-minded workman with an enthusiasm for his work.\n\n""The best method is obtained by earnestness,"" said Salvini. ""If you can impress people with the conviction that you feel what you say, they will pardon many shortcomings. And above all, study, study, study! All the genius in the world will not help you along with any art, unless you become a hard student. It has taken me years to master a single part.""\n[0.0031] ""You are capable of something much better than what you are doing. You must start out to-day with a firm resolution to make the returns from your work greater to-night than ever before. You must make this a red-letter day. Bestir yourself; get the cobwebs out of your head; brush off the brain ash. Think, think, think to some purpose! Do not mull and mope like this. You are only half-alive, man; get a move on you!""\n[0.0030] Adopt this motto as yours. Hang it up in your bedroom, in your office or place of business, put it into your pocket-book, weave it into the texture of everything you do, and your life-work will be what every one's should be--A MASTERPIECE.\n\nCHAPTER XXIII\n\nTHE REWARD OF PERSISTENCE\n\nEvery noble work is at first impossible.--CARLYLE.\n[0.0030] ""Many persons seeing me so much engaged in active life,"" said Edward Bulwer Lytton, ""and as much about the world as if I had never been a student, have said to me, 'When do you get time to write all your books? How on earth do you contrive to do so much work?' I shall surprise you by the answer I made. The answer is this--'I contrive to do so much by never doing too much at a time. A man to get through work well must not overwork himself; or, if he do too much to-day, the reaction of fatigue\n[0.0030] No matter where you go, study the situation. Think why the man does not do better if he is not doing well, why he remains in mediocrity all his life. If he is making a remarkable success, try to find out why. Keep your eyes open, your ears open. Make deductions from what you see and hear. Trace difficulties; look up evidences of success or failure everywhere. It will be one of the greatest factors in your own success.\n\nCHAPTER XXX\n\nSELF\n[0.0030] Edwin Chadwick, in his report to the British Parliament, stated that children, working on half time (that is, studying three hours a day and working the rest of their time out of doors), really made the greatest intellectual progress during the year. Business men have often accomplished wonders during the busiest lives by simply devoting one, two, three, or four hours daily to study or other literary work.\n[0.0030] The way to get the best out of yourself is to put things right up to yourself, handle yourself without gloves, and talk to yourself as you would to a son of yours who has great ability but who is not using half of it.\n\nWhen you go into an undertaking just say to yourself, ""Now, this thing is right up to me. I've got to make good, to show the man in me or the coward. There is no backing out.""\n[0.0030] ""Let us not be content to mine the most coal, to make the largest locomotives, to weave the largest quantities of carpets; but, amid the sounds of the pick, the blows of the hammer, the rattle of the looms, and the roar of the machinery, take care that the immortal mechanism of God's own hand--the mind--is still full-trained for the highest and noblest service.""\n[0.0029] This striving for excellence will make you grow. It will call out your resources, call out the best thing in you. The constant stretching of the mind over problems which interest you, which are to mean everything to you in the future, will help you expand into a broader, larger, more effective man.\n[0.0029] A successful manufacturer says: ""If you make a good pin, you will earn more money than if you make a bad steam engine."" ""If a man can write a better book, preach a better sermon, or make a better mousetrap than his neighbor,"" says Emerson, ""though he build his house in the woods, the world will make a path to his door."""


Quantization is a powerful tool for optimizing vector database performance, especially in applications that handle high-dimensional embeddings like semantic search and recommendation systems. This tutorial demonstrated the implementation of scalar and binary quantization methods using Nomic embeddings with MongoDB as the vector database. 
When leveraged appropriately, effective optimization extends beyond latency improvements. It enables scalability, reduces operational costs, and enhances application user experience. The Benefits of Database Optimization:
Latency Reduction for Improved User Experience: Minimizing delays in data retrieval enhances user satisfaction and engagement.
Efficient Handling of Large-Scale Data: Optimized databases can more effectively manage vast amounts of data, improving performance and scalability.

Cost Reduction and Resource Efficiency: Efficient data storage and retrieval reduce the need for excessive computational resources, leading to cost savings.
By examining the trade-offs between retrieval accuracy and performance across different embedding formats (float32, int8, and binary), we showcased how MongoDB's capabilities, such as vector indexing and automatic quantization, can streamline data storage, retrieval, and analysis. 

From this tutorial, we’ve explored Atlas Vector Search native capabilities for scalar quantization as well as binary quantization with rescoring. Our implementation showed that automatic quantization increases scalability and cost savings by reducing the storage and computational resources for efficient processing of vectors. In most cases, automatic quantization reduces the RAM for mongot by 3.75x for scalar and by 24x for binary; the vector values shrink by 4x and 32x, respectively, but the Hierarchical Navigable Small Worlds graph itself does not shrink.

We recommend automatic quantization if you have a large number of full-fidelity vectors, typically over 10M vectors. After quantization, you index reduced representation vectors without compromising the accuracy of your retrieval.
To further explore quantization techniques and their applications, refer to resources like [Ingesting Quantized Vectors with Cohere](https://www.mongodb.com/developer/products/atlas/ingesting_quantized_vectors_with_cohere/). An [additional notebook](https://github.com/mongodb-developer/GenAI-Showcase/blob/main/notebooks/advanced_techniques/advanced_evaluation_of_quantized_vectors_using_cohere_mongodb_beir.ipynb) for comparing retrieval accuracy between quantized and non-quantized vectors is also available to deepen your understanding of these methods.