# Binary Quantization with Qdrant & OpenAI Embedding

---
In the world of large-scale data retrieval and processing, efficiency is crucial. With the exponential growth of data, the ability to retrieve information quickly and accurately can significantly affect system performance. This blog post explores a technique known as binary quantization applied to OpenAI embeddings, demonstrating how it can enhance **retrieval latency by 20x** or more.

## What Are OpenAI Embeddings?
OpenAI embeddings are numerical representations of textual information. They transform text into a vector space where semantically similar texts are mapped close together. This mathematical representation enables computers to understand and process human language more effectively.

## Binary Quantization
Binary quantization is a method which converts continuous numerical values into binary values (0 or 1). It simplifies the data structure, allowing faster computations. Here's a brief overview of the binary quantization process applied to OpenAI embeddings:

1. **Load Embeddings**: OpenAI embeddings are loaded from parquet files.
2. **Binary Transformation**: The continuous valued vectors are converted into binary form. Here, values greater than 0 are set to 1, and others remain 0.
3. **Comparison & Retrieval**: Binary vectors are used for comparison using logical XOR operations and other efficient algorithms.

Binary Quantization is a promising approach to improve retrieval speeds and reduce memory footprint of vector search engines. In this notebook we will show how to use Qdrant to perform binary quantization of vectors and perform fast similarity search on the resulting index.

## Table of Contents
1. Imports
2. Download and Slice Dataset
3. Create Qdrant Collection
4. Indexing
5. Search

## 1. Imports

In [None]:
!pip install qdrant-client pandas dataset --quiet --upgrade

In [None]:
import pandas as pd
from qdrant_client import QdrantClient, models

## 2. Download and Slice Dataset

We will be using the [dbpedia-entities](https://huggingface.co/datasets/Qdrant/dbpedia-entities-openai3-text-embedding-3-small-1536-100K) dataset from the [HuggingFace Datasets](https://huggingface.co/datasets) library. This contains 100K vectors of 1536 dimensions each

In [None]:
import datasets

dataset = datasets.load_dataset(
    "Qdrant/dbpedia-entities-openai3-text-embedding-3-small-1536-100K", split="train"
)

In [None]:
len(dataset)
# dataset[0]

In [None]:
client = QdrantClient(
    timeout=600,
    prefer_grpc=True,
)

collection_name = "binary-quantization"
client.recreate_collection(
    collection_name=f"{collection_name}",
    vectors_config=models.VectorParams(
        size=1536,
        distance=models.Distance.DOT,
        on_disk=True,
    ),
    quantization_config=models.BinaryQuantization(
        binary=models.BinaryQuantizationConfig(always_ram=True),
    ),
)

In [None]:
import os


bs = 1000
for i in range(0, len(dataset), bs):
    client.upload_collection(
        collection_name=collection_name,
        ids=range(i, i + bs),
        vectors=dataset[i : i + bs]["openai"],
        payload=[{"text": x} for x in dataset[i : i + bs]["text"]],
        parallel=max(1, (os.cpu_count() // 2)),
    )

In [None]:
collection_info = client.get_collection(collection_name=f"{collection_name}")
collection_info.dict()

## Oversampling vs Recall

### Preparing a query dataset

For the purpose of this illustration, we'll take a few vectors which we know are already in the index and query them. We should get the same vectors back as results from the Qdrant index. 

In [None]:
import random
from random import randint

random.seed(37)

query_indices = [randint(0, len(dataset)) for _ in range(100)]
query_dataset = dataset[query_indices]
query_indices

In [None]:
## Add Gaussian noise to any vector
import numpy as np

np.random.seed(37)


def add_noise(vector, noise=0.05):
    return vector + noise * np.random.randn(*vector.shape)

In [None]:
import time


def correct(results, text):
    result_texts = [x.payload["text"] for x in results]
    return text in result_texts


def count_correct(query_dataset, limit=1, oversampling=1, rescore=False):
    correct_results = 0
    for qv, text in zip(query_dataset["openai"], query_dataset["text"]):
        results = client.search(
            collection_name=collection_name,
            query_vector=add_noise(np.array(qv)),
            limit=limit,
            search_params=models.SearchParams(
                quantization=models.QuantizationSearchParams(
                    ignore=False,
                    rescore=rescore,
                    oversampling=oversampling,
                )
            ),
        )
        correct_results += correct(results, text)
    return correct_results


limit_grid = [1, 3, 5, 10, 20, 50]
# limit_grid = [1, 3, 5]
oversampling_grid = [1.0, 1.5, 2.0, 3.0, 5.0]
# oversampling_grid = [1.0, 1.5, 2.0]
rescore_grid = [False, True]
results = []
for limit in limit_grid:
    for oversampling in oversampling_grid:
        for rescore in rescore_grid:
            # print(f"limit={limit}, oversampling={oversampling}, rescore={rescore}")
            start = time.time()
            correct_results = count_correct(
                query_dataset, limit=limit, oversampling=oversampling, rescore=rescore
            )
            end = time.time()
            results.append(
                {
                    "limit": limit,
                    "oversampling": oversampling,
                    "rescore": rescore,
                    "correct": correct_results,
                    "total queries": len(query_dataset["text"]),
                    "time": end - start,
                }
            )

results_df = pd.DataFrame(results)
results_df

In [None]:
df = results_df.copy()
df["candidates"] = df["oversampling"] * df["limit"]
df[["candidates", "rescore", "time"]]
# df.to_csv("candidates-rescore-time.csv", index=False)