# Scalable late interaction vectors in Elasticsearch: Bit Vectors #

In this notebook, we will be looking at how to convert late interaction vectors to bit vectors to 
1. Save siginificant disk space  
2. Lower query latency
   
We will also look at how we can use hamming distance to speed our queries up even further.  
This notebook builds on part 1 where we downloaded the images, created ColPali vectors and saved them to disk. Please execute this notebook before trying the techniques in this notebook.  
 
Also check out our accompanying blog post on [Scaling Late Interaction Models](TODO) for more context on this notebook. 

This is the key part of this notebook. We use the `to_bit_vectors()` function to convert our vectors into bit vectors.  
The function is simple in essence. Values `> 0` are converted to `1`, values `< 0` are converted to `0`. We then convert our array of `0`s and `1`s to a hex string, that represents our bit vector.  
So don't be surprised that the values that we will be indexing look like strings and not arrays as before. This is intended!  

Learn more about [bit vectors and hamming distance in our blog](https://www.elastic.co/search-labs/blog/bit-vectors-in-elasticsearch) about this topic. 

In [1]:
import numpy as np


def to_bit_vectors(embeddings: list) -> list:
    return [
        np.packbits(np.where(np.array(embedding) > 0, 1, 0))
        .astype(np.int8)
        .tobytes()
        .hex()
        for embedding in embeddings
    ]

Here we are defining our mapping for our Elasticsearch index. Note how we set the `element_type` parameter to `bit` to inform Elasticsearch that we will be indexing bit vectors in this field. 

In [2]:
import os
from dotenv import load_dotenv
from elasticsearch import Elasticsearch

load_dotenv("elastic.env")

ELASTIC_API_KEY = os.getenv("ELASTIC_API_KEY")
ELASTIC_HOST = os.getenv("ELASTIC_HOST")
INDEX_NAME = "searchlabs-colpali-hamming"

es = Elasticsearch(ELASTIC_HOST, api_key=ELASTIC_API_KEY)

mappings = {
    "mappings": {
        "properties": {
            "col_pali_vectors": {"type": "rank_vectors", "element_type": "bit"}
        }
    }
}

if not es.indices.exists(index=INDEX_NAME):
    print(f"[INFO] Creating index: {INDEX_NAME}")
    es.indices.create(index=INDEX_NAME, body=mappings)
else:
    print(f"[INFO] Index '{INDEX_NAME}' already exists.")


def index_document(es_client, index, doc_id, document, retries=10, initial_backoff=1):
    for attempt in range(1, retries + 1):
        try:
            return es_client.index(index=index, id=doc_id, document=document)
        except Exception as e:
            if attempt < retries:
                wait_time = initial_backoff * (2 ** (attempt - 1))
                print(f"[WARN] Failed to index {doc_id} (attempt {attempt}): {e}")
                time.sleep(wait_time)
            else:
                print(f"Failed to index {doc_id} after {retries} attempts: {e}")
                raise

[INFO] Creating index: searchlabs-colpali-hamming


In [3]:
from concurrent.futures import ThreadPoolExecutor
from tqdm.notebook import tqdm
import pickle


def process_file(file_name, vectors):
    if es.exists(index=INDEX_NAME, id=file_name):
        return

    bit_vectors = to_bit_vectors(vectors)

    index_document(
        es_client=es,
        index=INDEX_NAME,
        doc_id=file_name,
        document={"col_pali_vectors": bit_vectors},
    )


with open("col_pali_vectors.pkl", "rb") as f:
    file_to_multi_vectors = pickle.load(f)

with ThreadPoolExecutor(max_workers=10) as executor:
    list(
        tqdm(
            executor.map(
                lambda item: process_file(*item), file_to_multi_vectors.items()
            ),
            total=len(file_to_multi_vectors),
            desc="Indexing documents",
        )
    )

print(f"Completed indexing {len(file_to_multi_vectors)} documents")

Indexing documents:   0%|          | 0/500 [00:00<?, ?it/s]

Completed indexing 500 documents


In [4]:
import torch
from PIL import Image
from colpali_engine.models import ColPali, ColPaliProcessor

model_name = "vidore/colpali-v1.3"
model = ColPali.from_pretrained(
    "vidore/colpali-v1.3",
    torch_dtype=torch.float32,
    device_map="mps",  # "mps" for Apple Silicon, "cuda" if available, "cpu" otherwise
).eval()

col_pali_processor = ColPaliProcessor.from_pretrained(model_name)


def create_col_pali_query_vectors(query: str) -> list:
    queries = col_pali_processor.process_queries([query]).to(model.device)
    with torch.no_grad():
        return model(**queries).tolist()[0]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [5]:
from IPython.display import display, HTML
import os
import json

DOCUMENT_DIR = "searchlabs-colpali"

query = "What do companies use for recruiting?"
query_vector = to_bit_vectors(create_col_pali_query_vectors(query))
es_query = {
    "_source": False,
    "query": {
        "script_score": {
            "query": {"match_all": {}},
            "script": {
                "source": "maxSimInvHamming(params.query_vector, 'col_pali_vectors')",
                "params": {"query_vector": query_vector},
            },
        }
    },
    "size": 5,
}
print(json.dumps(es_query))

results = es.search(index=INDEX_NAME, body=es_query)
image_ids = [hit["_id"] for hit in results["hits"]["hits"]]

html = "<div style='display: flex; flex-wrap: wrap; align-items: flex-start;'>"
for image_id in image_ids:
    image_path = os.path.join(DOCUMENT_DIR, image_id)
    html += f'<img src="{image_path}" alt="{image_id}" style="max-width:300px; height:auto; margin:10px;">'
html += "</div>"

display(HTML(html))

{"_source": false, "query": {"script_score": {"query": {"match_all": {}}, "script": {"source": "maxSimInvHamming(params.query_vector, 'col_pali_vectors')", "params": {"query_vector": ["7747bcd9732859c3645aa81036f5c960", "729b3c418ba8594a67daa042eca1c961", "609e3d8a2ac379c2204aa0cfa8345bdc", "30bf378a2ac279da245aa8dfa83c3bdc", "64af77ea2acdf9c28c0aa5df863677f4", "686f3fce2ac871c26e6aaddf023455ec", "383f31a8e8c0f8ca2c4ab54f047c7dec", "203b33caaac279da0acaa54f8a3c6bcc", "319a63eba8d279ca30dbbccf8f757b8e", "203b73ca28d2798a325bb44f8c3c5bce", "203bb7caa8d2718a1a4bb14f8a3c5bdc", "203bb7caa8d2798a1a6aa14f8a3c5fdc", "303b33caa8d2798a0a4aa14f8a3c5bdc", "303b33caaad379ca0e4aa14f8a3c5bdc", "709b33caaac379ca0c4aa14f8a3c5fdc", "708e37eaaac779ca2c4aa1df863c1fdc", "648e77ea6acd79caac4ae1df86363ffc", "648e77ea6acdf9caac4ae5df06363ffc", "608f37ea2ac579ca2c4ea1df063c3ffc", "709f37c8aac379ca2c4ea1df863c1fdc", "70af31c82ac671ce2c6ab14fc43c1bfc"]}}}}, "size": 5}


Above we have seen how to query our data using the `maxSimInvHamming(...)` function.  
We can also just pass the full fidelity col pali vector and use the `maxSimDotProduct(...)` function for [asymmetric similarity](https://www.elastic.co/guide/en/elasticsearch/reference/8.18/rank-vectors.html#rank-vectors-scoring) between the vectors. 

In [6]:
query = "What do companies use for recruiting?"
query_vector = create_col_pali_query_vectors(query)
es_query = {
    "_source": False,
    "query": {
        "script_score": {
            "query": {"match_all": {}},
            "script": {
                "source": "maxSimDotProduct(params.query_vector, 'col_pali_vectors')",
                "params": {"query_vector": query_vector},
            },
        }
    },
    "size": 5,
}

results = es.search(index=INDEX_NAME, body=es_query)
image_ids = [hit["_id"] for hit in results["hits"]["hits"]]

html = "<div style='display: flex; flex-wrap: wrap; align-items: flex-start;'>"
for image_id in image_ids:
    image_path = os.path.join(DOCUMENT_DIR, image_id)
    html += f'<img src="{image_path}" alt="{image_id}" style="max-width:300px; height:auto; margin:10px;">'
html += "</div>"

display(HTML(html))

In [None]:
# We kill the kernel forcefully to free up the memory from the ColPali model.
print("Shutting down the kernel to free memory...")
import os

os._exit(0)