# Re-Ranking using Qdrant's `prefetch` and Matryoshka Representation Learning

## Install libraries

In [None]:
!pip install qdrant-client
!pip install -U sentence-transformers
!pip install datasets

Collecting qdrant-client
  Downloading qdrant_client-1.13.3-py3-none-any.whl.metadata (10 kB)
Collecting grpcio-tools>=1.41.0 (from qdrant-client)
  Downloading grpcio_tools-1.71.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.3 kB)
Collecting portalocker<3.0.0,>=2.7.0 (from qdrant-client)
  Downloading portalocker-2.10.1-py3-none-any.whl.metadata (8.5 kB)
Collecting protobuf<6.0dev,>=5.26.1 (from grpcio-tools>=1.41.0->qdrant-client)
  Downloading protobuf-5.29.4-cp38-abi3-manylinux2014_x86_64.whl.metadata (592 bytes)
Downloading qdrant_client-1.13.3-py3-none-any.whl (306 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m306.7/306.7 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading grpcio_tools-1.71.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m40.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading portalocker-2.10.1-py3-none-any

# Dataset

Use HuggingFace's datasets library to download the dataset and load it into your session. This library is quick, efficient and will allow you to manipulate unstructured data in other ways.

> more information about the dataset can be found [here](https://github.com/nickprock/qdrant_examples/blob/master/qdrant_101_text_data/qdrant_and_text_data.ipynb)

In [None]:
from datasets import load_dataset

In [None]:
dataset = load_dataset("ag_news", split="train")
dataset

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/8.07k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/18.6M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/1.23M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/120000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/7600 [00:00<?, ? examples/s]

Dataset({
    features: ['text', 'label'],
    num_rows: 120000
})

## Retrieval and Reranking

<br>

![retr_rerank](https://miro.medium.com/v2/resize:fit:1100/format:webp/1*PWYRflN04fldhY19Qi_CLg.png)

<br>

Retrieval alone often produces a list of documents that may not be in the most relevant order.
While retrieval identifies potentially relevant documents, it doesn't consider factors like the specific query terms, document context, or semantic relationships.
Reranking uses algorithms to evaluate these factors and reorder the results to present the most likely matches at the top.

**Retrieval and reranking are crucial for efficient information retrieval systems.**

## Matryoshka Representation Learning

Matryoshka Representation Learning (MRL) is an advanced machine learning approach that encodes data at multiple levels of granularity within a single vector representation.
Like a Matryoshka doll the levels are nested in one embedding, the greater the number of levels, the more detail the embedding maps.

As embedder we use `mixedbread-ai/mxbai-embed-xsmall-v1` with vector size `[64, 128, 384]`.

In [None]:
from sentence_transformers import SentenceTransformer

matryoshka_dim = [64, 128, 384]

model_64 = SentenceTransformer(
    "mixedbread-ai/mxbai-embed-xsmall-v1",
    trust_remote_code=True,
    truncate_dim=matryoshka_dim[0],
)

model_256 = SentenceTransformer(
    "mixedbread-ai/mxbai-embed-xsmall-v1",
    trust_remote_code=True,
    truncate_dim=matryoshka_dim[1],
)

model_full = SentenceTransformer(
    "mixedbread-ai/mxbai-embed-xsmall-v1",
    trust_remote_code=True,
    truncate_dim=matryoshka_dim[2],
)

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/195 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/81.2k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/675 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/48.2M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

### Embed dataset

In [None]:
vec_64 = model_64.encode(dataset["text"])
vec_256 = model_256.encode(dataset["text"])
vec_full = model_full.encode(dataset["text"])

## Qdrant

### Create Qdrant's collections.

For the experiment we create 3 collections, one with multiple vectors on which nested prefetch will be applied and two classical collections, with the minimum and maximum size vectors.

In [None]:
from qdrant_client import QdrantClient
from qdrant_client.http.models import VectorParams, Distance

client = QdrantClient(
    ":memory:",
    timeout=None
    )

client.create_collection(
    collection_name="multiple_vectors",
    vectors_config={
        "vec_64": VectorParams(
            size=64,
            distance=Distance.COSINE,
        ),
        "vec_128": VectorParams(
            size=128,
            distance=Distance.COSINE,
        ),
        "vec_384": VectorParams(
            size=384,
            distance=Distance.COSINE,
        ),
    },
)

client.create_collection(
   collection_name="single_vector_64",
   vectors_config=VectorParams(
       size=64,
       distance=Distance.COSINE,
   )
)

client.create_collection(
   collection_name="single_vector_full",
   vectors_config=VectorParams(
       size=384,
       distance=Distance.COSINE,
   )
)

True

Points in collections are imported.

In [None]:
from qdrant_client.http.models import PointStruct

for index, row in enumerate(dataset):
    client.upsert(
        collection_name="multiple_vectors",
        points=[
            PointStruct(
                id=index,
                vector={
                    "vec_64": vec_64[index],
                    "vec_128": vec_256[index],
                    "vec_384": vec_full[index],
                },
                payload={
                    "text": row["text"],
                    "label": row["label"],
                }
            )
        ]
    )

    client.upsert(
        collection_name="single_vector_64",
        points=[
            PointStruct(
                id=index,
                vector=vec_64[index],
                payload={
                    "text": row["text"],
                    "label": row["label"],
                }
            )
        ]
    )

    client.upsert(
        collection_name="single_vector_full",
        points=[
            PointStruct(
                id=index,
                vector=vec_full[index],
                payload={
                    "text": row["text"],
                    "label": row["label"],
                }
            )
        ]
    )

  client.upsert(


## Query Experiments

### Classical Approach

The first experiment is a classic retrieval and re-rank pipeline, I will use `mixedbread-ai/mxbai-rerank-xsmall-v1` as the reranker, I calculate the time it takes the pipeline to return results.



In [None]:
from datetime import datetime

query = "Russian scientists spot alien spacecraft"

In [None]:
start_time = datetime.now()

search_result_64 = client.query_points(
    collection_name="single_vector_64",
    query=model_64.encode(query),
    with_payload=True,
    limit=100
).points

end_time = datetime.now()

retrieve_time = end_time - start_time

print('Duration: {}'.format(retrieve_time))
print("\n")
print(search_result_64)

Duration: 0:00:00.064587


[ScoredPoint(id=63224, version=0, score=0.6377151631747799, payload={'text': 'Science ; Russian-US crew goes to space, AIDS experiments planned Science News, Kazakhstan - A Russian-US crew blasted off bound for the International Space Station where they will spend six months conducting experiments including work on the search for an AIDS vaccine.', 'label': 3}, vector=None, shard_key=None, order_value=None), ScoredPoint(id=71605, version=0, score=0.6195525861056244, payload={'text': 'US, Russian Astronauts Land Safely in Kazakhstan Russian cosmonaut Gennady Padalka and American astronaut Mike Fincke had been in space since April. While in space they carried out four space walks, including one crucial mission to repair ', 'label': 3}, vector=None, shard_key=None, order_value=None), ScoredPoint(id=70139, version=0, score=0.6035118942769271, payload={'text': 'Russian - US Space Crew Lands in Kazakhstan A Russian Soyuz craft has landed in Kazakhstan bringing an A

In [None]:
from sentence_transformers import CrossEncoder

ranker = CrossEncoder("mixedbread-ai/mxbai-rerank-xsmall-v1")

A cross encoder needs pairs `[query, passages]` to calculate scores, then we need to reorder documents by score and select the first N results.

In [None]:
N = 10

start_time = datetime.now()
retrieved_documents = [[query, res.payload["text"]] for res in search_result_64]
scores = ranker.predict(retrieved_documents)

# Sort the scores in decreasing order
results = [{"input": inp, "score": score} for inp, score in zip(retrieved_documents, scores)]
results = sorted(results, key=lambda x: x["score"], reverse=True)[:N]
end_time = datetime.now()
reranking_time = end_time - start_time

print('Duration: {}'.format(reranking_time))

Duration: 0:00:00.372171


In [None]:
total_duration = retrieve_time + reranking_time
print('Total Duration: {}'.format(total_duration))

Total Duration: 0:00:00.436758


In [None]:
results

## Qdrant prefetch approach

The idea is to refine the retrieval adding dimension on the embeddings decreasing the number of documents on which to search, a larger size means more detail but on fewer documents.
The number of documents and the embedder of the “external query” is the same of the classic approach.

In [None]:
from qdrant_client.models import Prefetch

In [None]:
start_time = datetime.now()

search_result_MV = client.query_points(
    collection_name="multiple_vectors",
    prefetch=Prefetch(
        prefetch=Prefetch(
            query=model_64.encode(query),  # <------ small byte vector
            using="vec_64",
            limit=100,
        ),
        query=model_256.encode(query),  # <-- mid byte vector
        using="vec_128",
        limit=50,
    ),
    query=model_full.encode(query), # <-- full dense vector
    with_payload=True,
    using="vec_384",
    limit=10,
).points

end_time = datetime.now()

print('Duration: {}'.format(end_time - start_time))
print("\n")
print(search_result_MV)

Duration: 0:00:01.743438


[ScoredPoint(id=99, version=0, score=0.6027291863477137, payload={'text': 'Russian Alien Spaceship Claims Raise Eyebrows, Skepticism (SPACE.com) SPACE.com - An expedition of Russian researchers claims to have found evidence that an \\  alien spaceship had something to do with a huge explosion over Siberia in 1908. \\  Experts in asteroids and comets have long said the massive blast was caused \\  by a space rock.', 'label': 3}, vector=None, shard_key=None, order_value=None), ScoredPoint(id=70139, version=0, score=0.5248586328396725, payload={'text': 'Russian - US Space Crew Lands in Kazakhstan A Russian Soyuz craft has landed in Kazakhstan bringing an American astronaut and two Russian cosmonauts back from the International Space Station or ISS.', 'label': 3}, vector=None, shard_key=None, order_value=None), ScoredPoint(id=100415, version=0, score=0.5145111616990831, payload={'text': 'Russia sends scientist to jail for spying A Russian court has sentenced a ph

In [None]:
for p in search_result_MV:
  print("'",p.payload['text'], "', 'score: '", p.score)
  print("\n")

' Russian Alien Spaceship Claims Raise Eyebrows, Skepticism (SPACE.com) SPACE.com - An expedition of Russian researchers claims to have found evidence that an \  alien spaceship had something to do with a huge explosion over Siberia in 1908. \  Experts in asteroids and comets have long said the massive blast was caused \  by a space rock. ', 'score: ' 0.6027291863477137


' Russian - US Space Crew Lands in Kazakhstan A Russian Soyuz craft has landed in Kazakhstan bringing an American astronaut and two Russian cosmonauts back from the International Space Station or ISS. ', 'score: ' 0.5248586328396725


' Russia sends scientist to jail for spying A Russian court has sentenced a physicist Valentin Danilov to 14 years in a Siberian prison for passing space secrets to China. Danilov, 53, a professor at Krasnoyarsk  ', 'score: ' 0.5145111616990831


' Science ; Russian-US crew goes to space, AIDS experiments planned Science News, Kazakhstan - A Russian-US crew blasted off bound for the Inte

> check [Hybrid Queries](https://qdrant.tech/documentation/concepts/hybrid-queries/#hybrid-and-multi-stage-queries) article on Qdrant's blog