# Testing different representation methods

Dense retrieval is easy to start with, but it does not provide the most accurate answers in all the cases. Sometimes, we need exact keyword matching and in that cases sparse vectors, such as BM25 might be more appropriate. They definitely excel at proper names detection, and we may need that to search over our datasets, with specific company constraints in mind. Let's add another representation method and build hybrid search using both of them.

In [None]:
from dotenv import load_dotenv

load_dotenv()

## Setting up another Qdrant collection for multiple vectors per point

If we want to use different search methods, we need to store multiple vectors per point in one collection. It's easier that way, as multi-stage retrieval pipelines might be launched in a single API call.

In [None]:
# See: https://qdrant.github.io/fastembed/examples/Supported_Models/#supported-text-embedding-models
COLLECTION_NAME = "hackernews-hybrid-rag"

# Dense retrieval
MODEL_NAME = "BAAI/bge-small-en-v1.5"
VECTOR_SIZE = 384
VECTOR_NAME = "bge-small-en-v1.5"

# Sparse model
BM25_MODEL_NAME = "Qdrant/bm25"
BM25_VECTOR_NAME = "bm25"

# Token-level representations
MUTLIVECTOR_MODEL_NAME = "colbert-ir/colbertv2.0"
MULTIVECTOR_SIZE = 128
MULTIVECTOR_NAME = "colbertv2.0"

In [None]:
from qdrant_client import QdrantClient, models

import os

client = QdrantClient(
    os.environ.get("QDRANT_URL"), 
    api_key=os.environ.get("QDRANT_API_KEY"),
)

In [None]:
client.create_collection(
    collection_name=COLLECTION_NAME,
    vectors_config={
        VECTOR_NAME: models.VectorParams(
            size=VECTOR_SIZE,
            distance=models.Distance.COSINE,
        ),
        MULTIVECTOR_NAME: models.VectorParams(
            size=MULTIVECTOR_SIZE,
            distance=models.Distance.DOT,
            multivector_config=models.MultiVectorConfig(
                comparator=models.MultiVectorComparator.MAX_SIM
            ),
            # Disable HNSW for reranking
            hnsw_config=models.HnswConfigDiff(m=0),
        ),
    },
    sparse_vectors_config={
        BM25_VECTOR_NAME: models.SparseVectorParams(
            modifier=models.Modifier.IDF,
        ),
    },
)

## Migrating to multiple vectors

There is no need to recreate the previously created dense embeddings, as we can migrate them from the previous collection and avoid recomputations. In the meantime we'll still create sparse and multi-vector representations. Also, since we agreed we need more context, we don't really need to store the points without more detailed text description of a submission, so let's filter them out.

In [None]:
OLD_COLLECTION_NAME = "hackernews-rag"

In [None]:
last_offset = None
while True:
    # Get a batch of records
    records, last_offset = client.scroll(
        collection_name=OLD_COLLECTION_NAME, 
        scroll_filter=models.Filter(
            must_not=[
                # Lack of field
                models.IsEmptyCondition(
                    is_empty=models.PayloadField(key="text"),
                ),
                # Field set to null value
                models.IsNullCondition(
                    is_null=models.PayloadField(key="text"),
                ),
                # Field set to an empty string
                models.FieldCondition(
                    key="text",
                    match=models.MatchValue(value=""),
                ),
            ],
        ),
        offset=last_offset,
        with_payload=True,
        with_vectors=True,
        limit=10,
    )

    # Migrate them to a new collection
    client.upsert(
        collection_name=COLLECTION_NAME,
        points=[
            models.PointStruct(
                id=record.id,
                vector={
                    # Copy the dense embedding directly
                    VECTOR_NAME: record.vector[VECTOR_NAME],
                    # Calculate BM25 embedding
                    BM25_VECTOR_NAME: models.Document(
                        text=f"{record.payload['title']} {record.payload['text']}",
                        model=BM25_MODEL_NAME,
                    ),
                    # Calculate ColBERT embeddings as well
                    MULTIVECTOR_NAME: models.Document(
                        text=f"{record.payload['title']} {record.payload['text']}",
                        model=MUTLIVECTOR_MODEL_NAME,
                    ),
                },
                payload=record.payload,
            )
            for record in records
        ]
    )

    # Stop when the last batch has been already processed
    if last_offset is None:
        break

In [None]:
client.recover_snapshot(
    collection_name=COLLECTION_NAME,
    # Please do not modify the URL below
    location="https://storage.googleapis.com/tutorials-snapshots-bucket/workshop-improving-r-in-rag/hackernews-hybrid-rag.snapshot",
    wait=False, # Loading a snapshot may take some time, so let's avoid a timeout
)

## Experimenting with Hybrid Search

Our previous attempts to use dense retrieval to find some Qdrant-specific data weren't succesful. Let's try to build a better retriever that will use keyword-based search retrieval and dense reranking, so it hopefully capture more nuances.

In [None]:
def retrieve_dual(q: str, n_docs: int) -> list[str]:
    """
    Retrieve documents based on the provided query
    with BM25 retrieval and dense reranking.
    """
    result = client.query_points(
        collection_name=COLLECTION_NAME,
        prefetch=[
            models.Prefetch(
                query=models.Document(
                    text=q,
                    model=BM25_MODEL_NAME,
                ),
                using=BM25_VECTOR_NAME,
                # Prefetch ten times more!
                limit=(n_docs * 10)
            ),
        ],
        query=models.Document(
            text=q,
            model=MODEL_NAME,
        ),
        using=VECTOR_NAME,
        limit=n_docs,
    )
    docs = [
        f"{point.payload['title']} {point.payload['text']}"
        for point in result.points
    ]
    return docs

In [None]:
retrieve_dual("What does Qdrant do?", n_docs=10)

In [None]:
from any_llm import acompletion
from typing import Callable

RetieverFunc = Callable[[str, int], list[str]]

LLM_NAME = "claude-sonnet-4-20250514"

async def rag(q: str, retrieve_func: RetieverFunc, *, n_docs: int = 10) -> str:
    """
    Run single-turn RAG on a given input query.
    Return just the model response.
    """
    docs = retrieve_func(q, n_docs)
    messages = [
        {
            "role": "user",
            "content": (
                "Please provide a response to my question based only " +
                "on the provided context and only it. If it doesn't " +
                "contain any helpful information, please let me know " +
                "and admit you cannot produce relevant answer.\n" +
                f"<context>{'\n'.join(docs)}</context>\n" +
                f"<question>{q}</question>"
            )
        }
    ]
    response = await acompletion(
        provider=os.environ.get("LLM_PROVIDER"),
        model=LLM_NAME,
        messages=messages,
    )
    return response.choices[0].message.content

In [None]:
response = await rag(
    "What does Qdrant do?", 
    retrieve_func=retrieve_dual
)
print(response)

In [None]:
retrieve_dual("How do I perform a KNN search on a large scale?", n_docs=10)

### More sophisticated reranking

Sparse retrieval and dense reranking might be a useful strategy, but it cannot support all the possible search queries. If we cannot capture a particular semantic match using sparse vectors, then dense reranking won't even see it, so it'll never get retrieved. That's why it pretty common to use both methods for prefetching, and something else for reranking, so we can have the best of both worlds.

In the simplest case, we can run both prefetches and combine the results with fusion based on the ranks as returned by the individual methods.

In [None]:
def retrieve_fusion(q: str, n_docs: int) -> list[str]:
    """
    Retrieve documents based on the provided query
    with BM25 and dense retrieval + fusion to merge them.
    """
    result = client.query_points(
        collection_name=COLLECTION_NAME,
        prefetch=[
            models.Prefetch(
                query=models.Document(
                    text=q,
                    model=BM25_MODEL_NAME,
                ),
                using=BM25_VECTOR_NAME,
                limit=n_docs,
            ),
            models.Prefetch(
                query=models.Document(
                    text=q,
                    model=MODEL_NAME,
                ),
                using=VECTOR_NAME,
                limit=n_docs,
            ),
        ],
        # Reciprocal Rank Fusion works on the rankings
        query=models.FusionQuery(fusion=models.Fusion.RRF),
        limit=n_docs,
    )
    docs = [
        f"{point.payload['title']} {point.payload['text']}"
        for point in result.points
    ]
    return docs

In [None]:
retrieve_fusion("How do I perform a KNN search on a large scale?", n_docs=10)

In [None]:
response = await rag(
    "How do I perform a KNN search on a large scale?", 
    retrieve_func=retrieve_fusion
)
print(response)

More complex problems may require running better rerankers to capture the data nuances. That's why we also created ColBERT embeddings, and it's finally time to test them.

In [None]:
def retrieve_colbert_reranking(q: str, n_docs: int) -> list[str]:
    """
    Retrieve documents based on the provided query
    with BM25 and dense retrieval + ColBERT to merge them.
    """
    result = client.query_points(
        collection_name=COLLECTION_NAME,
        prefetch=[
            models.Prefetch(
                query=models.Document(
                    text=q,
                    model=BM25_MODEL_NAME,
                ),
                using=BM25_VECTOR_NAME,
                limit=n_docs,
            ),
            models.Prefetch(
                query=models.Document(
                    text=q,
                    model=MODEL_NAME,
                ),
                using=VECTOR_NAME,
                limit=n_docs,
            ),
        ],
        # Reranking with ColBERT embeddings
        query=models.Document(
            text=q,
            model=MUTLIVECTOR_MODEL_NAME,
        ),
        using=MULTIVECTOR_NAME,
        limit=n_docs,
    )
    docs = [
        f"{point.payload['title']} {point.payload['text']}"
        for point in result.points
    ]
    return docs

In [None]:
response = await rag(
    "How do I perform a KNN search on a large scale?", 
    retrieve_func=retrieve_colbert_reranking
)
print(response)

In [None]:
retrieve_colbert_reranking("How do I perform a KNN search on a large scale?", n_docs=10)