# Qdrant Essentials: Day 3 - Building Hybrid Search in Qdrant

Let's see how hybrid search might be implemented with Qdrant's Universal Query API.

## Step 1: Install the dependencies

In [None]:
!pip install -q qdrant-client[fastembed]

## Step 2: Connect to Qdrant

Let's connect to a running [Qdrant Cloud](https://cloud.qdrant.io/) cluster and create a collection containing both sparse and dense named vectors.

In [None]:
from qdrant_client import QdrantClient
from google.colab import userdata

client = QdrantClient(
    location="https://your-cluster-url.cloud.qdrant.io:6333",
    api_key=userdata.get("api-key")
)

In [None]:
from qdrant_client import models

# Define the collection name
collection_name = "hybrid_search_demo"

# Create our collection with both sparse (bm25) and dense vectors
client.create_collection(
    collection_name=collection_name,
    vectors_config={
        "dense": models.VectorParams(
            distance=models.Distance.COSINE,
            size=384,
        ),
    },
    sparse_vectors_config={
        "sparse": models.SparseVectorParams(
            modifier=models.Modifier.IDF
        )
    }
)

True

## Step 3: Upload the data into the collection

Now, we have a collection that allows us to store two vectors per point, and we can finally fill it with data.

In [None]:
documents = [
    "Aged Gouda develops a crystalline texture and nutty flavor profile after 18 months of maturation.",
    "Mature Gouda cheese becomes grainy and develops a rich, buttery taste with extended aging.",
    "Brie cheese features a soft, creamy interior surrounded by an edible white rind.",
    "This French cheese has a flowing, buttery center encased in a bloomy white crust.",
    "Fresh mozzarella pairs beautifully with ripe tomatoes and basil leaves.",
    "Classic Margherita pizza topped with tomato sauce, mozzarella, and fresh basil.",
    "Parmesan requires at least 12 months of cave aging to develop its signature sharp taste.",
    "Parmigiano-Reggiano's distinctive piquant flavor comes from extended maturation in controlled environments.",
    "Grilled cheese sandwiches are the ultimate American comfort food for cold winter days.",
    "Croque Monsieur combines ham and Gruyère in France's answer to the toasted cheese sandwich.",
]

In [None]:
import uuid

client.upsert(
    collection_name=collection_name,
    points=[
        models.PointStruct(
            id=uuid.uuid4().hex,
            vector={
                "dense": models.Document(
                    text=doc,
                    model="sentence-transformers/all-MiniLM-L6-v2",
                ),
                "sparse": models.Document(
                    text=doc,
                    model="Qdrant/bm25",
                ),
            },
            payload={"text": doc},
        )
        for i, doc in enumerate(documents)
    ]
)

UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>)

## Step 4: Validating the outputs of sparse and dense search

Both of our models may return completely different sets of results for the same query. Let's check if that's the case.

In [None]:
def dense_search(query: str) -> list[models.ScoredPoint]:
    response = client.query_points(
        collection_name=collection_name,
        query=models.Document(
            text=query,
            model="sentence-transformers/all-MiniLM-L6-v2",
        ),
        using="dense",
        limit=3,
    )
    return response.points

In [None]:
def sparse_search(query: str) -> list[models.ScoredPoint]:
    response = client.query_points(
        collection_name=collection_name,
        query=models.Document(
            text=query,
            model="Qdrant/bm25",
        ),
        using="sparse",
        limit=3,
    )
    return response.points

Let's run both methods on some of the possible queries, to see if the outputs really differ.

In [None]:
queries = [
    "nutty aged cheese",
    "soft French cheese",
    "pizza ingredients",
    "a good lunch",
]

In [None]:
for query in queries:
    print("Query:", query)

    dense_results = dense_search(query)
    print("Dense Results:")
    for result in dense_results:
        print("\t-", result.payload["text"], result.score)

    sparse_results = sparse_search(query)
    print("Sparse Results:")
    for result in sparse_results:
        print("\t-", result.payload["text"], result.score)
    print()

Query: nutty aged cheese
Dense Results:
	- Mature Gouda cheese becomes grainy and develops a rich, buttery taste with extended aging. 0.5829767
	- Brie cheese features a soft, creamy interior surrounded by an edible white rind. 0.47647107
	- This French cheese has a flowing, buttery center encased in a bloomy white crust. 0.45055333
Sparse Results:
	- Aged Gouda develops a crystalline texture and nutty flavor profile after 18 months of maturation. 5.1563325
	- Mature Gouda cheese becomes grainy and develops a rich, buttery taste with extended aging. 3.0210652
	- Parmesan requires at least 12 months of cave aging to develop its signature sharp taste. 1.8819332

Query: soft French cheese
Dense Results:
	- This French cheese has a flowing, buttery center encased in a bloomy white crust. 0.6242111
	- Brie cheese features a soft, creamy interior surrounded by an edible white rind. 0.60305476
	- Croque Monsieur combines ham and Gruyère in France's answer to the toasted cheese sandwich. 0.468

## Step 5: Hybrid Search with Reciprocal Rank Fusion

Scores coming from both methods are incompatible, but RRF will not use them directly. It will only consider the order / ranking of the elements, so let's implement such a hybrid search pipeline.

In [None]:
def rrf_search(query: str) -> list[models.ScoredPoint]:
    response = client.query_points(
        collection_name=collection_name,
        prefetch=[
            models.Prefetch(
                query=models.Document(
                    text=query,
                    model="Qdrant/bm25",
                ),
                using="sparse",
                limit=3,
            ),
            models.Prefetch(
                query=models.Document(
                    text=query,
                    model="sentence-transformers/all-MiniLM-L6-v2",
                ),
                using="dense",
                limit=3,
            )
        ],
        query=models.FusionQuery(fusion=models.Fusion.RRF),
        limit=3,
    )
    return response.points

In [None]:
for query in queries:
    print("Query:", query)

    rrf_results = rrf_search(query)
    print("RRF Results:")
    for result in rrf_results:
        print("\t-", result.payload["text"], result.score)
    print()

Query: nutty aged cheese
RRF Results:
	- Mature Gouda cheese becomes grainy and develops a rich, buttery taste with extended aging. 0.8333334
	- Aged Gouda develops a crystalline texture and nutty flavor profile after 18 months of maturation. 0.5
	- Brie cheese features a soft, creamy interior surrounded by an edible white rind. 0.33333334

Query: soft French cheese
RRF Results:
	- This French cheese has a flowing, buttery center encased in a bloomy white crust. 1.0
	- Brie cheese features a soft, creamy interior surrounded by an edible white rind. 0.6666667
	- Croque Monsieur combines ham and Gruyère in France's answer to the toasted cheese sandwich. 0.5

Query: pizza ingredients
RRF Results:
	- Classic Margherita pizza topped with tomato sauce, mozzarella, and fresh basil. 1.0
	- Fresh mozzarella pairs beautifully with ripe tomatoes and basil leaves. 0.33333334
	- Croque Monsieur combines ham and Gruyère in France's answer to the toasted cheese sandwich. 0.25

Query: a good lunch
RRF

## Step 6: Distribution-Based Score Fusion

RRF is not the only supported fusion method. DBSF is another option that normalizes the scores of the points in each query, and sums the scores of the same point across different queries. Choosing a different algorithm can definitely impact the final outputs, so let's see how they are going to look like.

In [None]:
def dbsf_search(query: str) -> list[models.ScoredPoint]:
    response = client.query_points(
        collection_name=collection_name,
        prefetch=[
            models.Prefetch(
                query=models.Document(
                    text=query,
                    model="Qdrant/bm25",
                ),
                using="sparse",
                limit=3,
            ),
            models.Prefetch(
                query=models.Document(
                    text=query,
                    model="sentence-transformers/all-MiniLM-L6-v2",
                ),
                using="dense",
                limit=3,
            )
        ],
        query=models.FusionQuery(fusion=models.Fusion.DBSF),
        limit=3,
    )
    return response.points

In [None]:
for query in queries:
    print("Query:", query)

    dbsf_results = dbsf_search(query)
    print("DBSF Results:")
    for result in dbsf_results:
        print("\t-", result.payload["text"], result.score)
    print()

Query: nutty aged cheese
DBSF Results:
	- Mature Gouda cheese becomes grainy and develops a rich, buttery taste with extended aging. 1.1558483
	- Aged Gouda develops a crystalline texture and nutty flavor profile after 18 months of maturation. 0.6808001
	- Brie cheese features a soft, creamy interior surrounded by an edible white rind. 0.43620512

Query: soft French cheese
DBSF Results:
	- This French cheese has a flowing, buttery center encased in a bloomy white crust. 1.2130749
	- Brie cheese features a soft, creamy interior surrounded by an edible white rind. 1.1703095
	- Croque Monsieur combines ham and Gruyère in France's answer to the toasted cheese sandwich. 0.6166155

Query: pizza ingredients
DBSF Results:
	- Classic Margherita pizza topped with tomato sauce, mozzarella, and fresh basil. 1.1904721
	- Fresh mozzarella pairs beautifully with ripe tomatoes and basil leaves. 0.4285979
	- Croque Monsieur combines ham and Gruyère in France's answer to the toasted cheese sandwich. 0.3