## Qdrant Essentials: Day 3 - Using Sparse Vectors for Keyword-Based Text Retrieval in Qdrant

To interact with Qdrant, we'll install the Qdrant Python client and Qdrant's lightweight embedding library called [FastEmbed](https://github.com/qdrant/fastembed).


### Step 1: Install the Qdrant Client & FastEmbed

In [None]:
!pip install -q "qdrant-client[fastembed]>=1.14.2"
!pip3 install -U -q fastembed

## Step 2: Import Required Libraries

In [None]:
from qdrant_client import QdrantClient, models

## Step 3: Connect to Qdrant Cloud

In [None]:
from google.colab import userdata

client = QdrantClient(
    url="your_url",
    api_key=userdata.get('api-key')
)

## Lexical Retrieval with BM25 in Qdrant

The BM25 formula can be represented as follows:

$$
\text{BM25}(Q, D) = \sum_{i=1}^{n} \text{IDF}(q_i) \cdot \mathrm{function}\left(\mathrm{TF}(q_i, D),\, k_1,\, b,\, |D|,\, \mathrm{avg}_{\text{corpus}}|D|\right)
$$

Qdrant provides tooling to compute IDF on the server side. To enable this, we need to activate the [IDF modifier](https://qdrant.tech/documentation/concepts/indexing/#idf-modifier) when configuring sparse vectors in a collection.

> Once enabled, **IDF is maintained at the collection level**.

When using any retrieval formula that includes IDF, such as BM25, we no longer need to include the IDF component in the sparse document representations. This leaves us with the following `values` of the documents' words:

$$
\text{BM25}(d_i) = \mathrm{function}\left(\mathrm{TF}(d_i, D),\, k_1,\, b,\, |D|,\, \mathrm{avg}_{\text{corpus}}|D|\right)
$$

The IDF component will be applied by Qdrant automatically when computing similarity scores.

## Step 4: Create a Collection for BM25-based Retrieval

In [None]:
client.create_collection(
    collection_name="bm25_vectors_collection",
    sparse_vectors_config={
        "bm25_sparse_vector": models.SparseVectorParams(
            modifier=models.Modifier.IDF #Inverse Document Frequency
        ),
    },
)

## Step 5: Create BM25-based Sparse Vectors with FastEmbed & Insert Them into the Collection

The FastEmbed Qdrant library provides a way to [generate BM25 formula-based sparse representations](https://github.com/qdrant/fastembed/blob/main/fastembed/sparse/bm25.py) tailored for Qdrant specifics.

The integration between Qdrant and FastEmbed allows you to simply pass your texts and BM25 formula parameters when indexing documents to Qdrant. The conversion to sparse vectors happens under the hood.

> **<font color='red'>Update:</font>** Since Qdrant's release [1.15.2](https://github.com/qdrant/qdrant/pull/6891), the conversion to BM25 sparse vectors happens directly in Qdrant, for all supported Qdrant clients.  
> Interface-wise, it looks the same as the local inference with FastEmbed, as shown in this notebook. 

> **Note:** Don’t forget to enable the `IDF` modifier when using BM25-based sparse representations generated by FastEmbed, as they intentionally exclude this component.

### BM25 in FastEmbed: Implementation Details

**Corpus Average Length**

Qdrant and FastEmbed do not compute $\mathrm{avg}_{\text{corpus}}|D|$ (the average document length in the corpus). You must **estimate and provide this value** as a BM25 parameter.

**Default BM25 Parameters in FastEmbed**

- `k = 1.2`
- `b = 0.75`

**Text Processing Pipeline**

FastEmbed uses the [Snowball stemmer](https://www.geeksforgeeks.org/snowball-stemmer-nlp/) to reduce words to their root or base form, and applies language-specific stop word lists (e.g., *and*, *or* in English) to reduce vocabulary size and improve retrieval quality.

Therefore, **FastEmbed’s BM25 works out of the box for the following languages**:  
*Arabic, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil,* and *Turkish*.


In [None]:
grocery_items_descriptions = [
    "Grated hard cheese",
    "White crusty bread roll",
    "Mac and cheese"
]

#Estimating the average length of documents in the corpus
avg_document_length = sum(len(description.split()) for description in grocery_items_descriptions) / len(grocery_items_descriptions)

print(f"Average document length: {avg_document_length}")

client.upsert(
    collection_name="bm25_vectors_collection",
    points=[
        models.PointStruct(
            id=i,
            payload={"text": description}, #meta data, descriptions text in human-readable format
            vector={
                "bm25_sparse_vector": models.Document( #to run FastEmbed under the hood
                    text=description,
                    model="Qdrant/bm25",
                    options={"avg_len": avg_document_length} #To pass BM25 parameters, here we're using default k & b for the BM25 formula
                )
           },
        ) for i, description in enumerate(grocery_items_descriptions)
    ],
)

Here, FastEmbed downloads the [Qdrant BM25 model from Hugging Face](https://huggingface.co/Qdrant/bm25) and performs the conversion to Qdrant-compatible sparse representations (so, arrays of `indices` and `values`) under the hood. These vectors are then upserted to Qdrant.

In this example, inference - computing sparse representations with FastEmbed - is performed locally, using Google Colab resources.

> Qdrant also offers **Cloud Inference** for both sparse and dense vectors.


## Step 6: Lexical Retrieval with BM25 & Qdrant

Now let's test our BM25-based lexical search in Qdrant.

Suppose we're searching for the word **"cheese"** — this is our query. Let's break down what happens with this query and the documents indexed to Qdrant in the previous step.

#### Step 1

For every keyword in the query that is not a stop word in the target language (in our case, English, and **"cheese"** is not a stop word):
- FastEmbed extracts the **stem** (root/base form) of the word.  
  - `"cheese"` becomes `"chees"`
- The stem is then mapped to a corresponding **index** from the vocabulary.  
  - `"chees"` -> `1496964506`

#### Step 2

Qdrant lookups up this keyword index (`1496964506`) in the **inverted index**, introduced in the previous video.

For every document (found via the inverted index) that contains the keyword `"cheese"`, we have the BM25-based score for `"cheese"` in that particular document, precomputed by FastEmbed and stored in **Step 5**:

$$
\mathrm{function}_{\text{FastEmbed}}\left(\mathrm{TF}(\text{"cheese"}, D),\, k_1,\, b,\, |D|,\, \mathrm{avg}_{\text{corpus}}|D|\right)
$$

#### Step 3

Qdrant scales this document-specific score by the **IDF** of the keyword `"cheese"`, calculated across the entire corpus:

$$
\text{IDF}(\text{"cheese"}) \cdot \mathrm{function}_{\text{FastEmbed}}\left(\mathrm{TF}(\text{"cheese"}, D),\, k_1,\, b,\, |D|,\, \mathrm{avg}_{\text{corpus}}|D|\right)
$$

#### Step 4

The final similarity score between the query and a document is the **sum of the scores of all matching keywords**:

$$
\text{BM25}(\text{"cheese"}, D) = \sum_{i=1}^{1} \text{IDF}(\text{"cheese"}) \cdot \mathrm{function}\left(\mathrm{TF}(\text{"cheese"}, D),\, k_1,\, b,\, |D|,\, \mathrm{avg}_{\text{corpus}}|D|\right)
$$


In [None]:
client.query_points(
    collection_name="bm25_vectors_collection",
    using="bm25_sparse_vector",
    limit=3,
    query=models.Document(  #to run FastEmbed under the hood
        text="cheese",
        model="Qdrant/bm25"
    ),
    with_vectors=True,
)

> BM25 retrieved only the documents that contain the keyword "*cheese*", as BM25-based retrieval works strictly with **exact keyword matches**.


The description "*Mac and cheese*" was ranked higher because the BM25-estimated value of "*cheese*" is greater in this text than in "*Grated hard cheese*".

It's higher because "*and*" is a stop word and is excluded from the calculation.  
So in "*Mac and cheese*", "*cheese*" is one of two considered words, whereas in "*Grated hard cheese*", it's one of three - giving it lower relative importance.

---

🎉 Now you know how to use BM25 in Qdrant. This will come in handy when you want to **combine** the precision and explainability of **lexical search** with the flexibility and semantic understanding of **dense vectors** - in a **hybrid search** scenario.

But before we dive into hybrid search in Qdrant, let’s explore an approach making keyword-based retrieval semantically aware: **sparse neural retrieval**.




## Sparse Neural Retrieval with SPLADE++ in Qdrant

Sparse Lexical and Expansion Model (SPLADE) is a family of sparse neural retrievers built on top of [Bidirectional Encoder Representations from Transformers (BERT)](https://huggingface.co/docs/transformers/en/model_doc/bert).

> These models are intended for retrieval in **English**, unless fine-tuned or retrained for other languages.

In addition to assigning weights to terms in the input text, SPLADE also **expands inputs with contextually relevant terms**. This is done to **solve the vocabulary mismatch problem**, allowing the model to match queries and documents that use different but semantically close terms.

ℹ️ Check out more about SPLADE and its architecture in the ["Modern Sparse Neural Retrieval" article](https://qdrant.tech/articles/modern-sparse-neural-retrieval).



## Step 7: Create a Collection for Sparse Neural Retrieval with SPLADE++

> Note that we’re **not configuring the Inverse Document Frequency (IDF) modifier** here, unlike in BM25-based retrieval. SPLADE models don’t rely on corpus-level statistics like IDF to estimate word relevance. Instead, they generate term weights in sparse representations based on their interactions within the encoded text.


In [None]:
client.create_collection(
    collection_name="splade_vectors_collection",
    sparse_vectors_config={
        "splade_sparse_vector": models.SparseVectorParams(),
    },
)

## Step 8: Create SPLADE++ Sparse Vectors with FastEmbed & Insert Them into the Collection

The FastEmbed library provides **SPLADE++**; one of the latest models in the SPLADE family.

> **<font color='red'>Update:</font>** Since the release of [Qdrant Cloud Inference](https://qdrant.tech/blog/qdrant-cloud-inference-launch/), you can move SPLADE++ embedding inference from local execution (as shown in this notebook) to the cloud, reducing latency and centralizing resource usage.

As a result, this step looks mostly identical to `Step 5` of this tutorial. However, under the hood, the process of converting a document to a sparse representation is quite different.

### Documents to SPLADE++ Sparse Representations

SPLADE models generate sparse text representations made up of **tokens** produced by the SPLADE tokenizer.

> Tokenizers break text into smaller units called **tokens**, which form the model's **vocabulary**. Depending on the tokenizer, these tokens can be words, subwords, or even characters.

> SPLADE models operate on a fixed vocabulary of **30,522 tokens**.

#### Text to Tokens

Each document is first tokenized and the resulting tokens are mapped to their corresponding indices in the model’s vocabulary.  
These indices are then used in the final sparse representation.

> You can explore this process in the [Tokenizer Playground](https://huggingface.co/spaces/Xenova/the-tokenizer-playground) by selecting the `custom` tokenizer and entering `Qdrant/Splade_PP_en_v1`.  
> For example, "*cheese*" is mapped to token index `8808`, and "*mac*" to `6097`.

#### Weighting Tokens

The tokenized text, now represented as token indices, is passed through the SPLADE model.  
SPLADE **expands** the input by adding contextually relevant tokens and simultaneously assigns each token in the final sparse representation a **weight** that reflects its role in the text.

**SPLADE++ Document Expansion**

For example, "*mac and cheese*" will be expanded to: "*mac and cheese dairy apple dish & variety brand food made , foods difference eat restaurant or*", resulting in a SPLADE-generated sparse representation with **17 non-zero values**.

> If you’d like to experiment with SPLADE's expansion behavior, check out our documentation on [using SPLADE in FastEmbed](https://qdrant.tech/documentation/fastembed/fastembed-splade/). It includes a utility function to decode SPLADE++ sparse representations back into tokens with their corresponding weights.

In [None]:
client.upsert(
    collection_name="splade_vectors_collection",
    points=[
        models.PointStruct(
            id=i,
            payload={"text": description}, #meta data, descriptions text in human-readable format
            vector={
                "splade_sparse_vector": models.Document( #to run FastEmbed under the hood
                    text=description,
                    model="prithivida/Splade_PP_en_v1"
                )
           },
        ) for i, description in enumerate(grocery_items_descriptions)
    ],
)

## Step 9: Sparse Neural Retrieval with SPLADE++ & Qdrant

> Conversion of a query to a sparse representation by SPLADE++ works exactly the same way as for documents.

Let’s see what sparse neural retrieval brings to the table compared to BM25-based lexical retrieval.

We'll test a query where the meaning of the keyword depends heavily on context: "*a not soft cheese*". In our toy dataset of grocery item descriptions, the most fitting result should be "*grated hard cheese*".


In [None]:
client.query_points(
    collection_name="splade_vectors_collection",
    using="splade_sparse_vector",
    limit=3,
    query=models.Document(
        text="A not soft cheese",
        model="prithivida/Splade_PP_en_v1"
    ),
    with_vectors=True,
)

Yet for BM25, "*mac and cheese*" would be ranked higher, since "*cheese*", the only matching keyword between the query and the documents, plays a more prominent role in that description compared to "*grated hard cheese*" as we saw in `Step 6`.


In [None]:
client.query_points(
    collection_name="bm25_vectors_collection",
    using="bm25_sparse_vector",
    limit=3,
    query=models.Document(
        text="A not soft cheese",
        model="Qdrant/bm25"
    ),
    with_vectors=True,
)

### Role of SPLADE++ Document & Query Expansion in Vocabulary Mismatch

You may have noticed that SPLADE++ returned a non-zero similarity score between the query "*A not soft cheese*" and the document "*White crusty bread roll*", even though they have no overlapping keywords.

This happened due to SPLADE++’s internal **expansion mechanism**.

> SPLADE expands both documents and queries.

Let’s now see SPLADE++ in action solving the **vocabulary mismatch** problem.


In [None]:
client.query_points(
    collection_name="splade_vectors_collection",
    using="splade_sparse_vector",
    limit=3,
    query=models.Document(
        text="parmesan",
        model="prithivida/Splade_PP_en_v1"
    ),
    with_vectors=True,
)

SPLADE expands the query "*parmesan*" with 10+ additional tokens, making it possible to match and rank the (also expanded at indexing time) "*grated hard cheese*" as the top hit, even though "*parmesan*" doesn’t appear in any document in our dataset.


## Qdrant's Sparse Neural Retrievers

We’ve been exploring sparse neural retrieval as a promising approach for domains where keyword-based matching is useful, but traditional methods like BM25 fall short due to their lack of semantic understanding.

Our goal is to push this field forward by fostering adoption of lightweight, explainable, and practical sparse neural retrievers.

To support this, **we’ve developed and open-sourced two custom sparse neural retrievers**, both built on top of the BM25 formula.  
You can find all the details in the following articles: [BM42 Sparse Neural Retriever](https://qdrant.tech/articles/bm42/) and [miniCOIL Sparse Neural Retriever](https://qdrant.tech/articles/minicoil/).

Both models can be used with FastEmbed and Qdrant in the same way we demonstrated with BM25 and SPLADE++ in this tutorial.

- FastEmbed handle for **BM42**: `Qdrant/bm42-all-minilm-l6-v2-attentions`  
- FastEmbed handle for **miniCOIL**: `Qdrant/minicoil-v1`

> You can check all sparse retrievers supported in FastEmbed using:
```python
from fastembed import SparseTextEmbedding
SparseTextEmbedding.list_supported_models()
```

🚀 We encourage you to experiment and find the **sparse retriever that fits your data best**!

---

Congratulations! You’re now well-equipped with everything you need to know about **sparse text retrieval in Qdrant**.

This approach shines in domains where exact keyword matches are critical, like e-commerce, medical, legal, and many more.

However, sparse (even neural) retrieval has its limits. It’s not so ideal when semantically similar content is expressed in entirely different ways.  
Models like SPLADE++ try to close that semantic gap, but doing so makes their representations less sparse and harder to interpret. After all, why "*apple*" is related to "*mac and cheese*"? 🤔

That’s where **dense retrieval** comes in, great for discovery and bridging the vocabulary mismatch natively.

So now we have:
- **Sparse retrieval**: precise, lightweight, and explainable
- **Dense retrieval**: flexible and great for exploration

**Why not combine both?** In the next videos, we’ll show you how to do it with **Hybrid Search**.
