# Hybrid Search with miniCOIL

This is a simple example of using the **miniCOIL** sparse neural retrieval model in a hybrid search setup.

## Why use miniCOIL?

The goal is to replace the usual choice of BM25 without losing the quality of keyword-based retrieval (as sparse neural retrieval [usually does](https://arxiv.org/pdf/2307.10488).  
In fact, miniCOIL can **enhance** hybrid search results by doing what BM25 can’t: understanding the **meaning of keywords** within the context of the text.

It acts as if BM25 understood what the keywords actually mean in the sentence.

## Learn more

- [The article](https://qdrant.tech/articles/minicoil/) on the miniCOIL approach, training, and architecture

- [MiniCOIL v1](https://huggingface.co/Qdrant/minicoil-v1) model on Hugging Face

## What this notebook shows

We’ll run a hybrid textual search on a list of book, article, and paper titles containing the keywords **"vector"** and **"search"** appearing in different contexts.

This will demonstrate how miniCOIL captures the meaning of these keywords in a way that BM25 can't.

In [1]:
titles_vector_and_search = [
    "Vector Graphics in Modern Web Design",
    "The Art of Search and Self-Discovery",
    "Efficient Vector Search Algorithms for Large Datasets",
    "Searching the Soul: A Journey Through Mindfulness",
    "Vector-Based Animations for User Interface Design",
    "Search Engines: A Technical and Social Overview",
    "The Rise of Vector Databases in AI Systems",
    "Search Patterns in Human Behavior",
    "Vector Illustrations: A Guide for Creatives",
    "Search and Rescue: Technologies in Emergency Response",
    "Vectors in Physics: From Arrows to Equations",
    "Searching for Lost Time in the Digital Age",
    "Vector Spaces and Linear Transformations",
    "The Endless Search for Truth in Philosophy",
    "3D Modeling with Vectors in Blender",
    "Search Optimization Strategies for E-commerce",
    "Vector Drawing Techniques with Open-Source Tools",
    "In Search of Meaning: A Psychological Perspective",
    "Advanced Vector Calculus for Engineers",
    "Search Interfaces: UX Principles and Case Studies",
    "The Use of Vector Fields in Meteorology",
    "Search and Destroy: Cybersecurity in the 21st Century",
    "From Bitmap to Vector: A Designer’s Guide",
    "Search Engines and the Democratization of Knowledge",
    "Vector Geometry in Game Development",
    "The Human Search for Connection in a Digital World",
    "AI-Powered Vector Search in Recommendation Systems",
    "Searchable Archives: The History of Digital Retrieval",
    "Vector Control Strategies in Public Health",
    "The Search for Extraterrestrial Intelligence"
]

## Setup for Hybrid Search in Qdrant

We’re going to demonstrate hybrid search using [Qdrant](https://qdrant.tech/) and the [FastEmbed Python library](https://github.com/qdrant/fastembed) for convenience.

### Why Qdrant & FastEmbed?

Using Qdrant & FastEmbed combo comes with a few benefits:

- **Specific sparse representation**:  
  miniCOIL was designed to work with Qdrant’s ability to calculate the [Inverse Document Frequency (IDF) part](https://qdrant.tech/documentation/concepts/indexing/#idf-modifier) of the BM25 (and, consequently, miniCOIL) formula on the server side.  
  > If you use miniCOIL embeddings outside of Qdrant, you’ll have to scale them by the IDF component yourself.

- **Integration with Qdrant**:  
  FastEmbed is integrated with Qdrant, providing the **local inference** feature.  
  This lets us skip all the steps of creating and formatting embeddings before sending them to the vector search solution.

### Install dependencies

Let’s install the required libraries:

In [2]:
!pip3 install -q "qdrant-client[fastembed]"
!pip3 install -U -q fastembed #to make miniCOIL v1 accessible

### Running Qdrant

You can run this example using either:

- [Qdrant Cloud Free Tier](https://qdrant.tech/documentation/cloud-quickstart/)  
- [Local Qdrant instance via Docker](https://qdrant.tech/documentation/quickstart/)

This notebook is set up to work with the Cloud Free Tier. If you prefer to run Qdrant locally, just replace the client initialization with:

```python
client = QdrantClient(url="http://localhost:6333")
```

In [None]:
from qdrant_client import QdrantClient, models
from google.colab import userdata

client = QdrantClient(url=userdata.get("free-tier-url"), #or run Qdrant locally
                      api_key=userdata.get("api-key"))

## Create Collection for Hybrid Search

We’ll now create a collection to store and index our books/papers/articles titles.

Each point in this collection will represent one title and will include **three vector representations**:

- **Dense vector**  
  Generated using the `jina-small` embedding model; this will be used for the dense part of hybrid search.

- **Two sparse vectors**  
  These will be used for the keyword-based retrieval part of hybrid search, allowing us to compare in retrieval:
  - Classic BM25 formula-based sparse vectors
  - miniCOIL vectors

> We switch on IDF modifier calculation; this is the IDF part of the BM25 (and miniCOIL) formula, mentioned earlier.


In [6]:
client.create_collection(
    collection_name="hybrid_search",
    sparse_vectors_config={
        "minicoil": models.SparseVectorParams(
            modifier=models.Modifier.IDF
        ),
        "bm25": models.SparseVectorParams(
            modifier=models.Modifier.IDF #to switch on calculation of the BM25 formula IDF part
        )
    },
    vectors_config={
        "jina-small": models.VectorParams(
            size=512,
            distance=models.Distance.COSINE
        )
    }
)

True

> **Note:** To see available models for dense and sparse textual embeddings in FastEmbed, run:

```python
from fastembed import SparseTextEmbedding, TextEmbedding

SparseTextEmbedding.list_supported_models()
TextEmbedding.list_supported_models()
```

## Upload and Index Titles

Next, we simply upsert our titles into the configured collection.

Due to the **local inference** feature embedding inference is handled automatically:

- FastEmbed downloads the selected models from Hugging Face
- Runs inference under the hood
- The resulting vector representations (dense and sparse) are uploaded to Qdrant

In [7]:
client.upsert(
    collection_name="hybrid_search",
    points=[
        models.PointStruct(
            id=i,
            payload={
                "title": titles_vector_and_search[i]  # Metadata: human-readable title text
            },
            vector={
                # Sparse vector from miniCOIL
                "minicoil": models.Document(
                    text=titles_vector_and_search[i],
                    model="Qdrant/minicoil-v1",
                    options={"avg_len": 7}  # Estimated by us average text length (a part of the BM25 formula)
                ),
                # Dense vector from jina-small model
                "jina-small": models.Document(
                    text=titles_vector_and_search[i],
                    model="jinaai/jina-embeddings-v2-small-en"
                ),
                # Sparse vector from classic BM25 model
                "bm25": models.Document(
                    text=titles_vector_and_search[i],
                    model="Qdrant/bm25",
                    options={"avg_len": 7}  # Estimated by us average text length (a part of the BM25 formula)
                )
            },
        )
        for i in range(len(titles_vector_and_search))
    ],
)


Fetching 8 files:   0%|          | 0/8 [00:00<?, ?it/s]

model.onnx:   0%|          | 0.00/130M [00:00<?, ?B/s]

stopwords.txt:   0%|          | 0.00/743 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

minicoil.triplet.model.npy:   0%|          | 0.00/157M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/367 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

minicoil.triplet.model.vocab:   0%|          | 0.00/717k [00:00<?, ?B/s]

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

onnx/model.onnx:   0%|          | 0.00/130M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/367 [00:00<?, ?B/s]

Fetching 18 files:   0%|          | 0/18 [00:00<?, ?it/s]

arabic.txt:   0%|          | 0.00/6.35k [00:00<?, ?B/s]

danish.txt:   0%|          | 0.00/424 [00:00<?, ?B/s]

german.txt:   0%|          | 0.00/1.36k [00:00<?, ?B/s]

english.txt:   0%|          | 0.00/936 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

finnish.txt:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

french.txt:   0%|          | 0.00/813 [00:00<?, ?B/s]

greek.txt:   0%|          | 0.00/2.17k [00:00<?, ?B/s]

dutch.txt:   0%|          | 0.00/453 [00:00<?, ?B/s]

portuguese.txt:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

norwegian.txt:   0%|          | 0.00/851 [00:00<?, ?B/s]

romanian.txt:   0%|          | 0.00/1.91k [00:00<?, ?B/s]

italian.txt:   0%|          | 0.00/1.65k [00:00<?, ?B/s]

hungarian.txt:   0%|          | 0.00/1.23k [00:00<?, ?B/s]

spanish.txt:   0%|          | 0.00/2.18k [00:00<?, ?B/s]

russian.txt:   0%|          | 0.00/1.24k [00:00<?, ?B/s]

turkish.txt:   0%|          | 0.00/260 [00:00<?, ?B/s]

swedish.txt:   0%|          | 0.00/559 [00:00<?, ?B/s]

UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>)

## Search

Now let's search through our collection and observe in practice the differences between:

- **Dense retrieval**
- **miniCOIL** keyword-based retrieval with semantic understanding
- **BM25** keyword-based retrieval

We’ll use two queries.

### Dense

The first one, `query_text_dense` - "*Scaling Applications in Production Setup*" - contains no exact keyword matches with the indexed titles.

That makes it an **ideal candidate for dense retrieval**, where fuzzy semantic similarity matters more than exact keyword overlap.


In [8]:
query_text_dense = "Scaling Applications in Production Setup"

In [9]:
client.query_points(
    collection_name="hybrid_search",
    query=models.Document(text=query_text_dense, model="jinaai/jina-embeddings-v2-small-en"),
    using="jina-small",
    limit=1,
)

QueryResponse(points=[ScoredPoint(id=6, version=0, score=0.71037966, payload={'title': 'The Rise of Vector Databases in AI Systems'}, vector=None, shard_key=None, order_value=None)])

Both miniCOIL and BM25 will fail to find a match in the database for this query; as mentioned, there are no overlapping keywords.

In [10]:
client.query_points(
    collection_name="hybrid_search",
    query=models.Document(text=query_text_dense, model="Qdrant/minicoil-v1"), #for query embedding we don't need any parameters, only for documents
    using="minicoil",
    limit=1,
)

QueryResponse(points=[])

In [11]:
client.query_points(
    collection_name="hybrid_search",
    query=models.Document(text=query_text_dense, model="Qdrant/bm25"), #for query embedding we don't need any parameters, only for documents
    using="bm25",
    limit=1,
)

QueryResponse(points=[])

So, in hybrid search, this query will be resolved thanks to the **dense** part of the retrieval.

### Sparse

The second query, `query_text_sparse` - "*Vectors in Medicine*", will show us the **difference between miniCOIL and BM25** keyword-based retrieval.

None of the indexed titles contain the word "*medicine*", so it won’t contribute to the similarity ranking in either method. At the same time, the word "*vector*" appears once in most titles, so its importance is roughly equal from the perspective of the BM25 formula.

miniCOIL, however, can **understand the meaning** of the word "*vector*" in the context of "*medicine*", and match a document where "*vector*" is also surrounded by medicine- or health-related concepts.


In [12]:
query_text_sparse = "Vectors in Medicine"

In [13]:
client.query_points(
    collection_name="hybrid_search",
    query=models.Document(text=query_text_sparse, model="Qdrant/bm25"), #when embedding queries we don't need to pass any parameters
    using="bm25",
    limit=1,
)

QueryResponse(points=[ScoredPoint(id=18, version=0, score=0.8405092, payload={'title': 'Advanced Vector Calculus for Engineers'}, vector=None, shard_key=None, order_value=None)])

In [14]:
client.query_points(
    collection_name="hybrid_search",
    query=models.Document(text=query_text_sparse, model="Qdrant/minicoil-v1"), #when embedding queries we don't need to pass any parameters
    using="minicoil",
    limit=1,
)

QueryResponse(points=[ScoredPoint(id=28, version=0, score=0.7005557, payload={'title': 'Vector Control Strategies in Public Health'}, vector=None, shard_key=None, order_value=None)])

At the same time, dense embedding model matching is more approximate and inexplainable, so we here get some other result, which is less relevant.


In [15]:
client.query_points(
    collection_name="hybrid_search",
    query=models.Document(text=query_text_sparse, model="jinaai/jina-embeddings-v2-small-en"),
    using="jina-small",
    limit=1,
)

QueryResponse(points=[ScoredPoint(id=10, version=0, score=0.83358926, payload={'title': 'Vectors in Physics: From Arrows to Equations'}, vector=None, shard_key=None, order_value=None)])

In situations where **precise keyword matching** is needed, it's the responsibility of the **sparse part** in hybrid retrieval.  

Using **miniCOIL** in this part makes keyword matching **semantically aware**.

## Hybrid Search

Now let’s see how to combine all of that in a **Hybrid Search**.

We’ll retrieve results based on both **dense** and **sparse** embedding representations, doing both **keyword-based** and **semantic similarity** search.

The results are then merged using [**Reciprocal Rank Fusion**](https://qdrant.tech/documentation/concepts/hybrid-queries/#hybrid-search), and we return the top documents based on the fused ranking.


In [16]:
query_text = "Scaling AI Search Applications in Production Setup"

client.query_points(
    collection_name="hybrid_search",
    prefetch=[
        models.Prefetch(
            query=models.Document(text=query_text, model="Qdrant/minicoil-v1"),
            using="minicoil",
            limit=3
        ),
        models.Prefetch(
            query=models.Document(text=query_text, model="jinaai/jina-embeddings-v2-small-en"),
            using="jina-small",
            limit=3
        ),
    ],
    query=models.FusionQuery(fusion=models.Fusion.RRF),
    limit=3
)

QueryResponse(points=[ScoredPoint(id=6, version=0, score=0.8333334, payload={'title': 'The Rise of Vector Databases in AI Systems'}, vector=None, shard_key=None, order_value=None), ScoredPoint(id=26, version=0, score=0.8333334, payload={'title': 'AI-Powered Vector Search in Recommendation Systems'}, vector=None, shard_key=None, order_value=None), ScoredPoint(id=5, version=0, score=0.25, payload={'title': 'Search Engines: A Technical and Social Overview'}, vector=None, shard_key=None, order_value=None)])

🤝 Now you have a recipe for making retrieval more precise.

If you're thinking now how to improve your **Retrieval Augmented Generation (RAG)**-based solutions using this improved hybrid search approach, check out this article from the *Advanced Retrieval and Evaluation Monitoring* series:

[Advanced Hybrid RAG with Qdrant miniCOIL, LangGraph, and SambaNova DeepSeek-R1](https://medium.com/dphi-tech/advanced-retrieval-and-evaluation-hybrid-search-with-minicoil-using-qdrant-and-langgraph-6fbe5e514078)
