# Multivector Representation Example

This example demonstrates how to use Qdrant's multi-vector search capabilities with both dense and late-interaction (ColBERT-style) embeddings for retrieval and reranking.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/qdrant/examples/blob/master/multivector-representation/multivector_representation_qdrant.ipynb)

## Overview

- Connects to a Qdrant vector database instance.
- Loads two embedding models:
  - Dense embedding model (e.g., `BAAI/bge-small-en`)
  - Late interaction embedding model (e.g., `colbert-ir/colbertv2.0`)
- Indexes documents with both dense and ColBERT embeddings.
- Performs a search: first retrieves candidates with the dense vector, then reranks them using the ColBERT multivector.
- Returns the top reranked results.

## Requirements

- Python 3.8+
- Qdrant client
- fastembed

Install dependencies:
```bash
pip install qdrant-client[fastembed]>=1.14.2
```

You also need a running Qdrant instance (default: `http://localhost:6333`).

## Usage

Let’s demonstrate how to effectively use multivectors using FastEmbed, which wraps ColBERT into a simple API.

Install FastEmbed and Qdrant:

In [1]:
!pip install "qdrant-client[fastembed]>=1.14.2"

# 1. Prepare your Qdrant server

Ensure that Qdrant is running and create a client:



In [2]:
from qdrant_client import QdrantClient, models

client = QdrantClient("http://localhost:6333")

# 2. Load embedding models

Next, define your embedding models:

In [3]:
dense_model = "BAAI/bge-small-en" # Any dense model
colbert_model = "colbert-ir/colbertv2.0"  # Late interaction

# 3. Example documents and query

Let's create some sample documents for demonstration. 

In [4]:
documents = [
    "Artificial intelligence is used in hospitals for cancer diagnosis and treatment.",
    "Self-driving cars use AI to detect obstacles and make driving decisions.",
    "AI is transforming customer service through chatbots and automation."
]
query_text = "How does AI help in medicine?"

# 4. Generate embeddings

Next, encode your documents:



In [5]:
dense_documents = [
    models.Document(text=doc, model=dense_model)
    for doc in documents
]
dense_query = models.Document(text=query_text, model=dense_model)

colbert_documents = [
    models.Document(text=doc, model=colbert_model)
    for doc in documents
]
colbert_query = models.Document(text=query_text, model=colbert_model)


# 5. Create collection with dense + multivector configuration

Then create a Qdrant collection with both vector types. Note that we leave indexing on for the dense vector but turn it off for the colbert vector that will be used for reranking.



In [6]:
collection_name = "dense_multivector_search_collection"
client.create_collection(
    collection_name=collection_name,
    vectors_config={
        "dense": models.VectorParams(
            size=384,
            distance=models.Distance.COSINE
            # Leave HNSW indexing ON for dense
        ),
        "colbert": models.VectorParams(
            size=128,
            distance=models.Distance.COSINE,
            multivector_config=models.MultiVectorConfig(
                comparator=models.MultiVectorComparator.MAX_SIM
            ),
            hnsw_config=models.HnswConfigDiff(m=0)  #  Disable HNSW for reranking
        )
    }
)

True

# 6. Upload documents with both dense and multivector embeddings

Now upload the vectors:



In [7]:
points = [
    models.PointStruct(
        id=i,
        vector={
            "dense": dense_documents[i],
            "colbert": colbert_documents[i]
        },
        payload={"text": documents[i]}
    )
    for i in range(len(documents))
]
client.upsert(collection_name=collection_name, points=points)

UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>)

# 7. Search using dense vector (prefetch), then rerank with multivector in one query
Now let’s run a search:



In [8]:
results = client.query_points(
    collection_name=collection_name,
    prefetch=models.Prefetch(
        query=dense_query,
        using="dense",
    ),
    query=colbert_query,
    using="colbert",
    limit=3,
    with_payload=True
)

# 8. Display final reranked results


In [9]:
print(results)

points=[ScoredPoint(id=1, version=0, score=18.812855, payload={'text': 'Self-driving cars use AI to detect obstacles and make driving decisions.'}, vector=None, shard_key=None, order_value=None), ScoredPoint(id=2, version=0, score=18.604212, payload={'text': 'AI is transforming customer service through chatbots and automation.'}, vector=None, shard_key=None, order_value=None), ScoredPoint(id=0, version=0, score=14.95095, payload={'text': 'Artificial intelligence is used in hospitals for cancer diagnosis and treatment.'}, vector=None, shard_key=None, order_value=None)]


- The dense vector retrieves the top candidates quickly.
- The Colbert multivector reranks them using token-level MaxSim with fine-grained precision.
- Returns the top 3 results.

# Conclusion 

Multivector search is one of the most powerful features of a vector database when used correctly. With this functionality in Qdrant, you can:

Store token-level embeddings natively.
Disable indexing to reduce overhead.
Run fast retrieval and accurate reranking in one API call.
Efficiently scale late interaction.
Combining FastEmbed and Qdrant leads to a production-ready pipeline for ColBERT-style reranking without wasting resources. You can do this locally or use Qdrant Cloud. Qdrant offers an easy-to-use API to get started with your search engine, so if you’re ready to dive in, sign up for free at [Qdrant Cloud](https://qdrant.tech/cloud/) and start building.