MUVERA (**Mu**lti-**Ve**ctor **R**etrieval **A**lgorithm) is an algorithm that transforms variable-length sequences of vectors into Fixed Dimensional Encodings (FDEs) described in the [paper](https://arxiv.org/abs/2405.19504). This is particularly useful for converting multi-vector representations from late interaction models (like ColBERT) into fixed-size embeddings that can be efficiently stored and searched.

## ü§ù MUVERA with FastEmbed

The original paper suggests using the created FDEs for initial retrieval and original multi-vector representations for reranking to achieve the best quality of the results. FastEmbed implements the MUVERA algorithm as a postprocessor, not a separate model, so you can pass the sequence of vectors from the late interaction model to the postprocessor and get the FDE as a result. By implementing it that way, we ensure you don't need to encode your data with a multi-vector model twice if you decide to keep both representations.

If you used multi-vector model before, then you have probably created an instance of it like this:

In [1]:
from fastembed.late_interaction.colbert import Colbert

model = Colbert(model_name="answerdotai/answerai-colbert-small-v1")

Fetching 5 files: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 5/5 [00:03<00:00,  1.52it/s]


Adding MUVERA embeddings requires postprocessing the embeddings generated by your multi-vector encoder. Let's import the FastEmbed's MUVERA implementation and set it up specifically for the model we use. MUVERA needs to know the dimensionality of the individual vectors that your model creates, so we can eiter set it up manually or just pass a model instance to a helper class method `.from_multivector_model`:

In [2]:
from fastembed.postprocess.muvera import MuveraPostprocessor

muvera_postprocessor = MuveraPostprocessor.from_multivector_model(model)

## üóÇÔ∏è Ingesting the documents

The original paper separates processing the document embeddings (the ones we store in the Qdrant collection to search over), from the query embeddings and calculate them in a slightly different way. Thus, there are different processing methods, depending on whether you encode document or queries.

In [3]:
multivectors = model.passage_embed(
    [
        "Paris is a capital of France",
        "Berlin is a capital of Germany",
        "The best chestnuts are in Place Pigalle",
    ]
)
fde_vectors = [muvera_postprocessor.process_document(v) for v in multivectors]
print(fde_vectors[0].shape)

(10240,)


## üîé Querying

When querying, we use a different method of the `MuveraPostprocessor` to convert the multi-vector representations into FDEs:

In [4]:
query_multivector = next(model.query_embed("French cuisine"))
query_fde = muvera_postprocessor.process_query(query_multivector)
print(query_fde.shape)

(10240,)


## ü¶Ä Usage with Qdrant

If you want to reproduce the whole process as described in the MUVERA paper, you have to create a single Qdrant collection with two [named vectors](https://qdrant.tech/documentation/concepts/vectors/#named-vectors): one for the multi-vector representation and one for the MUVERA embedding. The latter one has a dimensionality that depends on how you configure the parameters of the MUVERA projection and might be checked by inspecting the `.embedding_size` property of the postprocessor. This will be useful for properly configuring the collection.

In [5]:
muvera_postprocessor.embedding_size

10240