# ColPali v1.3 for FiftyOne Tutorial

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/harpreetsahota204/colpali_v1_3/blob/main/colpali_fiftyone_tutorial.ipynb)

This notebook demonstrates how to use ColPali v1.3 with FiftyOne for visual document retrieval.

## Overview

ColPali is a Vision Language Model based on PaliGemma-3B that generates ColBERT-style multi-vector representations for efficient document retrieval. This integration uses token pooling to make ColPali compatible with FiftyOne's similarity infrastructure.


## Setup

Install required packages:


In [None]:
%pip install fiftyone colpali-engine transformers torch huggingface-hub umap-learn


## Register the Zoo Model

Register this repository as a FiftyOne zoo model source:


In [None]:
import fiftyone.zoo as foz

# Register this repository as a remote zoo model source
foz.register_zoo_model_source(
    "https://github.com/harpreetsahota204/colpali_v1_3",
    overwrite=True
)


## Load Dataset

Load a document dataset from Hugging Face:


In [None]:
import fiftyone as fo
from fiftyone.utils.huggingface import load_from_hub

# Load document dataset from Hugging Face
dataset = load_from_hub(
    "Voxel51/document-haystack-10pages",
    overwrite=True
)


## Basic Workflow: Document Retrieval

### Load Model and Compute Embeddings


In [None]:
import fiftyone.zoo as foz

# Load ColPali model with desired pooling strategy
model = foz.load_zoo_model(
    "vidore/colpali-v1.3-merged",
    pooling_strategy="max",  # or "mean" (default)
    pool_factor=3  # Compression factor
)


In [None]:
# Compute embeddings for all documents
dataset.compute_embeddings(
    model=model,
    embeddings_field="copali_embeddings",
)

# Check embedding dimensions
print(dataset.first()['copali_embeddings'].shape)  # Should be (128,)


### Build Similarity Index


In [None]:
import fiftyone.brain as fob

# Build similarity index
text_img_index = fob.compute_similarity(
    dataset,
    model="vidore/colpali-v1.3-merged",
    embeddings_field="copali_embeddings",
    brain_key="copali_sim",
    model_kwargs={
        "pooling_strategy": "max",
        "pool_factor": 3,
    }
)


### Query for Specific Content


In [None]:
# Query for specific content
sims = text_img_index.sort_by_similarity(
    "the secret office supply is pencil"
)

# Launch FiftyOne App
session = fo.launch_app(dataset, auto=False)
print(session.url)


## Advanced Embedding Workflows

### 1. Embedding Visualization with UMAP

Create 2D visualizations of your document embeddings:


In [None]:
import fiftyone.brain as fob

# Create UMAP visualization
results = fob.compute_visualization(
    dataset,
    method="umap",  # Also supports "tsne", "pca"
    brain_key="copali_viz",
    embeddings="copali_embeddings"
)

# Explore in the App
session = fo.launch_app(dataset)


### 2. Similarity Search

Build powerful similarity search with ColPali embeddings:


In [None]:
import fiftyone.brain as fob

# Build similarity index
results = fob.compute_similarity(
    dataset,
    backend="sklearn",  # Fast sklearn backend
    brain_key="colpali_sim", 
    embeddings="colpali_embeddings"
)

# Find similar images
sample_id = dataset.first().id
similar_samples = dataset.sort_by_similarity(
    sample_id,
    brain_key="colpali_sim",
    k=10  # Top 10 most similar
)

# View results
session = fo.launch_app(similar_samples)


### 3. Dataset Representativeness

Score how representative each sample is of your dataset:


In [None]:
import fiftyone.brain as fob

# Compute representativeness scores
fob.compute_representativeness(
    dataset,
    representativeness_field="colpali_represent",
    method="cluster-center",
    embeddings="colpali_embeddings"
)

# Find most representative samples
representative_view = dataset.sort_by("colpali_represent", reverse=True)


### 4. Duplicate Detection

Find and remove near-duplicate documents:


In [None]:
import fiftyone.brain as fob

# Detect duplicates using embeddings
results = fob.compute_uniqueness(
    dataset,
    embeddings="colpali_embeddings"
)

# Filter to most unique samples
unique_view = dataset.sort_by("uniqueness", reverse=True)


## Configuration Options

### Pooling Strategy Comparison

Compare mean vs max pooling strategies:


In [None]:
# Mean pooling (default) - holistic document matching
model_mean = foz.load_zoo_model(
    "vidore/colpali-v1.3-merged",
    pooling_strategy="mean",
    pool_factor=3
)

# Max pooling - specific content/keyword matching
model_max = foz.load_zoo_model(
    "vidore/colpali-v1.3-merged",
    pooling_strategy="max",
    pool_factor=3
)


### Custom Pool Factor

Adjust compression level:


In [None]:
# More aggressive compression (faster, less accurate)
model_compressed = foz.load_zoo_model(
    "vidore/colpali-v1.3-merged",
    pool_factor=5
)


## Understanding the Compression Pipeline

ColPali natively produces variable-length multi-vector embeddings that are incompatible with FiftyOne's fixed-dimension requirements. This integration uses a two-stage compression approach:

### Stage 1: Token Pooling (Intelligent Compression)
- Images: `(1031, 128)` → `(~344, 128)`
- Queries: `(19, 128)` → `(~6, 128)`
- **Retains ~97.8% accuracy**
- Removes redundant patches (e.g., white backgrounds)

### Stage 2: Final Pooling (Fixed Dimensions)
- Mean or Max pooling: `(~344, 128)` → `(128,)`
- **FiftyOne compatible**
- Both strategies work for classification and retrieval

**Trade-off**: ~85-90% of native ColPali accuracy for full FiftyOne compatibility.

For production applications requiring native ColPali accuracy, consider using dedicated vector databases like [Qdrant](https://qdrant.tech/) or [Weaviate](https://weaviate.io/) that support multi-vector search natively.


## Resources

- **Original Repository**: [illuin-tech/colpali](https://github.com/illuin-tech/colpali)
- **Model Weights**: [vidore/colpali-v1.3-merged](https://huggingface.co/vidore/colpali-v1.3-merged)
- **Paper**: [ColPali: Efficient Document Retrieval with Vision Language Models](https://arxiv.org/abs/2407.01449)

### Citation

If you use ColPali in your research, please cite:

```bibtex
@misc{faysse2024colpaliefficientdocumentretrieval,
      title={ColPali: Efficient Document Retrieval with Vision Language Models}, 
      author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo},
      year={2024},
      eprint={2407.01449},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2407.01449},
}
```

## License

- **Model Weights**: [Gemma License](https://ai.google.dev/gemma/terms)
- **Integration Code**: Apache 2.0 License
