<a href="https://colab.research.google.com/github/itsloganmann/VLM2Vec/blob/main/notebooks/run_mmeb_eval.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# VLM2Vec MMEB-V2 Multi-Vector Benchmark

This Colab notebook provisions an NVIDIA A100 (40 GB or 80 GB) workflow that installs pinned dependencies, clones the repo, materialises retriever modules, and evaluates vidore/colqwen2.5-v0.2, nvidia/llama-nemoretriever-colembed-3b-v1, and nomic-ai/colnomic-embed-multimodal-3b on MMEB-V2.


## Runtime checklist
- Select **GPU** runtime with **NVIDIA A100 (40 GB or 80 GB)**.
- Ensure Google Drive access for persistent caches, logs, metrics, and resume state.
- Run cells sequentially; smoke mode validates setup before running the full benchmark.


In [None]:
import json
import subprocess
from pathlib import Path

GPU_QUERY = ["nvidia-smi", "--query-gpu=name,memory.total", "--format=csv,noheader"]


def _parse_gpu_line(line: str) -> dict:
    parts = [segment.strip() for segment in line.split(",")]
    name = parts[0] if parts else line
    memory_mib = None
    if len(parts) > 1:
        try:
            memory_mib = int(parts[1].split()[0])
        except (ValueError, IndexError):
            memory_mib = None
    memory_gib = round(memory_mib / 1024, 2) if memory_mib else None
    tier = None
    if "A100" in name and memory_mib:
        if memory_mib >= 78000:
            tier = "80GB"
        elif memory_mib >= 39000:
            tier = "40GB"
        else:
            tier = "A100"
    return {
        "raw": line,
        "name": name,
        "memory_mib": memory_mib,
        "memory_gib": memory_gib,
        "tier": tier,
    }


try:
    raw_lines = subprocess.check_output(GPU_QUERY).decode("utf-8").strip().splitlines()
except Exception as exc:
    raw_lines = []
    print(f"nvidia-smi unavailable: {exc}")

devices = [_parse_gpu_line(line) for line in raw_lines if line.strip()]

print("Detected accelerators:")
if not devices:
    print(" - none")
else:
    for dev in devices:
        memory_txt = f" ({dev['memory_gib']} GiB)" if dev["memory_gib"] else ""
        print(f" - {dev['name']}{memory_txt}")

primary = devices[0] if devices else None
has_a100 = any("A100" in dev["name"] for dev in devices)
a100_tiers = sorted({dev["tier"] for dev in devices if dev["tier"]})

state = {
    "devices": devices,
    "has_a100": has_a100,
    "a100_tiers": a100_tiers,
    "primary": primary,
}

Path("/content/work").mkdir(parents=True, exist_ok=True)
with open("/content/work/gpu_detection.json", "w") as fp:
    json.dump(state, fp, indent=2)

if not devices:
    raise SystemExit("No NVIDIA GPU detected. Switch to an A100 runtime.")

if has_a100:
    if a100_tiers:
        tiers = ", ".join(a100_tiers)
        print(f"✅ NVIDIA A100 detected ({tiers}).")
    else:
        print("✅ NVIDIA A100 detected.")
else:
    suggested = {
        "device": primary["name"] if primary else "unknown",
        "adjustments": {
            "max_batch_queries": 2,
            "max_batch_docs": 3,
            "patch_budget": 768,
        },
    }
    print("⚠️ Running without A100. Switch to quick smoke preset or adjust batch sizes.")
    print(json.dumps(suggested, indent=2))


In [None]:
%%capture install_log
%pip install -U pip==24.2
%pip install --force-reinstall --no-cache-dir 'numpy<2'
%pip install torch==2.2.1+cu121 torchvision==0.17.1+cu121 torchaudio==2.2.1+cu121 --index-url https://download.pytorch.org/whl/cu121
%pip install transformers==4.57.1 accelerate==0.34.2 pillow==10.3.0 tqdm==4.66.5 numpy==1.26.4 pyyaml==6.0.2 datasets==2.20.0 huggingface-hub==0.23.4 sentencepiece==0.2.0 safetensors==0.4.3 einops==0.8.0 timm==1.0.7 pandas==2.2.2 rich==13.7.1 keybert==0.7.0 umap-learn==0.5.5 hdbscan==0.8.33 bertopic==0.16.0 evaluate==0.4.1 pynvml==11.5.0 pytest==8.3.2 pytest-cov==4.1.0

In [None]:
import logging, torch, sys
from rich.console import Console

console = Console(record=True)
logging.basicConfig(level=logging.INFO, format='%(asctime)s | %(levelname)s | %(name)s | %(message)s')

torch.backends.cudnn.benchmark = True
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
torch.set_float32_matmul_precision('high')

console.print(f'[bold green]Torch device[/bold green]: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else
}')

In [None]:
from google.colab import drive
from datetime import datetime

drive.mount('/content/drive', force_remount=True)
BASE = Path('/content/work')
PERSIST = Path('/content/drive/MyDrive/vlm2vec')
for path in [BASE, PERSIST, PERSIST / 'outputs', PERSIST / 'cache', PERSIST / 'logs', PERSIST / 'profiler']:
    path.mkdir(parents=True, exist_ok=True)
print('Workspace ready:', BASE)

In [None]:
%%bash
set -e
cd /content/work
rm -rf VLM2Vec
git clone https://github.com/itsloganmann/VLM2Vec.git
pip install -e VLM2Vec --no-deps

## Materialise project assets
The following cells copy repository configs, retriever modules, evaluation harness, and tests into `/content/work` so they can be executed and cached within the Colab runtime.

In [None]:
import shutil
SRC = Path('/content/work/VLM2Vec')
DST = Path('/content/work/runtime')
if DST.exists():
    shutil.rmtree(DST)
shutil.copytree(SRC, DST)
print('Copied project to', DST)

In [None]:
import subprocess
subprocess.run(['pytest'], cwd='/content/work/runtime', check=True)

In [None]:
!python /content/work/runtime/evaluation/multi_vector_eval.py --config /content/work/runtime/configs/mmeb_quick_smoke.yaml

## Full evaluation
Run the full MMEB-V2 benchmark after the smoke test passes. Ensure the runtime remains connected (expect several hours on A100).

```python
!python /content/work/runtime/evaluation/multi_vector_eval.py --config /content/work/runtime/configs/mmeb_a100_full.yaml
```