In this repository, following a plethora of works before us, we apply DINO(V2) to the pathology space. If you are interested in helping out, check the open Issues.
Clone the repository, cd into it, then run the provided installation script.
pip install uv
uv sync
source .venv/bin/activate
cp _utils.py .venv/lib/python3.10/site-packages/eva/core/models/wrappers/This will create a virtual environment with all necessary packages pre-installed called "pathologydino", located as a .venv folder in the same directory as path-fm.
We provide a script, run.sh. This will activate the venv created above, and run training on a single node. If you modified the venv directory, you will need to make that change in run.sh also.
bash run.shBy default, we make only 4 GPUs visible, and run on those 4. If you want to change the indexes, modify the numbers after "CUDA_VISIBLE_DEVICES=0,1,2,3".
If you change the number of GPUs, you will need to change the value of "--nproc_per_node=4" to properly reflect this.
By default, we use a vits, with 4 registers. This is reflected in the config.
Output will be saved into the directory specificed by "--output_dir". Ensure that this directory does not contain any old files from training runs, or the code will attempt to resume instead.
At this time, we use Kaiko-Eva for evaluation. In order to test the Bach dataset, you will run:
eva predict fit --config dinov2/eval_config.yaml
Please modify the checkpoint_path to match the checkpoint you wish to test. Trained checkpoints will be found in output_dir/eval/training_X.
| Name | Group | Weights | Released | SSL | WSIs | Tiles | Patients | Batch size | Iterations | Architecture | Parameters | Embed dim | Input size | Dataset | Links |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CTransPath | Sichuan University / Tencent AI Lab | ✅ | Dec 2021* | SRCL | 32K | 16M | Swin-Transformer | 768 | 224 | TCGA, PAIP | |||||
| RetCCL | Sichuan University / Tencent AI Lab | ✅ | Dec 2021* | CCL | 32K | 16M | ResNet-50 | 2048 | 224 | TCGA, PAIP | |||||
| REMEDIS | Google Research | ✅ | May 2022* | SimCLR/BiT | 29K | 50M | 11K cases | 4096 | 1.2M | ResNet-50 | 2048 | 224 | TCGA | ||
| HIPT | Mahmood Lab | ✅ | Jun 2022* | DINOv1 | 11K | 100M | 256 | 400K | ViT-S | 384 | 256 | TCGA | |||
| Lunit-DINO | Lunit | ✅ | Dec 2022* | DINOv1 | 21K | ViT-S | 384 | 224 | TCGA | ||||||
| Lunit-{BT,MoCoV2,SwAV} | Lunit | ✅ | Dec 2022* | {BT,MoCoV2,SwAV} | 21K | ResNet-50 | 2048 | 224 | TCGA | ||||||
| Phikon | Owkin | ✅ | Jul 2023* | iBOT | 6.1K | 43M | 5.6K | 1440 | 155K | ViT-B | 86M | 768 | 224 | TCGA | |
| CONCH (VL) | Mahmood Lab | ✅ | Jul 2023* | iBOT & vision-language pretraining | 21K | 16M | 1024 | 80 epochs | ViT-B | 86M | 768 | 224 | proprietary | ||
| UNI | Mahmood Lab | ✅ | Aug 2023* | DINOv2 | 100K | 100M | ViT-L | 1024 | 224 | proprietary (Mass-100K) | |||||
| Virchow | Paige / Microsoft | ✅ | Sep 2023* | DINOv2 | 1.5M | 120K | ViT-H | 632M | 2560 | 224 | proprietary (from MSKCC) | ||||
| Campanella et al. (DINO) | Thomas Fuchs Lab | ✅ | Oct 2023* | DINOv1 | 420K | 3.3B | 77K | 1080 | 1.3K INE | ViT-S | 22M | 384 | 224 | proprietary (MSHS) | |
| Campanella et al. (MAE) | Thomas Fuchs Lab | ❌ | Oct 2023* | MAE | 420K | 3.3B | 77K | 1440 | 2.5K INE | ViT-L | 303M | 1024 | 224 | proprietary (MSHS) | |
| Path Foundation | ✅ | Oct 2023* | SimCLR, MSN | 6K | 60M | 1024 | ViT-S | 384 | 224 | TCGA | |||||
| PathoDuet | Shanghai Jiao Tong University | ✅ | Dec 2023* | inspired by MoCoV3 | 11K | 13M | 2048 | 100 epochs | ViT-B | 4096 | 224 | TCGA | |||
| RudolfV | Aignostics | ❌ | Jan 2024* | DINOv2 | 130K | 750M | 36K | ViT-L | 300M | 224 | proprietary (from EU & US), TCGA | ||||
| kaiko | kaiko.ai | ✅ | Mar 2024* | DINOv2 | 29K | 260M** | 512 | 200 INE | ViT-L | 1024 | 224 | TCGA | |||
| PLUTO | PathAI | ❌ | May 2024* | DINOv2 (+ MAE and Fourier loss) | 160K | 200M | FlexiViT-S | 22M | 224 | proprietary (PathAI) | |||||
| BEPH | Shanghai Jiao Tong University | ✅ | May 2024* | BEiTv2 | 12K | 12M | 1024 | ViT-B | 193M | 1024 | 224 | TCGA | |||
| Prov-GigaPath | Microsoft / Providence | ✅ | May 2024* | DINOv2 | 170K | 1.4B | 30K | 384 | ViT | 1536 | 224 | proprietary (Providence) | |||
| Hibou-B | HistAI | ✅ | Jun 2024* | DINOv2 | 1.1M | 510M | 310K cases | 1024 | 500K | ViT-B | 86M | 768 | 224 | proprietary | |
| Hibou-L | HistAI | ✅ | Jun 2024* | DINOv2 | 1.1M | 1.2B | 310K cases | 1024 | 1.2M | ViT-L | 304M | 1024 | 224 | proprietary | |
| H-optimus-0 | Bioptimus | ✅ | Jul 2024* | DINOv2/iBOT | 500K (across 4,000 clinics) | >100M | 200K | ViT-G with 4 registers | 1.1B | 1536 | 224 | proprietary | |||
| mSTAR (VL) | Smart Lab | ❌ | Jul 2024* | mSTAR (multimodal) | 10K | 10K | ViT-L | 224 | TCGA | ||||||
| Virchow 2 | Paige / Microsoft | ✅ | Aug 2024* | DINOv2 (+ ECT and KDE) | 3.1M | 2B | 230K | 4096 | ViT-H with 4 registers | 632M | 3584 | 224 | proprietary (from MSKCC and international sites) | ||
| Virchow 2G | Paige / Microsoft | ❌ | Aug 2024* | DINOv2 (+ ECT and KDE) | 3.1M | 2B | 230K | 3072 | ViT-G with 8 registers | 1.9B | 3584 | 224 | proprietary (from MSKCC and international sites) | ||
| Phikon-v2 | Owkin | ✅ | Sep 2024* | DINOv2 | 58.4K | 456M | 4096 | 250K | ViT-L | 307M | 1024 | 224 | PANCAN-XL (TCGA, CPTAC, GTEx, proprietary) | ||
| MUSKV (VL) | Li Lab (Stanford) | ✅ | Jan 2025* | Unified masked modeling (MLM, MIM) + contrastive learning | 33K | 50M | 12K | 2048 | 20 epochs | BEiT3 | 384 | TCGA | |||
| RudolfV2 | Mayo, Charité, Aignostics | ❌ | Jan 2025* | 1.2M | 3.4B | 490K cases | ViT-H | 632M | |||||||
| UNI2-h | Mahmood Lab | ✅ | Jan 2025* | DINOv2 | 350K | 200M | ViT-H with 8 registers | 681M | 1536 | 224 | proprietary (Mass) | ||||
| UNI2-g-preview | Mahmood Lab | ❌ | Jan 2025* | DINOv2 | 350K | 200M | ViT-G | proprietary (Mass) |
Notes:
- Models marked with VL indicate language-vision pretraining (others are vision-only)
- Models trained on >100K slides may be considered foundation models and are marked in bold
- # of WSIs, tiles, and patients are reported to 2 significant figures
- INE = ImageNet epochs
- Order is chronological
- Some of these feature extractors have been evaluated in a benchmarking study for whole slide classification here.
- ** means inferred from other numbers provided in the paper
This table includes models that produce slide-level or patient-level embeddings without supervision.
| Name | Group | Weights | Released | SSL | WSIs | Patients | Batch size | Iterations | Architecture | Parameters | Embed dim | Patch size | Dataset | Links |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GigaSSL | CBIO | ✅ | Dec 2022* | SimCLR | 12K | 1K epochs | ResNet-18 | 256 | 256 | TCGA | ||||
| PRISM (VL) | Paige / Microsoft | ✅ | May 2024* | contrastive (with language) | 590K (190K text reports) | 190K | 64 (x4) | 75K (10 epochs) | Perceiver + BioGPT | 1280 | 224 | proprietary | ||
| Prov-GigaPath | Microsoft / Providence | ✅ | May 2024* | DINOv2 | 170K | 30K | LongNet | 86M | 1536 | 224 | proprietary (Providence) | |||
| MADELEINE (VL) | Mahmood Lab | ✅ | Aug 2024* | contrastive (InfoNCE & OT) | 16K | 2K | 120 | 90 epochs | multi-head attention MIL | 512 | 256 | ACROBAT, BWH Kidney (proprietary) | ||
| CHIEF (VL) | Yu Lab | ✅ | Sep 2024* | |||||||||||
| COBRA | Kather Lab | ✅ | Nov 2024* | COBRA (MoCo-v3 in FM embedding space) | 3K | 2.8K | 1024 | 2K epochs | Mamba-2 + ABMIL | 15M | 768 | 224 | TCGA (BRCA, CRC, LUAD, LUSC, STAD) | |
| TITANV (VL) | Mahmood Lab | ✅ | Dec 2024* | iBOT | 340K | 1024 | 91K (270 epochs) | ViT (smaller) | 42M | 224 | Mass-340K (proprietary) | |||
| THREADS (WSI, RNA, DNA) | Mahmood Lab | ❌ | Jan 2025* | 47K | 1200 | up to 101 epochs | ViT-L | 224 | MBTG-47k (MGH, BWH, TCGA, GTEx) |
| Name | Link | Augmentations | Dataset |
|---|---|---|---|
| UNI | https://pmc.ncbi.nlm.nih.gov/articles/PMC11403354/pdf/nihms-2015612.pdf | To augment the data, we use the large-scale jittering (LSJ) augmentation135, with a random scale sampled from a range of 0.5–2.0, followed by a fixed size crop to 896 × 896pixels to accommodate the size constraints of CTransPath. At inference time, we resize the image dimensions to their nearest multiples of 224. | Private |
| Virchow | https://www.nature.com/articles/s41591-024-03141-0 | Nothing | Private |
| Hibou | https://arxiv.org/abs/2406.05074 | Nothing | Private |
| Rudolf | https://arxiv.org/pdf/2401.04079 | Data augmentation In pathology it is known that staining and scanning outputs vary between labs and even within the same lab over a given period of time. Consequently, in histopathology studies, staining and scanner informa-tion can produce spurious correlations and so-called “Clever Hans” effects [46] when correlated with label information [1]. To address this shortcoming, we transferred and augmented stain and scanner color profiles between patches in addition to the standard color augmentations in the view generation process of DINOv2 [8]. For each view, we picked a random other patch in the batch and transferred the patch color profile to the slide color statistics of the selected patch [47]. This discourages the model from exploiting staining and scanner color features for learning representations. We further added 90 degree rotations as well as horizon-tal and vertical flipping to the augmentations in DINOv2, incorporating the prior that objects on histopathological slides have no canonical orientation. Following [48, 49], we removed the solarization augmentation from the DINOv2standard augmentations. | Private |