Skip to content

josh28x/path-fm

 
 

Repository files navigation

Pathology DinoV2

SophontAI

In this repository, following a plethora of works before us, we apply DINO(V2) to the pathology space. If you are interested in helping out, check the open Issues.

Installation

Clone the repository, cd into it, then run the provided installation script.

pip install uv
uv sync
source .venv/bin/activate
cp _utils.py .venv/lib/python3.10/site-packages/eva/core/models/wrappers/

This will create a virtual environment with all necessary packages pre-installed called "pathologydino", located as a .venv folder in the same directory as path-fm.

Training

We provide a script, run.sh. This will activate the venv created above, and run training on a single node. If you modified the venv directory, you will need to make that change in run.sh also.

bash run.sh

By default, we make only 4 GPUs visible, and run on those 4. If you want to change the indexes, modify the numbers after "CUDA_VISIBLE_DEVICES=0,1,2,3".

If you change the number of GPUs, you will need to change the value of "--nproc_per_node=4" to properly reflect this.

By default, we use a vits, with 4 registers. This is reflected in the config.

Output will be saved into the directory specificed by "--output_dir". Ensure that this directory does not contain any old files from training runs, or the code will attempt to resume instead.

Evaluation

At this time, we use Kaiko-Eva for evaluation. In order to test the Bach dataset, you will run:

eva predict fit --config dinov2/eval_config.yaml

Please modify the checkpoint_path to match the checkpoint you wish to test. Trained checkpoints will be found in output_dir/eval/training_X.

Prior Work

Patch-level models

Name Group Weights Released SSL WSIs Tiles Patients Batch size Iterations Architecture Parameters Embed dim Input size Dataset Links
CTransPath Sichuan University / Tencent AI Lab Dec 2021* SRCL 32K 16M Swin-Transformer 768 224 TCGA, PAIP
RetCCL Sichuan University / Tencent AI Lab Dec 2021* CCL 32K 16M ResNet-50 2048 224 TCGA, PAIP
REMEDIS Google Research May 2022* SimCLR/BiT 29K 50M 11K cases 4096 1.2M ResNet-50 2048 224 TCGA
HIPT Mahmood Lab Jun 2022* DINOv1 11K 100M 256 400K ViT-S 384 256 TCGA
Lunit-DINO Lunit Dec 2022* DINOv1 21K ViT-S 384 224 TCGA
Lunit-{BT,MoCoV2,SwAV} Lunit Dec 2022* {BT,MoCoV2,SwAV} 21K ResNet-50 2048 224 TCGA
Phikon Owkin Jul 2023* iBOT 6.1K 43M 5.6K 1440 155K ViT-B 86M 768 224 TCGA
CONCH (VL) Mahmood Lab Jul 2023* iBOT & vision-language pretraining 21K 16M 1024 80 epochs ViT-B 86M 768 224 proprietary
UNI Mahmood Lab Aug 2023* DINOv2 100K 100M ViT-L 1024 224 proprietary (Mass-100K)
Virchow Paige / Microsoft Sep 2023* DINOv2 1.5M 120K ViT-H 632M 2560 224 proprietary (from MSKCC)
Campanella et al. (DINO) Thomas Fuchs Lab Oct 2023* DINOv1 420K 3.3B 77K 1080 1.3K INE ViT-S 22M 384 224 proprietary (MSHS) ()
Campanella et al. (MAE) Thomas Fuchs Lab Oct 2023* MAE 420K 3.3B 77K 1440 2.5K INE ViT-L 303M 1024 224 proprietary (MSHS)
Path Foundation Google Oct 2023* SimCLR, MSN 6K 60M 1024 ViT-S 384 224 TCGA
PathoDuet Shanghai Jiao Tong University Dec 2023* inspired by MoCoV3 11K 13M 2048 100 epochs ViT-B 4096 224 TCGA
RudolfV Aignostics Jan 2024* DINOv2 130K 750M 36K ViT-L 300M 224 proprietary (from EU & US), TCGA
kaiko kaiko.ai Mar 2024* DINOv2 29K 260M** 512 200 INE ViT-L 1024 224 TCGA
PLUTO PathAI May 2024* DINOv2 (+ MAE and Fourier loss) 160K 200M FlexiViT-S 22M 224 proprietary (PathAI)
BEPH Shanghai Jiao Tong University May 2024* BEiTv2 12K 12M 1024 ViT-B 193M 1024 224 TCGA
Prov-GigaPath Microsoft / Providence May 2024* DINOv2 170K 1.4B 30K 384 ViT 1536 224 proprietary (Providence)
Hibou-B HistAI Jun 2024* DINOv2 1.1M 510M 310K cases 1024 500K ViT-B 86M 768 224 proprietary
Hibou-L HistAI Jun 2024* DINOv2 1.1M 1.2B 310K cases 1024 1.2M ViT-L 304M 1024 224 proprietary
H-optimus-0 Bioptimus Jul 2024* DINOv2/iBOT 500K (across 4,000 clinics) >100M 200K ViT-G with 4 registers 1.1B 1536 224 proprietary
mSTAR (VL) Smart Lab Jul 2024* mSTAR (multimodal) 10K 10K ViT-L 224 TCGA
Virchow 2 Paige / Microsoft Aug 2024* DINOv2 (+ ECT and KDE) 3.1M 2B 230K 4096 ViT-H with 4 registers 632M 3584 224 proprietary (from MSKCC and international sites)
Virchow 2G Paige / Microsoft Aug 2024* DINOv2 (+ ECT and KDE) 3.1M 2B 230K 3072 ViT-G with 8 registers 1.9B 3584 224 proprietary (from MSKCC and international sites)
Phikon-v2 Owkin Sep 2024* DINOv2 58.4K 456M 4096 250K ViT-L 307M 1024 224 PANCAN-XL (TCGA, CPTAC, GTEx, proprietary)
MUSKV (VL) Li Lab (Stanford) Jan 2025* Unified masked modeling (MLM, MIM) + contrastive learning 33K 50M 12K 2048 20 epochs BEiT3 384 TCGA
RudolfV2 Mayo, Charité, Aignostics Jan 2025* 1.2M 3.4B 490K cases ViT-H 632M
UNI2-h Mahmood Lab Jan 2025* DINOv2 350K 200M ViT-H with 8 registers 681M 1536 224 proprietary (Mass)
UNI2-g-preview Mahmood Lab Jan 2025* DINOv2 350K 200M ViT-G proprietary (Mass)

Notes:

  • Models marked with VL indicate language-vision pretraining (others are vision-only)
  • Models trained on >100K slides may be considered foundation models and are marked in bold
  • # of WSIs, tiles, and patients are reported to 2 significant figures
  • INE = ImageNet epochs
  • Order is chronological
  • Some of these feature extractors have been evaluated in a benchmarking study for whole slide classification here.
  • ** means inferred from other numbers provided in the paper

Slide-level / patient-level models

This table includes models that produce slide-level or patient-level embeddings without supervision.

Name Group Weights Released SSL WSIs Patients Batch size Iterations Architecture Parameters Embed dim Patch size Dataset Links
GigaSSL CBIO Dec 2022* SimCLR 12K 1K epochs ResNet-18 256 256 TCGA
PRISM (VL) Paige / Microsoft May 2024* contrastive (with language) 590K (190K text reports) 190K 64 (x4) 75K (10 epochs) Perceiver + BioGPT 1280 224 proprietary
Prov-GigaPath Microsoft / Providence May 2024* DINOv2 170K 30K LongNet 86M 1536 224 proprietary (Providence)
MADELEINE (VL) Mahmood Lab Aug 2024* contrastive (InfoNCE & OT) 16K 2K 120 90 epochs multi-head attention MIL 512 256 ACROBAT, BWH Kidney (proprietary)
CHIEF (VL) Yu Lab Sep 2024*
COBRA Kather Lab Nov 2024* COBRA (MoCo-v3 in FM embedding space) 3K 2.8K 1024 2K epochs Mamba-2 + ABMIL 15M 768 224 TCGA (BRCA, CRC, LUAD, LUSC, STAD)
TITANV (VL) Mahmood Lab Dec 2024* iBOT 340K 1024 91K (270 epochs) ViT (smaller) 42M 224 Mass-340K (proprietary)
THREADS (WSI, RNA, DNA) Mahmood Lab Jan 2025* 47K 1200 up to 101 epochs ViT-L 224 MBTG-47k (MGH, BWH, TCGA, GTEx)

Modern results

Name Link Augmentations Dataset
UNI https://pmc.ncbi.nlm.nih.gov/articles/PMC11403354/pdf/nihms-2015612.pdf To augment the data, we use the large-scale jittering (LSJ) augmentation135, with a random scale sampled from a range of 0.5–2.0, followed by a fixed size crop to 896 × 896pixels to accommodate the size constraints of CTransPath. At inference time, we resize the image dimensions to their nearest multiples of 224. Private
Virchow https://www.nature.com/articles/s41591-024-03141-0 Nothing Private
Hibou https://arxiv.org/abs/2406.05074 Nothing Private
Rudolf https://arxiv.org/pdf/2401.04079 Data augmentation In pathology it is known that staining and scanning outputs vary between labs and even within the same lab over a given period of time. Consequently, in histopathology studies, staining and scanner informa-tion can produce spurious correlations and so-called “Clever Hans” effects [46] when correlated with label information [1]. To address this shortcoming, we transferred and augmented stain and scanner color profiles between patches in addition to the standard color augmentations in the view generation process of DINOv2 [8]. For each view, we picked a random other patch in the batch and transferred the patch color profile to the slide color statistics of the selected patch [47]. This discourages the model from exploiting staining and scanner color features for learning representations. We further added 90 degree rotations as well as horizon-tal and vertical flipping to the augmentations in DINOv2, incorporating the prior that objects on histopathological slides have no canonical orientation. Following [48, 49], we removed the solarization augmentation from the DINOv2standard augmentations. Private

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 67.1%
  • Python 32.9%