Repository of DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology that uses DINOv2 and is adapted from their original Github repository. DinoBloom is a model family (ViTs) trained on a large cohort of 13 diverse publicly available datasets of single cells in peripheral blood and bone marrow. The trained models in the can be downloaded on zenodo in the variants DinoBloom-S, DinoBloom-B, DinoBloom-L and DinoBloom-G. We show that our models outperforms existing medical and non-medical vision models in (i) linear probing and k-nearest neighbor evaluations for cell-type classification on peripheral blood and bone marrow smears and (ii) weakly supervised multiple instance learning for acute myeloid leukemia subtyping by a large margin.
Model | Feature dim | #params | Weights |
---|---|---|---|
DinoBloom-S | 384 | 22M | Download |
DinoBloom-B | 768 | 86M | Download |
DinoBloom-L | 1024 | 304M | Download |
DinoBloom-G | 1536 | 1136M | Download |
To train the model you need to specify the folder with .txt files holding the paths of the images you want to use to train in dinov2/configs/train/custom.yaml for training on a single GPU run:
python dinov2/train/train.py --config-file dinov2/configs/train/custom.yaml
for multiple GPUs on one node run
torchrun --nproc_per_node=#num_gpus dinov2/train/train.py --config-file dinov2/configs/train/custom.yaml
We provide a sample google colab notebook that shows feature extraction and how to do PCA visualization.
If you find this repository useful, please consider citing our work:
@misc{koch2024dinobloom,
title={DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology},
author={Valentin Koch and Sophia J. Wagner and Salome Kazeminia and Ece Sancar and Matthias Hehr and Julia Schnabel and Tingying Peng and Carsten Marr},
year={2024},
eprint={2404.05022},
archivePrefix={arXiv},
primaryClass={cs.CV}
}