Skip to content

Commit

Permalink
Add support for WSI-level classification (#542)
Browse files Browse the repository at this point in the history
* Add base class for WSIs & `openslide` implementation (#365)

* Add WSI dataset classes (#368)

* Add baseline `panda` workflow (#373)

* add panda config

* adjust batch size

* addressed comments

* addressed comments

* addressed comments

* Add support for grid sampling to `WsiDataset` (#377)

* Replaced cached_property in WsiDataset by LRU cache (#388)

* Updated `EmbeddingsWriter` to support multi-embedding file outputs (#384)

* Simple baseline sampler for foreground patches (#394)

* add foreground kaggle

* added random foreground sampler kaggle

* refactored sampler

* moved get_mask

* typo

* refactor sampler

* refactor sampler

* addressed comments

* addressed comments

* Fixed linting in WSI feature branch (#407)

* added openslide-python to all dependencies

* Fix input batch class name (#414)

* Add lower bound for wsi resolution level during mask generation (#412)

* Move sampler logic to `samplers` module and add unit tests (#420)

* Add `WsiClassificationDataset` (#429)

* Retrieve MPP from WSIs  (#432)

* add mpp conversion

* formatting

* addressed comments

* addressed comments

* addressed comments

* formatting

* formatting

* formatting

* updated panda config (#437)

* update WSI foreground segmentation algorithm (#456)

* Add `PANDA` dataset class (#430)

* fixed panda (#463)

* Update `EmbeddingsWriter` to store tensors as lists (#465)

* add tiffslide backend (#468)

* added panda cli unit tests (#470)

* move wsi initialization to setup method of `MultiWsiDataset` (#472)

* 459 add camelyon16 slide level task (#476)

* added panda dataset class

* clean up

* remove samples with noisy labels

* clean up table in dataset readme

* added function for stratified splits

* added unit tests

* cleanup

* addressed comments

* fixed issue with resource download

* validation fix

* updated readme

* added to mkdocs

* added image_dir to exception print

* updated root path in yaml config

* added panda to datasets overview table in docs

* added md5 hash for downloaded resources

* update init

* added camelyon16

* added camelyon16

* updated camelyon16 class

* added tests and config

* formatting

* formatting

* formatting

* formatting

* added test files

* formatting

* lint

* added target transforms

* formatting

* fixed dataset

* addressed comments

* addressed comments

* fix test

* fix test

* fixed test

* addressed comments

* updated loss

* fix annotations

* lint

---------

Co-authored-by: Nicolas Kaenzig <nkaenzig@gmail.com>

* 475 define slide level evaluation protocol (#511)

* updated configs

* adjust patience

* addressed comments

* fixed typo

* remove prefetch factor

* update 360-aggregated-feature before PR to main (#527)

* Updated developer guide (#418)

* Update `TotalSegmentator2D` dataset to fetch all the slices (#416)

* Move metrics to CPU when using single device (#446)

* Remove total segmentator classification dataset (#450)

* updated eva logo (#454)

* updated eva logo

* renamed files

* Update actions/checkout digest to a5ac7e5 (#458)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Add configuration logger (#466)

* Update `README` with paper citation (#474)

* update docs (#482)

* Update img shields of README (#480)

* Fix `torch` and `jsonargparse` versions (#483)

* update depedencies

* update

* bump micro version (#486)

* update config links (#487)

* Update paper citation format (#489)

* Update the vision dataset return types to `tv_tensors` (#478)

* Refactor embeddings writer (#461)

* fixed phikon configs (#493)

* Refactor embeddings datasets (#495)

* Add doc tests and minor fixes (#492)

* support setting download as env-variable (#514)

* updated confis and doc

* typo

* update datasets

* fixed types

* src/eva/core/callbacks/writers/embeddings/base.py

* formatting

* types

---------

Co-authored-by: Nicolas Känzig <36882833+nkaenzig@users.noreply.github.com>
Co-authored-by: ioangatop <johngatop@gmail.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Updated documentation with new datasets and leaderboard (#531)

* updated layout

* updated layout

* addressed comment

* 532 update leaderboard results with slide level tasks (#538)

* updated docs

* update leaderboard

* update docs and links

* updated configs (#539)

* updated leaderboard (#543)

---------

Co-authored-by: Nicolas Känzig <36882833+nkaenzig@users.noreply.github.com>
Co-authored-by: Nicolas Kaenzig <nkaenzig@gmail.com>
Co-authored-by: ioangatop <johngatop@gmail.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
  • Loading branch information
5 people committed Jun 24, 2024
1 parent 5752a24 commit 20da644
Show file tree
Hide file tree
Showing 107 changed files with 3,627 additions and 235 deletions.
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,6 @@ tests/eva/assets/**/*.h5 filter=lfs diff=lfs merge=lfs -text
tests/eva/assets/**/*.png filter=lfs diff=lfs merge=lfs -text
tests/eva/assets/**/*.jpg filter=lfs diff=lfs merge=lfs -text
tests/eva/assets/**/*.tif filter=lfs diff=lfs merge=lfs -text
tests/eva/assets/**/*.tiff filter=lfs diff=lfs merge=lfs -text
tests/eva/assets/**/*.csv filter=lfs diff=lfs merge=lfs -text
tests/eva/assets/**/*.pt filter=lfs diff=lfs merge=lfs -text
6 changes: 6 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,12 @@ jobs:
- "3.10"
runs-on: ${{ matrix.os }}
steps:
- name: Install OS dependencies
run: |
sudo apt update
sudo apt install -y software-properties-common
sudo add-apt-repository ppa:openslide/openslide
sudo apt install -y openslide-tools
- name: Checkout
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29 # v4
with:
Expand Down
30 changes: 14 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,29 +104,27 @@ and [tutorials](https://kaiko-ai.github.io/eva/dev/user-guide/advanced/replicate

In this section you will find model benchmarks which were generated with _`eva`_.

### Table I: WSI patch-level benchmark
### Table I: WSI classification tasks

<br />

<div align="center">

| Model | BACH | CRC | MHIST | PCam/val | PCam/test |
|--------------------------------------------------|-------|-------|-------|----------|-----------|
| ViT-S/16 _(random)_ <sup>[1]</sup> | 0.410 | 0.617 | 0.501 | 0.753 | 0.728 |
| ViT-S/16 _(ImageNet)_ <sup>[1]</sup> | 0.695 | 0.935 | 0.831 | 0.864 | 0.849 |
| ViT-B/8 _(ImageNet)_ <sup>[1]</sup> | 0.710 | 0.939 | 0.814 | 0.870 | 0.856 |
| ViT-L/14 _(ImageNet)_ <sup>[1]</sup> | 0.707 | 0.916 | 0.832 | 0.873 | 0.888 |
| DINO<sub>(p=16)</sub> <sup>[2]</sup> | 0.801 | 0.934 | 0.768 | 0.889 | 0.895 |
| Phikon <sup>[3]</sup> | 0.725 | 0.935 | 0.777 | 0.912 | 0.915 |
| UNI <sup>[4]</sup> | 0.814 | 0.950 | 0.837 | 0.936 | 0.938 |
| ViT-S/16 _(kaiko.ai)_ <sup>[5]</sup> | 0.797 | 0.943 | 0.828 | 0.903 | 0.893 |
| ViT-S/8 _(kaiko.ai)_ <sup>[5]</sup> | 0.834 | 0.946 | 0.832 | 0.897 | 0.887 |
| ViT-B/16 _(kaiko.ai)_ <sup>[5]</sup> | 0.810 | 0.960 | 0.826 | 0.900 | 0.898 |
| ViT-B/8 _(kaiko.ai)_ <sup>[5]</sup> | 0.865 | 0.956 | 0.809 | 0.913 | 0.921 |
| ViT-L/14 _(kaiko.ai)_ <sup>[5]</sup> | 0.870 | 0.930 | 0.809 | 0.908 | 0.898 |
| Model | BACH | CRC | MHIST | PCam | Camelyon16 | PANDA |
|---------|-------|-------|-------|--------|------------|-------|
| ViT-S/16 _(random)_ <sup>[1]</sup> | 0.411|0.613|0.5|0.752|0.551|0.347|
| ViT-S/16 _(ImageNet)_ <sup>[1]</sup> | 0.675|0.936|0.827|0.861|0.751|0.676|
| DINO<sub>(p=16)</sub> <sup>[2]</sup> | 0.77|0.936|0.751|0.905|0.869|0.737|
| Phikon <sup>[3]</sup> | 0.715|0.942|0.766|0.925|0.879|0.784|
| UNI <sup>[4]</sup> | 0.797|0.95|0.835|0.939|0.933|0.774|
| ViT-S/16 _(kaiko.ai)_ <sup>[5]</sup> | 0.8|0.949|0.831|0.902|0.897|0.77|
| ViT-S/8 _(kaiko.ai)_ <sup>[5]</sup> | 0.825|0.948|0.826|0.887|0.879|0.741|
| ViT-B/16 _(kaiko.ai)_ <sup>[5]</sup> | 0.846|0.959|0.839|0.906|0.891|0.753|
| ViT-B/8 _(kaiko.ai)_ <sup>[5]</sup> | 0.867|0.952|0.814|0.921|0.939|0.761|
| ViT-L/14 _(kaiko.ai)_ <sup>[5]</sup> | 0.862|0.935|0.822|0.907|0.941|0.769|

_Table I: Linear probing evaluation of FMs on patch-level downstream datasets.<br> We report averaged balanced accuracy
over 5 runs, with an average standard deviation of ±0.003._
over 5 runs. Results are reported on the "test" split if available and otherwise on the "validation" split.

</div>

Expand Down
134 changes: 134 additions & 0 deletions configs/vision/dino_vit/offline/camelyon16.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
---
trainer:
class_path: eva.Trainer
init_args:
n_runs: &N_RUNS ${oc.env:N_RUNS, 5}
default_root_dir: &OUTPUT_ROOT ${oc.env:OUTPUT_ROOT, logs/${oc.env:DINO_BACKBONE, dino_vits16}/offline/camelyon16}
max_epochs: &MAX_EPOCHS ${oc.env:MAX_EPOCHS, 100}
callbacks:
- class_path: lightning.pytorch.callbacks.LearningRateMonitor
init_args:
logging_interval: epoch
- class_path: lightning.pytorch.callbacks.ModelCheckpoint
init_args:
filename: best
save_last: true
save_top_k: 1
monitor: &MONITOR_METRIC ${oc.env:MONITOR_METRIC, val/BinaryAccuracy}
mode: &MONITOR_METRIC_MODE ${oc.env:MONITOR_METRIC_MODE, max}
- class_path: lightning.pytorch.callbacks.EarlyStopping
init_args:
min_delta: 0
patience: ${oc.env:PATIENCE, 10}
monitor: *MONITOR_METRIC
mode: *MONITOR_METRIC_MODE
- class_path: eva.callbacks.ClassificationEmbeddingsWriter
init_args:
output_dir: &DATASET_EMBEDDINGS_ROOT ${oc.env:EMBEDDINGS_ROOT, ./data/embeddings/${oc.env:DINO_BACKBONE, dino_vits16}/camelyon16}
save_every_n: 10_000
dataloader_idx_map:
0: train
1: val
2: test
metadata_keys: ["wsi_id"]
backbone:
class_path: eva.models.ModelFromFunction
init_args:
path: torch.hub.load
arguments:
repo_or_dir: ${oc.env:REPO_OR_DIR, facebookresearch/dino:main}
model: ${oc.env:DINO_BACKBONE, dino_vits16}
pretrained: ${oc.env:PRETRAINED, true}
force_reload: ${oc.env:FORCE_RELOAD, false}
checkpoint_path: ${oc.env:CHECKPOINT_PATH, null}
logger:
- class_path: lightning.pytorch.loggers.TensorBoardLogger
init_args:
save_dir: *OUTPUT_ROOT
name: ""
model:
class_path: eva.HeadModule
init_args:
head:
class_path: eva.vision.models.networks.ABMIL
init_args:
input_size: ${oc.env:IN_FEATURES, 384}
output_size: &NUM_CLASSES 1
projected_input_size: 128
criterion: torch.nn.BCEWithLogitsLoss
optimizer:
class_path: torch.optim.AdamW
init_args:
lr: ${oc.env:LR_VALUE, 0.001}
betas: [0.9, 0.999]
lr_scheduler:
class_path: torch.optim.lr_scheduler.CosineAnnealingLR
init_args:
T_max: *MAX_EPOCHS
eta_min: 0.0
metrics:
common:
- class_path: eva.metrics.AverageLoss
- class_path: eva.metrics.BinaryClassificationMetrics
data:
class_path: eva.DataModule
init_args:
datasets:
train:
class_path: eva.datasets.MultiEmbeddingsClassificationDataset
init_args: &DATASET_ARGS
root: *DATASET_EMBEDDINGS_ROOT
manifest_file: manifest.csv
split: train
embeddings_transforms:
class_path: eva.core.data.transforms.Pad2DTensor
init_args:
pad_size: 10_000
target_transforms:
class_path: eva.core.data.transforms.dtype.ArrayToFloatTensor
val:
class_path: eva.datasets.MultiEmbeddingsClassificationDataset
init_args:
<<: *DATASET_ARGS
split: val
test:
class_path: eva.datasets.MultiEmbeddingsClassificationDataset
init_args:
<<: *DATASET_ARGS
split: test
predict:
- class_path: eva.vision.datasets.Camelyon16
init_args: &PREDICT_DATASET_ARGS
root: ${oc.env:DATA_ROOT, ./data/camelyon16}
sampler:
class_path: eva.vision.data.wsi.patching.samplers.ForegroundGridSampler
init_args:
max_samples: 10_000
width: 224
height: 224
target_mpp: 0.25
split: train
image_transforms:
class_path: eva.vision.data.transforms.common.ResizeAndCrop
init_args:
size: ${oc.env:RESIZE_DIM, 224}
mean: ${oc.env:NORMALIZE_MEAN, [0.485, 0.456, 0.406]}
std: ${oc.env:NORMALIZE_STD, [0.229, 0.224, 0.225]}
- class_path: eva.vision.datasets.Camelyon16
init_args:
<<: *PREDICT_DATASET_ARGS
split: val
- class_path: eva.vision.datasets.Camelyon16
init_args:
<<: *PREDICT_DATASET_ARGS
split: test
dataloaders:
train:
batch_size: &BATCH_SIZE ${oc.env:BATCH_SIZE, 32}
shuffle: true
val:
batch_size: *BATCH_SIZE
test:
batch_size: *BATCH_SIZE
predict:
batch_size: &PREDICT_BATCH_SIZE ${oc.env:PREDICT_BATCH_SIZE, 64}
133 changes: 133 additions & 0 deletions configs/vision/dino_vit/offline/panda.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
---
trainer:
class_path: eva.Trainer
init_args:
n_runs: &N_RUNS ${oc.env:N_RUNS, 5}
default_root_dir: &OUTPUT_ROOT ${oc.env:OUTPUT_ROOT, logs/${oc.env:DINO_BACKBONE, dino_vits16}/offline/panda}
max_epochs: &MAX_EPOCHS ${oc.env:MAX_EPOCHS, 49}
callbacks:
- class_path: lightning.pytorch.callbacks.LearningRateMonitor
init_args:
logging_interval: epoch
- class_path: lightning.pytorch.callbacks.ModelCheckpoint
init_args:
filename: best
save_last: true
save_top_k: 1
monitor: &MONITOR_METRIC ${oc.env:MONITOR_METRIC, val/MulticlassAccuracy}
mode: &MONITOR_METRIC_MODE ${oc.env:MONITOR_METRIC_MODE, max}
- class_path: lightning.pytorch.callbacks.EarlyStopping
init_args:
min_delta: 0
patience: ${oc.env:PATIENCE, 8}
monitor: *MONITOR_METRIC
mode: *MONITOR_METRIC_MODE
- class_path: eva.callbacks.ClassificationEmbeddingsWriter
init_args:
output_dir: &DATASET_EMBEDDINGS_ROOT ${oc.env:EMBEDDINGS_ROOT, ./data/embeddings/${oc.env:DINO_BACKBONE, dino_vits16}/panda}
dataloader_idx_map:
0: train
1: val
2: test
metadata_keys: ["wsi_id"]
backbone:
class_path: eva.models.ModelFromFunction
init_args:
path: torch.hub.load
arguments:
repo_or_dir: ${oc.env:REPO_OR_DIR, facebookresearch/dino:main}
model: ${oc.env:DINO_BACKBONE, dino_vits16}
pretrained: ${oc.env:PRETRAINED, true}
force_reload: ${oc.env:FORCE_RELOAD, false}
checkpoint_path: ${oc.env:CHECKPOINT_PATH, null}
logger:
- class_path: lightning.pytorch.loggers.TensorBoardLogger
init_args:
save_dir: *OUTPUT_ROOT
name: ""
model:
class_path: eva.HeadModule
init_args:
head:
class_path: eva.vision.models.networks.ABMIL
init_args:
input_size: ${oc.env:IN_FEATURES, 384}
output_size: &NUM_CLASSES 6
projected_input_size: 128
criterion: torch.nn.CrossEntropyLoss
optimizer:
class_path: torch.optim.AdamW
init_args:
lr: ${oc.env:LR_VALUE, 0.001}
betas: [0.9, 0.999]
lr_scheduler:
class_path: torch.optim.lr_scheduler.CosineAnnealingLR
init_args:
T_max: *MAX_EPOCHS
eta_min: 0.0
metrics:
common:
- class_path: eva.metrics.AverageLoss
- class_path: eva.metrics.MulticlassClassificationMetrics
init_args:
num_classes: *NUM_CLASSES
data:
class_path: eva.DataModule
init_args:
datasets:
train:
class_path: eva.datasets.MultiEmbeddingsClassificationDataset
init_args: &DATASET_ARGS
root: *DATASET_EMBEDDINGS_ROOT
manifest_file: manifest.csv
split: train
embeddings_transforms:
class_path: eva.core.data.transforms.Pad2DTensor
init_args:
pad_size: &N_PATCHES 1000
val:
class_path: eva.datasets.MultiEmbeddingsClassificationDataset
init_args:
<<: *DATASET_ARGS
split: val
test:
class_path: eva.datasets.MultiEmbeddingsClassificationDataset
init_args:
<<: *DATASET_ARGS
split: test
predict:
- class_path: eva.vision.datasets.PANDA
init_args: &PREDICT_DATASET_ARGS
root: ${oc.env:DATA_ROOT, ./data/panda/prostate-cancer-grade-assessment}
sampler:
class_path: eva.vision.data.wsi.patching.samplers.ForegroundGridSampler
init_args:
max_samples: *N_PATCHES
width: 224
height: 224
target_mpp: 0.5
split: train
image_transforms:
class_path: eva.vision.data.transforms.common.ResizeAndCrop
init_args:
size: ${oc.env:RESIZE_DIM, 224}
mean: ${oc.env:NORMALIZE_MEAN, [0.485, 0.456, 0.406]}
std: ${oc.env:NORMALIZE_STD, [0.229, 0.224, 0.225]}
- class_path: eva.vision.datasets.PANDA
init_args:
<<: *PREDICT_DATASET_ARGS
split: val
- class_path: eva.vision.datasets.PANDA
init_args:
<<: *PREDICT_DATASET_ARGS
split: test
dataloaders:
train:
batch_size: &BATCH_SIZE ${oc.env:BATCH_SIZE, 32}
shuffle: true
val:
batch_size: *BATCH_SIZE
test:
batch_size: *BATCH_SIZE
predict:
batch_size: &PREDICT_BATCH_SIZE ${oc.env:PREDICT_BATCH_SIZE, 64}
Loading

0 comments on commit 20da644

Please sign in to comment.