Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for WSI-level classification #542

Merged
merged 31 commits into from
Jun 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
1232ea2
Add base class for WSIs & `openslide` implementation (#365)
nkaenzig Apr 10, 2024
d71cb3c
Add WSI dataset classes (#368)
nkaenzig Apr 10, 2024
fef6819
Add baseline `panda` workflow (#373)
roman807 Apr 16, 2024
1faa3c0
Add support for grid sampling to `WsiDataset` (#377)
nkaenzig Apr 23, 2024
278b758
Replaced cached_property in WsiDataset by LRU cache (#388)
nkaenzig Apr 25, 2024
5ce416c
Updated `EmbeddingsWriter` to support multi-embedding file outputs (#…
nkaenzig Apr 26, 2024
0319661
Simple baseline sampler for foreground patches (#394)
roman807 Apr 30, 2024
2d28f61
Fixed linting in WSI feature branch (#407)
nkaenzig Apr 30, 2024
1a03603
Merge remote-tracking branch 'origin/main' into 360-aggregated-featur…
nkaenzig May 2, 2024
d504b2c
added openslide-python to all dependencies
nkaenzig May 6, 2024
ecaa5ee
Fix input batch class name (#414)
roman807 May 6, 2024
15874f5
Add lower bound for wsi resolution level during mask generation (#412)
nkaenzig May 7, 2024
b6c5f52
Move sampler logic to `samplers` module and add unit tests (#420)
nkaenzig May 7, 2024
4bdecbe
Add `WsiClassificationDataset` (#429)
nkaenzig May 8, 2024
1211e5a
Retrieve MPP from WSIs (#432)
roman807 May 10, 2024
7fb4445
updated panda config (#437)
roman807 May 13, 2024
e3e5671
update WSI foreground segmentation algorithm (#456)
nkaenzig May 21, 2024
b3f0f43
Add `PANDA` dataset class (#430)
nkaenzig May 22, 2024
28e69be
fixed panda (#463)
roman807 May 22, 2024
7afc09c
Update `EmbeddingsWriter` to store tensors as lists (#465)
nkaenzig May 23, 2024
bd0c83d
add tiffslide backend (#468)
nkaenzig May 24, 2024
e9021cf
added panda cli unit tests (#470)
nkaenzig May 27, 2024
2efbca5
move wsi initialization to setup method of `MultiWsiDataset` (#472)
nkaenzig May 29, 2024
80ec72e
459 add camelyon16 slide level task (#476)
roman807 Jun 6, 2024
b91f1b5
475 define slide level evaluation protocol (#511)
roman807 Jun 10, 2024
094b9e4
update 360-aggregated-feature before PR to main (#527)
roman807 Jun 11, 2024
ecfd966
Updated documentation with new datasets and leaderboard (#531)
roman807 Jun 12, 2024
4e714cb
532 update leaderboard results with slide level tasks (#538)
roman807 Jun 17, 2024
0656202
updated configs (#539)
roman807 Jun 18, 2024
b105d4e
merged main and solved conflicts
roman807 Jun 18, 2024
6de261f
updated leaderboard (#543)
roman807 Jun 21, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,6 @@ tests/eva/assets/**/*.h5 filter=lfs diff=lfs merge=lfs -text
tests/eva/assets/**/*.png filter=lfs diff=lfs merge=lfs -text
tests/eva/assets/**/*.jpg filter=lfs diff=lfs merge=lfs -text
tests/eva/assets/**/*.tif filter=lfs diff=lfs merge=lfs -text
tests/eva/assets/**/*.tiff filter=lfs diff=lfs merge=lfs -text
tests/eva/assets/**/*.csv filter=lfs diff=lfs merge=lfs -text
tests/eva/assets/**/*.pt filter=lfs diff=lfs merge=lfs -text
6 changes: 6 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,12 @@ jobs:
- "3.10"
runs-on: ${{ matrix.os }}
steps:
- name: Install OS dependencies
run: |
sudo apt update
sudo apt install -y software-properties-common
sudo add-apt-repository ppa:openslide/openslide
sudo apt install -y openslide-tools
- name: Checkout
uses: actions/checkout@a5ac7e51b41094c92402da3b24376905380afc29 # v4
with:
Expand Down
30 changes: 14 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,29 +104,27 @@ and [tutorials](https://kaiko-ai.github.io/eva/dev/user-guide/advanced/replicate

In this section you will find model benchmarks which were generated with _`eva`_.

### Table I: WSI patch-level benchmark
### Table I: WSI classification tasks

<br />

<div align="center">

| Model | BACH | CRC | MHIST | PCam/val | PCam/test |
|--------------------------------------------------|-------|-------|-------|----------|-----------|
| ViT-S/16 _(random)_ <sup>[1]</sup> | 0.410 | 0.617 | 0.501 | 0.753 | 0.728 |
| ViT-S/16 _(ImageNet)_ <sup>[1]</sup> | 0.695 | 0.935 | 0.831 | 0.864 | 0.849 |
| ViT-B/8 _(ImageNet)_ <sup>[1]</sup> | 0.710 | 0.939 | 0.814 | 0.870 | 0.856 |
| ViT-L/14 _(ImageNet)_ <sup>[1]</sup> | 0.707 | 0.916 | 0.832 | 0.873 | 0.888 |
| DINO<sub>(p=16)</sub> <sup>[2]</sup> | 0.801 | 0.934 | 0.768 | 0.889 | 0.895 |
| Phikon <sup>[3]</sup> | 0.725 | 0.935 | 0.777 | 0.912 | 0.915 |
| UNI <sup>[4]</sup> | 0.814 | 0.950 | 0.837 | 0.936 | 0.938 |
| ViT-S/16 _(kaiko.ai)_ <sup>[5]</sup> | 0.797 | 0.943 | 0.828 | 0.903 | 0.893 |
| ViT-S/8 _(kaiko.ai)_ <sup>[5]</sup> | 0.834 | 0.946 | 0.832 | 0.897 | 0.887 |
| ViT-B/16 _(kaiko.ai)_ <sup>[5]</sup> | 0.810 | 0.960 | 0.826 | 0.900 | 0.898 |
| ViT-B/8 _(kaiko.ai)_ <sup>[5]</sup> | 0.865 | 0.956 | 0.809 | 0.913 | 0.921 |
| ViT-L/14 _(kaiko.ai)_ <sup>[5]</sup> | 0.870 | 0.930 | 0.809 | 0.908 | 0.898 |
| Model | BACH | CRC | MHIST | PCam | Camelyon16 | PANDA |
|---------|-------|-------|-------|--------|------------|-------|
| ViT-S/16 _(random)_ <sup>[1]</sup> | 0.411|0.613|0.5|0.752|0.551|0.347|
| ViT-S/16 _(ImageNet)_ <sup>[1]</sup> | 0.675|0.936|0.827|0.861|0.751|0.676|
| DINO<sub>(p=16)</sub> <sup>[2]</sup> | 0.77|0.936|0.751|0.905|0.869|0.737|
| Phikon <sup>[3]</sup> | 0.715|0.942|0.766|0.925|0.879|0.784|
| UNI <sup>[4]</sup> | 0.797|0.95|0.835|0.939|0.933|0.774|
| ViT-S/16 _(kaiko.ai)_ <sup>[5]</sup> | 0.8|0.949|0.831|0.902|0.897|0.77|
| ViT-S/8 _(kaiko.ai)_ <sup>[5]</sup> | 0.825|0.948|0.826|0.887|0.879|0.741|
| ViT-B/16 _(kaiko.ai)_ <sup>[5]</sup> | 0.846|0.959|0.839|0.906|0.891|0.753|
| ViT-B/8 _(kaiko.ai)_ <sup>[5]</sup> | 0.867|0.952|0.814|0.921|0.939|0.761|
| ViT-L/14 _(kaiko.ai)_ <sup>[5]</sup> | 0.862|0.935|0.822|0.907|0.941|0.769|

_Table I: Linear probing evaluation of FMs on patch-level downstream datasets.<br> We report averaged balanced accuracy
over 5 runs, with an average standard deviation of ±0.003._
over 5 runs. Results are reported on the "test" split if available and otherwise on the "validation" split.

</div>

Expand Down
134 changes: 134 additions & 0 deletions configs/vision/dino_vit/offline/camelyon16.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
---
trainer:
class_path: eva.Trainer
init_args:
n_runs: &N_RUNS ${oc.env:N_RUNS, 5}
default_root_dir: &OUTPUT_ROOT ${oc.env:OUTPUT_ROOT, logs/${oc.env:DINO_BACKBONE, dino_vits16}/offline/camelyon16}
max_epochs: &MAX_EPOCHS ${oc.env:MAX_EPOCHS, 100}
callbacks:
- class_path: lightning.pytorch.callbacks.LearningRateMonitor
init_args:
logging_interval: epoch
- class_path: lightning.pytorch.callbacks.ModelCheckpoint
init_args:
filename: best
save_last: true
save_top_k: 1
monitor: &MONITOR_METRIC ${oc.env:MONITOR_METRIC, val/BinaryAccuracy}
mode: &MONITOR_METRIC_MODE ${oc.env:MONITOR_METRIC_MODE, max}
- class_path: lightning.pytorch.callbacks.EarlyStopping
init_args:
min_delta: 0
patience: ${oc.env:PATIENCE, 10}
monitor: *MONITOR_METRIC
mode: *MONITOR_METRIC_MODE
- class_path: eva.callbacks.ClassificationEmbeddingsWriter
init_args:
output_dir: &DATASET_EMBEDDINGS_ROOT ${oc.env:EMBEDDINGS_ROOT, ./data/embeddings/${oc.env:DINO_BACKBONE, dino_vits16}/camelyon16}
save_every_n: 10_000
dataloader_idx_map:
0: train
1: val
2: test
metadata_keys: ["wsi_id"]
backbone:
class_path: eva.models.ModelFromFunction
init_args:
path: torch.hub.load
arguments:
repo_or_dir: ${oc.env:REPO_OR_DIR, facebookresearch/dino:main}
model: ${oc.env:DINO_BACKBONE, dino_vits16}
pretrained: ${oc.env:PRETRAINED, true}
force_reload: ${oc.env:FORCE_RELOAD, false}
checkpoint_path: ${oc.env:CHECKPOINT_PATH, null}
logger:
- class_path: lightning.pytorch.loggers.TensorBoardLogger
init_args:
save_dir: *OUTPUT_ROOT
name: ""
model:
class_path: eva.HeadModule
init_args:
head:
class_path: eva.vision.models.networks.ABMIL
init_args:
input_size: ${oc.env:IN_FEATURES, 384}
output_size: &NUM_CLASSES 1
projected_input_size: 128
criterion: torch.nn.BCEWithLogitsLoss
optimizer:
class_path: torch.optim.AdamW
init_args:
lr: ${oc.env:LR_VALUE, 0.001}
betas: [0.9, 0.999]
lr_scheduler:
class_path: torch.optim.lr_scheduler.CosineAnnealingLR
init_args:
T_max: *MAX_EPOCHS
eta_min: 0.0
metrics:
common:
- class_path: eva.metrics.AverageLoss
- class_path: eva.metrics.BinaryClassificationMetrics
data:
class_path: eva.DataModule
init_args:
datasets:
train:
class_path: eva.datasets.MultiEmbeddingsClassificationDataset
init_args: &DATASET_ARGS
root: *DATASET_EMBEDDINGS_ROOT
manifest_file: manifest.csv
split: train
embeddings_transforms:
class_path: eva.core.data.transforms.Pad2DTensor
init_args:
pad_size: 10_000
target_transforms:
class_path: eva.core.data.transforms.dtype.ArrayToFloatTensor
val:
class_path: eva.datasets.MultiEmbeddingsClassificationDataset
init_args:
<<: *DATASET_ARGS
split: val
test:
class_path: eva.datasets.MultiEmbeddingsClassificationDataset
init_args:
<<: *DATASET_ARGS
split: test
predict:
- class_path: eva.vision.datasets.Camelyon16
init_args: &PREDICT_DATASET_ARGS
root: ${oc.env:DATA_ROOT, ./data/camelyon16}
sampler:
class_path: eva.vision.data.wsi.patching.samplers.ForegroundGridSampler
init_args:
max_samples: 10_000
width: 224
height: 224
target_mpp: 0.25
split: train
image_transforms:
class_path: eva.vision.data.transforms.common.ResizeAndCrop
init_args:
size: ${oc.env:RESIZE_DIM, 224}
mean: ${oc.env:NORMALIZE_MEAN, [0.485, 0.456, 0.406]}
std: ${oc.env:NORMALIZE_STD, [0.229, 0.224, 0.225]}
- class_path: eva.vision.datasets.Camelyon16
init_args:
<<: *PREDICT_DATASET_ARGS
split: val
- class_path: eva.vision.datasets.Camelyon16
init_args:
<<: *PREDICT_DATASET_ARGS
split: test
dataloaders:
train:
batch_size: &BATCH_SIZE ${oc.env:BATCH_SIZE, 32}
shuffle: true
val:
batch_size: *BATCH_SIZE
test:
batch_size: *BATCH_SIZE
predict:
batch_size: &PREDICT_BATCH_SIZE ${oc.env:PREDICT_BATCH_SIZE, 64}
133 changes: 133 additions & 0 deletions configs/vision/dino_vit/offline/panda.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
---
trainer:
class_path: eva.Trainer
init_args:
n_runs: &N_RUNS ${oc.env:N_RUNS, 5}
default_root_dir: &OUTPUT_ROOT ${oc.env:OUTPUT_ROOT, logs/${oc.env:DINO_BACKBONE, dino_vits16}/offline/panda}
max_epochs: &MAX_EPOCHS ${oc.env:MAX_EPOCHS, 49}
callbacks:
- class_path: lightning.pytorch.callbacks.LearningRateMonitor
init_args:
logging_interval: epoch
- class_path: lightning.pytorch.callbacks.ModelCheckpoint
init_args:
filename: best
save_last: true
save_top_k: 1
monitor: &MONITOR_METRIC ${oc.env:MONITOR_METRIC, val/MulticlassAccuracy}
mode: &MONITOR_METRIC_MODE ${oc.env:MONITOR_METRIC_MODE, max}
- class_path: lightning.pytorch.callbacks.EarlyStopping
init_args:
min_delta: 0
patience: ${oc.env:PATIENCE, 8}
monitor: *MONITOR_METRIC
mode: *MONITOR_METRIC_MODE
- class_path: eva.callbacks.ClassificationEmbeddingsWriter
init_args:
output_dir: &DATASET_EMBEDDINGS_ROOT ${oc.env:EMBEDDINGS_ROOT, ./data/embeddings/${oc.env:DINO_BACKBONE, dino_vits16}/panda}
dataloader_idx_map:
0: train
1: val
2: test
metadata_keys: ["wsi_id"]
backbone:
class_path: eva.models.ModelFromFunction
init_args:
path: torch.hub.load
arguments:
repo_or_dir: ${oc.env:REPO_OR_DIR, facebookresearch/dino:main}
model: ${oc.env:DINO_BACKBONE, dino_vits16}
pretrained: ${oc.env:PRETRAINED, true}
force_reload: ${oc.env:FORCE_RELOAD, false}
checkpoint_path: ${oc.env:CHECKPOINT_PATH, null}
logger:
- class_path: lightning.pytorch.loggers.TensorBoardLogger
init_args:
save_dir: *OUTPUT_ROOT
name: ""
model:
class_path: eva.HeadModule
init_args:
head:
class_path: eva.vision.models.networks.ABMIL
init_args:
input_size: ${oc.env:IN_FEATURES, 384}
output_size: &NUM_CLASSES 6
projected_input_size: 128
criterion: torch.nn.CrossEntropyLoss
optimizer:
class_path: torch.optim.AdamW
init_args:
lr: ${oc.env:LR_VALUE, 0.001}
betas: [0.9, 0.999]
lr_scheduler:
class_path: torch.optim.lr_scheduler.CosineAnnealingLR
init_args:
T_max: *MAX_EPOCHS
eta_min: 0.0
metrics:
common:
- class_path: eva.metrics.AverageLoss
- class_path: eva.metrics.MulticlassClassificationMetrics
init_args:
num_classes: *NUM_CLASSES
data:
class_path: eva.DataModule
init_args:
datasets:
train:
class_path: eva.datasets.MultiEmbeddingsClassificationDataset
init_args: &DATASET_ARGS
root: *DATASET_EMBEDDINGS_ROOT
manifest_file: manifest.csv
split: train
embeddings_transforms:
class_path: eva.core.data.transforms.Pad2DTensor
init_args:
pad_size: &N_PATCHES 1000
val:
class_path: eva.datasets.MultiEmbeddingsClassificationDataset
init_args:
<<: *DATASET_ARGS
split: val
test:
class_path: eva.datasets.MultiEmbeddingsClassificationDataset
init_args:
<<: *DATASET_ARGS
split: test
predict:
- class_path: eva.vision.datasets.PANDA
init_args: &PREDICT_DATASET_ARGS
root: ${oc.env:DATA_ROOT, ./data/panda/prostate-cancer-grade-assessment}
sampler:
class_path: eva.vision.data.wsi.patching.samplers.ForegroundGridSampler
init_args:
max_samples: *N_PATCHES
width: 224
height: 224
target_mpp: 0.5
split: train
image_transforms:
class_path: eva.vision.data.transforms.common.ResizeAndCrop
init_args:
size: ${oc.env:RESIZE_DIM, 224}
mean: ${oc.env:NORMALIZE_MEAN, [0.485, 0.456, 0.406]}
std: ${oc.env:NORMALIZE_STD, [0.229, 0.224, 0.225]}
- class_path: eva.vision.datasets.PANDA
init_args:
<<: *PREDICT_DATASET_ARGS
split: val
- class_path: eva.vision.datasets.PANDA
init_args:
<<: *PREDICT_DATASET_ARGS
split: test
dataloaders:
train:
batch_size: &BATCH_SIZE ${oc.env:BATCH_SIZE, 32}
shuffle: true
val:
batch_size: *BATCH_SIZE
test:
batch_size: *BATCH_SIZE
predict:
batch_size: &PREDICT_BATCH_SIZE ${oc.env:PREDICT_BATCH_SIZE, 64}
Loading