GitHub - scjjb/Ovarian_Features: Histopathology Foundation Models Enable Accurate Ovarian Cancer Subtype Classification

Histopathology Foundation Models Enable Accurate Ovarian Cancer Subtype Classification

An extensive analysis of feature extraction techniques in attention-based multiple instance learning (ABMIL)

Ovarian cancer histological subtype classification using a total of 12 feature extraction techniques. This includes:

Three feature extraction model architectures (ResNet50, ResNet18, ViT-L)
Three pretraining strategies (ImageNet, Histo-ResNet18 with SimCLR, Histo-ViT ('UNI') with DINOv2)
Two normalisation strategies (Reinhard Normalisation, Macenko Normalisation)
Three quantities of colour-augmented training data (5x, 10x, 20x)
Two tissue segmentation strategies (CLAM-default saturation thresholding, Otsu saturation thresholding)

Hyperparameters

Final Hyperparamters Determined by Hyperparameter Tuning:

Model	Learning Rate	Weight Decay	First Moment Decay	Second Moment Decay	Stability Parameter	Model Size	Dropout	Max Patches	LR Decay Factor	LR Decay Patience
ResNet50 (RN50)	2e-3	1e-3	0.75	0.95	1e-2	20	0.75	[512,128]	0.4	800
RN50 Reinhard	2e-3	1e-3	0.75	0.95	1e-2	25	0.75	[512,256]	0.4	400
RN50 Macenko	2e-3	1e-3	0.85	0.95	1e-2	15	0.75	[512,128]	0.3	400
RN50 Otsu	2e-3	1e-3	0.75	0.95	1e-2	15	0.9	[512,256]	0.1	600
RN50 Otsu+Macenko	2e-3	1e-4	0.75	0.99	1e-3	25	0.9	[512,256]	0.3	1000
RN50 5Augs	1e-3	1e-4	0.8	0.99	1e-4	25	0.6	[128,32]	0.4	700
RN50 10Augs	2e-3	1e-3	0.8	0.99	1e-2	20	0.75	[512,256]	0.4	700
RN50 20 Augs	2e-3	1e-4	0.7	0.999	1e-3	20	0.75	[512,128]	0.6	1000
ResNet18 (RN18)	1e-4	1e-5	0.8	0.99	1e-4	20	0.9	[1024,256]	0.5	700
RN18 Histo	2e-4	1e-4	0.9	0.99	1e-4	20	0.9	[512,512]	0.6	1000
ViT	5e-5	1e-1	0.85	0.999	1e-3	10	0.35	[512,384]	0.0	800
ViT Histo	1e-5	1e-3	0.9	0.999	1e-5	10	0.75	[512,256]	0.0	1000

Hyperparameters were tuned in 19 stages in which 1-5 individual hyperparameters were altered and the rest were frozen. All specific configurations can be accessed in the folder tuning_configs. The tuning patience was set to 20 for stages 1-7.1, and 30 for stages 7.2-19. The overall maximum epochs was 300 for every evaluation.

Hyperparameter Tuning Stages

An issue with unstable random seeds effected some early experiments, but this was resolved before tuning stage 11 for each model. Models which were not effected by this were not subject to tuning stages 11 and 12, which repeated previous models using fixed random seeds.

Stage 1: Learning Rate, Model Size
Stage 2: Dropout, Max Patches
Stage 3: First Moment Decay, Second Moment Decay
Stage 4: Weight Decay, Learning Rate
Stage 5: First Moment Decay, Stability Parameter
Stage 6: Model Size, Max Patches
Stage 7: LR Decay Factor, LR Decay Patience
Stage 8: Learning Rate, Dropout
Stage 9: Model Size
Stage 10: Learning Rate, Model Size, LR Decay Patience
Stage 11: Repeat of stage 10 with fixed random seeds
Stage 12: Repeat of best from first 9 stages with fixed random seeds
Stage 13: Dropout, Max Patches
Stage 14: LR Decay Factor, LR Decay Patience
Stage 15: Learning Rate, Model Size
Stage 16: Max Patches, Weight Decay
Stage 17: Model Size
Stage 18: First Moment Decay, Second Moment Decay
Stage 19: Learning Rate, First Moment Decay, Model Size, Dropout, Max Patches

Results

Five-class ovarian cancer subtype classification results from three validations. Stratified 5-fold cross-validation used all available training data (1864 WSIs from 434 patients). Independent hold-out testing and external validation used an ensemble of the five cross-validation models. Hold-out testing was performed using 100 WSIs from 30 patients, and external validation was performed using 80 WSIs from 80 patients from the Transcanadian Study.

Confusion Matrices

ResNet50 Baseline Cross-Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	1175	19	25	37	10
LGSC	62	22	2	4	2
CCC	60	4	120	6	8
EC	48	4	1	131	25
MC	8	0	6	40	45

class 0 precision: 0.86844 recall: 0.92812 f1: 0.89729

class 1 precision: 0.44898 recall: 0.23913 f1: 0.31206

class 2 precision: 0.77922 recall: 0.60606 f1: 0.68182

class 3 precision: 0.60092 recall: 0.62679 f1: 0.61358

class 4 precision: 0.50000 recall: 0.45455 f1: 0.47619

ResNet50 Baseline Hold-out Testing

	HGSC	LGSC	CCC	EC	MC
HGSC	20	0	0	0	0
LGSC	6	6	0	4	4
CCC	9	2	8	1	0
EC	3	0	0	17	0
MC	3	0	2	0	15

class 0 precision: 0.48780 recall: 1.00000 f1: 0.65574

class 1 precision: 0.75000 recall: 0.30000 f1: 0.42857

class 2 precision: 0.80000 recall: 0.40000 f1: 0.53333

class 3 precision: 0.77273 recall: 0.85000 f1: 0.80952

class 4 precision: 0.78947 recall: 0.75000 f1: 0.76923

ResNet50 Baseline External Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	26	0	0	4	0
LGSC	7	2	0	0	0
CCC	2	1	17	0	0
EC	2	0	0	9	0
MC	0	0	0	3	7

class 0 precision: 0.70270 recall: 0.86667 f1: 0.77612

class 1 precision: 0.66667 recall: 0.22222 f1: 0.33333

class 2 precision: 1.00000 recall: 0.85000 f1: 0.91892

class 3 precision: 0.56250 recall: 0.81818 f1: 0.66667

class 4 precision: 1.00000 recall: 0.70000 f1: 0.82353

ResNet50 Reinhard Normalised Cross-Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	1158	8	60	25	15
LGSC	70	13	5	1	3
CCC	56	5	126	0	11
EC	54	8	3	89	55
MC	11	0	8	36	44

ResNet50 Reinhard Normalised Hold-out Testing

	HGSC	LGSC	CCC	EC	MC
HGSC	20	0	0	0	0
LGSC	7	6	0	3	4
CCC	9	1	10	0	0
EC	5	1	0	13	1
MC	2	0	2	0	16

ResNet50 Reinhard Normalised External Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	25	0	2	3	0
LGSC	4	4	1	0	0
CCC	3	1	16	0	0
EC	1	0	0	10	0
MC	0	0	0	2	8

class 0 precision: 0.75758 recall: 0.83333 f1: 0.79365

class 1 precision: 0.80000 recall: 0.44444 f1: 0.57143

class 2 precision: 0.84211 recall: 0.80000 f1: 0.82051

class 3 precision: 0.66667 recall: 0.90909 f1: 0.76923

class 4 precision: 1.00000 recall: 0.80000 f1: 0.88889

ResNet50 Macenko Normalised Cross-Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	1154	23	50	33	6
LGSC	55	31	1	3	2
CCC	68	3	120	1	6
EC	48	9	1	130	21
MC	9	1	7	41	41

class 0 precision: 0.86507 recall: 0.91153 f1: 0.88769

class 1 precision: 0.46269 recall: 0.33696 f1: 0.38994

class 2 precision: 0.67039 recall: 0.60606 f1: 0.63660

class 3 precision: 0.62500 recall: 0.62201 f1: 0.62350

class 4 precision: 0.53947 recall: 0.41414 f1: 0.46857

ResNet50 Macenko Normalised Hold-out Testing

	HGSC	LGSC	CCC	EC	MC
HGSC	20	0	0	0	0
LGSC	7	7	0	4	2
CCC	10	1	9	0	0
EC	4	2	0	14	0
MC	5	0	2	0	13

class 0 precision: 0.43478 recall: 1.00000 f1: 0.60606

class 1 precision: 0.70000 recall: 0.35000 f1: 0.46667

class 2 precision: 0.81818 recall: 0.45000 f1: 0.58065

class 3 precision: 0.77778 recall: 0.70000 f1: 0.73684

class 4 precision: 0.86667 recall: 0.65000 f1: 0.74286

ResNet50 Macenko Normalised External Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	29	0	0	1	0
LGSC	6	3	0	0	0
CCC	2	2	16	0	0
EC	2	0	0	8	1
MC	0	0	0	1	9

class 0 precision: 0.74359 recall: 0.96667 f1: 0.84058

class 1 precision: 0.60000 recall: 0.33333 f1: 0.42857

class 2 precision: 1.00000 recall: 0.80000 f1: 0.88889

class 3 precision: 0.80000 recall: 0.72727 f1: 0.76190

class 4 precision: 0.90000 recall: 0.90000 f1: 0.90000

ResNet50 Otsu Thresholding Cross-Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	1169	28	23	34	12
LGSC	62	25	1	2	2
CCC	73	4	113	3	5
EC	55	2	5	116	31
MC	10	0	7	45	37

class 0 precision: 0.85391 recall: 0.92338 f1: 0.88729

class 1 precision: 0.42373 recall: 0.27174 f1: 0.33113

class 2 precision: 0.75839 recall: 0.57071 f1: 0.65130

class 3 precision: 0.58000 recall: 0.55502 f1: 0.56724

class 4 precision: 0.42529 recall: 0.37374 f1: 0.39785

ResNet50 Otsu Thresholding Hold-out Testing

	HGSC	LGSC	CCC	EC	MC
HGSC	20	0	0	0	0
LGSC	6	9	0	3	2
CCC	10	3	7	0	0
EC	3	1	0	16	0
MC	5	0	2	0	13

class 0 precision: 0.45455 recall: 1.00000 f1: 0.62500

class 1 precision: 0.69231 recall: 0.45000 f1: 0.54545

class 2 precision: 0.77778 recall: 0.35000 f1: 0.48276

class 3 precision: 0.84211 recall: 0.80000 f1: 0.82051

class 4 precision: 0.86667 recall: 0.65000 f1: 0.74286

ResNet50 Otsu Thresholding External Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	30	0	0	0	0
LGSC	5	4	0	0	0
CCC	3	1	16	0	0
EC	2	0	0	9	0
MC	0	0	0	2	8

class 0 precision: 0.75000 recall: 1.00000 f1: 0.85714

class 1 precision: 0.80000 recall: 0.44444 f1: 0.57143

class 2 precision: 1.00000 recall: 0.80000 f1: 0.88889

class 3 precision: 0.81818 recall: 0.81818 f1: 0.81818

class 4 precision: 1.00000 recall: 0.80000 f1: 0.88889

ResNet50 Otsu+Macenko Cross-Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	1163	34	33	32	4
LGSC	54	29	6	2	1
CCC	69	3	118	3	5
EC	50	6	2	127	24
MC	12	1	3	37	46

class 0 precision: 0.86276 recall: 0.91864 f1: 0.88982

class 1 precision: 0.39726 recall: 0.31522 f1: 0.35152

class 2 precision: 0.72840 recall: 0.59596 f1: 0.65556

class 3 precision: 0.63184 recall: 0.60766 f1: 0.61951

class 4 precision: 0.57500 recall: 0.46465 f1: 0.51397

ResNet50 Otsu+Macenko Hold-out Testing

	HGSC	LGSC	CCC	EC	MC
HGSC	20	0	0	0	0
LGSC	8	7	0	3	2
CCC	9	2	9	0	0
EC	4	2	0	14	0
MC	7	0	2	2	9

class 0 precision: 0.41667 recall: 1.00000 f1: 0.58824

class 1 precision: 0.63636 recall: 0.35000 f1: 0.45161

class 2 precision: 0.81818 recall: 0.45000 f1: 0.58065

class 3 precision: 0.73684 recall: 0.70000 f1: 0.71795

class 4 precision: 0.81818 recall: 0.45000 f1: 0.58065

ResNet50 Otsu+Macenko External Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	29	0	0	1	0
LGSC	5	4	0	0	0
CCC	2	0	18	0	0
EC	2	0	0	9	0
MC	1	0	0	0	9

class 0 precision: 0.74359 recall: 0.96667 f1: 0.84058

class 1 precision: 1.00000 recall: 0.44444 f1: 0.61538

class 2 precision: 1.00000 recall: 0.90000 f1: 0.94737

class 3 precision: 0.90000 recall: 0.81818 f1: 0.85714

class 4 precision: 1.00000 recall: 0.90000 f1: 0.94737

ResNet50 5Augs Cross-Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	1144	41	23	51	7
LGSC	56	29	4	2	1
CCC	67	6	116	2	7
EC	53	6	0	121	29
MC	12	0	4	35	48

ResNet50 5Augs Hold-out Testing

	HGSC	LGSC	CCC	EC	MC
HGSC	20	0	0	0	0
LGSC	6	7	1	4	2
CCC	10	3	7	0	0
EC	4	0	0	16	0
MC	3	0	2	0	15

ResNet50 5Augs External Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	28	0	0	2	0
LGSC	4	4	0	1	0
CCC	2	1	17	0	0
EC	2	0	0	9	0
MC	0	0	0	3	7

class 0 precision: 0.77778 recall: 0.93333 f1: 0.84848

class 1 precision: 0.80000 recall: 0.44444 f1: 0.57143

class 2 precision: 1.00000 recall: 0.85000 f1: 0.91892

class 3 precision: 0.60000 recall: 0.81818 f1: 0.69231

class 4 precision: 1.00000 recall: 0.70000 f1: 0.82353

ResNet50 10Augs Cross-Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	1164	29	25	42	6
LGSC	57	32	1	1	1
CCC	55	6	132	5	0
EC	47	6	1	131	24
MC	10	0	7	43	39

class 0 precision: 0.87322 recall: 0.91943 f1: 0.89573

class 1 precision: 0.43836 recall: 0.34783 f1: 0.38788

class 2 precision: 0.79518 recall: 0.66667 f1: 0.72527

class 3 precision: 0.59009 recall: 0.62679 f1: 0.60789

class 4 precision: 0.55714 recall: 0.39394 f1: 0.46154

ResNet50 10Augs Hold-out Testing

	HGSC	LGSC	CCC	EC	MC
HGSC	19	0	0	1	0
LGSC	6	7	0	4	3
CCC	11	3	6	0	0
EC	4	0	0	16	0
MC	2	0	2	0	16

class 0 precision: 0.45238 recall: 0.95000 f1: 0.61290

class 1 precision: 0.70000 recall: 0.35000 f1: 0.46667

class 2 precision: 0.75000 recall: 0.30000 f1: 0.42857

class 3 precision: 0.76190 recall: 0.80000 f1: 0.78049

class 4 precision: 0.84211 recall: 0.80000 f1: 0.82051

ResNet50 10Augs External Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	28	0	0	2	0
LGSC	4	5	0	0	0
CCC	2	2	16	0	0
EC	1	1	0	9	0
MC	0	0	0	3	7

class 0 precision: 0.80000 recall: 0.93333 f1: 0.86154

class 1 precision: 0.62500 recall: 0.55556 f1: 0.58824

class 2 precision: 1.00000 recall: 0.80000 f1: 0.88889

class 3 precision: 0.64286 recall: 0.81818 f1: 0.72000

class 4 precision: 1.00000 recall: 0.70000 f1: 0.82353

ResNet50 20Augs Cross-Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	1125	48	50	40	3
LGSC	53	31	4	2	2
CCC	51	10	127	3	7
EC	46	3	1	126	33
MC	10	0	5	36	48

class 0 precision: 0.87549 recall: 0.88863 f1: 0.88201

class 1 precision: 0.33696 recall: 0.33696 f1: 0.33696

class 2 precision: 0.67914 recall: 0.64141 f1: 0.65974

class 3 precision: 0.60870 recall: 0.60287 f1: 0.60577

class 4 precision: 0.51613 recall: 0.48485 f1: 0.50000

ResNet50 20Augs Hold-out Testing

	HGSC	LGSC	CCC	EC	MC
HGSC	19	0	0	1	0
LGSC	6	6	1	4	3
CCC	6	2	11	1	0
EC	4	0	0	16	0
MC	1	0	3	0	16

class 0 precision: 0.52778 recall: 0.95000 f1: 0.67857

class 1 precision: 0.75000 recall: 0.30000 f1: 0.42857

class 2 precision: 0.73333 recall: 0.55000 f1: 0.62857

class 3 precision: 0.72727 recall: 0.80000 f1: 0.76190

class 4 precision: 0.84211 recall: 0.80000 f1: 0.82051

ResNet50 20Augs External Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	26	1	1	2	0
LGSC	3	6	0	0	0
CCC	2	1	17	0	0
EC	2	0	0	9	0
MC	0	0	0	2	8

class 0 precision: 0.78788 recall: 0.86667 f1: 0.82540

class 1 precision: 0.75000 recall: 0.66667 f1: 0.70588

class 2 precision: 0.94444 recall: 0.85000 f1: 0.89474

class 3 precision: 0.69231 recall: 0.81818 f1: 0.75000

class 4 precision: 1.00000 recall: 0.80000 f1: 0.88889

ResNet18 Baseline Cross-Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	1179	37	32	12	6
LGSC	55	32	4	0	1
CCC	57	2	137	1	1
EC	56	7	10	98	38
MC	18	1	5	39	36

class 0 precision: 0.86374 recall: 0.93128 f1: 0.89624

class 1 precision: 0.40506 recall: 0.34783 f1: 0.37427

class 2 precision: 0.72872 recall: 0.69192 f1: 0.70984

class 3 precision: 0.65333 recall: 0.46890 f1: 0.54596

class 4 precision: 0.43902 recall: 0.36364 f1: 0.39779

ResNet18 Baseline Hold-out Testing

	HGSC	LGSC	CCC	EC	MC
HGSC	20	0	0	0	0
LGSC	6	8	0	2	4
CCC	9	3	8	0	0
EC	6	1	0	13	0
MC	3	0	2	0	15

class 0 precision: 0.45455 recall: 1.00000 f1: 0.62500

class 1 precision: 0.66667 recall: 0.40000 f1: 0.50000

class 2 precision: 0.80000 recall: 0.40000 f1: 0.53333

class 3 precision: 0.86667 recall: 0.65000 f1: 0.74286

class 4 precision: 0.78947 recall: 0.75000 f1: 0.76923

ResNet18 Baseline External Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	28	0	0	2	0
LGSC	5	4	0	0	0
CCC	2	1	17	0	0
EC	2	0	0	9	0
MC	0	0	0	1	9

class 0 precision: 0.75676 recall: 0.93333 f1: 0.83582

class 1 precision: 0.80000 recall: 0.44444 f1: 0.57143

class 2 precision: 1.00000 recall: 0.85000 f1: 0.91892

class 3 precision: 0.75000 recall: 0.81818 f1: 0.78261

class 4 precision: 1.00000 recall: 0.90000 f1: 0.94737

ResNet18 Histo Cross-Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	1165	26	29	43	3
LGSC	54	27	8	1	2
CCC	56	3	136	1	2
EC	56	5	1	121	26
MC	10	2	7	33	47

class 0 precision: 0.86875 recall: 0.92022 f1: 0.89375

class 1 precision: 0.42857 recall: 0.29348 f1: 0.34839

class 2 precision: 0.75138 recall: 0.68687 f1: 0.71768

class 3 precision: 0.60804 recall: 0.57895 f1: 0.59314

class 4 precision: 0.58750 recall: 0.47475 f1: 0.52514

ResNet18 Histo Hold-out Testing

	HGSC	LGSC	CCC	EC	MC
HGSC	19	0	0	1	0
LGSC	9	2	2	4	3
CCC	5	1	14	0	0
EC	6	1	0	13	0
MC	3	0	0	0	17

class 0 precision: 0.45238 recall: 0.95000 f1: 0.61290

class 1 precision: 0.50000 recall: 0.10000 f1: 0.16667

class 2 precision: 0.87500 recall: 0.70000 f1: 0.77778

class 3 precision: 0.72222 recall: 0.65000 f1: 0.68421

class 4 precision: 0.85000 recall: 0.85000 f1: 0.85000

ResNet18 Histo External Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	22	0	0	8	0
LGSC	5	4	0	0	0
CCC	4	2	14	0	0
EC	1	0	0	6	4
MC	0	0	0	1	9

class 0 precision: 0.68750 recall: 0.73333 f1: 0.70968

class 1 precision: 0.66667 recall: 0.44444 f1: 0.53333

class 2 precision: 1.00000 recall: 0.70000 f1: 0.82353

class 3 precision: 0.40000 recall: 0.54545 f1: 0.46154

class 4 precision: 0.69231 recall: 0.90000 f1: 0.78261

ViT-L Baseline Cross-validation

	HGSC	LGSC	CCC	EC	MC
HGSC	1149	53	27	29	8
LGSC	44	37	4	2	5
CCC	51	5	135	1	6
EC	44	5	0	120	40
MC	3	1	4	35	56

class 0 precision: 0.89001 recall: 0.90758 f1: 0.89871

class 1 precision: 0.36634 recall: 0.40217 f1: 0.38342

class 2 precision: 0.79412 recall: 0.68182 f1: 0.73370

class 3 precision: 0.64171 recall: 0.57416 f1: 0.60606

class 4 precision: 0.48696 recall: 0.56566 f1: 0.52336

ViT-L Baseline Hold-out Testing

	HGSC	LGSC	CCC	EC	MC
HGSC	10	0	0	0	0
LGSC	1	10	2	5	2
CCC	5	1	14	0	0
EC	2	0	0	15	3
MC	1	0	2	0	17

class 0 precision: 0.68966 recall: 1.00000 f1: 0.81633

class 1 precision: 0.90909 recall: 0.50000 f1: 0.64516

class 2 precision: 0.77778 recall: 0.70000 f1: 0.73684

class 3 precision: 0.75000 recall: 0.75000 f1: 0.75000

class 4 precision: 0.77273 recall: 0.85000 f1: 0.80952

ViT-L Baseline External Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	28	0	1	1	0
LGSC	4	3	1	1	0
CCC	1	0	19	0	0
EC	2	0	0	9	0
MC	0	0	0	0	10

class 0 precision: 0.80000 recall: 0.93333 f1: 0.86154

class 1 precision: 1.00000 recall: 0.33333 f1: 0.50000

class 2 precision: 0.90476 recall: 0.95000 f1: 0.92683

class 3 precision: 0.81818 recall: 0.81818 f1: 0.81818

class 4 precision: 1.00000 recall: 1.00000 f1: 1.00000

ViT-L Histo (UNI) Cross-validation

	HGSC	LGSC	CCC	EC	MC
HGSC	1165	46	28	25	2
LGSC	39	43	7	3	0
CCC	29	10	154	3	2
EC	21	4	2	173	9
MC	1	0	4	28	66

class 0 precision: 0.92829 recall: 0.92022 f1: 0.92424

class 1 precision: 0.41748 recall: 0.46739 f1: 0.44103

class 2 precision: 0.78974 recall: 0.77778 f1: 0.78372

class 3 precision: 0.74569 recall: 0.82775 f1: 0.78458

class 4 precision: 0.83544 recall: 0.66667 f1: 0.74157

ViT-L Histo (UNI) Hold-out Testing

	HGSC	LGSC	CCC	EC	MC
HGSC	18	0	0	2	0
LGSC	0	14	2	2	2
CCC	3	0	17	0	0
EC	1	0	0	19	0
MC	0	0	0	0	20

class 0 precision: 0.81818 recall: 0.90000 f1: 0.85714

class 1 precision: 1.00000 recall: 0.70000 f1: 0.82353

class 2 precision: 0.89474 recall: 0.85000 f1: 0.87179

class 3 precision: 0.82609 recall: 0.95000 f1: 0.88372

class 4 precision: 0.90909 recall: 1.00000 f1: 0.95238

ViT-L Histo (UNI) External Validation

	HGSC	LGSC	CCC	EC	MC
HGSC	27	0	1	2	0
LGSC	0	9	0	0	0
CCC	0	1	19	0	0
EC	0	0	0	10	1
MC	0	0	0	1	9

class 0 precision: 1.00000 recall: 0.90000 f1: 0.94737

class 1 precision: 0.90000 recall: 1.00000 f1: 0.94737

class 2 precision: 0.95000 recall: 0.95000 f1: 0.95000

class 3 precision: 0.76923 recall: 0.90909 f1: 0.83333

class 4 precision: 0.90000 recall: 0.90000 f1: 0.90000

Code Examples

The following code includes examples from every stage of pre-processing, hyperparameter tuning, and model validation.

Tissue segmentation and tissue patch extraction

1024x1024 pixel patches at 40x native magnification for internal data and 512x512 at 20x native magnification for external data, so that after downsampling to apparent 10x magnification, patches will be 256x256.

## Internal data with CLAM default segmentation paramters 
python create_patches_fp.py --source "/mnt/data/Katie_WSI/edrive" --save_dir "/mnt/results/patches/ovarian_leeds_mag40x_patch1024_DGX_fp" --patch_size 1024 --step_size 1024 --seg --patch --stitch
## Internal data with Otsu thresholding segmentation and manually adjusted parameters
python create_patches_fp.py --source "/mnt/data/Katie_WSI/edrive" --save_dir "/mnt/results/patches/ovarian_leeds_mag40x_patch1024_DGX_fp_otsu" --patch_size 1024 --step_size 1024 --seg --patch --stitch --max_holes 100 --closing 20 --use_otsu --sthresh 0 --max_holes 20 --mthresh 15	
## External data with CLAM default segmentation parameters
python create_patches_fp.py --source "/mnt/data/transcanadian_WSI" --save_dir "/mnt/results/patches/transcanadian_mag20x_patch512_DGX_fp" --patch_size 512 --step_size 512 --seg --patch --stitch
## External data with Otsu thresholding segmentation and manually adjusted parameters
python create_patches_fp.py --source "/mnt/data/transcanadian_WSI" --save_dir "/mnt/results/patches/transcanadian_mag20x_patch512_DGX_fp_otsu" --patch_size 512 --step_size 512 --seg --patch --stitch --max_holes 100 --closing 20 --use_otsu --sthresh 0 --max_holes 20 --mthresh 15

Patch feature extraction

Feature extraction using 256x256 pixel patches at 10x apparent magnification, with various preprocessing and pretraining techniques, and model archiectures. Code here is for internal data, with the only notable difference in external data being a custom_downsample of 2 rather than 4 given the native 20x magnification rather than the internal 40x. All feature extraction models are ImageNet-pretrained unless explicitly listed as "histo-pretrained".

## Baseline ResNet50
python extract_features_fp.py --hardware DGX --custom_downsample 4 --model_type 'resnet50' --data_h5_dir "/mnt/results/patches/ovarian_leeds_mag40x_patch1024_DGX_fp" --data_slide_dir "/mnt/data/Katie_WSI/edrive" --csv_path "dataset_csv/set_edrivepatches_ESGO_train_staging.csv" --feat_dir "/mnt/results/features/ovarian_leeds_resnet50_10x_features_DGX" --batch_size 32 --slide_ext .svs
## Baseline ResNet50 with Otsu thresholding in patch extraction
python extract_features_fp.py --hardware DGX --custom_downsample 4 --model_type 'resnet50' --data_h5_dir "/mnt/results/patches/ovarian_leeds_mag40x_patch1024_DGX_fp_otsu" --data_slide_dir "/mnt/data/Katie_WSI/edrive" --csv_path "dataset_csv/set_edrivepatches.csv" --feat_dir "/mnt/results/features/ovarian_leeds_resnet50_10x_features_DGX_otsu" --batch_size 32 --slide_ext .svs 
## Reinhard normalised ResNet50
python extract_features_fp.py --hardware DGX --custom_downsample 4 --model_type 'resnet50' --data_h5_dir "/mnt/results/patches/ovarian_leeds_mag40x_patch1024_DGX_fp" --data_slide_dir "/mnt/data/Katie_WSI/edrive" --csv_path "dataset_csv/set_edrivepatches.csv" --feat_dir "/mnt/results/features/ovarian_leeds_resnet50_10x_features_DGX_reinhard" --batch_size 32 --slide_ext .svs --use_transforms reinhard
## Macenko normalised ResNet50
python extract_features_fp.py --hardware DGX --custom_downsample 4 --model_type 'resnet50' --data_h5_dir "/mnt/results/patches/ovarian_leeds_mag40x_patch1024_DGX_fp" --data_slide_dir "/mnt/data/Katie_WSI/edrive" --csv_path "dataset_csv/set_edrivepatches.csv" --feat_dir "/mnt/results/features/ovarian_leeds_resnet50_10x_features_DGX_macenko" --batch_size 32 --slide_ext .svs --use_transforms macenko
## Macenko normalised ResNet50 with Otsu thresholding
python extract_features_fp.py --hardware DGX --custom_downsample 4 --model_type 'resnet50' --data_h5_dir "/mnt/results/patches/ovarian_leeds_mag40x_patch1024_DGX_fp_otsu" --data_slide_dir "/mnt/data/Katie_WSI/edrive" --csv_path "dataset_csv/set_edrivepatches.csv" --feat_dir "/mnt/results/features/ovarian_leeds_resnet50_10x_features_DGX_otsu_macenko" --batch_size 32 --slide_ext .svs --use_transforms macenko
## Colour augmented ResNet50 (Repeated 20 times)
python extract_features_fp.py --hardware DGX --custom_downsample 4 --model_type 'resnet50' --data_h5_dir "/mnt/results/patches/ovarian_leeds_mag40x_patch1024_DGX_fp" --data_slide_dir "/mnt/data/Katie_WSI/edrive" --csv_path "dataset_csv/set_edrivepatches.csv" --feat_dir "/mnt/results/features/ovarian_leeds_resnet50_10x_features_DGX_colourjitternorm_1" --batch_size 32 --slide_ext .svs --use_transforms colourjitternorm
## Baseline ResNet18
python extract_features_fp.py --hardware DGX --custom_downsample 4 --model_type 'resnet18' --data_h5_dir "/mnt/results/patches/ovarian_leeds_mag40x_patch1024_DGX_fp" --data_slide_dir "/mnt/data/Katie_WSI/edrive" --csv_path "dataset_csv/StagingAndIDSTrain_edrive.csv" --feat_dir "/mnt/results/features/ovarian_leeds_resnet18_10x_features_DGX" --batch_size 32 --slide_ext .svs
## Histo-pretrained ResNet18
python extract_features_fp.py --hardware DGX --custom_downsample 4 --model_type 'resnet18' --pretraining_dataset Histo --data_h5_dir "/mnt/results/patches/ovarian_leeds_mag40x_patch1024_DGX_fp" --data_slide_dir "/mnt/data/Katie_WSI/edrive" --csv_path "dataset_csv/StagingAndIDSTrain_edrive.csv" --feat_dir "/mnt/results/features/ovarian_leeds_resnet18_10x_features_DGX_histotrained_fixed224" --batch_size 32 --slide_ext .svs --use_transforms histo_resnet18_224
## ViT-L Baseline
python extract_features_fp.py --hardware DGX --custom_downsample 4 --model_type 'vit_l' --use_transforms uni_default --data_h5_dir "/mnt/results/patches/ovarian_leeds_mag40x_patch1024_DGX_fp" --data_slide_dir "/mnt/data/Katie_WSI/edrive" --csv_path "dataset_csv/StagingAndIDSTrain_edrive.csv" --feat_dir "/mnt/results/features/ovarian_leeds_vitl_10x_features_DGX" --batch_size 32 --slide_ext .svs
## Histo-pretrained ViT-L (UNI)
python extract_features_fp.py --hardware DGX --custom_downsample 4 --model_type 'uni' --use_transforms uni_default --data_h5_dir "/mnt/results/patches/ovarian_leeds_mag40x_patch1024_DGX_fp" --data_slide_dir "/mnt/data/Katie_WSI/edrive" --csv_path "dataset_csv/StagingAndIDSTrain_edrive.csv" --feat_dir "/mnt/results/features/ovarian_leeds_uni_10x_features_DGX" --batch_size 32 --slide_ext .svs

Hyperparameter Tuning

Hyperparameter configurations can be found in the folder "tuning_configs". Separate main.py calls were used for each cross-validation fold to allow for parallelisation.

## ResNet50 Baseline Tuning Iteration 1, Fold 0
python main.py --tuning --hardware DGX --tuning_output_file /mnt/results/tuning_results/staging_only_resnet50_20x_tuning1_bce_fold0.csv --min_epochs 0 --max_epochs 100 --early_stopping --num_tuning_experiments 1 --split_dir "staging_and_ids_100" --k_start 0 --k_end 1 --results_dir /mnt/results --exp_code stagingandids_resnet50_20x_tuning1_100epochs_bce_NORMAL_fold0 --subtyping --weighted_sample --bag_loss balanced_ce --no_inst_cluster --task ovarian_5class  --model_type clam_sb --subtyping --csv_path 'dataset_csv/ESGO_train_all.csv' --data_root_dir "/mnt/results/features" --features_folder "ovarian_leeds_resnet50_20x_features_DGX" --tuning_config_file tuning_configs/esgo_stagingandids_resnet50_20x_NORMAL_config1.txt
## Combining results across five cross-validation folds
python combine_results.py --file_base_name "/mnt/results/tuning_results/staging_only_resnet50_20x_tuning1_bce"

## ResNet50 Baseline Tuning Iteration 19 (final iteration), Fold 4
python main.py --tuning --hardware DGX --tuning_output_file /mnt/results/tuning_results/stagingandids_resnet50_10x_tuning19_300epochs_30patience_bce_fold4.csv --min_epochs 0 --max_epochs 300 --early_stopping --num_tuning_experiments 1 --tuning_patience 30 --split_dir "staging_and_ids_100" --k_start 4 --k_end 5 --results_dir /mnt/results --exp_code stagingandids_resnet50_10x_tuning19_300epochs_30patience_bce_NORMAL_fold4 --subtyping --weighted_sample --bag_loss balanced_ce --no_inst_cluster --task ovarian_5class  --model_type clam_sb --subtyping --csv_path 'dataset_csv/ESGO_train_all.csv' --data_root_dir "/mnt/results/features" --features_folder "ovarian_leeds_resnet50_10x_features_DGX" --tuning_config_file tuning_configs/esgo_stagingandids_resnet50_10x_normal_config19.txt
## Combining results across five cross-validation folds
python combine_results.py --file_base_name "/mnt/results/tuning_results/stagingandids_resnet50_10x_tuning19_300epochs_30patience_bce"

Model Training

Training each model with the best hyperparameters from tuning.

## Baseline ResNet50
python main.py --hardware DGX --min_epochs 0 --max_epochs 300 --early_stopping --split_dir "staging_and_ids_100" --k 5 --results_dir /mnt/results --exp_code stagingandids_resnet50_10x_bestfrom19tuning_bce_normal --subtyping --weighted_sample --bag_loss balanced_ce --no_inst_cluster --task ovarian_5class  --model_type clam_sb --subtyping --csv_path 'dataset_csv/ESGO_train_all.csv' --data_root_dir "/mnt/results/features" --features_folder "ovarian_leeds_resnet50_10x_features_DGX" --reg 1e-3 --drop_out 0.4 --lr 2e-3 --max_patches_per_slide 800 --model_size smaller --beta1 0.75 --beta2 0.95 --eps 1e-2 --lr_factor 0.75 --lr_patience 20

Model Evaluation

Classifying each slide, then generating the final results using the mean and 95% CI from 10,000 iterations of bootstrapping.

## Five-fold cross-validation (baseline ResNet50)
python eval.py --drop_out 0.4 --model_size smaller --models_exp_code stagingandids_resnet50_10x_bestfrom19tuning_bce_normal_s1 --save_exp_code stagingandids_resnet50_10x_bestfrom19tuning_bce_normal_bootstrapping --task ovarian_5class --model_type clam_sb --results_dir /mnt/results --data_root_dir "/mnt/results/features" --k 5 --features_folder "ovarian_leeds_resnet50_10x_features_DGX" --csv_path 'dataset_csv/ESGO_train_all.csv' 
python bootstrapping.py --num_classes 5 --model_names stagingandids_resnet50_10x_bestfrom19tuning_bce_normal_bootstrapping --bootstraps 10000 --run_repeats 1 --folds 5

## Ensembled hold-out test set (baseline ResNet50)
python eval.py --split_dir splits/esgo_test_splits --drop_out 0.4 --model_size smaller --models_exp_code stagingandids_resnet50_10x_bestfrom19tuning_bce_normal_s1 --save_exp_code stagingandids_resnet50_10x_bestfrom19tuning_bce_normal_holdouttest_s1 --task ovarian_5class --model_type clam_sb --results_dir /mnt/results --data_root_dir "/mnt/results/features" --k 5 --features_folder "ovarian_leeds_resnet50_10x_features_DGX" --csv_path 'dataset_csv/ESGO_test_set.csv'
python bootstrapping.py --ensemble --num_classes 5 --model_names stagingandids_resnet50_10x_bestfrom19tuning_bce_normal_holdouttest_s1 --bootstraps 10000 --run_repeats 1 --folds 5

## Ensembled external validation set (baseline ResNet50)
python eval.py --split_dir splits/external_splits --drop_out 0.4 --model_size smaller --models_exp_code stagingandids_resnet50_10x_bestfrom19tuning_bce_normal_s1 --save_exp_code stagingandids_resnet50_10x_bestfrom19tuning_bce_normal_externaltest_s1 --task ovarian_5class --model_type clam_sb --results_dir /mnt/results --data_root_dir "/mnt/results/features" --k 5 --features_folder "transcanadian_resnet50_10x_features_DGX" --csv_path 'dataset_csv/ExternalData.csv'
python bootstrapping.py --ensemble --num_classes 5 --model_names stagingandids_resnet50_10x_bestfrom19tuning_bce_normal_externaltest_s1 --bootstraps 10000 --run_repeats 1 --folds 5

Reference

This code is an extension of our previous repository, which was originally based on the CLAM repository with corresponding paper. This repository and the original CLAM repository are both available for non-commercial academic purposes under the GPLv3 License.

Name		Name	Last commit message	Last commit date
Latest commit History 1,455 Commits
dataset_csv		dataset_csv
datasets		datasets
docs		docs
heatmaps		heatmaps
models		models
presets		presets
splits		splits
tuning_configs		tuning_configs
utils		utils
vis_utils		vis_utils
wsi_core		wsi_core
.gitignore		.gitignore
LICENSE.md		LICENSE.md
bootstrapping.py		bootstrapping.py
combine_results.py		combine_results.py
count_patches.py		count_patches.py
create_heatmaps.py		create_heatmaps.py
create_patches_fp.py		create_patches_fp.py
create_splits_seq.py		create_splits_seq.py
download_uni_model.py		download_uni_model.py
eval.py		eval.py
extract_features_fp.py		extract_features_fp.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Histopathology Foundation Models Enable Accurate Ovarian Cancer Subtype Classification

Hyperparameters

Results

Code Examples

Reference

About

Languages

License

scjjb/Ovarian_Features

Folders and files

Latest commit

History

Repository files navigation

Histopathology Foundation Models Enable Accurate Ovarian Cancer Subtype Classification

Hyperparameters

Results

Code Examples

Reference

About

Resources

License

Stars

Watchers

Forks

Languages