### 1.1 Preprocessing - Reinhard Normalization and WSI Tiling

As a first preprocessing step, all slides were color normalized with respect to a reference image selected by an expert neuropathologist. Color normalization was performed using the method described by [Reinhard et. al](https://ieeexplore.ieee.org/document/946629).

The resulting color normalized whole slide images were tiled using PyVips to generate 1536 x 1536 images patches.

In [1]:
import os
import glob
import numpy as np
import cv2
import matplotlib.pyplot as plt
import pyvips as Vips
from tqdm import tqdm

from utils import vips_utils, normalize

In [2]:
TRAIN_WSI_DIR = 'data/Dataset 1a Development_train/'              # WSIs in the training set
VAL_WSI_DIR = 'data/Dataset 1b Development_validation/'           # WSIs in the validation set

SAVE_DIR = 'data/norm_tiles/'

In [3]:
if not os.path.exists(SAVE_DIR):
        os.makedirs(SAVE_DIR)

In [4]:
ref_imagename = 'NA5002_2AB.svs'
#ref_imagename = 'NA3777-02_AB.svs'

In [5]:
wsi_train = os.listdir(TRAIN_WSI_DIR)
wsi_val = os.listdir(VAL_WSI_DIR)

imagenames = sorted(wsi_val + wsi_train)
imagenames.remove('NA5005-02_AB.svs')             # this WSI was digitalized at 40x, need resize down to 20x
imagenames.append('NA5005-02_AB.svs')

In [6]:
%%time
# Load reference image, fit Reinhard normalizer
ref_image = Vips.Image.new_from_file(TRAIN_WSI_DIR + ref_imagename, level=0)

normalizer = normalize.Reinhard()
normalizer.fit(ref_image)

CPU times: user 34min 43s, sys: 1min 14s, total: 35min 57s
Wall time: 11min 8s


In [7]:
stats_dict = {}
for imagename in tqdm(imagenames[:-1]):
    try:
        vips_img = Vips.Image.new_from_file(TRAIN_WSI_DIR + imagename, level=0)
        print("Loaded Image: " + TRAIN_WSI_DIR + imagename)
        #vips_img = Vips.Image.new_from_file(TEST_WSI_DIR + imagename, level=0)
    except:
        #vips_img = Vips.Image.new_from_file(VAL_WSI_DIR + imagename, level=0)
        vips_img = Vips.Image.new_from_file(TEST_WSI_DIR + imagename, level=0)
        print("Loaded Image: " + TEST_WSI_DIR + imagename)
    out = normalizer.transform(vips_img)
    out.filename = vips_img.filename
    vips_utils.save_and_tile(out, SAVE_DIR)
    stats_dict[imagename] = normalizer.image_stats

  0%|          | 0/32 [00:00<?, ?it/s]

Loaded Image: data/Dataset 1a Development_train/NA4009-02_AB.svs


  3%|▎         | 1/32 [17:28<9:01:56, 1048.91s/it]

Loaded Image: data/Dataset 1a Development_train/NA4072-02_AB.svs


  6%|▋         | 2/32 [42:57<9:56:21, 1192.72s/it]

Loaded Image: data/Dataset 1a Development_train/NA4137-02_AB.svs


  9%|▉         | 3/32 [1:00:16<9:14:16, 1146.77s/it]

Loaded Image: data/Dataset 1a Development_train/NA4144-02_AB.svs


 12%|█▎        | 4/32 [1:15:05<8:19:00, 1069.30s/it]

Loaded Image: data/Dataset 1a Development_train/NA4185-02_AB.svs


 16%|█▌        | 5/32 [1:29:19<7:32:10, 1004.82s/it]

Loaded Image: data/Dataset 1a Development_train/NA4229-02_AB.svs


 19%|█▉        | 6/32 [1:36:34<6:01:18, 833.80s/it] 

Loaded Image: data/Dataset 1a Development_train/NA4259-02_AB.svs


 22%|██▏       | 7/32 [1:51:09<5:52:32, 846.09s/it]

Loaded Image: data/Dataset 1a Development_train/NA4312-02_AB.svs


 25%|██▌       | 8/32 [2:05:25<5:39:39, 849.15s/it]

Loaded Image: data/Dataset 1a Development_train/NA4471-02_AB.svs


 28%|██▊       | 9/32 [2:22:31<5:45:50, 902.18s/it]

Loaded Image: data/Dataset 1a Development_train/NA4619-02_AB.svs


 31%|███▏      | 10/32 [2:45:09<6:21:00, 1039.09s/it]

Loaded Image: data/Dataset 1a Development_train/NA4711-02_AB.svs


 34%|███▍      | 11/32 [2:57:37<5:33:04, 951.63s/it] 

Loaded Image: data/Dataset 1a Development_train/NA4722-02_AB.svs


 38%|███▊      | 12/32 [3:18:04<5:44:47, 1034.38s/it]

Loaded Image: data/Dataset 1a Development_train/NA4749-02_AB.svs


 41%|████      | 13/32 [3:34:32<5:23:03, 1020.20s/it]

Loaded Image: data/Dataset 1a Development_train/NA4751-02_AB.svs


 44%|████▍     | 14/32 [3:51:49<5:07:36, 1025.36s/it]

Loaded Image: data/Dataset 1a Development_train/NA4757-02_AB.svs


 47%|████▋     | 15/32 [4:09:45<4:54:51, 1040.70s/it]

Loaded Image: data/Dataset 1a Development_train/NA4885-02_AB17-24.svs


 50%|█████     | 16/32 [4:24:12<4:23:37, 988.59s/it] 

Loaded Image: data/Dataset 1a Development_train/NA4898-02_AB17-24.svs


 53%|█████▎    | 17/32 [4:40:13<4:05:00, 980.06s/it]

Loaded Image: data/Dataset 1a Development_train/NA4918-02_AB17-24.svs


 56%|█████▋    | 18/32 [4:51:14<3:26:22, 884.45s/it]

Loaded Image: data/Dataset 1a Development_train/NA4951-02_AB17-24.svs


 59%|█████▉    | 19/32 [5:02:32<2:58:14, 822.65s/it]

Loaded Image: data/Dataset 1a Development_train/NA5001_2AB.svs


 62%|██████▎   | 20/32 [5:10:56<2:25:21, 726.79s/it]

Loaded Image: data/Dataset 1a Development_train/NA5002_2AB.svs


 66%|██████▌   | 21/32 [5:24:25<2:17:48, 751.64s/it]

Loaded Image: data/Dataset 1a Development_train/NA5003_2AB.svs


 69%|██████▉   | 22/32 [5:36:22<2:03:32, 741.23s/it]

Loaded Image: data/Dataset 1a Development_train/NA5004_02_AB.svs


 72%|███████▏  | 23/32 [5:51:35<1:58:55, 792.82s/it]

Loaded Image: data/Dataset 1a Development_train/NA_4865_02_AB1-40.svs


 75%|███████▌  | 24/32 [6:06:23<1:49:30, 821.27s/it]

Loaded Image: data/Dataset 1a Development_train/NA_4871_02_AB.svs


 78%|███████▊  | 25/32 [6:17:04<1:29:30, 767.25s/it]

Loaded Image: data/Dataset 1a Development_train/NA_4882_02_AB.svs


 81%|████████▏ | 26/32 [6:37:26<1:30:22, 903.71s/it]

Loaded Image: data/Dataset 1a Development_train/NA_4883_02_AB.svs


 84%|████████▍ | 27/32 [6:55:23<1:19:37, 955.59s/it]

Loaded Image: data/Dataset 1a Development_train/NA_4888_02_AB17-24.svs


 88%|████████▊ | 28/32 [7:10:55<1:03:14, 948.51s/it]

NameError: name 'TEST_WSI_DIR' is not defined

In [8]:
#  Resize the single 40x image down to 20x
for imagename in tqdm(imagenames[-1:]):
    vips_img = Vips.Image.new_from_file(TRAIN_WSI_DIR + imagename, level=0)
    vips_img = vips_img.resize(0.5)
    out = normalizer.transform(vips_img)
    out.filename = vips_img.filename
    vips_utils.save_and_tile(out, SAVE_DIR)
    stats_dict[imagename] = normalizer.image_stats


  0%|          | 0/1 [00:00<?, ?it/s][A
100%|██████████| 1/1 [49:03<00:00, 2943.39s/it][A

In [9]:
import pandas as pd
stats = pd.DataFrame(stats_dict)

In [10]:
stats = stats.transpose()

In [11]:
stats.columns = 'means', 'stds'

In [12]:
print(stats)

                                                                    means  \
NA4009-02_AB.svs        (88.79989640718921, 0.7796811779427336, 1.5307...   
NA4072-02_AB.svs        (85.45899856425426, 1.0290791363135563, -1.003...   
NA4137-02_AB.svs        (89.37619838928055, 1.1594221235255688, -1.090...   
NA4144-02_AB.svs        (87.65374174716155, 1.554181156613537, -0.2189...   
NA4185-02_AB.svs        (90.25360167112896, 0.7139148105981892, 0.4994...   
NA4229-02_AB.svs        (89.39547476784757, 0.4691240658320603, 1.1783...   
NA4259-02_AB.svs        (88.96922948784508, 0.6069457818993456, -0.060...   
NA4312-02_AB.svs        (91.08530689404986, 0.40543170409731605, 1.256...   
NA4471-02_AB.svs        (91.11351936054815, 0.6167910832749283, 1.1381...   
NA4619-02_AB.svs        (87.36757718530629, 0.972118135758152, -0.5145...   
NA4711-02_AB.svs        (89.64046849620684, 0.64518834848795, -0.08328...   
NA4722-02_AB.svs        (85.88080947016267, 1.0089842743077018, -1.180...   