### 1.1 Preprocessing - Reinhard Normalization and WSI Tiling

As a first preprocessing step, all slides were color normalized with respect to a reference image selected by an expert neuropathologist. Color normalization was performed using the method described by [Reinhard et. al](https://ieeexplore.ieee.org/document/946629).

The resulting color normalized whole slide images were tiled using PyVips to generate 1536 x 1536 images patches.

In [1]:
import os
import numpy as np
import cv2
import matplotlib.pyplot as plt
import pyvips as Vips
from tqdm import tqdm

import vips_utils, normalize

In [2]:
RAW_DIR = '/home/ziqi/Desktop/data/plaques_WSIs/train&validation/'
SAVE_DIR = '/home/ziqi/Desktop/data/norm_tiles/'

In [3]:
ref_imagename = 'NA5002_2AB.svs'

In [4]:
imagenames = sorted(os.listdir(RAW_DIR))

In [6]:
imagenames.remove('NA5005-02_AB.svs')
imagenames.append('NA5005-02_AB.svs')

In [8]:
%%time
# Load reference image, fit Reinhard normalizer
ref_image = Vips.Image.new_from_file(RAW_DIR + ref_imagename, level=0)

normalizer = normalize.Reinhard()
normalizer.fit(ref_image)

CPU times: user 43min 14s, sys: 2min 28s, total: 45min 42s
Wall time: 3min 58s


In [9]:
stats_dict = {}
for imagename in tqdm(imagenames[:-1]):
    vips_img = Vips.Image.new_from_file(RAW_DIR + imagename, level=0)
    out = normalizer.transform(vips_img)
    out.filename = vips_img.filename
    vips_utils.save_and_tile(out, SAVE_DIR)
    stats_dict[imagename] = normalizer.image_stats

100%|██████████| 32/32 [2:57:12<00:00, 332.26s/it]  


In [10]:
# Resize the single 40x image down to 20x
for imagename in tqdm(imagenames[-1:]):
    vips_img = Vips.Image.new_from_file(RAW_DIR + imagename, level=0)
    vips_img = vips_img.resize(0.5)
    out = normalizer.transform(vips_img)
    out.filename = vips_img.filename
    vips_utils.save_and_tile(out, SAVE_DIR)
    stats_dict[imagename] = normalizer.image_stats

100%|██████████| 1/1 [29:59<00:00, 1799.83s/it]


In [9]:
# Takes 42 minutes to apply and tile a 40x image.

In [10]:
import pandas as pd
stats = pd.DataFrame(stats_dict)

In [11]:
stats = stats.transpose()

In [12]:
stats.columns = 'means', 'stds'

In [13]:
stats.to_csv('/home/kangway/data/cnn_path/raw_images/' + "WSI_LAB_mean_std.csv")