# Histopathologic Cancer Detection: Check.
In this competition, you must create an algorithm to identify metastatic cancer in small image patches taken from larger digital pathology scans. The data for this competition is a slightly modified version of the PatchCamelyon (PCam) benchmark dataset (the original PCam dataset contains duplicate images due to its probabilistic sampling, however, the version presented on Kaggle does not contain duplicates).

PCam is highly interesting for both its size, simplicity to get started on, and approachability. In the authors' words:

[PCam] packs the clinically-relevant task of metastasis detection into a straight-forward binary image classification task, akin to CIFAR-10 and MNIST. Models can easily be trained on a single GPU in a couple hours, and achieve competitive scores in the Camelyon16 tasks of tumor detection and whole-slide image diagnosis. Furthermore, the balance between task-difficulty and tractability makes it a prime suspect for fundamental machine learning research on topics as active learning, model uncertainty, and explainability. 

Challenges
* Normalize staining (https://towardsdatascience.com/image-augmentation-for-deep-learning-using-keras-and-histogram-equalization-9329f6ae5085)
* 

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from PIL import Image
import os
from time import time
from tqdm import tqdm_notebook
import matplotlib

from keras.layers import Input, Dense, Flatten, Conv2D
from keras.models import Model
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import TensorBoard

from skimage import exposure
from generators.augment import augmentor, show_augmentations

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [2]:
BATCH_SIZE = 32
ONLY_USE_SUBSET = True # Set to true when running locally
DIMENSIONS = (96, 96)

## Importing dataset
In this dataset, you are provided with a large number of small pathology images to classify. Files are named with an image id. The train_labels.csv file provides the ground truth for the images in the train folder. You are predicting the labels for the images in the test folder. A positive label indicates that the center 32x32px region of a patch contains at least one pixel of tumor tissue. Tumor tissue in the outer region of the patch does not influence the label. This outer region is provided to enable fully-convolutional models that do not use zero-padding, to ensure consistent behavior when applied to a whole-slide image.

The original PCam dataset contains duplicate images due to its probabilistic sampling, however, the version presented on Kaggle does not contain duplicates. We have otherwise maintained the same data and splits as the PCam benchmark.

In [None]:
(x_train, y_train, meta_train), (x_valid, y_valid, meta_valid), (x_test, y_test, meta_test) = load_data()

In [None]:
# bit for the image augmentation

# -*- coding: utf-8 -*-
"""
Created on Mon May  6 17:14:07 2019

@author: Stephan
"""

import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from imgaug import augmenters as iaa


im = Image.open(data_dir)
plt.imshow(im)

img = np.array(im)

# Basic image augmentations

''' 
To think about:
    1. In what way does it extend one augmentation extend another?
    2. Do they (randomly) apply augmentation on top of augmentation? or something else?
'''


# BASIC DONE
'''
basic: 
    Operations
    ---
        1. 90 degree rotations
        2. horizontal/vertical image mirroring
'''

basic = iaa.Sequential( [iaa.Affine(rotate=90) ,iaa.Fliplr(1), iaa.Flipud(1)])

basic_img = basic.augment_image(img)



'''
Morphology
    Operations
    ---
        1. scaling
        2. elastic deformation
        3. additive Gaussian noise (perturbing the signal-to-noise ratio), 
        4. Gaussian blurring (simulating out-of-focus artifacts).
'''

morphology = iaa.Sequential([iaa.GaussianBlur(sigma=0.35), 
                             iaa.AdditiveGaussianNoise(scale=0.05*255),
                             iaa.ElasticTransformation(alpha=0.35, sigma=0.5),
                             iaa.Affine(scale=(1, 1.5))])

morph_img = morphology.augment_image(img)
plt.imshow(img)
plt.imshow(morph_img )

'''
Brightness & Contract (BC)
    Operations
    ---
        1. Random brightness image perturbations
        2. Random contrast image perturbations
        3. Haeberli and Voorhies (1994)

'''

bc = iaa.Sequential([iaa.ContrastNormalization((0.75, 1.25)), 
                     iaa.Multiply((0.5, 1.5))])
bc_img = bc.augment_image(img)

plt.imshow(img)
plt.imshow(bc_img)

'''
Hue Saturation Value (HSV)
    Operations
    ---
        1. Randomly shifting hue and saturations channels in the HSV color space
        2. Color variation strength: light and strong
        3. Van der Walt et al. 2014
        

'''

#Values are kind of extreme

hsv = iaa.WithColorspace(
    to_colorspace="HSV",
    from_colorspace="RGB",
    children=iaa.Sequential([iaa.WithChannels((1), iaa.Add((-100, 150))),
                         iaa.WithChannels((0), iaa.Add((-100, 150)))])
)


hsv_image = hsv.augment_image(img)

plt.imshow(hsv_image)
plt.imshow(img)