# Capstone: Dataset Exploration

##### Project Statement / DS Question:
* Can a Convolutional Neural Network (ConvNet) be used to identify malignant cancer from histopathology slides? How can the ConvNet be interpreted?

1. Labeled Dataset 1:
>Classification of breast cancer histology images using Convolutional Neural Networks

    >http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0177544s

    - resolution: 2040x1536px
    - from _BioImaging 2015 breast histology classification challenge_
    - four balanced classes: 
                normal, 
                benign lesion, 
                _in situ_ carcinoma, 
                invasive carcinoma 
    - dataset composition: 249 image training set and 20 image test set
 
2. Labeled Dataset 2: 
> Spanhol, F., Oliveira, L. S., Petitjean, C., Heutte, L., A Dataset for Breast Cancer Histopathological Image Classification, IEEE Transactions on Biomedical Engineering (TBME), 63(7):1455-1462, 2016. [pdf]


### Workflow for #1 

    - CNN and CNN+SVM used to calculate patch-wise class probability.

Image-wise classification:

    i. majority vote - most common patch label is chosen
    ii. maximum probability - patch with higher class prob
    iii. sum of probabilities - class probabilities summed  and largest chosen

1. Try the Augmented Patch Dataset
   >An augmented patch dataset is created from the normalized images in the training set. The
used dataset has a low number of samples when compared to other CNN classification problems
[18]. The network might thus be prone to overfit. Dividing images into patches allows to
increase the dataset complexity and dimension. Data augmentation through patch rotation
and mirroring further improves the dataset. This is possible because the studied problem is
rotation invariant, i.e., physicians can study breast cancer histological images from different
orientations without altering the diagnosis. Consequently, rotations and mirroring allow to
increase the size of the dataset without deteriorating its quality. Patching and dataset augmentation
have already been used successfully on similar histological classification problems [19].
However, they have not been used for carcinoma classification.
First, the image is divided in patches of 512 × 512 pixels size, with 50% overlap. Some example
patches are shown in Fig 1. Patch normalization is performed by subtracting the average
value to the red, green and blue channels separately. Each patch is then transformed into eight
different patches by combining k  π/2 rotations, with k = {0, 1, 2, 3}, and vertical reflections.
This results in a total of 70000 different patches from the original 250 training images. Each of
the patches is considered to have the same class label as the original image.
    >> PLOSone paper

Keras includes a class to handle data augmentation: `ImageDataGenerator` 
>_Deep Learning with Python_, Chapter 5

In [2]:
datagen = ImageDataGenerator(
   rotation_range=90,         # value in degrees (0-180) to rotate pictures
   validation_split = .20,    # reserve 20% of training set as validation
   width_shift_range=0.2,     # fractions by which to translate the image
   height_shift_range=0.2,    # fractions by which to translate the image  
   shear_range=0.2,           # shearing transformations
   zoom_range=0.2,            # zooming inside pictures
   horizontal_flip=True,      # flip half the image horizaontally 
   fill_mode='nearest')       # strategy for filling in newly created pixels 


"\ndatagen = ImageDataGenerator(\n   rotation_range=40,         # value in degrees (0-180) to rotate pictures\n   width_shift_range=0.2,     # fractions by which to translate the image\n   height_shift_range=0.2,    # fractions by which to translate the image  \n   shear_range=0.2,           # shearing transformations\n   zoom_range=0.2,            # zooming inside pictures\n   horizontal_flip=True,      # flip half the image horizaontally when there is  no assunmption of horizontal asymmetry\n   fill_mode='nearest')       # strategy for filling in newly created pixels resulting from transformation\n"

#### Convolutional Neural Net Architecture

### U-Net
![U-Net](https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/u-net-architecture.png)

### PLOSone Paper
![PLOSone](http://journals.plos.org/plosone/article/figure/image?size=large&id=10.1371/journal.pone.0177544.g003)