Code Lab 2: PatchCamelyon (PCAM) Histopathological Cancer Detection 
===

## The Problem
PCam an image classification dataset based off of one of the Camelyon16 tasks where 96x96, 3-color channel image patches are extracted from whole-slide images (WSI).  These slides were extracted from histopathologic scans of lymph node sections in order to determine if the lymph nodes contain metastases (cancer cells).  From the Camelyon16 web page:
>  This task has a high clinical relevance but requires large amounts of reading time from pathologists. Therefore, a successful solution would hold great promise to reduce the workload of the pathologists while at the same time reduce the subjectivity in diagnosis. [1](https://camelyon16.grand-challenge.org/)

![pcam-cover](https://github.com/basveeling/pcam/raw/master/pcam.jpg)

In this lab, we will explore transfer learning and explainable AI (XAI) techniques on the PCam dataset.

## Imports

We will start off using a similar setup to Code Lab 1.

In [0]:
from __future__ import print_function, division
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline

In [0]:
import os
import pandas as pd
from IPython.display import display, HTML

We will continue to use the Eager Execution API for Tensorflow 1.13.  Let us make sure that it is enabled and that the GPU is visible.

In [0]:
import tensorflow as tf
tf.enable_eager_execution()

tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
print('Tensorflow version: ',tf.__version__)
print('Is GPU available: %s' % str(tf.test.is_gpu_available()))
print('Is Eager Execution enabled?: %s' % str(tf.executing_eagerly()))

Tensorflow version:  1.13.1
Is GPU available: False
Is Eager Execution enabled?: True


## Dataset

### About Pcam

PCam is contained in three data splits:
    1. Training: 262K samples
    2. Validation: 33K
    3. Testing: 33K

The patches are labeled 'positive' for containing a metastases if there was at least one pixel of segment in the central 32x32 region of the patch from the segmentation of the WSI that the patch was extracted from.  The dataset was constructed to be evenly balanced with 50/50 'present'/'not present'.

### Load the dataset

In [0]:
import boto3
import os

s3 = boto3.client('s3',
endpoint_url = 'https://s3.wasabisys.com',
aws_access_key_id = 'VM2WCNG36U812Y1NGCT3',
aws_secret_access_key='g3Dqovv3IYlIFDZyNWONXZSU5yhGZvWhKOJrBQRI')

def download_files(filenames, save_dir):
    for i, filename in enumerate(filenames):
         print('Downloading %d: %s' % (i, filename)) 
         download_file(filename, save_dir)

def download_file(filename, save_dir):
    full_filename = os.path.join(save_dir,filename)
    if os.path.exists(full_filename):
        print('\tAlready have: %s' % full_filename)
        return
    s3.download_file('curaedlhw',filename,full_filename)
    print('\tCOMPLETE: %s saved to %s' % (filename, save_dir))

In [20]:
datasets = {
    'train' : {
        'x': 'camelyonpatch_level_2_split_train_x.h5.gz',
        'y': 'camelyonpatch_level_2_split_train_y.h5.gz',
        'meta': 'camelyonpatch_level_2_split_train_meta.csv'
    },
    'val' : {
        'x': 'camelyonpatch_level_2_split_valid_x.h5.gz',
        'y': 'camelyonpatch_level_2_split_valid_y.h5.gz',
        'meta': 'camelyonpatch_level_2_split_valid_meta.csv'
    },
    'test' : {
        'x': 'camelyonpatch_level_2_split_test_x.h5.gz',
        'y': 'camelyonpatch_level_2_split_test_y.h5.gz',
        'meta': 'camelyonpatch_level_2_split_test_meta.csv'
    }
}
FILELIST = []
for scenario in datasets.values():
  for filename in scenario.values():
    FILELIST.append('data/' + filename)
print(FILELIST)

['data/camelyonpatch_level_2_split_train_x.h5.gz', 'data/camelyonpatch_level_2_split_train_y.h5.gz', 'data/camelyonpatch_level_2_split_train_meta.csv', 'data/camelyonpatch_level_2_split_valid_x.h5.gz', 'data/camelyonpatch_level_2_split_valid_y.h5.gz', 'data/camelyonpatch_level_2_split_valid_meta.csv', 'data/camelyonpatch_level_2_split_test_x.h5.gz', 'data/camelyonpatch_level_2_split_test_y.h5.gz', 'data/camelyonpatch_level_2_split_test_meta.csv']


In [21]:
IS_ONEPANEL = False
NEED_DOWNLOAD = True
if IS_ONEPANEL:
    DATA_DIR='/onepanel/input/datasets/curae/skin-cancer-mnist/1'
else:
    import os
    # import load_data
#     DATA_DIR = '/storage/codelab1'
    DATA_DIR = 'data'
    if NEED_DOWNLOAD:
        # if not os.path.exists(DATA_DIR):
        #     os.mkdir(DATA_DIR)
        download_files(FILELIST,'')

Downloading 0: data/camelyonpatch_level_2_split_train_x.h5.gz
	Already have: data/camelyonpatch_level_2_split_train_x.h5.gz
Downloading 1: data/camelyonpatch_level_2_split_train_y.h5.gz
	Already have: data/camelyonpatch_level_2_split_train_y.h5.gz
Downloading 2: data/camelyonpatch_level_2_split_train_meta.csv
	Already have: data/camelyonpatch_level_2_split_train_meta.csv
Downloading 3: data/camelyonpatch_level_2_split_valid_x.h5.gz
	COMPLETE: data/camelyonpatch_level_2_split_valid_x.h5.gz saved to 
Downloading 4: data/camelyonpatch_level_2_split_valid_y.h5.gz
	COMPLETE: data/camelyonpatch_level_2_split_valid_y.h5.gz saved to 
Downloading 5: data/camelyonpatch_level_2_split_valid_meta.csv
	COMPLETE: data/camelyonpatch_level_2_split_valid_meta.csv saved to 
Downloading 6: data/camelyonpatch_level_2_split_test_x.h5.gz
	COMPLETE: data/camelyonpatch_level_2_split_test_x.h5.gz saved to 
Downloading 7: data/camelyonpatch_level_2_split_test_y.h5.gz
	COMPLETE: data/camelyonpatch_level_2_split_t

The x and y files are compressed are so we need to extract them to local.

In [0]:
import gzip
import shutil
for filename in FILELIST:
  with gzip.open(filename, 'rb') as f_in:
      filename2 = '.'.join(filename.split('.')[:-1])
      with open(filename2, 'wb') as f_out:
          print('Extracing %s' % filename2)
          shutil.copyfileobj(f_in, f_out)