<a href="https://colab.research.google.com/github/finlaycm/tensorflow_tumor_detection/blob/master/part5_newslide_predictions_final.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
from google.colab import drive
import os
colab_root_dir = '/content'
drive_dir='/content/drive'
project_root_dir = os.path.join(drive_dir,'My Drive','deeplearning','cancer_classification')
eval_dir = os.path.join(project_root_dir,'eval')
drive.mount(drive_dir)
test_slides = ['tumor_094','tumor_096','tumor_019','tumor_016','tumor_031','tumor_084']


Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


## About

This starter code shows how to read slides and tumor masks from the [CAMELYON16](https://camelyon17.grand-challenge.org/Data/) dataset. It will install [OpenSlide](https://openslide.org/) in Colab (the only non-Python dependency). Note that OpenSlide also includes a [DeepZoom viewer](https://github.com/openslide/openslide-python/tree/master/examples/deepzoom), shown in class. To use that, you'll need to install and run OpenSlide locally on your computer.

### Training data

The original slides and annotations are in an unusual format. I converted a bunch of them for you, so you can read them with OpenSlide as shown in this notebook. This [folder](https://drive.google.com/drive/folders/1rwWL8zU9v0M27BtQKI52bF6bVLW82RL5?usp=sharing) contains all the slides and tumor masks I converted (and these should be *plenty* for your project). If you'd like more beyond this, you'll need to use ASAP as described on the competition website to convert it into an appropriate format. 

Note that even with the starter code, it will take some effort to understand how to work with this data (the various zoom levels, and the coordinate system). Happy to help in OH if you're stuck.

### Reminder

The goal for your project is to build a thoughtful, end-to-end prototype - not to match the accuracy from the [paper](https://arxiv.org/abs/1703.02442), or use all the available data. 


In [None]:
#Install the OpenSlide C library and Python bindings
!apt-get install openslide-tools
!pip install openslide-python

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-430
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
  libopenslide0
Suggested packages:
  libtiff-tools
The following NEW packages will be installed:
  libopenslide0 openslide-tools
0 upgraded, 2 newly installed, 0 to remove and 7 not upgraded.
Need to get 92.5 kB of archives.
After this operation, 268 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libopenslide0 amd64 3.4.1+dfsg-2 [79.8 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 openslide-tools amd64 3.4.1+dfsg-2 [12.7 kB]
Fetched 92.5 kB in 1s (172 kB/s)
Selecting previously unselected package libopenslide0.
(Reading database ... 145653 files and directories currently installed.)
Preparing to unpack .../libopenslide0_3.4.1+dfsg-2_

In [None]:
%tensorflow_version 2.x
import tensorflow as tf
print(tf.__version__)

TensorFlow 2.x selected.
2.0.0


In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
from openslide import open_slide, __library_version__ as openslide_version
import os
from PIL import Image
from skimage.color import rgb2gray
import cv2 as cv
import json
from google.colab import drive
import pathlib
import shutil
import random
import time
import pandas as pd

from sklearn.model_selection import train_test_split

from tensorflow.keras.layers import Dense, Flatten, Input, MaxPooling2D, Conv2D
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import load_model
from tensorflow.keras import backend as K

print(tf.__version__)


2.0.0


In [None]:
test_slides = ['tumor_094','tumor_096','tumor_019','tumor_016','tumor_031','tumor_084']
slide_dir=os.path.join(project_root_dir,'myslides')
slide_files = [os.path.join(slide_dir,s) for s in os.listdir(slide_dir) if pathlib.Path(s).stem in test_slides ]
mask_dir=os.path.join(project_root_dir,'mymasks')
mask_files = [os.path.join(mask_dir,m) for m in os.listdir(mask_dir) if pathlib.Path(m).stem.replace('_mask','') in test_slides ]
#load model
model = load_model(os.path.join(project_root_dir,'cancer6.h5'))


In [None]:
def read_slide(slide_path, level, x = 0, y = 0, width=None, height=None, as_float=False, show=False):
    slide = open_slide(slide_path)
    if not width: width=slide.level_dimensions[level][0]
    if not height: height=slide.level_dimensions[level][1]
    im = slide.read_region((x,y), level, (width, height))
    im = im.convert('RGB') # drop the alpha channel
    if as_float:
        im = np.asarray(im, dtype=np.float32)
    else:
        im = np.asarray(im)
    assert im.shape == (height, width, 3)
    if '_mask.tif' in str(slide_path):
      im = im[:,:,0]
    if show:
      plt.imshow(im)
      plt.show()
    return im
def image_gen(img_paths, img_size=(128, 128)):
    for img_path in img_paths:
        img = mpimg.imread(img_path) / 255.
        img = cv.resize(img, img_size)
        img = np.expand_dims(img, axis=0) 
        img_name = pathlib.Path(img_path).stem
        i = int(img_name.split('_')[-1])
        x,y,w,h = patch_locators_in[i]
        level = 0
        truth = 1 if read_slide(mask_path,level, x,y,w,h).sum() > 0 else 0
        yield img_name,img,truth, (x,y,w,h)

def predict(slide_path):
  global patch_locators_in, mask_path
  slide_name = pathlib.Path(slide_path).stem
  print('Predictions for slide {}'.format(slide_name))
  mask_path = [m for m in mask_files if slide_name in m][0]
  patches_folder_path = os.path.join(eval_dir,slide_name,'patches_'+slide_name+'.zip')
  patches_folder_name = 'patches_'+slide_name
  shutil.copy(patches_folder_path,colab_root_dir)
  !unzip -q -o $patches_folder_name -d $patches_folder_name
  patches_locator_name = patches_folder_name+'_locators.npy'
  patches_locators_path = os.path.join(eval_dir,slide_name,patches_locator_name)
  shutil.copy(patches_locators_path,colab_root_dir)
  cols = ['patch_name','x','y','width','height','truth','prediction']
  rows = []
  img_paths = [os.path.join(colab_root_dir,patches_folder_name,f) for f in os.listdir(patches_folder_name)]
  patch_locators_in = np.load(patches_locator_name,allow_pickle = True)
  eval_generator = image_gen(img_paths)
  predictions_name = 'pred_'+pathlib.Path(slide_name).stem
  predictions_path = os.path.join(eval_dir,predictions_name)
  start_time = time.time()
  now = time.strftime("%H:%M", time.localtime(start_time))
  print("Start Time {} ".format(now)) 
  for patchname , img , truth, c in eval_generator:
    predictions = model.predict(img).item()
    rows.append({'patch_name': patchname,'x':c[0],'y':c[1],'width':c[2],'height':c[3], 'truth':truth,'prediction':predictions})
  predictions_df = pd.DataFrame(rows, columns = cols)
  predictions_df.to_pickle('pred_'+slide_name+'.pkl')
  shutil.copy('pred_'+slide_name+'.pkl',os.path.join(eval_dir,slide_name,'pred_'+slide_name+'.pkl'))
  end_time = time.time()
  now = time.strftime("%H:%M", time.localtime(end_time))
  print("End Time {} ".format(now)) 
  print('....'*10)

In [None]:
slide_files[4:5]

['/content/drive/My Drive/deeplearning/cancer_classification/myslides/tumor_084.tif']

In [None]:
for slide_path in slide_files[4:5]:
  predict(slide_path)

Predictions for slide tumor_084
Start Time 10:52 
End Time 11:16 
........................................
