<a href="https://colab.research.google.com/github/sylvia31096/Histopathologic-Cancer-Detection/blob/master/Histopathologic_Train_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



```
 Fetching data from Kaggle to upload to Google Colab.
```



In [0]:
!pip install -U -q kaggle
!mkdir -p ~/.kaggle

In [0]:
from google.colab import files
files.upload()

In [0]:
!cp kaggle.json ~/.kaggle/

In [0]:
!kaggle competitions download -c histopathologic-cancer-detection

Downloading sample_submission.csv.zip to /content
  0% 0.00/1.33M [00:00<?, ?B/s]
100% 1.33M/1.33M [00:00<00:00, 43.4MB/s]
Downloading train_labels.csv.zip to /content
  0% 0.00/5.10M [00:00<?, ?B/s]
100% 5.10M/5.10M [00:00<00:00, 83.8MB/s]
Downloading test.zip to /content
 98% 1.28G/1.30G [00:08<00:00, 156MB/s]
100% 1.30G/1.30G [00:08<00:00, 165MB/s]
Downloading train.zip to /content
100% 4.98G/4.98G [01:14<00:00, 87.6MB/s]
100% 4.98G/4.98G [01:14<00:00, 72.1MB/s]


Import the required libraries

In [0]:
import pandas as pd
import numpy as np
from zipfile import ZipFile
from keras import backend as K
from keras.preprocessing import image

from keras.models import Sequential,load_model
from keras.layers import Dense, Conv2D, Flatten,MaxPooling2D
import os
from keras.utils import to_categorical
from sklearn.model_selection import train_test_split

Using TensorFlow backend.


Extract the zip files 

In [0]:
with ZipFile("/content/train.zip","r") as zip_ref:
    zip_ref.extractall("content/train")

Get the list of file names

In [0]:
with ZipFile("train.zip", "r") as f:
   listOfFiles= [name for name in f.namelist()]

Join the filenames and the corresponding targets.

In [0]:
#get the targets
targets = pd.read_csv('train_labels.csv.zip')
targets = targets.set_index('id')
filenames = pd.DataFrame(listOfFiles,[os.path.splitext(base)[0] for base in listOfFiles])
filetarg = pd.concat([filenames, targets], axis=1,join='inner')


Rename the columns

In [0]:
filetarg.columns= ['path','label']


In [0]:
filetarg = filetarg.applymap(str)


Pass the filetarg dataframe to the ImageDataGenerator for:


1.   Passing images to the model in batches
2.   Split into validation and training



In [0]:
train_datagen = image.ImageDataGenerator(validation_split=0.2)
train_generator = train_datagen.flow_from_dataframe(filetarg, directory='content/train',
              x_col='path',
              y_col='label',
              batch_size=470,
              target_size=(96, 96),
              subset="training" 
             )
validation_generator = train_datagen.flow_from_dataframe(filetarg, directory='content/train',
              x_col='path',
              y_col='label',
              batch_size=470,
              target_size=(96, 96),
              subset="validation"                                           
             )

Found 176020 images belonging to 2 classes.
Found 44005 images belonging to 2 classes.


Load the vgg 16 model

In [0]:
from keras.applications import VGG16
#Load the VGG model
image_size = 96
vgg_conv = VGG16(weights='imagenet', include_top=False, input_shape=(image_size, image_size, 3))


Instructions for updating:
Colocations handled automatically by placer.
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5


Create the model

In [0]:
#create model
model = Sequential()

#add model layers
model.add(vgg_conv)
model.add(Flatten())
model.add(Dense(2, activation='softmax'))

Display the model

In [0]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
vgg16 (Model)                (None, 3, 3, 512)         14714688  
_________________________________________________________________
flatten_1 (Flatten)          (None, 4608)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 2)                 9218      
Total params: 14,723,906
Trainable params: 14,723,906
Non-trainable params: 0
_________________________________________________________________


Optimize the model by defining hyperparameters.

In [0]:
from keras import optimizers
#compile model using accuracy to measure model performance
adam = optimizers.Adam(lr=0.00004, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
model.compile(optimizer=adam, loss='binary_crossentropy', metrics=['accuracy'])



Fit the model with the train data and validate using validation data.

In [0]:
model.fit_generator(
    train_generator,
    
    steps_per_epoch = 450,
    validation_data = validation_generator, 
    validation_steps = 100,
    epochs = 3,
    verbose=1)

Instructions for updating:
Use tf.cast instead.
Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7fe4f581bd68>

Save trained model

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
model.save("drive/My Drive/model470.h5")