## Overview

  - Importing Dependencies
  - Data / Data Generator
  - Data Exploration / Data Visualization
  - Data Augmentation
  - Callbacks
  - Model Architecture (Model -> Compile -> Fit)
  - Evaluation
  - Showing the Result
  - Saving and Loading the Mode
  - Testing the Model
  - Reference

## Note:

In a starting a new project, normally, I would start with a simple neural network and a data. The overview would look something like:
  - Importing the Dependencies
  - Data
  - Model Architecture (Model -> Compile -> Fit)
  - Evaluation
  - Showing the Result
  
In creating a more accurate model, you need to evaluate the training accuracy and the validation accuracy of the model, thus training losses and validation losses. As of my knowledge today, there two concept problems you need to identify in your evaluation for creating a more accurate complex network and dataset. 

### **Importing Dependencies**

In [1]:
import os
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt

In [2]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dropout, Flatten, Dense, Activation, BatchNormalization

from sklearn.model_selection import train_test_split
from keras.preprocessing.image import ImageDataGenerator

!pip install livelossplot
from livelossplot import PlotLossesKeras

Using TensorFlow backend.


Collecting livelossplot
  Downloading https://files.pythonhosted.org/packages/07/9d/54f8a93d65eece0bcd475b191c4c9a3bff9dbf993db8d5e2d02b76c2d2c3/livelossplot-0.3.3-py3-none-any.whl
Installing collected packages: livelossplot
Successfully installed livelossplot-0.3.3


### **Output Files / Parameters of the Model**

In [3]:
TRAINING_LOGS_FILE = 'training_logs.csv'
MODEL_SUMMARY_FILE = 'model.summary.txt'
MODEL_FILE = 'histopathologic_cancer_detector.h5'

TRAINING_PLOT_FILE = 'training.png'
VALIDATION_PLOT_FILE = 'validation.png'
ROC_PLOT_FILE = 'roc.png'

INPUT_DIRECTORY = '../input/'

In [4]:
#Parameters of the Inputs
IMG_HEIGHT = 150
IMG_WIDTH = 150
IMG_CHANNEL = 3
IMG_SIZE = (IMG_HEIGHT, IMG_WIDTH)
BATCH_SIZE = 216

SAMPLE_COUNT = 85000
TRAINING_RATIO = 0.9
VERBOSITY = 1
TESTING_BATCH_SIZE = 5000
NUM_EPOCH = 10

### **Data & Data Preparation**

In [5]:
#Data (Preparing the Training Data)
filenames = os.listdir("../input/train/train")
categories = []

for filename in filenames:
    category = filename.split('.')[0]
    if category == 'dog':
        categories.append(1)
    else:
        categories.append(0)

In [6]:
categories = [str(i) for i in categories]

df = pd.DataFrame({'filename': filenames,
                  'category': categories})

df['category'] = df['category'].astype('str')

In [7]:
#Splitting the data
train_dataframe, validate_dataframe = train_test_split(df, test_size=0.20, 
                                                         random_state=42)

train_dataframe = train_dataframe.reset_index(drop=True)
validate_dataframe = validate_dataframe.reset_index(drop=True)

In [8]:
#Data Preparation / Generator(Scaling ONLY)
train_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_dataframe(train_dataframe,
                                                       '../input/train/train',
                                                       x_col='filename',
                                                       y_col='category',
                                                       target_size=IMG_SIZE,
                                                       class_mode='binary',
                                                       batch_size=BATCH_SIZE)

validate_datagen = ImageDataGenerator(rescale=1./255)
validate_generator = validate_datagen.flow_from_dataframe(validate_dataframe,
                                                                 '../input/train/train/',
                                                                 x_col='filename',
                                                                 y_col='category',
                                                                 target_size=IMG_SIZE,
                                                                 class_mode='binary',
                                                                 batch_size=BATCH_SIZE)

Found 20000 images belonging to 2 classes.
Found 5000 images belonging to 2 classes.


*In this project, I will try to iterate the model architecture, the data, the callbacks, regularization and optimizer. The project will start from a simple neural network with Keras framework, Afterthat, we work towards achieving higher accuracy through iteration process. The project was inspired by Andrew Ng's iterative machine learning cycle of Idea-Code-Experiment.*

## ***1st Iteration*** - Simple Neural Network

### **Neural Network Architecture**

In [9]:
#Model Architecture
model_1 = tf.keras.models.Sequential([
    tf.keras.layers.Convolution2D(128, kernel_size=(3, 3), input_shape=(IMG_WIDTH, IMG_HEIGHT, IMG_CHANNEL), activation='relu', padding='SAME'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='SAME'),

    tf.keras.layers.Convolution2D(64, kernel_size=(3, 3), activation='relu', padding='SAME'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='SAME'),

    tf.keras.layers.Convolution2D(64, kernel_size=(3, 3), activation='relu', padding='SAME'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='SAME'),
    
    tf.keras.layers.Convolution2D(32, kernel_size=(3, 3), activation='relu', padding='SAME'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='SAME'),

    tf.keras.layers.Convolution2D(32, kernel_size=(3, 3), activation='relu', padding='SAME'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='SAME'),

    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

Instructions for updating:
Colocations handled automatically by placer.


In [10]:
#Hyperparameters
'''learning_rate = 0.0001
num_epochs = 25

from keras.optimizers import SGD

option_1 = SGD(lr=learning_rate)'''

'learning_rate = 0.0001\nnum_epochs = 25\n\nfrom keras.optimizers import SGD\n\noption_1 = SGD(lr=learning_rate)'

In [11]:
from keras import optimizers

learning_rate = 0.0001

model_1.compile(loss='binary_crossentropy',
              optimizer= 'sgd',
              metrics=['accuracy'])

callbacks = [PlotLossesKeras()]

In [12]:
first_history = model_1.fit_generator(train_generator,
                              steps_per_epoch=len(train_generator), 
                              validation_data=validate_generator, 
                              validation_steps=len(validate_generator), 
                              epochs=NUM_EPOCH, verbose=VERBOSITY)

Instructions for updating:
Use tf.cast instead.
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## ***2nd Iteration*** - Training Data Augmentation

In [13]:
#Data Generator / Augmentation
train_datagen_augmented = ImageDataGenerator(rescale=1./255,
                                    shear_range=0.2,
                                    zoom_range=0.2,
                                    rotation_range=30,
                                    horizontal_flip=True,
                                    vertical_flip=True,
                                    width_shift_range=0.2,
                                    height_shift_range=0.2,
                                    data_format='channels_last')

train_generator_augmented = train_datagen_augmented.flow_from_dataframe(train_dataframe,
                                                                           '../input/train/train',
                                                                           x_col='filename',
                                                                           y_col='category',
                                                                           target_size=IMG_SIZE,
                                                                           class_mode='binary',
                                                                           batch_size=BATCH_SIZE)

Found 20000 images belonging to 2 classes.


In [14]:
'''#Callbacks
from keras.callbacks import EarlyStopping, ReduceLROnPlateau, CSVLogger, ModelCheckpoint

reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, 
                  patience=5, verbose=0, mode='auto', 
                  min_delta=0.0001, cooldown=0, min_lr=0)

csv_logger = CSVLogger(TRAINING_LOGS_FILE, separator=',', append=False)

model_checkpoint = ModelCheckpoint(MODEL_FILE, monitor='val_loss', 
                                verbose=0, save_best_only=True, 
                                save_weights_only=False, mode='auto', 
                                period=1)

callbacks = [reduce_lr, csv_logger, model_checkpoint]'''

"#Callbacks\nfrom keras.callbacks import EarlyStopping, ReduceLROnPlateau, CSVLogger, ModelCheckpoint\n\nreduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, \n                  patience=5, verbose=0, mode='auto', \n                  min_delta=0.0001, cooldown=0, min_lr=0)\n\ncsv_logger = CSVLogger(TRAINING_LOGS_FILE, separator=',', append=False)\n\nmodel_checkpoint = ModelCheckpoint(MODEL_FILE, monitor='val_loss', \n                                verbose=0, save_best_only=True, \n                                save_weights_only=False, mode='auto', \n                                period=1)\n\ncallbacks = [reduce_lr, csv_logger, model_checkpoint]"

In [15]:
second_history = model_1.fit_generator(train_generator_augmented,
                              steps_per_epoch=len(train_generator_augmented), 
                              validation_data=validate_generator, 
                              validation_steps=len(validate_generator), 
                              epochs=NUM_EPOCH, verbose=VERBOSITY)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## ***3rd Iteration*** - Regularization and Optimizer

In [16]:
from keras import regularizers

'''model.add(Dense(64, input_dim=64,
                kernel_regularizer=regularizers.l2(0.01),
                activity_regularizer=regularizers.l1(0.01)))'''

model_3 = tf.keras.models.Sequential([
    tf.keras.layers.Convolution2D(128, kernel_size=(3, 3), input_shape=(IMG_WIDTH, IMG_HEIGHT, IMG_CHANNEL), activation='relu', padding='SAME'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='SAME'),
    
    tf.keras.layers.Convolution2D(64, kernel_size=(3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.01), padding='SAME'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='SAME'),

    tf.keras.layers.Convolution2D(64, kernel_size=(3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.01), padding='SAME'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='SAME'),

    tf.keras.layers.Convolution2D(32, kernel_size=(3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.01), padding='SAME'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='SAME'),

    tf.keras.layers.Convolution2D(32, kernel_size=(3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.01), padding='SAME'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='SAME'),

    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid'),
])

In [17]:
model_3.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [18]:
third_history = model_3.fit_generator(train_generator_augmented,
                              steps_per_epoch=len(train_generator_augmented), 
                              validation_data=validate_generator, 
                              validation_steps=len(validate_generator), 
                              epochs=NUM_EPOCH, verbose=VERBOSITY)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## ***4th Iteration*** - Transfer Learning

In [19]:
from keras.applications.vgg19 import VGG19

initial_model = VGG19(include_top=False, weights='imagenet', 
                      input_tensor=None,  input_shape=(IMG_WIDTH, IMG_HEIGHT, 3))

Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5


In [20]:
'''model_4 = tf.keras.models.Sequential([
    tf.keras.layers.add(initial_model),

    tf.keras.layers.add(Flatten()),
    tf.keras.layers.Dense(1024, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    #tf.keras.layers.Dense(512, activation='relu'),
    #tf.keras.layers.Dropout(0.5)
    tf.keras.layers.Dense(1, activation='sigmoid')
])'''

model_4 = Sequential()
model_4.add(initial_model)

model_4.add(Flatten())
model_4.add(Dense(1024,activation='relu'))
model_4.add(Dropout(0.5))
model_4.add(Dense(1,activation='sigmoid'))
#model_4.summary()

Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


In [21]:
model_4.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [22]:
fourth_history = model_4.fit_generator(train_generator_augmented,
                              steps_per_epoch=len(train_generator_augmented), 
                              validation_data=validate_generator, 
                              validation_steps=len(validate_generator), 
                              epochs=NUM_EPOCH, verbose=VERBOSITY)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## ***5th Iteration*** - Freezing Concept

In [23]:
'''model_5 = tf.keras.models.Sequential([
    tf.keras.layers.add(initial_model)

    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(1024, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dropout(0.5)
    tf.keras.layers.Dense(1, activation='sigmoid'),
])'''

model_5 = Sequential()
model_5.add(initial_model)

model_5.add(Flatten())
model_5.add(Dense(1024,activation='relu'))
model_5.add(Dropout(0.5))
model_5.add(Dense(1,activation='sigmoid'))
#model_5.summary()

In [24]:
for layer in initial_model.layers[:-15]:
    layer.trainable = False
    
'''for layer in initial_model.layers:
    print(layer,layer.trainable)'''

'for layer in initial_model.layers:\n    print(layer,layer.trainable)'

In [25]:
model_5.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [26]:
fifth_history = model_5.fit_generator(train_generator_augmented,
                              steps_per_epoch=len(train_generator_augmented), 
                              validation_data=validate_generator, 
                              validation_steps=len(validate_generator), 
                              epochs=NUM_EPOCH, verbose=VERBOSITY)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## ***6th Iteration***

In [27]:
model_6 = tf.keras.models.Sequential([
    tf.keras.layers.Convolution2D(32, kernel_size=(3, 3), input_shape=(IMG_WIDTH, IMG_HEIGHT, IMG_CHANNEL), activation='relu', padding='SAME'),
    tf.keras.layers.Convolution2D(32, kernel_size=(3, 3), activation='relu', padding='SAME'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='SAME'),

    tf.keras.layers.Convolution2D(64, kernel_size=(3, 3), activation='relu', padding='SAME'),
    tf.keras.layers.Convolution2D(64, kernel_size=(3, 3), activation='relu', padding='SAME'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='SAME'),

    tf.keras.layers.Convolution2D(128, kernel_size=(3, 3), activation='relu', padding='SAME'),
    tf.keras.layers.Convolution2D(128, kernel_size=(3, 3), activation='relu', padding='SAME'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='SAME'),
    
    tf.keras.layers.Convolution2D(256, kernel_size=(3, 3), activation='relu', padding='SAME'),
    tf.keras.layers.Convolution2D(256, kernel_size=(3, 3), activation='relu', padding='SAME'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='SAME'),

    tf.keras.layers.Convolution2D(512, kernel_size=(3, 3), activation='relu', padding='SAME'),
    tf.keras.layers.Convolution2D(512, kernel_size=(3, 3), activation='relu', padding='SAME'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='SAME'),

    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

In [28]:
model_6.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [29]:
sixth_history = model_6.fit_generator(train_generator_augmented,
                              steps_per_epoch=len(train_generator_augmented), 
                              validation_data=validate_generator, 
                              validation_steps=len(validate_generator), 
                              epochs=NUM_EPOCH, verbose=VERBOSITY)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## ***References:***
  - https://www.youtube.com/channel/UCWN3xxRkmTPmbKwht9FuE5A
  - https://github.com/Terrance-Whitehurst/Keras-Histopathologic-Cancer-Detection/blob/master/cancer-detect-keras.ipynb
  - https://towardsdatascience.com/image-classifier-cats-vs-dogs-with-convolutional-neural-networks-cnns-and-google-colabs-4e9af21ae7a8