# Parker Dunn

__Assignment for COURSERA: Introduction to Deep Learning (via CU Boulder)__  
__Assignment:__ Week 3 - CNN Cancer Detection Kaggle Mini-Project

## Section 3 - Model Architecture

#### Plan

Due to limited time and computing resources, I'll stick to a simple model. I plan to use the "building block-style" Covolution-Convolution-Pooling design pattern with probably no more than 4 repetitions of this pattern. Since we previously experimented with the development of neural network architecture, I am hoping to replicate a reliable NN structure from one of the example image classification models from the videos. In theory, the key features will be extracted by the convolution architecture and the NN structure from another image classification task can be successfully optimized for new features.

Laid out below are my achitecture plans as well as some of the thoughts I have regarding the training of my CNN.

__Design parameters and Hyperparameters__
Decisions
* I will use ReLU (hidden layers) and sigmoid (output layer) as activation functions. This is not a design parameter that I plan to vary this time.
* I will primarily use 3x3xd convolutional filters
* As an optimization method, I will stick to SGD, which I am most familiar with, and plant to incorporate momentum if possible with the Keras API.

Hyperparameters
* Learning rate
    * Test: 0.01 | 0.001 | 0.0001 (3 values)
* Momentum
    * Test: 0.0 | 0.01 | 0.1 (3 values)
* Number of epochs (i.e., how much training)

Design
* Number of [Conv-Conv-Pool] layers
    - Test: 2, 3, 4
* Number of filters to use

Potential ways to improve a struggling model
* L2 regularization
* Batch normalization

I plan to use moderate training parameters at first (e.g. learning rate -> 0.001 and momentum -> 0.01) to experiment and narrow down some viable convolution designs.

#### Helper Functions

In [21]:
# HELPER FUNCTIONS

def partial_load_data(n):
    # n == total number of images to load
    # split == tuple with fraction of images for training and validation
    
    train_locs, test_locs, y_train_info = load_image_info()
    
    # Generate random set of indices
    rand_idx = np.random.randint(0,200000,(n,))
    #print(len(train_locs))
    #print(rand_idx.shape)
    
    X = np.zeros((n, 96, 96, 3))
    X_IDs = []
    
    for i in range(n):
        ind = rand_idx[i]
        #print(i, ind)
        img_file = train_locs[ind]
        img = io.imread(img_file)        # NOTE: io.imread() reads images in as numpy.ndarray
        
        #img = img.reshape(1,96*96,3)
        
        X[i,:,:,:] = img /255.0  # NOTE: MODIFYING ALL VALUES TO 0-1 SCALE!!!
        
        X_IDs.append(img_file[6:-4])
    
    return X, X_IDs, y_train_info

def partial_train_val_split(X, y_info, split=(0.66, 0.34)):
    # generate indices for training and validations sets based on 'split'
    sz = len(X)
    n_train, n_val = [int(split[i] * sz) for i in range(len(split))]
    print(n_train, n_val)
    
    rng = np.random.default_rng()
    idx_train = rng.choice(range(sz), (n_train,), replace=False, shuffle=False)
    idx_val = list(set(range(sz)) - set(idx_train))
    
    # separate X and y_info into separate datasets
    X_tr = X[idx_train,:,:,:]
    y_tr = y_info.iloc[idx_train,:].reset_index()
    
    X_val = X[idx_val,:,:,:]
    y_val = y_info.iloc[idx_val,:].reset_index()
    
    # return X_tr, y_tr, X_val, y_val
    return X_tr, y_tr, X_val, y_val

___
#### Step 3 - Part 1: Trying to find a repeatable way to create a CNN!

In [27]:
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers

#import numpy as np
from helperfunctions import *

In [3]:
layers_lst = ["input","conv", "maxpool","conv","conv","maxpool","flatten","dense","dense","dense"]
layer_design = [
    {"filters":24, "kernel_size":(3,3), "padding":"valid", "data_format":"channels_last", "use_bias":True, "input_shape":(96,96,3)},
    {"filters":48, "kernel_size":(3,3), "padding":"valid", "data_format":"channels_last", "use_bias":True},
    {"pool_size":(2,2)},
    {"filters":64, "kernel_size":(3,3), "padding":"valid", "data_format":"channels_last", "use_bias":True},
    {"filters":72, "kernel_size":(3,3), "padding":"valid", "data_format":"channels_last", "use_bias":True},
    {"pool_size":(2,2)},
    None,
    {"size":96, "activation":'relu'},
    {"size":48, "activation":'relu'},
    {"size":1, "activation":'sigmoid'}]

In [4]:
## BELOW CAN BE TURNED INTO A FUNCTION THAT TAKES THE PARAMETERS ABOVE AND
#  TURNS THEM INTO A MODEL!

model = tf.keras.Sequential()
for (l, d) in zip(layers_lst, layer_design):
    if l == "input":
        model.add(layers.Conv2D(d["filters"], d["kernel_size"], padding=d["padding"], use_bias=d["use_bias"], input_shape=d["input_shape"]))
    elif l == "conv":
        model.add(layers.Conv2D(d["filters"], d["kernel_size"], padding=d["padding"], use_bias=d["use_bias"]))
    elif l == "maxpool":
        model.add(layers.MaxPool2D(d["pool_size"]))
    elif l == "flatten":
        model.add(layers.Flatten())
    elif l == "dense":
        model.add(layers.Dense(d["size"], activation=d["activation"]))
    # elif l == "output":
    #     model.add(layers.Dense(d["size"], activation=d["activation"])
    else:
        raise Exception("Invalid layer provided for the model")
        
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 94, 94, 24)        672       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 92, 92, 48)        10416     
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 46, 46, 48)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 44, 44, 64)        27712     
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 42, 42, 72)        41544     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 21, 21, 72)        0         
_________________________________________________________________
flatten (Flatten)            (None, 31752)             0

___

#### Step 3 - Part 2: Loading some image data and splitting into training and validation

I don't want to use all of the available images to do some preliminary testing of model designs. Therefore, I'll setup some specific functions for training and validating on a small subset of the images available.

In [22]:
%%time

# 3000 samples -> 9.91 s
# 5000 samples -> 15.5 s

X, X_ids, y_info = partial_load_data(3000)

X_tr, y_tr, X_val, y_val = partial_train_val_split(X, y_info)

#print(X_tr, y_tr, X_val, y_val)
y_tr.head(5)

1980 1020
CPU times: total: 3.66 s
Wall time: 9.74 s


Unnamed: 0,index,id,label
0,493,50bd266a907e0a5c648e959277745ddde0b88993,0
1,270,8e72e27b0fb601c881ef84125bf25f51bdcdfa65,1
2,482,2beb4425674591e7cf163717675f0a122e6541f3,1
3,755,f4f1243220c1f51190c4f1c54a9ec5e47358ea63,1
4,496,0ace7ee78233ac5acd8d84d96a544a5248cb8de2,1


In [26]:
## Just double-checking the X_tr, and X_val data
print(y_tr.describe(),"\n")
#print(y_val.describe())
print(X_tr[0,:,:,:].shape, "\n")
print(X_val.shape)

             index        label
count  1980.000000  1980.000000
mean   1497.747475     0.388889
std     875.370774     0.487621
min       1.000000     0.000000
25%     736.500000     0.000000
50%    1501.500000     0.000000
75%    2266.250000     1.000000
max    2999.000000     1.000000 

(96, 96, 3) 

(1020, 96, 96, 3)


#### Step 3 - Part 3: Testing model parameters

- [Done] Load some sample data
- [Done] Create a consistent way to create CNN object
- [*in progress*] Create function to perform final steps of model design
- [*in progress*] Training and validation (I believe this is a TensorFlow built in function)

In [29]:
def model_compiler(model, opt_params, metrics):
    opt = optimizers.SGD(learning_rate=opt_params[0],
                         momentum=opt_params[1],
                        name='SGD')
    # "from_logits=True" -- recommended by tf documentation
    model.compile(optimizer=opt,
                 loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
                  metrics=metrics)
    return model

In [32]:
model = model_compiler(model, (0.001, 0.01), ['accuracy'])

# testing something
type(y_tr.loc[:,"label"].to_numpy())

numpy.ndarray

In [None]:
# history is kind of like results!
history = model.fit(X_tr,
                    y_tr.loc[:,"label"].to_numpy(),
                    epochs=10,
                    validation_data=(X_val, y_val.loc[:,"label"].to_numpy())
                   )