# CNN architectures for CIFAR-10 classification

CIFAR-10 is a famous collection of small color images, each 32 x 32 pixels.  There are 6,000 images in each of 10 classes.

https://www.cs.toronto.edu/~kriz/cifar.html

In this assignment you will perform experiments with a CNN model, and then tune the model to obtain the best test accuracy on CIFAR-10.

The purpose of the assignment if for you to get a feeling for how changes to a CNN model will affect its performance.  In the "experiment" problems, focus on clear, thoughtful, quantitative discussion of the impact of the change to the model.

## Instructions

In the code below, a baseline CNN classifier is created.  Your job is to code and run a bunch of experiments to see the effect of changes to the CNN classifier. 

Read the code, and look for problem prompts.  There are 9 problems.

vF22.2

In [None]:
from tensorflow.keras import models, layers
from tensorflow.keras.datasets import mnist, cifar10
from tensorflow.keras.callbacks import EarlyStopping
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.core.display import display, HTML

In [None]:
# allow output to span multiple output lines in the console
pd.set_option('display.max_columns', 600)
pd.options.display.width = 120
pd.options.display.max_colwidth = 50
display(HTML("<style>.container { width:100% !important; }</style>"))

In [None]:
def plot_metric(history, metric='loss'):
    """ Plot training and test values for a metric. """

    val_metric = 'val_'+metric
    plt.plot(history.history[metric])
    plt.plot(history.history[val_metric])
    plt.title('model '+metric)
    plt.ylabel(metric)
    plt.xlabel('epoch')
    plt.legend(['train', 'test'])
    plt.show();

This will help with replicability, but does not control all aspects of randomness in the code.

In [None]:
np.random.seed()

### Read the data

In [None]:
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


In [None]:
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

(50000, 32, 32, 3)
(50000, 1)
(10000, 32, 32, 3)
(10000, 1)


In [None]:
print(X_train.min(), X_train.max())
print(np.unique(y_train))

0 255
[0 1 2 3 4 5 6 7 8 9]


#### We'll use a smaller version of the data to speed up the training process.

In [None]:
idx = np.random.choice(X_train.shape[0], 25000, replace=False)
X_train = X_train[idx]
y_train = y_train[idx]

idx = np.random.choice(X_test.shape[0], 8000, replace=False)
X_test = X_test[idx]
y_test = y_test[idx]

### Preprocess the data

In [None]:
# from integers in [0,255] to float in [0,1]
X_train = X_train.astype('float32') / 255
X_test  = X_test.astype('float32') / 255

# store the labels in 1D arrays, not 2D
y_train = np.squeeze(y_train)  # this could also be done using reshape
y_test = np.squeeze(y_test)

In [None]:
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

(25000, 32, 32, 3)
(25000,)
(8000, 32, 32, 3)
(8000,)


### Create the baseline CNN model

#### Problem 1: fill in the input shape value below.

In [None]:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=None))  # replace None with your code
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
# output layer
model.add(layers.Dense(10, activation='softmax'))

In [None]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 30, 30, 32)        896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 13, 13, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 2304)              0         
_________________________________________________________________
dense (Dense)                (None, 64)                147520    
_________________________________________________________________
dense_1 (Dense)              (None, 10)                6

### Compile and train the model

In [None]:
model.compile(optimizer='rmsprop',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=5, batch_size=64, validation_data=(X_test, y_test))

Epoch 1/5
Epoch 2/5
Epoch 3/5

In [None]:
plot_metric(history)

## Experiments

In each experiment, compare an alternative model to the baseline model.

The amount of training that is needed will depend on the model.  You can adjust the number of epochs and you can use Use early stopping if you like.

**After each experiment, think about and comment on**:
- the size of the model (in terms of number of parameters)
- how long the model takes to train
- the validation accuracy you can achieve
- the amount of overfitting

Avoid making vague statements like: "this model performs a little better than the baseline", or
"we see a little overfitting".  Try to be quantitative.  What are you using to compare
your model to the baseline: validation accuracy, validation loss, or ...?  

### Experiment: Stack two convolutional layers

It is pretty common in CNN models to "stack" convolutional layers.  This means using two or more convolutional layers without pooling layers between them.

Create a model like the baseline, but add a second convolutional layer after the first convolutional layer.  You can use a layer with the same number of output channels and same filter size.

#### Problem 2: Copy the your baseline model and modify it to add the convolutional layer.  You may modify the fit() call to control the number of epochs.  Also, add your comments in the markdown cell after the plot_metric() cell.

In [None]:
# YOUR MODEL HERE

In [None]:
model.summary()

In [None]:
model.compile(optimizer='rmsprop',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=5, batch_size=64, validation_data=(X_test, y_test))

In [None]:
plot_metric(history)

(YOUR COMMENTS HERE)

### Experiment: Stack both convolutional layers

#### Problem 3.  Create a model like the baseline, but with each of the original convolutional layers replaced by a pair of convolutional layers.

#### YOUR CELLS HERE

### Experiment: Remove the next-to-last dense layer

#### Problem 4.  Create a model like the baseline, but remove the dense layer before the final layer.

#### YOUR CELLS HERE

### Experiment: Modify the convolutional kernel size.

#### Problem 5. The baseline has 3x3 kernels.  Make a modified version of the baseline using a different size.  However, use the same kernel size in both convolutional layersl.  

#### YOUR CELLS HERE

### Experiment: Replace ReLU activation functions with elu

#### Problem 6.  Create a model like the baseline, but with the ReLU activiation functions replaced by ELU activation functions.

#### YOUR CELLS HERE

### Experiment: Replace the rmsprop optimizer with adam

#### Problem 7.  Create a model like the baseline, but use the adam optimizer instead of rmsprop.

#### YOUR CELLS HERE

### Challenge: Create the best small model you can

#### Problem 8.  Create a CNN model, using only Conv2D(), MaxPooling2D(), Flatten(), and Dense(), with the best possible validation accuracy.

Your model must have less than 60,000 parameters.

Do not try ingredients outside the scope of this assignment, like pre-defined models.  The idea is for you to get a good feeling for basic CNNs at this time.

#### YOUR CELLS HERE

### Challenge Create the best model you can

#### Problem 9: Create a CNN model, using  Conv2D(), MaxPooling2D(), Flatten(), and Dense(), with the best possible validation accuracy.

You can also use dropout, batch normalization, and weight regularization.  You can modify the optimizer, the optimizer parameters, and weight regularization.

Note that, in CNNs, when dropout is used it is often applied after pooling.

**Only use ingredients covered in class**.  For example, don't use pre-defined models.  The one exception: you can use SpatialDropout, which is a form of dropout suited to CNNs.

Your model can have as many parameters as you like.  You can modify the optimizer, the optimizer parameters, and the activation function.  Also, you can try dropout, batch normalization, and weight regularization.

Remember to start with a model that overfits, and then control overfitting.

You are encouraged to use the grad search and/or random search code of last week's assignment.

#### YOUR CELLS HERE