# Introduction
One way to learn deep leaning is putting in practice all concepts, and trying many times with different configurations.

In this case, I will build some models using Convolutional Neural Networks (CNN) in order to get the best model.

I do not want to spend much time in training, so I chose CIFAR-10. There are few images and these images are small, that model will not spend a lot of time in training.

#### Data
https://www.kaggle.com/c/cifar-10/overview

#### References
To build these model I read the following readings:

https://medium.com/analytics-vidhya/cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488df5

https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

https://towardsdatascience.com/a-guide-to-an-efficient-way-to-build-neural-network-architectures-part-ii-hyper-parameter-42efca01e5d7

**Version: 1.0**

# 1. Load data

In [None]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [None]:
print(tf.__version__)

In [None]:
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()

# 2. Data understanding

In [None]:
train_images.shape

In [None]:
test_images.shape

In [None]:
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap=plt.cm.binary)
    # The CIFAR labels happen to be arrays, 
    # which is why you need the extra index
    plt.xlabel(class_names[train_labels[i][0]])
plt.show()

Frequency by label

In [None]:
np.unique(train_labels, return_counts=True)

# 3. Modeling

## 3.1. Transform data

In [None]:
# Normalizing data
train_images = train_images / 255.0
test_images = test_images / 255.0

## 3.2. Metrics
For this project I will use accuracy and loss metrics, bacuase they are the most common and I want to keep it easy.

Additionally, I will use avoidable bias and variance metrics to evaluate if it is necessary to focus in reduce bias (training) or variance (testing).

In [None]:
# How well a human can classify an image
humanLevelPerformance = 0.9

# How bad a human can classify an image
humanLevelError = 0.1

## 3.3. Create model

In [None]:
# Input configuration
inputHeight = train_images.shape[1]
inputWeight = train_images.shape[2]
numberChannels = train_images.shape[3]

In [None]:
# Hyperparameters

# 1028 because it is fast and the data is small
batchSize = 1028

# 500 epochs because it is necessary a large number of iteration to get the best results
epochs = 500
AUTOTUNE = tf.data.experimental.AUTOTUNE

### 3.3.1. Create a start model
This model is based on LeNet-5 model, because the dimension of the image is too small as LeNet-5 uses.

In [None]:
# Large size kernel use to large amount of pixels (big images)
# For small images or many elements, small kernels

model = tf.keras.models.Sequential()

# 3x3 filter because the input image is small and I want to capture as many details as posible.
# 32 filters because I follow LeNet-5 recommendation as start point in order to get.
# padding='same' because I want to capture image's borders.
# Activation function='relu' because it is the most recommended.
# MaxPooling2D(2, 2) to shrink convolution layer size and speed training and reduce risk of overfitting.
model.add(tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(inputHeight, inputWeight, numberChannels)))
model.add(tf.keras.layers.MaxPooling2D((2, 2)))

# 3x3 filter because to keep capturing as many details as posible
# 64 filters because more neurons process more information (feature maps).
# MaxPooling2D(2, 2) to shrink convolution layer size and speed training and reduce risk of overfitting
model.add(tf.keras.layers.Conv2D(64, (3, 3), activation='relu'))
model.add(tf.keras.layers.MaxPooling2D((2, 2)))

# 3x3 filter because to keep capturing as many details as posible
# 64 filters because more neurons process more information (feature maps).
# MaxPooling2D(2, 2) to shrink convolution layer size and speed training and reduce risk of overfitting
model.add(tf.keras.layers.Conv2D(64, (3, 3), activation='relu'))
model.add(tf.keras.layers.MaxPooling2D((2, 2)))

# LeNet-5 recommendations
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(10))

model.summary()

In [None]:
# Adam optimizer because it is most recommended
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

In [None]:
# EarlyStopping to capture the best loss
history = model.fit(train_images, train_labels, epochs=epochs, batch_size=batchSize, validation_data=(test_images, test_labels),
                    callbacks = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=5))

In [None]:
# Best results
print('Loss:', history.history['loss'][-1])
print('Accuracy:', history.history['accuracy'][-1])
print('Val Loss:', history.history['val_loss'][-1])
print('Val Accuracy:', history.history['val_accuracy'][-1])

In [None]:
avoidableBias = history.history['loss'][-1] - humanLevelError
variance = history.history['val_loss'][-1] - history.history['loss'][-1]

print('Avoidable bias:', avoidableBias)
print('Variance:', variance)

if avoidableBias < variance:
  print('It is necessary to reduce variance')
else:
  print('It is necessary to reduce bias')

Observations
* Both accuracy and loss are close to human perfomance vision, but unfortunately there is overfitting. The difference between training metrics and validation metrics is big.
* Variance is more than avoidable bias, that means that it is necessary to reduce variance.

### 3.3.2. Modify model to reduce variance
This model is to reduce the variance.

In [None]:
model = tf.keras.models.Sequential()

# Add Dropout performed better in CNN layers
model.add(tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(inputHeight, inputWeight, numberChannels)))
model.add(tf.keras.layers.MaxPooling2D((2, 2)))
model.add(tf.keras.layers.Dropout(0.2))

model.add(tf.keras.layers.Conv2D(64, (3, 3), activation='relu'))
model.add(tf.keras.layers.MaxPooling2D((2, 2)))
model.add(tf.keras.layers.Dropout(0.2))

model.add(tf.keras.layers.Conv2D(64, (3, 3), activation='relu'))
model.add(tf.keras.layers.MaxPooling2D((2, 2)))
model.add(tf.keras.layers.Dropout(0.2))

# Add regulation L2 performed better in Fully connected layers
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128, activation='relu', kernel_regularizer = tf.keras.regularizers.L2(0.01)))
model.add(tf.keras.layers.Dense(64, activation='relu', kernel_regularizer = tf.keras.regularizers.L2(0.01)))
model.add(tf.keras.layers.Dense(10))

model.summary()

In [None]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

In [None]:
history = model.fit(train_images, train_labels, epochs=epochs, batch_size=batchSize, validation_data=(test_images, test_labels),
                    callbacks = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=5))

In [None]:
# Best results
print('Loss:', history.history['loss'][-1])
print('Accuracy:', history.history['accuracy'][-1])
print('Val Loss:', history.history['val_loss'][-1])
print('Val Accuracy:', history.history['val_accuracy'][-1])

In [None]:
avoidableBias = history.history['loss'][-1] - humanLevelError
variance = history.history['val_loss'][-1] - history.history['loss'][-1]

print('Avoidable bias:', avoidableBias)
print('Variance:', variance)

if avoidableBias < variance:
  print('It is necessary to reduce variance')
else:
  print('It is necessary to reduce bias')

Observations:
* The overfitting was eliminated. In fact, validation metrics are less than training metrics (bias < variance).
* Avoidable bias is greater than variance, so a new model will focus on reducing the bias (training).

# 4. Conclusions
The first model was focused on reducing bias, because my first step looked for a model that performs well on training. Once I get a good start model, I need to eliminate overfitting.