# Work 02: Convolutional neural networks (CNNs)
# Leonardo Gargitter GRR20172145

## Part 01: Theoretical

### 1. What is deep learning?

Deep learning is essentially a neural network with three or more layers. It eliminates some of data pre-processing that is typically involved with machine learning. These algorithms can ingest and process unstructured data, like text and images, and it automates feature extraction, removing some of the dependency on human experts.

### 2. Why do we prefer CNNs over shallow artificial neural networks for image data?

In the specific case of image data we prefer CNNs because it scales better. Shallow artificial neural networks or even vanilla deep neural networks will get very heavy as the image sizes increase, and also using only fully connected layers makes it more likely to overfit the training dataset. Inspired by how human vision works, layers of a convolutional network have neurons arranged in three dimensions, so layers have a width, height, and depth. The neurons in a convolutional layer are only connected to a small, local region of the preceding layer, so we avoid the wastefulness of fully-connected neurons.

### 3. Explain the role of the convolution layer in a CNN design.

Convolutional neural networks are neural networks that use convolution in place of general matrix multiplication in at least one of its layers. Convolution adds the possibility to have sparse interactions, parameter sharing and equivariant representations.

### 4. What is the role of the fully connected (FC) layer in CNN?

### 5. Why do we use a pooling layer in a CNN?

Pooling is a subsampling layer on a CNN, it replaces output at a location with a summary statistic of nearby outputs. Overall it is used to dimensionality reduction and to introduce invariance to transformations of rotation and translation.

### 6. Explain the characteristics of the following pooling approaches: max pooling, average pooling, and sum pooling.

Max pooling:

## Part 02: Practical

### 1. Test 5 setups for the CNNs with focus in the accuracy improve values (during training and test phases) for 20 (or more) learning epochs.

### 2. What was the best setup in terms of performance metrics (accuracy or MSE, Mean Squared Error) to the evaluated CNNs the MNIST and MNIST-like fashion product database?

# Datasets

## MNIST

In [1]:
import matplotlib.pyplot as plt
import tensorflow as tf

mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

rows, cols = 28,28

x_train = x_train[0:3000]
y_train = y_train[0:3000]

x_test = x_test[0:500]
y_test = y_test[0:500]

x_train = x_train.reshape(x_train.shape[0], rows, cols, 1)
x_test = x_test.reshape(x_test.shape[0], rows, cols, 1)

input_shape = (rows, cols, 1)

#normalizar os valores dos pixels das imagens
x_train = tf.keras.utils.normalize(x_train, axis=1)
x_test = tf.keras.utils.normalize(x_test, axis=1)

y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)

## Fashion MNIST

In [2]:
fashion_mnist = tf.keras.datasets.fashion_mnist
(fashion_x_train, fashion_y_train), (fashion_x_test, fashion_y_test) = fashion_mnist.load_data()

rows, cols = 28,28

fashion_x_train = fashion_x_train[0:3000]
fashion_y_train = fashion_y_train[0:3000]

fashion_x_test = fashion_x_test[0:500]
fashion_y_test = fashion_y_test[0:500]

fashion_x_train = fashion_x_train.reshape(fashion_x_train.shape[0], rows, cols, 1)
fashion_x_test = fashion_x_test.reshape(fashion_x_test.shape[0], rows, cols, 1)

input_shape = (rows, cols, 1)

#normalizar os valores dos pixels das imagens
fashion_x_train = tf.keras.utils.normalize(fashion_x_train, axis=1)
fashion_x_test = tf.keras.utils.normalize(fashion_x_test, axis=1)

fashion_y_train = tf.keras.utils.to_categorical(fashion_y_train, 10)
fashion_y_test = tf.keras.utils.to_categorical(fashion_y_test, 10)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


# Models

## Simple DNN

In this model we will use only 3 fully connected layers.

In [10]:
dnn = tf.keras.Sequential()
dnn.add(tf.keras.layers.Flatten())
dnn.add(tf.keras.layers.Dense(units=128, activation='relu'))
dnn.add(tf.keras.layers.Dense(units=128, activation='relu'))
dnn.add(tf.keras.layers.Dense(units=128, activation='relu'))
dnn.add(tf.keras.layers.Dense(units=10, activation='softmax'))
dnn.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy', 'mse'])

## LeNet-5
http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf

I replicated the CNN LeNet-5 and choose 2 different types of pooling layers.

### Max Pooling:

In [11]:
#Lenet using max pooling
lenet_max_pooling = tf.keras.Sequential()

#Convolução 1
lenet_max_pooling.add(tf.keras.layers.Conv2D(filters=6, kernel_size=(5, 5), strides=(1, 1), activation='relu', input_shape=input_shape))
#Avarage pooling 1
lenet_max_pooling.add(tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2)))
#Convolução 2
lenet_max_pooling.add(tf.keras.layers.Conv2D(filters=16, kernel_size=(5, 5), strides=(1, 1), activation='relu'))
#Avarage pooling 2
lenet_max_pooling.add(tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2)))
#flatten (converter os arrays bidimensionais para um vetor)
lenet_max_pooling.add(tf.keras.layers.Flatten())
#dense (fully connected)
lenet_max_pooling.add(tf.keras.layers.Dense(units=120, activation='relu'))
#flatten
lenet_max_pooling.add(tf.keras.layers.Flatten())
#dense (fully connected)
lenet_max_pooling.add(tf.keras.layers.Dense(units=84, activation='relu'))
#output
lenet_max_pooling.add(tf.keras.layers.Dense(units=10, activation='softmax'))
#compilação do modelo
lenet_max_pooling.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy', 'mse'])

### Avarage Pooling:

In [12]:
#Lenet using avarage pooling
lenet_avg_pooling = tf.keras.Sequential()

#Convolução 1
lenet_avg_pooling.add(tf.keras.layers.Conv2D(filters=6, kernel_size=(5, 5), strides=(1, 1), activation='relu', input_shape=input_shape))
#Avarage pooling 1
lenet_avg_pooling.add(tf.keras.layers.AvgPool2D(pool_size=(2, 2), strides=(2, 2)))
#Convolução 2
lenet_avg_pooling.add(tf.keras.layers.Conv2D(filters=16, kernel_size=(5, 5), strides=(1, 1), activation='relu'))
#Avarage pooling 2
lenet_avg_pooling.add(tf.keras.layers.AvgPool2D(pool_size=(2, 2), strides=(2, 2)))
#flatten (converter os arrays bidimensionais para um vetor)
lenet_avg_pooling.add(tf.keras.layers.Flatten())
#dense (fully connected)
lenet_avg_pooling.add(tf.keras.layers.Dense(units=120, activation='relu'))
#flatten
lenet_avg_pooling.add(tf.keras.layers.Flatten())
#dense (fully connected)
lenet_avg_pooling.add(tf.keras.layers.Dense(units=84, activation='relu'))
#output
lenet_avg_pooling.add(tf.keras.layers.Dense(units=10, activation='softmax'))
#compilação do modelo
lenet_avg_pooling.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy', 'mse'])

## Multi-column Deep Neural Networks for Image Classification
Authors: Dan Ciresan, Ueli Meier, Jurgen Schmidhuber
https://arxiv.org/abs/1202.2745

I replicated the CNN from the article above and choose 2 different types of activation functions.

### ReLu:

In [13]:
ciresan_relu = tf.keras.Sequential()
ciresan_relu.add(tf.keras.layers.Conv2D(20, (3, 3), activation='relu', input_shape=input_shape))
ciresan_relu.add(tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2)))
ciresan_relu.add(tf.keras.layers.Conv2D(800, (5, 5), activation='relu', input_shape=input_shape))
ciresan_relu.add(tf.keras.layers.MaxPool2D(pool_size=(3, 3), strides=(3, 3)))
ciresan_relu.add(tf.keras.layers.Flatten())
ciresan_relu.add(tf.keras.layers.Dense(150, activation='relu', kernel_initializer='he_uniform'))
ciresan_relu.add(tf.keras.layers.Dense(10, activation='softmax'))
ciresan_relu.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy', 'mse'])

### Sigmoid:

In [14]:
ciresan_sigmoid = tf.keras.Sequential()
ciresan_sigmoid.add(tf.keras.layers.Conv2D(20, (3, 3), activation='sigmoid', input_shape=input_shape))
ciresan_sigmoid.add(tf.keras.layers.MaxPool2D(pool_size=(2, 2), strides=(2, 2)))
ciresan_sigmoid.add(tf.keras.layers.Conv2D(800, (5, 5), activation='sigmoid', input_shape=input_shape))
ciresan_sigmoid.add(tf.keras.layers.MaxPool2D(pool_size=(3, 3), strides=(3, 3)))
ciresan_sigmoid.add(tf.keras.layers.Flatten())
ciresan_sigmoid.add(tf.keras.layers.Dense(150, activation='sigmoid', kernel_initializer='he_uniform'))
ciresan_sigmoid.add(tf.keras.layers.Dense(10, activation='softmax'))
ciresan_sigmoid.compile(loss='categorical_crossentropy', optimizer='Adam', metrics=['accuracy', 'mse'])

# Training and Evaluation

## Training MNIST

In [15]:
dnn.fit(x_train, y_train, epochs=20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x2b022ddd250>

In [16]:
lenet_max_pooling.fit(x_train, y_train, epochs=20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x2b022f82bb0>

In [17]:
lenet_avg_pooling.fit(x_train, y_train, epochs=20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x2b0230aed30>

In [18]:
ciresan_relu.fit(x_train, y_train, epochs=20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x2b024722fa0>

In [19]:
ciresan_sigmoid.fit(x_train, y_train, epochs=20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x2b024729640>

## Evaluation MNIST

In [20]:
print('Simple DNN result:')
dnn_results_MNIST = dnn.evaluate(x_test, y_test)
print('LeNet Max Pooling result:')
lenet_max_pooling_results_MNIST = lenet_max_pooling.evaluate(x_test, y_test)
print('LeNet Avarage Pooling result:')
lenet_avg_pooling_results_MNIST = lenet_avg_pooling.evaluate(x_test, y_test)
print('Ciresan ReLu results:')
ciresan_relu_results_MNIST = ciresan_relu.evaluate(x_test, y_test)
print('Ciresan Sigmoid results:')
ciresan_sigmoid_results_MNIST = ciresan_sigmoid.evaluate(x_test, y_test)

Simple DNN result:
LeNet Max Pooling result:
LeNet Avarage Pooling result:
Ciresan ReLu results:
Ciresan Sigmoid results:


## Training Fashion MNIST

In [21]:
dnn.fit(fashion_x_train, fashion_y_train, epochs=20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x2b0231ed4c0>

In [22]:
lenet_max_pooling.fit(fashion_x_train, fashion_y_train, epochs=20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x2b025451f10>

In [23]:
lenet_avg_pooling.fit(fashion_x_train, fashion_y_train, epochs=20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x2b023229c70>

In [24]:
ciresan_relu.fit(fashion_x_train, fashion_y_train, epochs=20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x2b023248e20>

In [25]:
ciresan_sigmoid.fit(fashion_x_train, fashion_y_train, epochs=20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20

KeyboardInterrupt: 

## Evaluation Fashion MNIST

In [26]:
print('Simple DNN result:')
dnn_results_FASHION = dnn.evaluate(fashion_x_test, fashion_y_test)
print('LeNet Max Pooling result:')
lenet_max_pooling_results_FASHION = lenet_max_pooling.evaluate(fashion_x_test, fashion_y_test)
print('LeNet Avarage Pooling result:')
lenet_avg_pooling_results_FASHION = lenet_avg_pooling.evaluate(fashion_x_test, fashion_y_test)
print('Ciresan ReLu results:')
ciresan_relu_results_FASHION = ciresan_relu.evaluate(fashion_x_test, fashion_y_test)
print('Ciresan Sigmoid results:')
ciresan_sigmoid_results_FASHION = ciresan_sigmoid.evaluate(fashion_x_test, fashion_y_test)

Simple DNN result:
LeNet Max Pooling result:
LeNet Avarage Pooling result:
Ciresan ReLu results:
Ciresan Sigmoid results:


# Results summary