# Feed-forward neural networks, Autoencoders and CNNs

This Jupyter notebook will cover the topics:

* Feed-forward neural network 
* Autoencoder
* Convolutional neural network

We're going to use the Keras module from TensorFlow package. TensorFlow is an open-source library for machine learning and deep learning created by Google.

At first we're importing the mnist dataset (large database of handwritten digits) and preparing it for the neural network. That includes normalizing the images (giving each pixel the value between 0 and 1) and flattening it
(turning (28, 28) matrix into (1, 784) array)

In [None]:
import numpy as np
import mnist
from matplotlib import pyplot as plt
from tensorflow.keras.models import Sequential,Model
from tensorflow.keras.layers import Dense, Dropout, Input
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.optimizers import Adam

# Load the data
train_images = mnist.train_images()
train_labels = mnist.train_labels()
test_images = mnist.test_images()
test_labels = mnist.test_labels()

# Normalize the images
train_images = (train_images / 255) 
test_images = (test_images / 255)

# Flatten the images
train_images = train_images.reshape((-1, 784))
test_images = test_images.reshape((-1, 784))

**Task:** Complete the code below to build a 3-layer model with the following conditions:
* the activation function for both hidden layers is **sigmoid**
* each hidden layer has **64 units**
* the activation function for the output layer is **softmax**

*Notes:* 
* input_shape parameter is only needed in the first hidden layer
* number of output units should be equal to the number of classes that we want to classify (in our case - 10 digits)



In [None]:
# Define variables
input_num_units = 784
hidden_units = 64 
output_units = 10 

# Build the model
model = Sequential([
    Dense(hidden_units, activation='sigmoid', input_shape=(input_num_units,)),
    Dense(hidden_units, activation='sigmoid'), 
    Dense(output_units, activation='softmax'), 
])

# Compile the model
model.compile(
    optimizer='rmsprop',
    loss='categorical_crossentropy', # using this loss function because we have more than 2 classes to classify
    metrics=['accuracy'],
)

**Task:** Copy your model from the cell above and add 2 Dropout layers (50% Dropout rate for each Dropout layer) between the actual layers


In [None]:
# Define variables
input_num_units = 784
hidden_units = 64 
output_units = 10
dropout_rate = 0.5

# Build the model
model_drop = Sequential([
    Dense(hidden_units, activation='sigmoid', input_shape=(input_num_units,)),
    Dropout(dropout_rate), 
    Dense(hidden_units, activation='sigmoid'), 
    Dropout(dropout_rate),
    Dense(output_units, activation='softmax'),
])

# Compile the model
model_drop.compile(
    optimizer='rmsprop',
    loss='categorical_crossentropy',
    metrics=['accuracy'],
)

In [None]:
print("Model without dropout layers.\n")

# Train the model without dropout layers
history = model.fit(
  train_images,
  to_categorical(train_labels),
  epochs=5,
  batch_size=100,
)

print("\nModel with 2 dropout layers.\n")

# Train the model with dropout layers
history_drop = model_drop.fit(
  train_images,
  to_categorical(train_labels),
  epochs=5,
  batch_size=100,
)

As you can see, in our case the model with dropout layers has slower convergence than a model without them.
We're going to plot the results for both models.

In [None]:
# Plot the loss function for both models
fig, (ax1,ax2) = plt.subplots(1, 2, figsize=(20,6))
ax1.plot(np.sqrt(history.history['loss']), 'r', label='without_drop')
ax1.plot(np.sqrt(history_drop.history['loss']), 'b', label='with_drop')
ax1.set_xlabel(r'Epoch', fontsize=20)
ax1.set_ylabel(r'Loss', fontsize=20)
ax1.legend()
ax1.tick_params(labelsize=20)

# Plot the accuracy for both models
ax2.plot(np.sqrt(history.history['accuracy']), 'r', label='without_drop')
ax2.plot(np.sqrt(history_drop.history['accuracy']), 'b', label='with_drop')
ax2.set_xlabel(r'Epoch', fontsize=20)
ax2.set_ylabel(r'Accuracy', fontsize=20)
ax2.legend()
ax2.tick_params(labelsize=20)

In [None]:
# Evaluate model without dropout layers
model.evaluate(
  test_images,
  to_categorical(test_labels)
)

# Evaluate model with dropout layers
model_drop.evaluate(
  test_images,
  to_categorical(test_labels)
)

**Question:** What is the difference between the accuracy values you see in the plot above and the accuracy values you received just now?

## Autoencoders

At first we're going to show you the structure of the simple autoencoder. It has only 1 layer for encoding and 1 layer for decoding.

We're using the same mnist dataset for this task. The autoencoder is a form of unsupervised learning, that's why we don't need labels for the images.

In [None]:
# Define variables
encoding_dim = 32 # size of the encoded representation
input_dim = 784 # size of the input

input_img = Input(shape=(784,)) # input image

# Maps input image to its encoded representation
encoded = Dense(encoding_dim, activation='relu')(input_img)

# Maps encoded representation to its reconstruction
decoded = Dense(input_dim, activation='sigmoid')(encoded)

# Maps input image to its reconstruction
simple_autoencoder = Model(input_img, decoded)

Now we're training our simple autoencoder. The next cell takes some time to be executed. 

In [None]:
# Compile the autoencoder
simple_autoencoder.compile(
    optimizer='adam', 
    loss='binary_crossentropy')

# Train the autoencoder
simple_autoencoder.fit(train_images, train_images,
                epochs=25,
                batch_size=256,
                shuffle=True,
                validation_data=(test_images, test_images))

Now we're building a more complex autoencoder with mutiple layers for encoding and decoding.

Structure of the single layer looks like :

current_layer_output = Dense(num_dimensions, activation = "...")(previous_layer_output)

**Task:** Build the autoencoder with 3 layers for encoding and 3 layers for decoding using 'relu' as activation function for all layers except the last decoding layer, for which we're using 'sigmoid'.

*Note:* 
* num_dimension for the last encoded layer should be equal to the desired size of the encoded representation (we use 32 as in the last example)
* num_dimension for the last decoded layer should be equal to the original image size (in our case - 784)
* you can use the following dimension reduction in your encoding layers = (128, 64, 32)
* you can use the following dimension restoring in your decoding layers = (64, 128, 784)

In [None]:
# Define variables
encoding_dim = 32 # size of the encoded representation
input_dim = 784 # size of the input

input_img = Input(shape=(784,)) # input image

# Maps input image to its encoded representation in multiple steps
encoded_1 = Dense(128, activation='relu')(input_img)
encoded_2 = Dense(64, activation='relu')(encoded_1) 
encoded_3 = Dense(32, activation='relu')(encoded_2) 

# Maps encoded representation to its reconstruction in multiple steps
decoded_1 = Dense(64, activation='relu')(encoded_3) 
decoded_2 = Dense(128, activation='relu')(decoded_1)
decoded_3 = Dense(784, activation='sigmoid')(decoded_2) 

# Maps an input image to its reconstruction
deep_autoencoder = Model(input_img, decoded_3)

We're training our deep autoencoder. This cell also takes time to be executed.

In [None]:
deep_autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

deep_autoencoder.fit(train_images, train_images,
                epochs=25,
                batch_size=256,
                shuffle=True,
                validation_data=(test_images, test_images))

Now we're going to compare the original images with the result images of our two autoencoders. 
The result images of simple and deep autoencoders look very similar, because we didn't do extra processing of the images but only added some layers in the deep_autoencoder.

We can still see, that the value of the loss function has improved.

In [None]:
simple_encoded_imgs = simple_autoencoder.predict(test_images) # images encoded with simple_autoencoder
deep_encoded_imgs = deep_autoencoder.predict(test_images) # images encoded with deep_autoencoder

n = 10 # number of images we want to check

plt.figure(figsize=(20, 6))
for i in range(1, n + 1):
    # Display original
    ax = plt.subplot(3, n, i)
    plt.imshow(test_images[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
    
    # Display reconstruction
    ax = plt.subplot(3, n, i + n)
    plt.imshow(simple_encoded_imgs[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
    
    # Display reconstruction
    ax = plt.subplot(3, n, i + n + n)
    plt.imshow(deep_encoded_imgs[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

plt.show()

## Convolutional neural network

We're reusing mnist dataset. But this time we need to change the shape of the images after normalizing them.

In [None]:
train_images = mnist.train_images()
train_labels = mnist.train_labels()
test_images = mnist.test_images()
test_labels = mnist.test_labels()

# Normalize the images.
train_images = (train_images / 255)
test_images = (test_images / 255)

# Reshape the images.
train_images = np.expand_dims(train_images, axis=3) # now the images have the shape (28,28,1)
test_images = np.expand_dims(test_images, axis=3)

**Task:** Build a CNN using Sequential class from the Keras module

Your CNN should have the following layers:
* 2D convolution layer with 16 3x3 filters
* 2D max pooling layer with 2x2 pooling window
* 2D convolution layer with 16 3x3 filters
* fully connected layer with 'relu' activation function and 64 units (Dense layer)
* 2D max pooling layer with 2x2 pooling window
* Dropout layer with 50% Dropout rate
* fully connected layer with 'softmax' activation function and 10 units - output layer

*Note:* Don't forget to flatten your input before the output layer.

In [None]:
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten

# Define variables
num_filters = 16
filter_size = 3
pool_size = 2
hidden_units = 64 
output_units = 10
dropout_rate = 0.5

# Build the model.
model = Sequential([
    Conv2D(num_filters, filter_size, input_shape=(28, 28, 1)),
    MaxPooling2D(pool_size=pool_size),
    Conv2D(num_filters, filter_size),
    Dense(hidden_units, activation='relu'),
    MaxPooling2D(pool_size=pool_size),
    Dropout(dropout_rate),
    Flatten(),
    Dense(output_units, activation='softmax'),
])


**Task:** Compile your model using 'adam' optimizer, 'categorical_crossentropy' as loss function and only 'accuracy' in metrics.

In [None]:
# Compile the model.
model.compile(
  optimizer='adam',
  loss='categorical_crossentropy',
  metrics=['accuracy'],
)

In [None]:
# Train the model.
history_cnn=model.fit(
    train_images,
    to_categorical(train_labels),
    batch_size = 128,
    epochs=5,
    validation_data=(test_images, to_categorical(test_labels)),
)

In [None]:
model.evaluate(
  test_images,
  to_categorical(test_labels)
)

The accuracy of our CNN model is around 99%

We're going to plot the loss function and accuracy of the training and validation datasets. 

In [None]:
# Plot the loss function
fig, (ax1,ax2) = plt.subplots(1, 2, figsize=(20,6))
ax1.plot(np.sqrt(history_cnn.history['val_loss']), 'b', label='val')
ax1.plot(np.sqrt(history_cnn.history['loss']), 'g' ,label='train')
ax1.set_xlabel(r'Epoch', fontsize=20)
ax1.set_ylabel(r'Loss', fontsize=20)
ax1.legend()
ax1.tick_params(labelsize=10)

# Plot the accuracy
ax2.plot(np.sqrt(history_cnn.history['val_accuracy']), 'b', label='val')
ax2.plot(np.sqrt(history_cnn.history['accuracy']), 'g' ,label='train')
ax2.set_xlabel(r'Epoch', fontsize=20)
ax2.set_ylabel(r'Accuracy', fontsize=20)
ax2.legend()
ax2.tick_params(labelsize=10)

Now we want to compare our feed-forward neural network (without dropout layers) and convolutional neural network.

In [None]:
# Plot the loss function
fig, (ax1,ax2) = plt.subplots(1, 2, figsize=(20,6))
ax1.plot(np.sqrt(history.history['loss']), 'r', label='without_drop')
ax1.plot(np.sqrt(history_cnn.history['loss']), 'g' ,label='cnn')
ax1.set_xlabel(r'Epoch', fontsize=20)
ax1.set_ylabel(r'Loss', fontsize=20)
ax1.legend()
ax1.tick_params(labelsize=10)

# Plot the accuracy
#fig, ax = plt.subplots(1, 1, figsize=(10,6))
ax2.plot(np.sqrt(history.history['accuracy']), 'r', label='without_drop')
ax2.plot(np.sqrt(history_cnn.history['accuracy']), 'g' ,label='cnn')
ax2.set_xlabel(r'Epoch', fontsize=20)
ax2.set_ylabel(r'Accuracy', fontsize=20)
ax2.legend()
ax2.tick_params(labelsize=10)


As you can see CNN performs better but needs more time for training.