# <center>Artificial Neural Networks</center>

<center>This notebook is a part of teaching material for CS-EJ3311 - Deep Learning with Python</center>
<center>Aalto University (Espoo, Finland)</center>
<center>fitech.io (Finland)</center>

# <center>Step1. Data</center>

[Fashion-MNIST ](https://www.tensorflow.org/datasets/catalog/fashion_mnist) dataset consists of data points representing articles of an online shop. Each article is characterized by a $28 \times 28$ pixels grayscale image. Moreover, each article is associated with a label $y$ that indicates to which of $10$ classes (or product categories) this article belongs. 

In [None]:
#@title  Import Python libraries

import numpy as np                  # library for numerical computations (vectors, matrices, tensors)
import pandas as pd                 # library for handling data
import matplotlib.pyplot as plt     # library providing tools for plotting data 
import tensorflow as tf             # end-to-end open source platform for deep learning
from tensorflow.keras import layers # layers are the basic building blocks of neural networks in Keras

In [None]:
#@title Load Data

# load dataset
from tensorflow.keras.datasets import fashion_mnist
(trainval_images, trainval_labels), (test_images, test_labels) = fashion_mnist.load_data()

# shape of train and test image
print(f'Number of training and validation examples {trainval_images.shape}')
print(f'Number of test examples {test_images.shape}')

# the label values are stored as integer numbers, in the range [0, 9]
# these numeric labels correspond to the classes of clothing items the image represent:

labels = np.unique(test_labels)

# map the numeric label values to class names
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal',      'Shirt',   'Sneaker',  'Bag',   'Ankle boot']

# display numeric label and corresponding class name 
print('label value \t category')
for class_name, label in zip(class_names, labels):
    print (f'{label} \t\t {class_name}')

In [None]:
#@title Visualize first 10 data points (images)

plt.figure(figsize=(10,10))
i = 0
for (image, label) in zip(test_images[:10],test_labels[:10]):
    plt.subplot(5,5,i+1)
    plt.xticks([]) # remove ticks on x-axis
    plt.yticks([]) # remove ticks on y-axis
    plt.imshow(image, cmap='binary') # set the colormap to 'binary' 
    plt.xlabel(class_names[label])
    i += 1
plt.tight_layout()    
plt.show()

In [None]:
#@title Illustrate features (grayscale values of each pixel) of one data point

# select the image to visualize
img = test_images[0]
# create figure and axis objects
fig, ax = plt.subplots(1,1,figsize = (10,10)) 
# display image
ax.imshow(img, cmap='gray')
width, height = img.shape
# this value will be needed in order to change the color of annotations
thresh = img.max()/2.5

# display grayscale value of each pixel
for x in range(width):
    for y in range(height):
        val = (img[x][y])
        ax.annotate(str(val), xy=(y,x),
                    horizontalalignment='center',
                    verticalalignment='center',
                    # if a pixel is black set the color of annotation as white
                    color='white' if img[x][y]<thresh else 'black')
plt.show()

In [None]:
#@title Preprocess data

# Choose only a subset for training data in order to reduce training time:

# select subset of trainval_images and trainval_labels
X_trainval = trainval_images[:16000]
y_trainval = trainval_labels[:16000]

# select whole test set
X_test = test_images
y_test = test_labels

#  Reshape feature matrices

X_trainval = X_trainval.reshape(-1, 28 * 28)
X_test = test_images.reshape(-1, 28 * 28)

# When training the ANN it is a good practice to normalize the input values so that they are between 0 and 1, in our case, the pixel values.
# Let's transform feature values of type uint8 in a range [0, 255] to feature values of type float in the range [0, 1]:

# Normalize data to have feature values between 0 and 1
X_trainval = X_trainval/ 255.0
X_test = X_test/ 255.0 

# <center>Step 2. Define ANN Structure</center>

The ANN reads in the features of a shop item, which are the grayscale values of the item picture. The output of the ANN are probabilities, or degrees of belonging, for each of the ten different categories. 

In particular, we will use an ANN with an arbitrary architecture:

- one input layer 
- one dense layer with 128 units and ReLU activation function
- one dense output layer with 10 units and softmax activation function

The first layer consists of the individual features and is the entry point to the ANN. The first layer is connected to a dense layer with 128 neurons with the ReLU activation function. This hidden layer is then followed by the final output layer with ten neurons and a softmax activation function. 

The output layer with 10 neurons (corresponding to 10 classes) returns probabilities of belonging to a certain class. Note, that output values of all 10 neurons always sum to one. This is the result of using the softmax activation function in the last layer. For example, if the output of a first neuron (out of 10) is close to 1, we are confident in assigning the data point to class $1$. 

In [None]:
#@title Define model architecture

model = tf.keras.Sequential([
    # hidden layers
    layers.Dense(128, activation='relu',input_shape=(784,), name='Hidden_layer'),
    # output layer
    layers.Dense(10, activation='softmax',  name='Output_layer')
])

We can call the `summary()` method on the model to see information about it. This method provides the following textual information:
- The number of parameters (weights) in each layer.
- The total number of parameters (weights) in the model.
- The layers and their order in the model.
- The output shape of each layer. 

In [None]:
model.summary()

The textual model summary above is useful for simple models. However, this textual summary can be difficult to interpret for ANN structures with multiple inputs or outputs. To this end, `tf.keras` provides a function called `plot_model()` to create a graphical summary of the ANN that might be easier to understand.

In [None]:
#@title Plot model

tf.keras.utils.plot_model(
    model,
    show_shapes=True, 
    show_layer_names=True
)

# <center>Step 3. Choose Loss Function and Optimizer</center>

In [None]:
#@title Compile the model
model.compile(optimizer='RMSprop',
              loss='sparse_categorical_crossentropy',
              metrics=['sparse_categorical_accuracy'])

# <center>Step 4. Training - Adjusting ANN Weights </center>

In [None]:
#@title Train the model

history = model.fit(X_trainval, y_trainval, validation_split=0.2, batch_size=32, epochs=20, verbose=1)

In [None]:
#@title Plot training log

pd.DataFrame(history.history).plot(figsize=(10,5))
plt.grid(True)
plt.xlabel('epoch', fontsize=14)
plt.show()

In [None]:
#@title Evaluation on Test Set

test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print('Accuracy on test dataset:', test_accuracy)