<a href="https://colab.research.google.com/github/schmelto/machine-learning/blob/main/Deeplearning/activation_functions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Activation Functions

## Import the necessary libraries

We want to use Tensorflow Version 2.0 and therefore state this specifically.



In [None]:
%tensorflow_version 2.x

TensorFlow 2.x selected.


In [None]:
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

## Load of the MNIST-Dataset

In [None]:
(train_images, train_labels), (test_images, test_labels) = keras. \
  datasets.mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


We remember that the pixel values are not yet in normalized form. So we normalize this first by dividing by the maximum pixel value 255:

In [None]:
train_images = train_images / 255.0
test_images = test_images / 255.0

* The picture with the handwrited 5 has the label 5.
* We want to have the lable as a Vektor [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] which is in the necessary  for our network. This vektor has now on position 5 (starting at 0) a 1.

In [None]:
total_classes = 10
train_vec_labels = keras.utils.to_categorical(train_labels, total_classes)
test_vec_labels = keras.utils.to_categorical(test_labels, total_classes)

## Design of the networks

Now we have normalized the input data and the labels are available as vectors. So we can finally start building the networks for recognizing the handwritten numbers!

We want to define a very simple network with 3 layers (input layer, hidden layer and output layer):

* We use a keras.layers.Flatten layer as the input layer, which distributes the 28x28 matrices that we receive as inputs to 28x28 = 784 neurons
* Next, we use a keras.layers.Dense layer with 128 neurons for the hidden layer
* We use a keras.layers.Dense layer with 10 neurons as the output layer, since we want to recognize 10 classes (digits from 0-9)

We define the individual networks with the different activation functions so that we can then compare them with one another.

In [None]:
# model_no_activation = keras.Sequential([
#     keras.layers.Flatten(input_shape=(28, 28)),
#     keras.layers.Dense(128), # , activation='sigmoid'),
#     keras.layers.Dense(10), #, activation='sigmoid')
# ])

model_relu = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='sigmoid')
])

model_linear = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation='linear'),
    keras.layers.Dense(10, activation='linear')
])

model_sigmoid = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation='sigmoid'),
    keras.layers.Dense(10, activation='sigmoid')
])

model_tanh = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation='tanh'),
    keras.layers.Dense(10, activation='tanh')
])

models = [model_relu, model_linear,
          model_sigmoid,model_tanh]

## Compiling the networks

After we have defined our networks, we need to *compile* them before we can begin training.

In this step we define important parameters for the training phase:
- The **Optimizer** is the learning algorithm used in training to improve the network. In the last week we already got to know *Gradient Descent* and its optimization *Stochastic Gradient Descent* (SGD).
- The **Loss** is the cost function used. The aim during training is to minimize this.
- The **metrics** are the metrics evaluated during training. For all classification problems, we are interested in the "accuracy".

In this example we use
- The *Stochastic Gradient Descent* (`"sgd"`) learning algorithm as our optimizer.
- The `"mean_squared_error"` cost function, which, compared to the normal *Squared Error* cost function, does not calculate the sum but the mean of the errors of the output neurons.

In [None]:
[
  model.compile(
      optimizer='sgd',
      loss='mean_squared_error',
      metrics=['accuracy']
  ) for model in models
]

[None, None, None, None]

## Training the networks

Now we can finally train our network. For this we use the `fit` method and transfer our training images as inputs with the associated labels as desired outputs. The number of `epochs` indicates how often the network gets to see the entire training set. If we increase the number of epochs, we let our network learn longer.

In [None]:
epochs=15
[
 model.fit(
    train_images,
    train_vec_labels,
    epochs=epochs,
    verbose=True
  ) for model in models
]

Train on 60000 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Train on 60000 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Train on 60000 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Train on 60000 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


[<tensorflow.python.keras.callbacks.History at 0x7f74304cfe80>,
 <tensorflow.python.keras.callbacks.History at 0x7f74303c8a58>,
 <tensorflow.python.keras.callbacks.History at 0x7f74303017b8>,
 <tensorflow.python.keras.callbacks.History at 0x7f743023f198>]

## Evaluate the networks

So far, the network has only seen training images and learned from them. The aim is to use our networks to recognize new images of handwritten numbers. For this purpose, there is the test data with which we now want to check our networks for accuracy in the case of unseen data.

In [None]:
_, result_relu = model_relu.evaluate(test_images, test_vec_labels)
_, result_linear = model_linear.evaluate(test_images, test_vec_labels)
_, result_sigmoid = model_sigmoid.evaluate(test_images, test_vec_labels)
_, result_tanh = model_tanh.evaluate(test_images, test_vec_labels)



## Clear presentation of the results

In [None]:
from prettytable import PrettyTable
tbl = PrettyTable()
tbl.field_names = ["Activation function", f"Accurracy (after {epochs} epochs)"]
tbl.add_row(["Tanh", result_tanh])
tbl.add_row(["model_relu", result_relu])
tbl.add_row(["Linear", result_linear])
tbl.add_row(["Sigmoid", result_sigmoid])
print(tbl)

+---------------------+-----------------------------+
| Activation function | Accurracy (after 15 epochs) |
+---------------------+-----------------------------+
|         Tanh        |            0.905            |
|      model_relu     |            0.882            |
|        Linear       |            0.8577           |
|       Sigmoid       |            0.6573           |
+---------------------+-----------------------------+
