# Exercise 03.1 - Keras Sequential Model 

Before you can start, you have to find a GPU on the system that is not heavily used by other users. Otherwise you cannot initialize your neural network.


**Hint:** the command is **nvidia-smi**, just in case it is displayed above in two lines because of a line break.

As a result you get a summary of the GPUs available in the system, their current memory usage (in MiB for megabytes), and their current utilization (in %). There should be six or eight GPUs listed and these are numbered 0 to n-1 (n being the number of GPUs). The GPU numbers (ids) are quite at the beginning of each GPU section and their numbers increase from top to bottom by 1.

Find a GPU where the memory usage is low. For this purpose look at the memory usage, which looks something like '365MiB / 16125MiB'. The first value is the already used up memory and the second value is the total memory of the GPU. Look for a GPU where there is a large difference between the first and the second value.

**Remember the GPU id and write it in the next line instead of the character X.**

In [1]:
# Change X to the GPU number you want to use,
# otherwise you will get a Python error
# e.g. USE_GPU = 4
USE_GPU = 1

In [2]:
!nvidia-smi

Wed Nov 10 11:30:44 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   65C    P8    30W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

### Choose one GPU

**The following code is very important and must always be executed before using TensorFlow in the exercises, so that only one GPU is used and that it is set in a way that not all its memory is used at once. Otherwise, the other students will not be able to work with this GPU.**

The following program code imports the TensorFlow library for Deep Learning and outputs the version of the library.

Then, TensorFlow is configured to only see the one GPU whose number you wrote in the above cell (USE_GPU = X) instead of the X.

Finally, the GPU is set so that it does not immediately reserve all memory, but only uses more memory when needed. 

(The comments within the code cell explains a bit of what is happening if you are interested to better understand it. See also the documentation of TensorFlow for an explanation of the used methods.)

In [3]:
# Import TensorFlow 
import tensorflow as tf

# Print the installed TensorFlow version
print(f'TensorFlow version: {tf.__version__}\n')

# Get all GPU devices on this server
gpu_devices = tf.config.list_physical_devices('GPU')

# Print the name and the type of all GPU devices
print('Available GPU Devices:')
for gpu in gpu_devices:
    print(' ', gpu.name, gpu.device_type)
    
# Set only the GPU specified as USE_GPU to be visible
tf.config.set_visible_devices(gpu_devices[0], 'GPU')

# Get all visible GPU  devices on this server
visible_devices = tf.config.get_visible_devices('GPU')

# Print the name and the type of all visible GPU devices
print('\nVisible GPU Devices:')
for gpu in visible_devices:
    print(' ', gpu.name, gpu.device_type)
    
# Set the visible device(s) to not allocate all available memory at once,
# but rather let the memory grow whenever needed
for gpu in visible_devices:
    tf.config.experimental.set_memory_growth(gpu, True)

TensorFlow version: 2.7.0

Available GPU Devices:
  /physical_device:GPU:0 GPU

Visible GPU Devices:
  /physical_device:GPU:0 GPU


# Introduction to the Keras Sequential Model

This notebook is the first part of the tutorial 'Introduction to Keras sequential model' and introduces the possibilities to create a sequential model in keras.

#### A typical Keras workflow includes the following steps:

-  configurating of the training data: *input tensors* and  *target tensors*   
-  defining a stack of layers - `the model`- that will encode the mapping from inputs to the target
-  configurate the learning process via the `compile()` function 
-  train the model on the labeled data by calling the `.fit`method ob the model object
-  evaluation of the model on the validation/test data
-  prediction on test data

Keras provides several possibilities (3 API styles) for creating a deep learning model:

- [Sequential API](https://www.tensorflow.org/guide/keras/sequential_model?hl=en) (simple, single -input, single -output, sequential layer stacks)
-  [Functional API](https://www.tensorflow.org/guide/keras/functional?hl=en) (multi-input, multi-output, arbitrary static graph configurations)
- *Model subclassing* (maximum flexibility, larger potential error surface)

In this notebook we will focus on the step of creating a neural network by using the functionality of TF keras Sequential API.

## Learning objectives
- understand the TF keras Sequential API and TF keras layers API
- learn to create a sequential model object 
- learn different methods to add layers to a model
- learn several types of layers implemented in TF keras layer API
- learn the required arguments for creating layers
- learn how to add non-linearities to the output of a layer


### Keras Sequential class

In this tutorial, we going to introduce the **keras sequential class**. It is appropriate for building models with simple topology, composed of a stack of layers, where each layer has exactely one input and one output.
In fact, you'll find that most of the neural networks that you work with, can be built using the sequential class. 

##### Imports
We will import `sequential class` from `tensorflow.keras.models` and the `dense layer` class from `tensorflow.keras.layers`.

In [4]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import  Flatten, Dense

## Creating a feed forward network (Method 1)

In the following example, we will create a model for solving  classification task, where the output can be assigned to one of 10 possible classes (categories).
The input to the model are images og size 28 x 28 x 1 pixels. 

**Sequential**:  Instantiate an `model`object using  [Sequential class](https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential) and pass in a **list** of keras **layers** It defines a SEQUENCE of layers in the neural network.

The core building block of neural networks is the `layer`, a data-processing module which you can conceive as a "filter" for data.  Most of deep learning really consists of chaining together simple layers which will implement a form of progressive "data distillation". A deep learning model is like a sieve for data processing, made of a succession of increasingly refined data filters -- the  `layers`.



The model will consist of:

 * 1 reshaping operation
 * 1 fully connected layer (the hidden layer), containing 64 neurons and has a ReLU activation
 * 1 fully connected layer (the output layer), containing 10 neurons (the number of classes) and has softmax activation


The possible types of layers to build a NN are defined [here](https://www.tensorflow.org/api_docs/python/tf/keras/layers). <br>
**Flatten**:  Flatten just takes a multidimensional tensor and turns it into a 1 dimensional set.

**Dense**: Adds a layer of neurons. **A fully connected layer** in keras is implemented by the class `tensorflow.keras.layers.Dense`. To instantiate a [Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) layer object we need to pass to the constructor as required key word argument `units` a value for the numbers of neurons in the layer. Aditionally, we have to specify the type of activation e.g. 'relu', per default the activation is `None`, so no non-linearity will be applied unless we specify one.
**ReLU** effectively means "If X>0 return X, else return 0" -- so what it does it it only passes values 0 or greater to the next layer in the network.

**Softmax** takes a set of values, and picks the biggest one, so, for example, if the output of the last layer looks like [0.1, 0.1, 0.05, 0.1, 9.5, 0.1, 0.05, 0.05, 0.05], it turns it into [0,0,0,0,1,0,0,0,0] 



#### Specifying the input shape

We specify explicitly the shape of the input data  by passing pass an **input_shape argument** ONLY TO THE  **first layer**. 

In [5]:
model = Sequential([
    Flatten(input_shape = (28,28)), #(784,)
    Dense(units = 64, activation = 'relu'),
    Dense(units = 10, activation = 'softmax')
])

Here we are specifying that each input data example will be a one-dimensional vector of size 784.
Because the input shape is provided, the weights and biases will be created and initialized straight away.
##### The usage of a Flatten layer </br>
The flatten layer has no parameters and just has the role to unroll a higher dimenssional tensors (e.g. images) to a one dimenssional vector. Commonly is used for feeding a higher dimenssional tensors to a fully connecetd layer (dense layer). </br>
Here, we have images/tensors of shape (28,28) which are converted it into a long one-dimensional vector of size 784 before sending it through to the first dense layer. </br>
To check the existence of the weights and biases you can call the attribute `weights` on the object` model` or the method summary() as in the following cell:


In [6]:
# model.weights

In [7]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense (Dense)               (None, 64)                50240     
                                                                 
 dense_1 (Dense)             (None, 10)                650       
                                                                 
Total params: 50,890
Trainable params: 50,890
Non-trainable params: 0
_________________________________________________________________


## Creating a feed forward network (Method 2)

Instead of passing in a list of layers, you can call the `.add()` method on the created model instance to append additional layers to the model.
The model in the following cell is an equivalent model to the one we build before.
This method is useful for adding layers one at a time, when building code to depend on some conditions or loops.

In [8]:
model = Sequential()
model.add(Flatten(input_shape = (28,28)))
model.add(Dense(64, activation = 'relu'))
model.add(Dense(units = 10, activation = 'softmax'))

In [9]:
model.count_params()

50890

In [10]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_1 (Flatten)         (None, 784)               0         
                                                                 
 dense_2 (Dense)             (None, 64)                50240     
                                                                 
 dense_3 (Dense)             (None, 10)                650       
                                                                 
Total params: 50,890
Trainable params: 50,890
Non-trainable params: 0
_________________________________________________________________


You'll notice that the first dimension of every tensor has a value `None`. And that's because the first dimension will always be the `batch size`. Because we can feed any number of training examples when training the model, the batch_size is unknown at this moment (flexible). Tensorflow represents this with the value `None` in the tensor shape. 

#### Observation:
We specifed the activation as readable strings, e.g, 'relu', 'softmax'. However, these redeadable strings are references   TF objects or functions: [tf.keras.layers.ReLU](https://www.tensorflow.org/api_docs/python/tf/keras/layers/ReLU) and 
 [tf.keras.layers.softmax](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Softmax)
 

#### Exercise: Exploring the Sequential model

    1) You can substitute the Softmax activation function in the last dense layer with a new Softmax layer. Run run `model.summary`, to check that the output is identical as the previous one.
    2 ) In addition, you can try to define a name for each layer by providing a string value to the kew-word argument `name` when creating a new layer e.g. name = 'first_layer'
    3) Change the activations of the dense layers to hyperbolic tangent or the sigmoid function

In [11]:
model = Sequential()
model.add(Flatten(input_shape = (28,28)))
model.add(Dense(64, activation = 'tanh', name="first_layer"))
model.add(Dense(10, activation='tanh', name="second_layer"))
model.add(tf.keras.layers.Softmax())

In [12]:
model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_2 (Flatten)         (None, 784)               0         
                                                                 
 first_layer (Dense)         (None, 64)                50240     
                                                                 
 second_layer (Dense)        (None, 10)                650       
                                                                 
 softmax (Softmax)           (None, 10)                0         
                                                                 
Total params: 50,890
Trainable params: 50,890
Non-trainable params: 0
_________________________________________________________________
