# Deep Learning: Ex.4 - Training Networks

Submitted by: [... **name & ID** ...]


In [None]:
# TensorFlow and tf.keras
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D, MaxPooling2D

from tensorflow.keras.layers import Input, Dropout, BatchNormalization  # <-- new layers!
from tensorflow.keras.layers import RandomFlip, RandomRotation, RandomZoom, RandomTranslation # <-- new layers!

# Helper libraries
import numpy as np
import matplotlib.pyplot as plt

from sklearn.metrics import confusion_matrix
from seaborn import heatmap 

print(tf.__version__)

### Load the CIFAR-10 Dataset

We will use the same CIFAR-10 dataset as in Ex.3:

In [None]:
# 1. load/download the data
(train_images, train_labels), (test_images, test_labels) = keras.datasets.cifar10.load_data()

# 2. flatten the labels (easier to deal with)
train_labels = train_labels.flatten()  # (50000, 1) -> (50000,)
test_labels = test_labels.flatten()    # (10000, 1) -> (10000,)

# 3. convert uint8->float32 and normalize range to 0.0-1.0 
train_images = train_images.astype('float32') / 255.0
test_images = test_images.astype('float32') / 255.0

# 4. define the 10 classes names
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer','dog', 'frog', 'horse', 'ship', 'truck']

# 5. print the shapes
print('train_images.shape =',train_images.shape)
print('train_labels.shape =',train_labels.shape)
print('test_images.shape =',test_images.shape)
print('test_labels.shape =',test_labels.shape)


***

## 1. Testing SGD options

Our basic model will be the same as in Ex.3:

- `Input` layer
- 32 3x3-`Conv2D` + 2x2 `MaxPooling` 
- 64 3x3-`Conv2D` + 2x2 `MaxPooling` 
- 128 3x3-`Conv2D` + 2x2 `MaxPooling` 
- 128-`Dense` 
- 10-`Dense` - output layer 


Prepeare a function that returns this model (without the `compile` part, just the layers)



In [None]:
def make_basic_model():
    # build out basic model
    the_model = Sequential()
    the_model.add(Input(shape=(32,32,3)))
    the_model.add(Conv2D(32, (3,3), activation='relu', padding='same'))
    the_model.add(MaxPooling2D((2,2)))
    the_model.add(Conv2D(64, (3,3), activation='relu', padding='same'))
    the_model.add(MaxPooling2D((2,2)))
    the_model.add(Conv2D(128, (3,3), activation='relu', padding='same'))
    the_model.add(MaxPooling2D((2,2)))
    the_model.add(Flatten())
    the_model.add(Dense(128, activation='relu'))
    the_model.add(Dense(10, activation='softmax'))
    return the_model


We will train the same model, each time using a different optimizer:
- SGD with `learning_rate = 0.01`  (the default value)
- SGD with `learning_rate = 0.001` 
- SGD with `learning_rate = 0.1`
- SGD with `learning_rate = 0.01` and `momentum = 0.9`


In order to train each model from scratch (and not to continue training the same model again and again), create a new model each time (m1, m2, m3, m4).

Also, use a **different variable** to record the `history` of the training results (h1, h2, h3, h4).

Train each model for 100 epochs with a batch size of 64 (remember to use a GPU), and plot the usual graphs (loss&accuracy for train&test).



####  SGD with learning_rate = 0.01


In [None]:
        ################################
        ###  your code goes here...  ###
        ################################

#### SGD with learning_rate = 0.001 



In [None]:
        ################################
        ###  your code goes here...  ###
        ################################

#### SGD with learning_rate = 0.1 


In [None]:
        ################################
        ###  your code goes here...  ###
        ################################

#### SGD with learning_rate = 0.01 and momentum = 0.9


In [None]:
        ################################
        ###  your code goes here...  ###
        ################################

#### Graphical comparison 

Finally, In a single graph, plot together the training loss curves of all 4 runs (use different color for each plot, and add a proper legend).


In [None]:
        ################################
        ###  your code goes here...  ###
        ################################

---
## 2. Add Dropout

In order to overcome the over-training we will add `dropout` layers: one `dropout` layer before each of the `Dense` layers. Use a 20% dropout rate.

Pick your favorite SGD optimizer and train the network for 50 Epochs. 

- Verify that you get better results in terms of over-training (less over-training is better..)

- Did you also get a better accuracy on the validation data?

In [None]:
        ################################
        ###  your code goes here...  ###
        ################################

---
## 3. Add Data Augmentation

Add 2-3 layers of data augmentation (of your choice) to the previous model (with the dropout).

- Train the model (50 epoch)

- Did you get better results?


In [None]:
        ################################
        ###  your code goes here...  ###
        ################################

---
## 4. Deeper model

Finally, lets make the model a bit deeper, doubling each `Conv2D` layer.
We will also add `BatchNormalization` layers in between, to help the learning converge faster

- 32 3x3-`Conv2D` + BatchNormalization()` + 32 3x3-`Conv2D` + BatchNormalization()` + 2x2 `MaxPooling` 
- 64 3x3-`Conv2D` + BatchNormalization()` + 64 3x3-`Conv2D` + BatchNormalization()` + 2x2 `MaxPooling` 
- 128 3x3-`Conv2D` + BatchNormalization()` + 128 3x3-`Conv2D` + BatchNormalization()` + 2x2 `MaxPooling` 

Train this model (including the data augmentation and drop-out) and plot the usual graphs.

(Hopefully, you should get close to 90% accuracy..)

In [None]:
        ################################
        ###  your code goes here...  ###
        ################################

***
## Good Luck!