### LAB: DNN with keras and tensorflow

**Example 1: MNIST dataset (Digit recognition dataset)**

MNIST is available as a keras dataset (mnist)

Inputs: images of 28 x 28 pixels

Output: class representing the digit (10 classes, digitos 0-9)

60k images for training and 10k images for test


In [None]:
from tensorflow.keras.datasets import mnist
import numpy as np
import matplotlib.pyplot as plt

(train_images, train_labels), (test_images, test_labels)= mnist.load_data()

#Check data dimensions

?

### Data pre-processing

**Classification problem => Convert outputs into categorical variables**

In [None]:
from tensorflow.keras.utils import to_categorical

# Reshape to make the images into 1D vector  for each example
train_images = ?

#Standardize values /255
train_images = train_images.astype('float32') / 255

#Convert outputs into categorical variables 
train_labels = to_categorical(train_labels)

#Do the same for the test data
 ?

### Model structure (feedforward DNN)

In [None]:
from tensorflow.keras import models
from tensorflow.keras import layers

network = models.Sequential()

#One hidden layer with 512 neurons (RelU)
network.add(layers.Dense(512, activation='relu', 
                         input_shape=(28 * 28,)))

# Output layer with 10 neurons
# (softmax = 1 neuron for each class; one-hot encoding )
network.add(layers.Dense(10, activation='softmax'))


### DNN training

In [None]:
network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

#train the model (fit)
history= network.fit(train_images, train_labels, epochs=5, batch_size=128)

***history.history* is a dictionary, check what are the keys, extract their values and plot them. You are expected to get similar plots as below.** 

<img src="images/f1.jpg" style="width:250px;height:150px;">

<img src="images/f2.jpg" style="width:250px;height:150px;">

**Predict the outputs (with *network.predict*)  for test images with the trained DNN network**. 

In [None]:
test_preds = ?

# test_preds contains the class probabilities, what is its dimension ?

?

In [None]:
#(choose the class with max probabilities)
test_classes = np.argmax(network.predict(test_images), axis=-1)

#what is the dimension of test_classes ?
 ?

### Evaluate the trained model for test and train images

In [None]:
test_loss, test_acc = network.evaluate(test_images, test_labels, verbose =0)
print(test_loss, test_acc)

#Do the same for train data. Compare the results 
?

### Example 2:  DNN classifier for IMDB dataset

Dataset with texts of reviews in IMDB about movies; classified into 2 
classes: positive or negative

25k reviews for training + 25k for test; 

balanced – 50% positive and negative examples

IMDB is available as a keras dataset (imdb)

We will only consider the most common 10k words

In [None]:
from tensorflow.keras.datasets import imdb

#Use text dictionary with num_words=10000

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

In [None]:
#Check dimensions of train and test data, labels
?

#what is the content of train_data and train_labels ? 
?

#Print the content of the first review 
?
#The number of words in each review are different 
# How many words are in review 1, review 4

? # ANSWER: 218 words / #550 words 

#What is the class of the first review
?

**Transform the word indices back into words** 

In [None]:
print(max([max(sequence) for sequence in train_data]))

word_index = imdb.get_word_index()
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
decoded_review = ' '.join([reverse_word_index.get(i - 3, '?') for i in train_data[2]])
print(decoded_review)

In [None]:
def vectorize_sequences(sequences, dimension=10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1.
    return results

#One-hot encoding of the reviews (binary vectors)

x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)

#Output to binary (0 / 1 Classes) 
y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')

### Model structure (fully connected feedforward DNN)

Implements DNN  with 2 hidden layers; 

16 neurons in each hidden layers; RelU

Output layer – sigmoid – binary classification

In [None]:
from tensorflow.keras import models, layers

?
?
?
model = ?


### DNN Model training  for IMDB dataset

In [None]:
# Compile the model with optimizer (RMSprop), 
# loss function (binary_crossentropy ) and error metric (accuracy)

?

#Split train data into: 
# validation data (the first 10000 reviews in x_train)
# train data (the remaining reviews in x_train )

x_val = 
y_val = 

partial_x_train = 
partial_y_train = 

#Fit the model with partial train data and validation data 
#(check model.fit function of keras)


Plot results similar to the figures below. 

<img src="images/f3.jpg" style="width:250px;height:150px;">

<img src="images/f4.jpg" style="width:250px;height:150px;">

In [None]:
# Predict test data 
 ?

In [None]:
# Evaluate the model on train data. 
# what are the loss and the accuracy ?

?

# Do the same for test data and compare

?

#### Overfitting issues - change the number of hidden layer units

If you observe a significant difference (7%, 8%) between train and test accuracy, try different approaches. 

Change the number of hidden layer units (h). 

In [None]:
# For example, repeat all training and testing process with h = 4 or 8. 
hidden = 4  

#build the model
?
#fit the model
?
#plot train and val acc, train and val loss during the training
?
# evaluate the model, what are the train and test final accuracies 
?


### Overfitting issues - Regularization  L2

Apply L2, but it can be changed to L1 or L1+ L2: for example

regularizers.l1(0.001)

regularizers.l1_l2(l1=0.001, l2=0.001) 

In [None]:
from tensorflow.keras import regularizers

hidden = 16

model_l2 = models.Sequential()

model_l2.add(layers.Dense(hidden, activation='relu', 
  kernel_regularizer=regularizers.l2(0.001),input_shape=(10000,)))

model_l2.add(layers.Dense(hidden, activation='relu', 
  kernel_regularizer=regularizers.l2(0.001)))

model_l2.add(layers.Dense(1, activation='sigmoid'))

#Compile and fit the model 
?
#plot train and val acc, train and val loss during the training
?
# evaluate the model, what are the train and test final accuracies 
?


### Overfitting issues - Regularization  L1

Apply now Regularization  L1 for the same model and repeat the whole training and testing process. 

In [None]:
?

### Overfitting issues - dropout

In [None]:
model_dr = models.Sequential()
model_dr.add(layers.Dense(hidden, activation='relu', 
                          input_shape=(10000,)))
model_dr.add(layers.Dropout(0.5))

model_dr.add(layers.Dense(hidden, activation='relu'))
model_dr.add(layers.Dropout(0.5))

model_dr.add(layers.Dense(1, activation='sigmoid'))


#Compile and fit the model 
?
#plot train and val acc, train and val loss during the training
?
# evaluate the model, what are the train and test final accuracies 
?


### Overfitting issues -  EarlyStopping

In [None]:
from tensorflow.keras.callbacks import EarlyStopping

hidden = 16

model = models.Sequential()
model.add(layers.Dense(hidden, activation='relu', 
                       input_shape=(10000,)))
model.add(layers.Dense(hidden, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['acc'])

early = EarlyStopping(monitor='val_loss', min_delta=0, patience= 5, 
                      verbose= True, mode='auto')

callbacks = [early]
history_es = model.fit(partial_x_train,
                    partial_y_train,
                    epochs=20,
                    batch_size=512,
                    validation_data=(x_val, y_val),
                    callbacks = callbacks)

#plot train and val acc, train and val loss during the training
?
# evaluate the model, what are the train and test final accuracies 
?


### Overfitting issues -  Dropout & EarlyStopping

Aplly a combination of mechanisms to deal with overfitting.