## Deep Neural Networks with Keras ##

In this exercise, you are going to build a set of deep learning models on a real world task using Tensorflow and Keras. Tensorflow is a deep learning framwork developed by Google, and Keras is a frontend library built on top of Tensorflow (or Theano, CNTK) to provide an easier way to use standard layers and networks.

To complete this exercise, you will need to build deep learning models for precipitation nowcasting. You will build a subset of the models shown below:
- Fully Connected (Feedforward) Neural Network
- Two-Dimentional Convolution Neural Network (2D-CNN)
- Recurrent Neural Network with Gated Recurrent Unit (GRU)

and one more model of your choice to achieve the highest score possible.

We provide the code for data cleaning and some starter code for keras in this notebook but feel free to modify those parts to suit your needs. You can also complete this exercise using only Tensorflow (without using Keras). Feel free to use additional libraries (e.g. scikit-learn) as long as you have a model for each type mentioned above.

This notebook assumes you have already installed Tensorflow and Keras with python3 and had GPU enabled. If you run this exercise on GCloud using the provided disk image you are all set.

As a reminder,

### Don't forget to shut down your instance on Gcloud when you are not using it ###

## Precipitation Nowcasting ##

Precipitation nowcasting is the the task of predicting the amount of rainfall in a certain region given some kind of sensor data.  The term nowcasting refers to tasks that try to predict the current or near future conditions (within 6 hours). 

You will be given satellite images in 3 different bands covering a 5 by 5 region from different parts of Thailand. In other words, your input will be a 5x5x3 image. Your task is to predict the amount of rainfal in the center pixel. You will first do the prediction using just a simple fully-connected neural network that view each pixel as different input features.

Since the your input is basically an image, we will then view the input as an image and apply CNN to do the prediction. Finally, we can also add a time component since weather prediction can benefit greatly using previous time frames. Each data point actually contain 5 time steps, so each input data point has a size of 5x5x5x3 (time x height x width x channel), and the output data has a size of 5 (time). You will use this time information when you work with RNNs.

Finally, we would like to thank the Thai Meteorological Department for providing the data for this assignment.

In [10]:
import os
import numpy as np
import pickle
import keras
from keras.models import load_model
import pandas as pd
import matplotlib.pyplot as plt
import urllib

Using TensorFlow backend.


# Data Explanation #

The data is an hourly measurement of water vapor in the atmosphere, and two infrared measurements of cloud imagery on a latitude-longitude coordinate. Each measurement is illustrated below as an image. These three features are included as different channels in your input data.

<img src="https://raw.githubusercontent.com/burin-n/pattern-recognition/master/HW4/images/wvapor.png" width="200"> <img src="https://raw.githubusercontent.com/burin-n/pattern-recognition/master/HW4/images/cloud1.png" width="200"> <img src="https://raw.githubusercontent.com/burin-n/pattern-recognition/master/HW4/images/cloud2.png" width="200">

We also provide the hourly precipitation (rainfall) records in the month of June, July, August, September, and October from weather stations spreaded around the country. A 5x5 grid around each weather station at a particular time will be paired with the precipitation recorded at the corresponding station as input and output data. Finally, five adjacent timesteps are stacked into one sequence.

The month of June-August are provided as training data, while the months of September and October are used as validation and test sets, respectively.


# Reading data

In [11]:
def read_data(months, data_dir='dataset'):
    features = np.array([], dtype=np.float32).reshape(0,5,5,5,3)
    labels = np.array([], dtype=np.float32).reshape(0,5)
    for m in months:
        filename = 'features-m{}.pk'.format(m)
        with open(os.path.join(data_dir,filename), 'rb') as file:
            features_temp = pickle.load(file)
        features = np.concatenate((features, features_temp), axis=0)
        
        filename = 'labels-m{}.pk'.format(m)
        with open(os.path.join(data_dir,filename), 'rb') as file:
            labels_temp = pickle.load(file)
        labels = np.concatenate((labels, labels_temp), axis=0)
    
    return features, labels

In [12]:
# use data from month 6,7,8 as training set
x_train, y_train = read_data(months=[6,7,8])

# use data from month 9 as validation set
x_val, y_val = read_data(months=[9])

# use data from month 10 as test set
x_test, y_test = read_data(months=[10])

print('x_train shape:',x_train.shape)
print('y_train shape:', y_train.shape, '\n')
print('x_val shape:',x_val.shape)
print('y_val shape:', y_val.shape, '\n')
print('x_test shape:',x_test.shape)
print('y_test shape:', y_test.shape)

x_train shape: (229548, 5, 5, 5, 3)
y_train shape: (229548, 5) 

x_val shape: (92839, 5, 5, 5, 3)
y_val shape: (92839, 5) 

x_test shape: (111715, 5, 5, 5, 3)
y_test shape: (111715, 5)


**features** 
- dim 0: number of entries
- dim 1: number of time-steps in ascending order
- dim 2,3: a 5x5 grid around rain-measued station
- dim 4: water vapor and two cloud imagenaries 

**labels**
- dim 0: number of entries
- dim 1: number of precipitation for each time-step

In [13]:
def normalize(X):
    mean = np.mean(X)
    var = np.var(X)
    return (X - mean) / var

In [14]:
x_train = normalize(x_train)
x_val = normalize(x_val)

# Three-Layer Feedforward Neural Networks

Below, the code for creating a 3-layers fully connected neural network in keras is provided. Run the code and make sure you understand what you are doing. Then, report the results.

In [15]:
# Dataset need to be reshaped to make it suitable for feedforword model
def preprocess_for_ff(x_train, y_train, x_val, y_val):
    x_train_ff = x_train.reshape((-1, 5*5*3))
    y_train_ff = y_train.reshape((-1, 1))
    x_val_ff = x_val.reshape((-1, 5*5*3))
    y_val_ff = y_val.reshape((-1, 1))
    return x_train_ff, y_train_ff, x_val_ff, y_val_ff

x_train_ff, y_train_ff, x_val_ff, y_val_ff = preprocess_for_ff(x_train, y_train, x_val, y_val)
print(x_train_ff.shape, y_train_ff.shape)
print(x_val_ff.shape, y_val_ff.shape)

(1147740, 75) (1147740, 1)
(464195, 75) (464195, 1)


In [16]:
from keras.layers import *
from keras.models import Model
from keras.optimizers import Adam

def get_feedforward_nn():    
    input1 = Input(shape=(75,))    
    x = Dense(200, activation='relu')(input1)    
    x = Dense(200, activation='relu')(x)
    x = Dense(200, activation='relu')(x)
    out = Dense(1)(x)

    model = Model(inputs=input1, outputs=out)
    model.compile(optimizer=Adam(0.001),
                loss='mse',
                metrics=['mse'])

    return model

In [17]:
from keras import backend as K
# This is called to clear the original model session in order to use TensorBoard
K.clear_session()

model_ff = get_feedforward_nn()
model_ff.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 75)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 200)               15200     
_________________________________________________________________
dense_2 (Dense)              (None, 200)               40200     
_________________________________________________________________
dense_3 (Dense)              (None, 200)               40200     
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 201       
Total params: 95,801
Trainable params: 95,801
Non-trainable params: 0
_________________________________________________________________


In [18]:
from keras.callbacks import ModelCheckpoint, TensorBoard, ReduceLROnPlateau

print('start training ff')

# Path to save model parameters
weight_path_model_ff ='model_ff_nn.h5'
# Path to write tensorboard
tensorboard_path_model_ff = 'Graphs/ff_nn'

callbacks_list_model_ff_nn = [
#     TensorBoard(log_dir=tensorboard_path_model_ff, histogram_freq=1, write_graph=True, write_grads=True),
    ModelCheckpoint(
            weight_path_model_ff,
            save_best_only=True,
            save_weights_only=True,
            monitor='val_loss',
            mode='min',
            verbose=1
        ),
    ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=2, min_lr=0.0001)
]

verbose = 1
epochs, batch_size = [10,1024]

model_ff.fit(x_train_ff, y_train_ff, epochs=epochs, batch_size=batch_size, verbose=verbose,
                callbacks=callbacks_list_model_ff_nn, validation_data=(x_val_ff, y_val_ff))

start training ff
Train on 1147740 samples, validate on 464195 samples
Epoch 1/10

Epoch 00001: val_loss improved from inf to 1.65876, saving model to model_ff_nn.h5
Epoch 2/10

Epoch 00002: val_loss did not improve from 1.65876
Epoch 3/10

Epoch 00003: val_loss did not improve from 1.65876
Epoch 4/10

Epoch 00004: val_loss did not improve from 1.65876
Epoch 5/10

Epoch 00005: val_loss did not improve from 1.65876
Epoch 6/10

Epoch 00006: val_loss did not improve from 1.65876
Epoch 7/10

Epoch 00007: val_loss did not improve from 1.65876
Epoch 8/10

Epoch 00008: val_loss did not improve from 1.65876
Epoch 9/10

Epoch 00009: val_loss did not improve from 1.65876
Epoch 10/10

Epoch 00010: val_loss did not improve from 1.65876


<keras.callbacks.History at 0x7f5025179e10>

In [19]:
################################################################################
# TODO#1:                                                                      #
# Write a function to evaluate your model. Your function must make prediction  #
# using the input model and return mean square error of the model.             #
#                                                                              #
# Hint: https://keras.io/models/model#evaluate                                 #
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################
def evaluate(features, labels, model):
    """
    Evaluate model on validation data
    """
    mse = model.evaluate(features, labels)[1]
    return mse

model_ff.save_weights('model_ff_nn_last.h5')
model_ff.load_weights('model_ff_nn.h5')
mse_ff = evaluate(x_val_ff, y_val_ff, model_ff)
print("Mean square error of feedforward:", mse_ff)

Mean square error of feedforward: 1.6587646545394852


In [20]:
# We will use majority rule as a baseline.
def majority_baseline(label_set):
    unique, counts = np.unique(label_set, return_counts=True)
    majority = unique[np.argmax(counts)]
    baseline = 0
    label_set = label_set.reshape(-1,1)
    for r in label_set:
        baseline += (majority - r) ** 2 / len(label_set)
    pass
    return baseline

In [21]:
print('baseline')
print('train', majority_baseline(y_train))
print('validate', majority_baseline(y_val))

baseline
train [1.94397725]
validate [1.6746546]


# (Optional) Tensorboard #
The code provided also have Tensorboard (a visualization tool that comes with Tensorflow). Note the part that calls it `TensorBoard(log_dir='./Graph/' + graph_name, histogram_freq=1, write_graph=True, write_grads=True)`. This tells Tensorflow to write extra outputs to the `log_dir` which can then be used for visualization.

To start tensorboard do
```
tensorboard --logdir=/full_path_to_your_logs
```
from the commandline. This will launch tensorboard, you will be able to access it from a web browser by pointint the url to `<instance-ip>:6006`. You will need to enable additional firewall rules in Gcloud for this.

** Make sure your logs path is in the second drive (under /data). Otherwise, your main disk will be full! **

In Tensorboard, you will be able to debug your computation graph which can be hard to keep track in code. This is might seem trivial in Keras, but it is very helpful for Tensorflow. You can see a visualization of the computation graph at the `GRAPH` tab. If you see multiple dense layers (more than 4), this is caused by running the code several times without deleting the log dir. Delete the log dir and re-run the code.

Next, let's look at the scalars tab, we can see the loss and accuracy on the training and validation set as they change over each epoch. This can be useful to detect overfitting.

Another useful tab is the histograms tab. This plot histograms of the weights, biases, and outputs of each layer. The depth of the histograms show the change over epochs. We can see how the histograms of weights change over the training peroid. This can be used to debug vanishing gradients or getting stuck in local minimas.

There are other useful tabs in Tensorboard, you can read about them in the Keras [documentation](https://keras.io/callbacks/#tensorboard) for tensorboard.

# Tensorboard observation #

**Optional TODO#1** Write your own interpretation of the logs from this example. A simple sentence or two for each tab is sufficient.

**Your answer:** 

# Dropout #

You might notice that the 3-layered feedforward does not use dropout at all. Now, try adding dropout to the model, run, and report the result again.

In [22]:
def get_fully_connected_with_dropout():    
    input1 = Input(shape=(75,))    
    x = Dropout(0.2)(input1)    
    x = Dense(200, activation='relu')(x)    
    x = Dropout(0.2)(x)    
    x = Dense(200, activation='relu')(x)
    x = Dropout(0.2)(x)    
    x = Dense(200, activation='relu')(x)
    x = Dropout(0.2)(x)    
    out = Dense(1)(x)

    model = Model(inputs=input1, outputs=out)
    model.compile(optimizer=Adam(0.005),
                loss='mse',
                metrics=['mse'])

    return model

In [23]:
from keras import backend as K
K.clear_session()

model_ff_dropout = get_fully_connected_with_dropout()
model_ff_dropout.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 75)                0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 75)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 200)               15200     
_________________________________________________________________
dropout_2 (Dropout)          (None, 200)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 200)               40200     
_________________________________________________________________
dropout_3 (Dropout)          (None, 200)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 200)               40200     
__________

**TODO#2** Train you model with dropout below

In [24]:
################################################################################
# TODO#3:                                                                      #
# Complete the code to train your dropout model                                #
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################
print('start training ff dropout')
# Path to save model parameters
weight_path_model_ff_dropout ='model_ff_nn_dropout.h5'
# Path to write tensorboard
tensorboard_path_model_ff_dropout = 'Graphs/ff_nn_dropout'

callbacks_list_model_ff_nn_dropout = [
#     TensorBoard(log_dir=tensorboard_path_model_ff_dropout, histogram_freq=1, write_graph=True, write_grads=True),
    ModelCheckpoint(
            weight_path_model_ff_dropout,
            save_best_only=True,
            save_weights_only=True,
            monitor='val_loss',
            mode='min',
            verbose=1
        ),
    ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=2, min_lr=0.0001)
]

verbose = 1
epochs, batch_size = [10,1024]

model_ff_dropout.fit(x_train_ff, y_train_ff, epochs=epochs, batch_size=batch_size, verbose=verbose,
                callbacks=callbacks_list_model_ff_nn_dropout, validation_data=(x_val_ff, y_val_ff))

start training ff dropout
Train on 1147740 samples, validate on 464195 samples
Epoch 1/10

Epoch 00001: val_loss improved from inf to 1.66075, saving model to model_ff_nn_dropout.h5
Epoch 2/10

Epoch 00002: val_loss improved from 1.66075 to 1.66065, saving model to model_ff_nn_dropout.h5
Epoch 3/10

Epoch 00003: val_loss did not improve from 1.66065
Epoch 4/10

Epoch 00004: val_loss improved from 1.66065 to 1.65997, saving model to model_ff_nn_dropout.h5
Epoch 5/10

Epoch 00005: val_loss did not improve from 1.65997
Epoch 6/10

Epoch 00006: val_loss did not improve from 1.65997
Epoch 7/10

Epoch 00007: val_loss did not improve from 1.65997
Epoch 8/10

Epoch 00008: val_loss did not improve from 1.65997
Epoch 9/10

Epoch 00009: val_loss did not improve from 1.65997
Epoch 10/10

Epoch 00010: val_loss did not improve from 1.65997


<keras.callbacks.History at 0x7f502449eb00>

In [25]:
################################################################################
# TODO#4:                                                                      #
# Complete the code to evaluate your dropout model                             #
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################
model_ff_dropout.save_weights('model_ff_nn_dropout_last.h5')
model_ff_dropout.load_weights('model_ff_nn_dropout.h5')
mse_ff_dropout = evaluate(x_val_ff, y_val_ff, model_ff_dropout)
print("Mean square error of model with dropout:", mse_ff_dropout)
print("The non dropout - dropout difference is:", mse_ff - mse_ff_dropout)

Mean square error of model with dropout: 1.6599748849463718
The non dropout - dropout difference is: -0.0012102304068866143


# A fork on the road

In the next Sections, we will discuss CNNs and GRUs. **PICK ONE** method to complete to finish the homework. If you do both methods, the other method counts as an optional task. Then, do the **Final Section**.

# Convolution Neural Networks
Now, you are going to implement you own 2d-convolution neural networks with the following structure.
```
_________________________________________________________________
Layer (type)                 Output Shape              Param
=================================================================
input_1 (InputLayer)         (None, 5, 5, 3)           0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 3, 3, 200)         5600      
_________________________________________________________________
flatten_1 (Flatten)          (None, 1800)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 200)               360200    
_________________________________________________________________
dense_2 (Dense)              (None, 200)               40200     
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 201       
=================================================================
Total params: 406,201
Trainable params: 406,201
Non-trainable params: 0
_________________________________________________________________
```
These parameters are simple guidelines to save your time.    
You can play with them in the final section which you can choose any normalization methods, activation function, as well as any hyperparameter the way you want.         

Hint: You should read keras documentation to see the list of available layers and options you can use.                         

In [26]:
################################################################################
# TODO#A1:                                                                     #
# Complete the code for preparing data for training CNN                        #
# Input for CNN should not have time step.                                     #
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################
def preprocess_for_cnn(x_train, y_train, x_val, y_val):
    x_train_cnn = x_train.reshape((-1, 5, 5, 3))
    y_train_cnn = y_train.reshape((-1, 1))
    x_val_cnn = x_val.reshape((-1, 5, 5, 3))
    y_val_cnn = y_val.reshape((-1, 1))
    return x_train_cnn, y_train_cnn, x_val_cnn, y_val_cnn

x_train_cnn, y_train_cnn, x_val_cnn, y_val_cnn = preprocess_for_cnn(x_train, y_train, x_val, y_val)
print(x_train_cnn.shape, y_train_cnn.shape)
print(x_val_cnn.shape, y_val_cnn.shape)

(1147740, 5, 5, 3) (1147740, 1)
(464195, 5, 5, 3) (464195, 1)


In [27]:
                    ################################################################################
                    # TODO#A2:                                                                     #
                    # Write a function that returns keras convolution nueral network model.        #
                    ################################################################################
                    #                            WRITE YOUR CODE BELOW                             #
                    ################################################################################
                    def get_conv2d_nn():
                        input1 = Input(shape=(5,5,3,))    
                        x = Conv2D(200, (3,3))(input1)
                        x = Flatten()(x)
                        x = Dense(200, activation='relu')(x)
                        x = Dense(200, activation='relu')(x)
                        out = Dense(1)(x)

                        model = Model(inputs=input1, outputs=out)
                        model.compile(optimizer=Adam(0.001),
                                    loss='mse',
                                    metrics=['mse'])

                        return model

In [28]:
################################################################################
# TODO#A3:                                                                     #
# Write code that call model.fit, or model.fit_generator if you have data      #
# generator, to train you models. Make sure you have validation_data as an     # 
# argument and use verbose=2 to generate one log line per epoch. Select your   #
# batch size carefully as it will affect your model's ability to converge and  #
# time needed for one epoch.                                                   #
#                                                                              #
# Hint: Read about callbacks_list argument on the documentation. You might     #
# find  ReduceLROnPlateau() and ModelCheckpoint() useful for your training     #
# process. Feel free to use any other callback function available.             #
################################################################################
print('start training conv2d')
model_cnn = get_conv2d_nn()
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################
model_cnn.summary()
# Path to save model parameters
weight_path_model_cnn ='model_cnn_nn.h5'
# Path to write tensorboard
tensorboard_path_model_cnn = 'Graphs/ff_nn'

callbacks_list_model_cnn_nn = [
#     TensorBoard(log_dir=tensorboard_path_model_cnn, histogram_freq=1, write_graph=True, write_grads=True),
    ModelCheckpoint(
            weight_path_model_cnn,
            save_best_only=True,
            save_weights_only=True,
            monitor='val_loss',
            mode='min',
            verbose=1
        ),
    ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=2, min_lr=0.0001)
]

verbose = 2
epochs, batch_size = [10,256]

model_cnn.fit(x_train_cnn, y_train_cnn, epochs=epochs, batch_size=batch_size, verbose=verbose,
                callbacks=callbacks_list_model_cnn_nn, validation_data=(x_val_cnn, y_val_cnn))

start training conv2d
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         (None, 5, 5, 3)           0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 3, 3, 200)         5600      
_________________________________________________________________
flatten_1 (Flatten)          (None, 1800)              0         
_________________________________________________________________
dense_5 (Dense)              (None, 200)               360200    
_________________________________________________________________
dense_6 (Dense)              (None, 200)               40200     
_________________________________________________________________
dense_7 (Dense)              (None, 1)                 201       
Total params: 406,201
Trainable params: 406,201
Non-trainable params: 0
________________________________________________

<keras.callbacks.History at 0x7f4fc408b048>

In [29]:
model_cnn.save_weights('model_cnn_nn_last.h5')
model_cnn.load_weights('model_cnn_nn.h5')
mse_cnn = evaluate(x_val_cnn, y_val_cnn, model_cnn)
print("Mean square error of CNN:", mse_cnn)
print("Different from feedforward:", mse_ff - mse_cnn)
print("Different from feedforward with dropout:", mse_ff_dropout - mse_cnn)

Mean square error of CNN: 1.6601089272342373
Different from feedforward: -0.0013442726947521244
Different from feedforward with dropout: -0.0001340422878655101


# Gated Recurrent Units

Now, you are going to implement you own GRU network with the following structure.
```
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 5, 75)             0         
_________________________________________________________________
gru_1 (GRU)                  (None, 5, 200)            165600    
_________________________________________________________________
time_distributed_1 (TimeDist (None, 5, 200)            40200     
_________________________________________________________________
time_distributed_2 (TimeDist (None, 5, 1)              201       
_________________________________________________________________
flatten_1 (Flatten)          (None, 5)                 0         
=================================================================
Total params: 206,001
Trainable params: 206,001
Non-trainable params: 0
_________________________________________________________________
```


These parameters are simple guidelines to save your time.    
You can play with them in the final section which you can choose any normalization methods, activation function, as well as any hyperparameter the way you want.         
The result should be better than the feedforward model and at least on par with your CNN model.    

Do consult keras documentation on how to use [GRUs](https://keras.io/layers/recurrent/).


In [30]:
################################################################################
# TODO#B1:                                                                     #
# Complete the code for preparing data for training GRU                        #
# GRU's input should has 3 dimensions.                                         #
# The dimensions should compose of entries, time-step, and features.          #
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################
def preprocess_for_gru(x_train, y_train, x_val, y_val):
    x_train_gru = x_train.reshape(-1,5,75)
    x_val_gru = x_val.reshape(-1,5, 75)
    y_train_gru = y_train.reshape(-1, 5)
    y_val_gru = y_val.reshape(-1,5)
    return x_train_gru, y_train_gru, x_val_gru, y_val_gru
x_train_gru, y_train_gru, x_val_gru, y_val_gru = preprocess_for_gru(x_train, y_train, x_val, y_val)
print(x_train_gru.shape, y_train_gru.shape)
print(x_val_gru.shape, y_val_gru.shape)

(229548, 5, 75) (229548, 5)
(92839, 5, 75) (92839, 5)


In [31]:

################################################################################
# TODO#B2                                                                      #
# Write a function that returns keras GRU network model.                       #
# Your goal is to predict a precipitation of every time step.                  #
#                                                                              #
# Hint: You should read keras documentation to see the list of available       #
# layers and options you can use.                                              #
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################

def get_gru():    
    input1 = Input(shape=(5,75,))    
    x = GRU(200,return_sequences=True)(input1)
    x = TimeDistributed(Dense(200))(x)
    x = TimeDistributed(Dense(1))(x)
    out = Flatten()(x)

    model = Model(inputs=input1, outputs=out)
    model.compile(optimizer=Adam(0.0005),
                loss='mse',
                metrics=['mse'])

    return model

In [32]:
################################################################################
# TODO#B3                                                                      #
# Write code that call model.fit, or model.fit_generator if you have data      #
# generator, to train you models. Make sure you have validation_data as an     # 
# argument and use verbose=2 to generate one log line per epoch. Select your   #
# batch size carefully as it will affect your model's ability to converge and  #
# time needed for one epoch.                                                   #
#                                                                              #
# Hint: Read about callbacks_list argument on the documentation. You might     #
# find  ReduceLROnPlateau() and ModelCheckpoint() useful for your training     #
# process. Feel free to use any other callback function available.             #
################################################################################
print('start training gru')
model_gru = get_gru()
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################
model_gru.summary()
# Path to save model parameters
weight_path_model_gru ='model_gru_nn.h5'
# Path to write tensorboard
tensorboard_path_model_gru = 'Graphs/ff_nn'

callbacks_list_model_gru_nn = [
#     TensorBoard(log_dir=tensorboard_path_model_gru, histogram_freq=1, write_graph=True, write_grads=True),
    ModelCheckpoint(
            weight_path_model_gru,
            save_best_only=True,
            save_weights_only=True,
            monitor='val_loss',
            mode='min',
            verbose=1
        ),
    ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=2, min_lr=0.0001)
]

verbose = 2
epochs, batch_size = [10,512]

model_gru.fit(x_train_gru, y_train_gru, epochs=epochs, batch_size=batch_size, verbose=verbose,
                callbacks=callbacks_list_model_gru_nn, validation_data=(x_val_gru, y_val_gru))

start training gru
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_3 (InputLayer)         (None, 5, 75)             0         
_________________________________________________________________
gru_1 (GRU)                  (None, 5, 200)            165600    
_________________________________________________________________
time_distributed_1 (TimeDist (None, 5, 200)            40200     
_________________________________________________________________
time_distributed_2 (TimeDist (None, 5, 1)              201       
_________________________________________________________________
flatten_2 (Flatten)          (None, 5)                 0         
Total params: 206,001
Trainable params: 206,001
Non-trainable params: 0
_________________________________________________________________
Train on 229548 samples, validate on 92839 samples
Epoch 1/10
 - 9s - loss: 1.9182 - mean_squared_error: 1.9182 - val

<keras.callbacks.History at 0x7f4faa0d9860>

In [33]:
model_gru.save_weights('model_gru_nn_last.h5')
model_gru.load_weights('model_gru_nn.h5')
mse_gru = evaluate(x_val_gru, y_val_gru, model_gru)
print("Mean square error of gru:", mse_gru)
print("Different from CNN:", mse_cnn - mse_gru)
print("Different from feedforward:", mse_ff - mse_gru)
print("Different from feedforward with dropout:", mse_ff_dropout - mse_gru)

Mean square error of gru: 1.6568418235959599
Different from CNN: 0.0032671036382774243
Different from feedforward: 0.0019228309435253
Different from feedforward with dropout: 0.0031330613504119142


# Final Section
# Keras playground

Now, train the best model you can do for this task. You can use any model structure and function available.    
Remember that trainig time increases with the complexity of the model. You might find TensorBoard helpful in tuning of complicated models.    
Your model should be better than your CNN or GRU model in the previous sections.

You should tune your model on training and validation set.    
**The test set should be used only for the last evaluation.**

In [34]:
################################################################################
# TODO#5                                                                       #
# Write a function that returns keras your best model. You can use anything    #
# you want. The goal here is to create the best model you can think of.        #
#                                                                              #
# Hint: You should read keras documentation to see the list of available       #
# layers and options you can use.                                              #
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################

def get_my_best_model():
    input1 = Input(shape=(5,75,))    
    x = LSTM(200,return_sequences=True)(input1)
    x = LSTM(200,return_sequences=True)(x)
    x = Dropout(0.2)(x)
    x = TimeDistributed(Dense(200, activation='relu'))(x)
    x = Dropout(0.2)(x)
    x = TimeDistributed(Dense(200, activation='relu'))(x)
    x = TimeDistributed(Dense(1))(x)
    out = Flatten()(x)

    model = Model(inputs=input1, outputs=out)
    model.compile(optimizer=Adam(0.0005),
                loss='mse',
                metrics=['mse'])

    return model

In [35]:
################################################################################
# TODO#6                                                                       #
# Write code that call model.fit, or model.fit_generator if you have data      #
# generator, to train you models. Make sure you have validation_data as an     # 
# argument and use verbose=2 to generate one log line per epoch. Select your   #
# batch size carefully as it will affect your model's ability to converge and  #
# time needed for one epoch.                                                   #
#                                                                              #
# Hint: Read about callbacks_list argument on the documentation. You might     #
# find  ReduceLROnPlateau() and ModelCheckpoint() useful for your training     #
# process. Feel free to use any other callback function available.             #
################################################################################
print('start training the best model')
model_best = get_my_best_model()
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################
x_train_best = x_train_gru.copy()
y_train_best = y_train_gru.copy()
x_val_best = x_val_gru.copy()
y_val_best = y_val_gru.copy()
model_best.summary()
# Path to save model parameters
weight_path_model_best ='model_best_nn.h5'
# Path to write tensorboard
tensorboard_path_model_best = 'Graphs/ff_nn'

callbacks_list_model_best_nn = [
#     TensorBoard(log_dir=tensorboard_path_model_best, histogram_freq=1, write_graph=True, write_grads=True),
    ModelCheckpoint(
            weight_path_model_best,
            save_best_only=True,
            save_weights_only=True,
            monitor='val_loss',
            mode='min',
            verbose=1
        ),
    ReduceLROnPlateau(monitor='val_loss', factor=0.8, patience=2, min_lr=0.00001)
]

verbose = 2
epochs, batch_size = [50,256]

model_best.fit(x_train_best, y_train_best, epochs=epochs, batch_size=batch_size, verbose=verbose,
                callbacks=callbacks_list_model_best_nn, validation_data=(x_val_best, y_val_best))

start training the best model
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_4 (InputLayer)         (None, 5, 75)             0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 5, 200)            220800    
_________________________________________________________________
lstm_2 (LSTM)                (None, 5, 200)            320800    
_________________________________________________________________
dropout_5 (Dropout)          (None, 5, 200)            0         
_________________________________________________________________
time_distributed_3 (TimeDist (None, 5, 200)            40200     
_________________________________________________________________
dropout_6 (Dropout)          (None, 5, 200)            0         
_________________________________________________________________
time_distributed_4 (TimeDist (None, 5, 200)   

 - 31s - loss: 1.9104 - mean_squared_error: 1.9104 - val_loss: 1.6616 - val_mean_squared_error: 1.6616

Epoch 00040: val_loss did not improve from 1.65656
Epoch 41/50
 - 31s - loss: 1.9106 - mean_squared_error: 1.9106 - val_loss: 1.6605 - val_mean_squared_error: 1.6605

Epoch 00041: val_loss did not improve from 1.65656
Epoch 42/50
 - 31s - loss: 1.9104 - mean_squared_error: 1.9104 - val_loss: 1.6595 - val_mean_squared_error: 1.6595

Epoch 00042: val_loss did not improve from 1.65656
Epoch 43/50
 - 31s - loss: 1.9106 - mean_squared_error: 1.9106 - val_loss: 1.6611 - val_mean_squared_error: 1.6611

Epoch 00043: val_loss did not improve from 1.65656
Epoch 44/50
 - 31s - loss: 1.9105 - mean_squared_error: 1.9105 - val_loss: 1.6611 - val_mean_squared_error: 1.6611

Epoch 00044: val_loss did not improve from 1.65656
Epoch 45/50
 - 31s - loss: 1.9104 - mean_squared_error: 1.9104 - val_loss: 1.6603 - val_mean_squared_error: 1.6603

Epoch 00045: val_loss did not improve from 1.65656
Epoch 46/5

<keras.callbacks.History at 0x7f4f8dc666a0>

In [36]:
model_best.save_weights('model_best_nn_last.h5')
model_best.load_weights('model_best_nn.h5')
x_train_cnn, y_train_cnn, x_test_cnn, y_test_cnn = preprocess_for_cnn(x_train, y_train, x_test, y_test)
x_train_gru, y_train_gru, x_test_gru, y_test_gru = preprocess_for_gru(x_train, y_train, x_test, y_test)
mse_best = evaluate(x_val_best, y_val_best, model_best)
print("Last model")
print("Mean square error of best:", mse_best)
print("Different from feedforward:", mse_ff - mse_best)
print("Different from feedforward with dropout:", mse_ff_dropout - mse_best)
print("Different from CNN:", mse_cnn - mse_best)
print("Different from GRU:", mse_gru - mse_best)

print("Test set evaluate")
test_cnn = evaluate(x_test_cnn, y_test_cnn, model_cnn)
test_gru = evaluate(x_test_gru, y_test_gru, model_gru)
test_best = evaluate(x_test_gru, y_test_gru, model_best)
print("Test of CNN:", test_cnn)
print("Test of GRU:", test_gru)
print("Test of best:", test_best)
#Also evaluate your fully-connected model and CNN/GRU model on the test set.

Last model
Mean square error of best: 1.6565593197344861
Different from feedforward: 0.0022053348049990706
Different from feedforward with dropout: 0.003415565211885685
Different from CNN: 0.003549607499751195
Different from GRU: 0.0002825038614737707
Test set evaluate
Test of CNN: 32782.73145409426
Test of GRU: 2.128476027049198
Test of best: 1.1619463834452906


To get full credit for this part, your best model should be better than the previous models on the **test set**. The top 5 students will recieve 2 additional points. The top student will recieve another 2 additional points on top.