## Deep Neural Networks with Keras ##

In this exercise, you are going to build a set of deep learning models on a real world task using Tensorflow and Keras. Tensorflow is a deep learning framwork developed by Google, and Keras is a frontend library built on top of Tensorflow (or Theano, CNTK) to provide an easier way to use standard layers and networks.

To complete this exercise, you will need to build deep learning models for precipitation nowcasting. You will build a subset of the models shown below:
- Fully Connected (Feedforward) Neural Network
- Two-Dimentional Convolution Neural Network (2D-CNN)
- Recurrent Neural Network with Gated Recurrent Unit (GRU)

and one more model of your choice to achieve the highest score possible.

We provide the code for data cleaning and some starter code for keras in this notebook but feel free to modify those parts to suit your needs. You can also complete this exercise using only Tensorflow (without using Keras). Feel free to use additional libraries (e.g. scikit-learn) as long as you have a model for each type mentioned above.

This notebook assumes you have already installed Tensorflow and Keras with python3 and had GPU enabled. If you run this exercise on GCloud using the provided disk image you are all set.

As a reminder,

### Don't forget to shut down your instance on Gcloud when you are not using it ###

## Precipitation Nowcasting ##

Precipitation nowcasting is the the task of predicting the amount of rainfall in a certain region given some kind of sensor data.  The term nowcasting refers to tasks that try to predict the current or near future conditions (within 6 hours). 

You will be given satellite images in 3 different bands covering a 5 by 5 region from different parts of Thailand. In other words, your input will be a 5x5x3 image. Your task is to predict the amount of rainfal in the center pixel. You will first do the prediction using just a simple fully-connected neural network that view each pixel as different input features.

Since the your input is basically an image, we will then view the input as an image and apply CNN to do the prediction. Finally, we can also add a time component since weather prediction can benefit greatly using previous time frames. Each data point actually contain 5 time steps, so each input data point has a size of 5x5x5x3 (time x height x width x channel), and the output data has a size of 5 (time). You will use this time information when you work with RNNs.

Finally, we would like to thank the Thai Meteorological Department for providing the data for this assignment.

In [None]:
import os
import numpy as np
import pickle
import keras
from keras.models import load_model
import pandas as pd
import matplotlib.pyplot as plt
import urllib

# Data Explanation #

The data is an hourly measurement of water vapor in the atmosphere, and two infrared measurements of cloud imagery on a latitude-longitude coordinate. Each measurement is illustrated below as an image. These three features are included as different channels in your input data.

<img src="https://raw.githubusercontent.com/burin-n/pattern-recognition/master/HW4/images/wvapor.png" width="200"> <img src="https://raw.githubusercontent.com/burin-n/pattern-recognition/master/HW4/images/cloud1.png" width="200"> <img src="https://raw.githubusercontent.com/burin-n/pattern-recognition/master/HW4/images/cloud2.png" width="200">

We also provide the hourly precipitation (rainfall) records in the month of June, July, August, September, and October from weather stations spreaded around the country. A 5x5 grid around each weather station at a particular time will be paired with the precipitation recorded at the corresponding station as input and output data. Finally, five adjacent timesteps are stacked into one sequence.

The month of June-August are provided as training data, while the months of September and October are used as validation and test sets, respectively.


# Reading data

In [None]:
def read_data(months, data_dir='dataset'):
    features = np.array([], dtype=np.float32).reshape(0,5,5,5,3)
    labels = np.array([], dtype=np.float32).reshape(0,5)
    for m in months:
        filename = 'features-m{}.pk'.format(m)
        with open(os.path.join(data_dir,filename), 'rb') as file:
            features_temp = pickle.load(file)
        features = np.concatenate((features, features_temp), axis=0)
        
        filename = 'labels-m{}.pk'.format(m)
        with open(os.path.join(data_dir,filename), 'rb') as file:
            labels_temp = pickle.load(file)
        labels = np.concatenate((labels, labels_temp), axis=0)
    
    return features, labels

In [None]:
# use data from month 6,7,8 as training set
x_train, y_train = read_data(months=[6,7,8])

# use data from month 9 as validation set
x_val, y_val = read_data(months=[9])

# use data from month 10 as test set
x_test, y_test = read_data(months=[10])

print('x_train shape:',x_train.shape)
print('y_train shape:', y_train.shape, '\n')
print('x_val shape:',x_val.shape)
print('y_val shape:', y_val.shape, '\n')
print('x_test shape:',x_test.shape)
print('y_test shape:', y_test.shape)

**features** 
- dim 0: number of entries
- dim 1: number of time-steps in ascending order
- dim 2,3: a 5x5 grid around rain-measued station
- dim 4: water vapor and two cloud imagenaries 

**labels**
- dim 0: number of entries
- dim 1: number of precipitation for each time-step

In [None]:
def normalize(X):
    mean = np.mean(X)
    var = np.var(X)
    return (X - mean) / var

In [None]:
x_train = normalize(x_train)
x_val = normalize(x_val)

# Three-Layer Feedforward Neural Networks

Below, the code for creating a 3-layers fully connected neural network in keras is provided. Run the code and make sure you understand what you are doing. Then, report the results.

In [None]:
# Dataset need to be reshaped to make it suitable for feedforword model
def preprocess_for_ff(x_train, y_train, x_val, y_val):
    x_train_ff = x_train.reshape((-1, 5*5*3))
    y_train_ff = y_train.reshape((-1, 1))
    x_val_ff = x_val.reshape((-1, 5*5*3))
    y_val_ff = y_val.reshape((-1, 1))
    return x_train_ff, y_train_ff, x_val_ff, y_val_ff

x_train_ff, y_train_ff, x_val_ff, y_val_ff = preprocess_for_ff(x_train, y_train, x_val, y_val)
print(x_train_ff.shape, y_train_ff.shape)
print(x_val_ff.shape, y_val_ff.shape)

In [None]:
from keras.layers import Dense, Input
from keras.models import Model
from keras.optimizers import Adam

def get_feedforward_nn():    
    input1 = Input(shape=(75,))    
    x = Dense(200, activation='relu')(input1)    
    x = Dense(200, activation='relu')(x)
    x = Dense(200, activation='relu')(x)
    out = Dense(1)(x)

    model = Model(inputs=input1, outputs=out)
    model.compile(optimizer=Adam(0.001),
                loss='mse',
                metrics=['mse'])

    return model

In [None]:
from keras import backend as K
# This is called to clear the original model session in order to use TensorBoard
K.clear_session()

model_ff = get_feedforward_nn()
model_ff.summary()

In [None]:
from keras.callbacks import ModelCheckpoint, TensorBoard, ReduceLROnPlateau

print('start training ff')

# Path to save model parameters
weight_path_model_ff ='model_ff_nn.h5'
# Path to write tensorboard
tensorboard_path_model_ff = 'Graphs/ff_nn'

callbacks_list_model_ff_nn = [
#     TensorBoard(log_dir=tensorboard_path_model_ff, histogram_freq=1, write_graph=True, write_grads=True),
    ModelCheckpoint(
            weight_path_model_ff,
            save_best_only=True,
            save_weights_only=True,
            monitor='val_loss',
            mode='min',
            verbose=1
        ),
    ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=2, min_lr=0.0001)
]

verbose = 1
epochs, batch_size = [10,1024]

model_ff.fit(x_train_ff, y_train_ff, epochs=epochs, batch_size=batch_size, verbose=verbose,
                callbacks=callbacks_list_model_ff_nn, validation_data=(x_val_ff, y_val_ff))

In [None]:
################################################################################
# TODO#1:                                                                      #
# Write a function to evaluate your model. Your function must make prediction  #
# using the input model and return mean square error of the model.             #
#                                                                              #
# Hint: https://keras.io/models/model#evaluate                                 #
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################
def evaluate(features, labels, model):
    """
    Evaluate model on validation data
    """
    pass
    return mse

In [None]:
# We will use majority rule as a baseline.
def majority_baseline(label_set):
    unique, counts = np.unique(label_set, return_counts=True)
    majority = unique[np.argmax(counts)]
    baseline = 0
    label_set = label_set.reshape(-1,1)
    for r in label_set:
        baseline += (majority - r) ** 2 / len(label_set)
    pass
    return baseline

In [None]:
print('baseline')
print('train', majority_baseline(y_train))
print('validate', majority_baseline(y_val))

# (Optional) Tensorboard #
The code provided also have Tensorboard (a visualization tool that comes with Tensorflow). Note the part that calls it `TensorBoard(log_dir='./Graph/' + graph_name, histogram_freq=1, write_graph=True, write_grads=True)`. This tells Tensorflow to write extra outputs to the `log_dir` which can then be used for visualization.

To start tensorboard do
```
tensorboard --logdir=/full_path_to_your_logs
```
from the commandline. This will launch tensorboard, you will be able to access it from a web browser by pointint the url to `<instance-ip>:6006`. You will need to enable additional firewall rules in Gcloud for this.

** Make sure your logs path is in the second drive (under /data). Otherwise, your main disk will be full! **

In Tensorboard, you will be able to debug your computation graph which can be hard to keep track in code. This is might seem trivial in Keras, but it is very helpful for Tensorflow. You can see a visualization of the computation graph at the `GRAPH` tab. If you see multiple dense layers (more than 4), this is caused by running the code several times without deleting the log dir. Delete the log dir and re-run the code.

Next, let's look at the scalars tab, we can see the loss and accuracy on the training and validation set as they change over each epoch. This can be useful to detect overfitting.

Another useful tab is the histograms tab. This plot histograms of the weights, biases, and outputs of each layer. The depth of the histograms show the change over epochs. We can see how the histograms of weights change over the training peroid. This can be used to debug vanishing gradients or getting stuck in local minimas.

There are other useful tabs in Tensorboard, you can read about them in the Keras [documentation](https://keras.io/callbacks/#tensorboard) for tensorboard.

# Tensorboard observation #

**Optional TODO#1** Write your own interpretation of the logs from this example. A simple sentence or two for each tab is sufficient.

**Your answer:** 

# Dropout #

You might notice that the 3-layered feedforward does not use dropout at all. Now, try adding dropout to the model, run, and report the result again.

In [None]:
################################################################################
# TODO#2:                                                                      #
# Write a function that return feedforward model with dropout                  #
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################
def get_fully_connected_with_dropout():    
    pass    
    return model

In [None]:
from keras import backend as K
K.clear_session()

model_ff_dropout = get_fully_connected_with_dropout()
model_ff_dropout.summary()

**TODO#2** Train you model with dropout below

In [None]:
################################################################################
# TODO#3:                                                                      #
# Complete the code to train your dropout model                                #
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################
print('start training ff dropout')
pass

In [4]:
################################################################################
# TODO#4:                                                                      #
# Complete the code to evaluate your dropout model                             #
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################
pass

# A fork on the road

In the next Sections, we will discuss CNNs and GRUs. **PICK ONE** method to complete to finish the homework. If you do both methods, the other method counts as an optional task. Then, do the **Final Section**.

# Convolution Neural Networks
Now, you are going to implement you own 2d-convolution neural networks with the following structure.
```
_________________________________________________________________
Layer (type)                 Output Shape              Param
=================================================================
input_1 (InputLayer)         (None, 5, 5, 3)           0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 3, 3, 200)         5600      
_________________________________________________________________
flatten_1 (Flatten)          (None, 1800)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 200)               360200    
_________________________________________________________________
dense_2 (Dense)              (None, 200)               40200     
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 201       
=================================================================
Total params: 406,201
Trainable params: 406,201
Non-trainable params: 0
_________________________________________________________________
```
These parameters are simple guidelines to save your time.    
You can play with them in the final section which you can choose any normalization methods, activation function, as well as any hyperparameter the way you want.         

Hint: You should read keras documentation to see the list of available layers and options you can use.                         

In [None]:
################################################################################
# TODO#A1:                                                                     #
# Complete the code for preparing data for training CNN                        #
# Input for CNN should not have time step.                                     #
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################
def preprocess_for_cnn(x_train, y_train, x_val, y_val):
    pass
    return x_train_cnn, y_train_cnn, x_val_cnn, y_val_cnn

In [None]:
################################################################################
# TODO#A2:                                                                     #
# Write a function that returns keras convolution nueral network model.        #
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################
def get_conv2d_nn():
    pass
    return model

In [None]:
################################################################################
# TODO#A3:                                                                     #
# Write code that call model.fit, or model.fit_generator if you have data      #
# generator, to train you models. Make sure you have validation_data as an     # 
# argument and use verbose=2 to generate one log line per epoch. Select your   #
# batch size carefully as it will affect your model's ability to converge and  #
# time needed for one epoch.                                                   #
#                                                                              #
# Hint: Read about callbacks_list argument on the documentation. You might     #
# find  ReduceLROnPlateau() and ModelCheckpoint() useful for your training     #
# process. Feel free to use any other callback function available.             #
################################################################################
print('start training conv2d')
model_cnn = get_conv2d_nn()
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################
pass

In [None]:
evaluate(x_val_cnn, y_val_cnn, model)

# Gated Recurrent Units

Now, you are going to implement you own GRU network with the following structure.
```
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 5, 75)             0         
_________________________________________________________________
gru_1 (GRU)                  (None, 5, 200)            165600    
_________________________________________________________________
time_distributed_1 (TimeDist (None, 5, 200)            40200     
_________________________________________________________________
time_distributed_2 (TimeDist (None, 5, 1)              201       
_________________________________________________________________
flatten_1 (Flatten)          (None, 5)                 0         
=================================================================
Total params: 206,001
Trainable params: 206,001
Non-trainable params: 0
_________________________________________________________________
```


These parameters are simple guidelines to save your time.    
You can play with them in the final section which you can choose any normalization methods, activation function, as well as any hyperparameter the way you want.         
The result should be better than the feedforward model and at least on par with your CNN model.    

Do consult keras documentation on how to use [GRUs](https://keras.io/layers/recurrent/).


In [None]:
################################################################################
# TODO#B1:                                                                     #
# Complete the code for preparing data for training GRU                        #
# GRU's input should has 3 dimensions.                                         #
# The dimensions should compose of entries, time-step, and features.          #
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################
def preprocess_for_gru(x_train, y_train, x_val, y_val):
    pass
    return x_train_gru, y_train_gru, x_val_gru, y_val_gru

In [None]:
################################################################################
# TODO#B2                                                                      #
# Write a function that returns keras GRU network model.                       #
# Your goal is to predict a precipitation of every time step.                  #
#                                                                              #
# Hint: You should read keras documentation to see the list of available       #
# layers and options you can use.                                              #
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################

def get_gru():    
    pass
    return model

In [None]:
################################################################################
# TODO#B3                                                                      #
# Write code that call model.fit, or model.fit_generator if you have data      #
# generator, to train you models. Make sure you have validation_data as an     # 
# argument and use verbose=2 to generate one log line per epoch. Select your   #
# batch size carefully as it will affect your model's ability to converge and  #
# time needed for one epoch.                                                   #
#                                                                              #
# Hint: Read about callbacks_list argument on the documentation. You might     #
# find  ReduceLROnPlateau() and ModelCheckpoint() useful for your training     #
# process. Feel free to use any other callback function available.             #
################################################################################
print('start training gru')
model_gru = get_gru()
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################
pass

In [None]:
evaluate(x_val_gru, y_val_gru, model)

# Final Section
# Keras playground

Now, train the best model you can do for this task. You can use any model structure and function available.    
Remember that trainig time increases with the complexity of the model. You might find TensorBoard helpful in tuning of complicated models.    
Your model should be better than your CNN or GRU model in the previous sections.

You should tune your model on training and validation set.    
**The test set should be used only for the last evaluation.**

In [None]:
################################################################################
# TODO#5                                                                       #
# Write a function that returns keras your best model. You can use anything    #
# you want. The goal here is to create the best model you can think of.        #
#                                                                              #
# Hint: You should read keras documentation to see the list of available       #
# layers and options you can use.                                              #
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################

def get_my_best_model():
    pass
    return model

In [None]:
################################################################################
# TODO#6                                                                       #
# Write code that call model.fit, or model.fit_generator if you have data      #
# generator, to train you models. Make sure you have validation_data as an     # 
# argument and use verbose=2 to generate one log line per epoch. Select your   #
# batch size carefully as it will affect your model's ability to converge and  #
# time needed for one epoch.                                                   #
#                                                                              #
# Hint: Read about callbacks_list argument on the documentation. You might     #
# find  ReduceLROnPlateau() and ModelCheckpoint() useful for your training     #
# process. Feel free to use any other callback function available.             #
################################################################################
print('start training the best model')
my_best_model = get_my_best_model()
################################################################################
#                            WRITE YOUR CODE BELOW                             #
################################################################################
pass

In [None]:
evaluate(x_val_best, y_val_best, model)
evaluate(x_test_best, y_test_best, model)
#Also evaluate your fully-connected model and CNN/GRU model on the test set.

To get full credit for this part, your best model should be better than the previous models on the **test set**. The top 5 students will recieve 2 additional points. The top student will recieve another 2 additional points on top.