# Week 11 - Review and Beyond MLP

### Aims

By the end of this notebook you will be able to understand 

>* The Basics of Keras overview
>* Linear Regression by Keras
>* Working out on Project II data

The exercises here are designed to reinforce the basics of keras for further use. Additionally, you will see some simple tasks to try and comment out. 

- For the last time, you will have lighter tasks tagged by (CORE) and (EXTRA).

- If you already submitted at least 5 hands-in script before (marked as 1), you can directly start your project II during the WS. 

- Some experiments asked below is related to the hotel data already. 

# Imports

We're only going to need a couple of standard libraries this week, as well as keras. 

In [13]:
# Display plots inline
%matplotlib inline  

# Data libraries
import pandas as pd
import numpy as np

# Plotting libraries
import matplotlib.pyplot as plt
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D

In [3]:
import tensorflow as tf
from tensorflow import keras

In [None]:
# Not necessary in general !
import os, datetime
logdir = os.path.join("logs", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))

%load_ext tensorboard
%tensorboard --port=5036 --logdir $logdir
tensorboard_callback = keras.callbacks.TensorBoard(logdir, histogram_freq=1) 

# Basics

**Just to highlight some differences**

TensorFlow is an infrastructure layer for differentiable programming. At its heart, it's a framework for manipulating N-dimensional arrays (tensors), much like NumPy.But as you experienced already, there are three key differences between NumPy and TensorFlow:

- TensorFlow can leverage hardware accelerators such as GPUs and TPUs.

- TensorFlow can automatically compute the gradient of arbitrary differentiable tensor expressions.

- TensorFlow computation can be distributed to large numbers of devices on a single machine, and large number of machines (potentially with multiple devices each).

In [None]:
x = tf.constant([[5, 2], [1, 3]])
print(x)

In [6]:
# You can get its value as a NumPy array by calling .numpy():
x.numpy()

array([[5, 2],
       [1, 3]], dtype=int32)

In [7]:
# Much like a NumPy array, it features the attributes dtype and shape:
print("dtype:", x.dtype)
print("shape:", x.shape)

dtype: <dtype: 'int32'>
shape: (2, 2)


In [8]:
# You can also create random constant tensors:
x = tf.random.normal(shape=(2, 2), mean=0.0, stddev=1.0)
x

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[ 0.7950265 , -0.5505485 ],
       [ 0.75827163,  0.11674298]], dtype=float32)>

## Variables

Variables are special tensors used to store mutable state (such as the weights of a neural network). You create a Variable using some initial value:

**Doing math in TensorFlow:** 

If you've used NumPy, doing math in TensorFlow will look very familiar. The main difference is that your TensorFlow code can run on GPU and TPU.

In [9]:
initial_value = tf.random.normal(shape=(2, 2))
a = tf.Variable(initial_value)
print(a)

<tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[-1.0459622 ,  0.7224666 ],
       [-0.13016361, -0.787909  ]], dtype=float32)>


In [11]:
a = tf.random.normal(shape=(2, 2))
b = tf.random.normal(shape=(2, 2))

c = a + b
d = tf.square(c)
e = tf.exp(d)
print(c, d, e)

tf.Tensor(
[[ 0.4076662   0.01619387]
 [-0.38453448 -1.6682957 ]], shape=(2, 2), dtype=float32) tf.Tensor(
[[1.6619174e-01 2.6224132e-04]
 [1.4786677e-01 2.7832108e+00]], shape=(2, 2), dtype=float32) tf.Tensor(
[[ 1.1807995  1.0002623]
 [ 1.1593584 16.170858 ]], shape=(2, 2), dtype=float32)


# Exercise 1 (CORE)

You can convert a the dataframe column to a tensor object like so: `tf.constant((df['column_name']))`

So, consider your hotel data set as data frame;

- Convert the numerical variables `lead_time` and `adr` into tensor object

- Print the shape and type of created tensor objects

# Keras layers

You already experienced by directly applying some layers in Week 9-10 but let us recall once again some properties

- While TensorFlow is an infrastructure layer for differentiable programming, dealing with tensors, variables, and gradients, Keras is a user interface for deep learning, dealing with layers, models, optimizers, loss functions, metrics, and more.

- Keras serves as the high-level API for TensorFlow: Keras is what makes TensorFlow simple and productive.

- The Layer class is the fundamental abstraction in Keras. A Layer encapsulates a state (weights) and some computation (defined in the call method).

A simple layer looks like this:

In [17]:
class Linear(keras.layers.Layer):
    """y = w.x + b"""

    def __init__(self, units=32, input_dim=32):
        super().__init__()
        w_init = tf.random_normal_initializer()
        self.w = tf.Variable(
            initial_value=w_init(shape=(input_dim, units), dtype="float32"),
            trainable=True,
        )
        b_init = tf.zeros_initializer()
        self.b = tf.Variable(
            initial_value=b_init(shape=(units,), dtype="float32"), trainable=True
        )

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b

In [18]:
# You would use a Layer instance much like a Python function:
linear_layer = Linear(units=4, input_dim=2)
linear_layer

<__main__.Linear at 0x7f4cd2faec70>

In [19]:
# The layer can be treated as a function.
# Here we call it on some data.
t = linear_layer(tf.ones((2, 2)))
t

<tf.Tensor: shape=(2, 4), dtype=float32, numpy=
array([[-0.00755919, -0.07351511,  0.0961083 , -0.05432985],
       [-0.00755919, -0.07351511,  0.0961083 , -0.05432985]],
      dtype=float32)>

# Exercise 2 (CORE)

Consider the class definition given above for the `Linear` one 

- Explain the each line of code to demistfy the meaning of this class, `Linear`

- Discuss the meaning of `Linear(units=4, input_dim=2)` usage above. 

# Exercise 3 (CORE)

For your created tensors above, 

- Build a linear regression model in keras. The model should consist of an input layer and a fully-connected output layer. See lecture notes for details of how to create these objects, previous WS materials or ask your tutors.

- Compile the model. At this stage you need to select a loss function (specified via the "loss" keyword) and an optimizer. 

- Train the model with model.fit. Pass the keyword argument (similar to previous labs). Consider small number of `epochs` like 50 for the computational time reasons 

```
# callbacks=[tensorboard_callback]
```

- You might also want to split the dataset into a training and validation component via  


```
# validation_split=0.3
```



In [18]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Exercise 4 (CORE)

- Now create a new model with single feature by adding a fully-connected hidden layer with 2 neurons between your input and output above. Using the `linear` type of layers again.

- Train the new model and comment on your fitted model

# Exercise 5 (CORE)

To run the single-variable linear regression using keras, we can benefit from `Sequential` model idea as well

- See the details of added normalization layer below

- Compile the created model below and try to produce the model's training progress using the stats stored in the history object:

For more details, see example given here : https://www.tensorflow.org/tutorials/keras/regression

In [38]:
from tensorflow import keras
from tensorflow.keras import layers

# Variable selection
lead_time = np.array(df_hotel['lead_time'])

# About normalization by keras
lead_time_normalizer = layers.Normalization(input_shape=[1,], axis=None)
lead_time_normalizer.adapt(lead_time)

In [None]:
# Created model including normalization
linear_model = tf.keras.Sequential([
    lead_time_normalizer,
    layers.Dense(units=1, activation='linear')
])

linear_model.summary()

In [40]:
# Compile the model 


In [None]:
# Use Keras Model.fit to execute the training for 50 epochs:
history = linear_model.fit(
    df_hotel['lead_time'], df_hotel['adr'],
    epochs = 50,
    # Calculate validation results on 30% of the training data.
    validation_split = 0.3)

In [None]:
# Visualize the model's training progress using the stats stored in the history object:
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
hist.tail()

In [20]:
# Plotting the behaviour of training and validation loss functions
# Consider the below function simply 
def plot_loss(history):
  plt.plot(history.history['loss'], label='loss')
  plt.plot(history.history['val_loss'], label='val_loss')
  plt.xlabel('Epoch')
  plt.ylabel('Error')
  plt.legend()
  plt.grid(True)

In [None]:
plot_loss(history)

# Exercise 6 (CORE)

Consider your hotel data once again. Officially, our response is binary and supposed to be the `is_canceled` in the data set. So, we are interested in predicting a binary response here so the problem is classification. 

- Consider fitting a logistic regression model in keras similar to Exercise 3. The model should consist of an input layer and a fully-connected output layer. Choose one single numerical predictor, such as `lead_time` to do this experiment

- Which activation function you used and how is the progress of the model fit with `epochs = 50` and `validation_split=0.3` (You can benefit from the plotting introduced before)

In [45]:
# Define the response with a single predictor


# Exercise 7 (CORE)

- Calculate your model predictions on the original data set (using the 0.5 threshold for the decision boundary)

- Compare your findings with your ground truth (true observations for `is_canceled` variable)

# Exercise 8 (EXTRA)

For the hotel data set, as an extension of above mentioned method;

1. Lets create our feature matrix (including some variables) and response varible (`is_canceled`). You can think of set of predictors (suitable and ready to use predictors)

2.  Split the data into training and test sets as usual. Use the test size as $30\%$ of the whole sample herein.

3. Re-create the above logistic model (the model should consist of an input layer and a fully-connected output layer) and train the model using only training data now (Remember that validation over the training set still makes sense!)

4. Derive the predictions of the fitted model over the test data (unseen data set by the NN model)


**WARNING :** Note that, these steps are followed under the assumption that all the necessary data cleaning was completed generally.  

# Exercise 9 (EXTRA)

By changing the NN model a bit further;

1. Create a new model by adding a fully-connected hidden layer with 2 neurons between your input and output above.

2. Play around the considered activation functions (adding `relu` etc. instead of logistic)

3. Train the new model over the training data and consider your predictions over the test data again. Compare your predictions with the true test values

4. Visualize the model's training progress similar to the previous graphical outputs
