## Keras

***
My goal is not to fit a good model.
My goal is to show you how to implement what is covered in the slides.
***

#### Table of Contents

- [Preliminaries](#Preliminaries)
- [Null Model](#Null-Model)
- [Initialization](#Initialization)
- [Compilation](#Compilation)
- [Fitting](#Fitting)
- [Evaluation](#Evaluation)
- [Prediction](#Prediction)


****************
# Preliminaries
[TOP](#Keras)

Remember, standardizing your features affects the cost function; standardizing helps correctly and quickly finding the optimal solution.
Let us grab our custom `stdz()` function.

In [None]:
%run metrics.py

Here are the packages and functions that we will need.

In [1]:
# utilities
import numpy as np
import pandas as pd

# processing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder

# algorithms
from tensorflow import keras

# plotting
import matplotlib.pyplot as plt

# Setting the Seed....

It is quite an involved process. 
Not only do we need to set the seed for `TensorFlow`, but we also need to set it for `NumPy` and in the backend because they are all used when fitting a neural network.

Check out the [documentation](https://keras.io/getting_started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development) for more details.

In [3]:
import tensorflow
tensorflow.__version__

'2.4.1'

In [None]:
import os
import random
import tensorflow as tf

np.random.seed(490)
os.environ['PYTHONHASHSEED'] = '0'
random.seed(490)
tf.random.set_seed(490)

Okay, let's load in our data.

Predicting categorical data requires much more setup than the regression data.

* With continuous labels, you simply proceed as we have been throughout the class up to this point.
- Discrete labels require the data to be transformed via `OneHotEncoder()`

In [2]:
df = pd.read_pickle('C:/Users/johnj/Documents/Data/aml in econ 02 spring 2021/class data/class_data.pkl')
df.shape

(50834, 12)

In [None]:
df_prepped = df.drop(columns = ['year']).join([
    pd.get_dummies(df['year'], drop_first = True)
])

In [None]:
y = df_prepped['urate_bin']
x = df_prepped.drop(columns = 'urate_bin')

This is a step-by-step how to prepare categorical data for `OneHotEncoder()`.

In [None]:
y
np.array(y)
np.array(y).reshape(-1, 1)
ohe = OneHotEncoder().fit(np.array(y).reshape(-1, 1))

Okay, time to split the data.

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y,
                                                   train_size = 2/3,
                                                   random_state = 490)
x_train = x_train.apply(stdz)
x_test  = x_test.apply(stdz)

y_train = np.array(y_train).reshape(-1, 1)
y_test  = np.array(y_test).reshape(-1, 1)

y_train = ohe.transform(y_train).toarray()
y_test  = ohe.transform(y_test).toarray()

Just to reiterate how we have transformed our label:

In [None]:
y_train[0:5, :]

***********
# Null Model
[TOP](#Keras)

We are going to have to fit the null model differently than before because it is now an `np.array()`, not a `pd.Series()`.

In [None]:
y_train_counts = y_train.sum(axis = 0)
yhat_null = np.argmax(y_train_counts)

acc_null = y_test[:, yhat_null].sum()/y_test.sum()
acc_null

*****************
# Initialization
[TOP](#Keras)

I would not recommend actually fitting this model.
This is purely for expositional purposes.

I am going to show you how to add activation functions in multiple ways:

In [None]:
# For leaky ReLU and to show you how to adjust hyperparameters
from tensorflow.keras.layers import LeakyReLU

Let's determine our input and output `shape`.

In [None]:
x_train.shape

In [None]:
y_train.shape

Okay, let's define our first neural network!

In [None]:
model = keras.models.Sequential()
model.add(keras.layers.Input(shape = x_train.shape[1]))
model.add(keras.layers.Dense(300, activation = 'relu'))
model.add(keras.layers.Dense(200, activation = LeakyReLU(alpha = 0.1))) 
model.add(keras.layers.Dense(100, activation = 'elu'))
model.add(keras.layers.Dense(100, activation = keras.layers.ELU(alpha=1.0))) # alpha = 1 is default
model.add(keras.layers.Dense(y_train.shape[1], activation = 'softmax'))

Want to see something crazy?

In [None]:
model.summary()

Holy parameters, Batman!

Let's see how we got so many:

In [None]:
# dense - input layer
(1 + x_train.shape[1])*300

In [None]:
# dense_1
(1 + 300)*200

In [None]:
# dense_2
(1 + 200)*100

In [None]:
# dense_3
(1 + 100)*100

In [None]:
# dense_4 - output layer
(1 + 100)*y_train.shape[1]

Here is an alternative way to define a sequential NN:

In [None]:
model2 = keras.models.Sequential([
    keras.layers.Input(shape = x_train.shape[1]),
    keras.layers.Dense(300, activation = 'relu'),
    keras.layers.Dense(200, activation = LeakyReLU(alpha = 0.1)),
    keras.layers.Dense(100, activation = 'elu'),
    keras.layers.Dense(100, activation = keras.layers.ELU(alpha=1.0)),
    keras.layers.Dense(y_train.shape[1], activation = 'softmax')
])
model2.summary()

*********************
# Compilation
[TOP](#Keras)

I am going to show you three different ways to compile the model:

1. the easy way
2. with a custom optimizer
3. with a custom optimizer and custom learning rate

The easy way:

In [None]:
model.compile(loss = 'categorical_crossentropy', # training data
             metrics = ['accuracy', 'categorical_crossentropy'], # validation data
             optimizer = 'rmsprop')

Custom optimizer:

In [None]:
# Default RMSprop values
model.compile(loss = 'categorical_crossentropy', # training data
             metrics = ['accuracy'], # validation data
             optimizer = keras.optimizers.RMSprop(learning_rate = 0.001,
                                                 rho = 0.9,
                                                 momentum = 0.0,
                                                 epsilon = 1e-7))

Now we shall fit a custom learning rate schedule.
The default values for an exponential decay learning schedule is too extreme for our setting.
We are going to do some math to adjust the default values.

When we fit, we will have:

- all the training data over batches of size 32 (the default)
- 20% validation_fraction
- 30 epochs

In [None]:
# divide by batch size
# multiply by 1 minus validation size
# multiply by number of epochs
x_train.shape[0]/32*(1-0.2)*30

In [None]:
# Default values
lr_exp = keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate = 0.1,
    decay_steps = 100000,
    decay_rate = 0.96)

lr_exp = keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate = 0.1,
    decay_steps = 5000,
    decay_rate = 0.1)

In [None]:
model.compile(loss = 'categorical_crossentropy', # training data
             metrics = ['accuracy'], # validation data
             optimizer = keras.optimizers.RMSprop(learning_rate = lr_exp,
                                                 rho = 0.9,
                                                 momentum = 0.5,
                                                 epsilon = 1e-7))

*********
# Fitting 
[TOP](#Keras)

Before we can fit our final model, we will set up early stopping.
This is the same idea as in boosting ensembles.

In [None]:
# Callbacks
es = keras.callbacks.EarlyStopping(patience = 4)

Here we go!

In [None]:
history = model.fit(x_train, y_train,
                   batch_size = 32,
                   epochs = 30,
                   validation_split = 0.2,
                   callbacks = [es])

We can take a look at the performance of training over time:

In [None]:
history.history
train_results = pd.DataFrame(history.history)
train_results.head()

And plot it!

In [None]:
train_results.plot()

plt.grid(True)
plt.legend()

plt.ylim(0, 2)
plt.show()

***************
# Evaluation
[TOP](#Keras)

Evaluation is perhaps one of the easiest parts:

In [None]:
model_perf = model.evaluate(x_test, y_test)
model_perf

Let's save our accuracy

In [None]:
acc_nn = model_perf[1]
acc_nn

Yikes.
What was the null accuracy again?

In [None]:
acc_null

What was the percetage point gain from the null model?

In [None]:
acc_nn - acc_null

Well... at least it is positive...

****************
# Prediction
[TOP](#Keras)

And now for some fancy prediction

In [None]:
yhat = model.predict(x_test.iloc[0:5, :])
yhat
np.argmax(yhat, axis = 1)