# Build Machine Learning Dataset


This notebook is meant to describe aspects of RNN model architectures.  

In this project, we use the Functional API to tensorflow for 2 reasons:
1. It allows for dynamic hidden layer building. With the Functional API, we loop over a list of hidden layers and add an arbitrary number of layers. The Sequential API does not make this easy
2. When visualizing models with `model.summary()`, the Functional API treats the input as a layer and will show the input shape, while Sequential does not and thus makes the summary less informative

## Setup

In [None]:
import numpy as np
import tensorflow as tf
import sys
sys.path.append("../src")
import reproducibility
from utils import read_pkl, hash_weights

In [None]:
dat = read_pkl("../data/test_data/test_rnn_dat.pkl")
dat.scale_data()

## Input Shape

### Overview

All RNN layers have an input shape of `(batch_size, timesteps, features)`. This applies to SimpleRNN layers as well as LSTM and GRU. (Does NOT apply to attention layers and transformers). These shape hyperparameters control how data is fed into the network

* `batch_size`: the number of samples in a batch (aka minibatch) of training. After all samples in a batch are passed through the network *independently*, the loss is calculated and model weights are updated
* `timesteps`: the number of timesteps that defines a single sample input. Also referred to as "sequence length"
* `features`: the dimesionality of predictors/covariates/independent variables

So in a given batch, samples of shape `(timesteps, features)` are passed through the network. Each sample is processed by each recurrent cell (e.g. a LSTM cell). In tensorflow, the `Input()` layer is used to control the input shape. 

See [Keras documentation](https://keras.io/api/layers/core_layers/input/) for more details

### Flexible vs Fixed Input Shapes

The `batch_size` and the `timesteps` hyperparameters can either be fixed to an integer value, or be set to `None`. There are two programmatically equivalent ways to set `batch_size` to `None` in tensorflow:

* `Input(shape=(timesteps, features))`: implicitely sets `batch_size` to `None`
* `Input(batch_shape=(None, timesteps, features))`: explicitely sets `batch_size` to `None`

If these hyperparameters are set to None, there are different consequences.

* `batch_size`: If set to `None`, the network can accept input data with any positive integer number of batches.
    * The model still needs a batch size to process training gradient descent, and tensorflow will default to a `batch_size` of 32 unless otherwise directed. In tensorflow, `batch_size` is set in the `.fit(batch_size = __)` method if it was set to None initially
    * This *will NOT* work with a `stateful` model, which requires consistent batch sizes because it needs to know how to pass hidden states

* `timesteps`: If set to `None`, the network can accept input data with any positive integer number of timesteps

    * In practice it will be determined by the input array that is passed to the `.fit` or `.predict` call


In this project, we fix `batch_size` and `timesteps` during training since it allows for a more systematic hyperparameter tuning procedure. In other words, you can test various values of these hyperparameters and evaluate which leads to the most accurate models. However, when predicting with a trained model, we want to be able to forecast values at an arbitrary number of locations and arbitrarily far into the future. So for forecasting, we want these hyperparmeters to be `None`. Fortunately, these hyperparameters do not actually change the number of connections or weights within the network, so we can train a "training model" with the fixed hyperparameters, and then copy the weights over to a network with the same number of trainable parameters but with a more flexible input shape, the so-called "prediction model". 

Below we demonstrate the various input shapes and how they data can or cannot be passed through the network. We print a unique hash value of the model weights for each one to demonstrate that the weights of these networks are identical on initialization and following training.

In [None]:
# Hyperparameters to use below
features = 3
timesteps = 12
batch_size = 10

# Random Data of various shapes to illustrate compatibility
# Assume response variable is 1-d 

rand1 = np.random.randn(batch_size, timesteps, features)
yrand1 = np.random.randn(batch_size, timesteps, 1)

rand2 = np.random.randn(batch_size+5, timesteps, features)
yrand2 = np.random.randn(batch_size+5, timesteps, 1)

rand3 = np.random.randn(batch_size+5, timesteps+5, features)
yrand3 = np.random.randn(batch_size+5, timesteps+5, 1)

print(rand1.shape)
print(rand2.shape)
print(rand3.shape)

### Example: Stateful

In [None]:
reproducibility.set_seed(123)

inputs = tf.keras.Input(batch_shape=(batch_size, timesteps, features))
x = tf.keras.layers.LSTM(4, stateful=True)(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model1 = tf.keras.Model(inputs, outputs, name = "Stateful")
model1.compile(loss = "mean_squared_error", optimizer="Adam")
print(f"Initial Weights Hash: {hash_weights(model1)}")

model1.summary()

In [None]:
model1.fit(x=rand2, y = yrand2)

In [None]:
try:
    model1.fit(x=rand2, y = yrand2)
except Exception as e:
    print("Error due to incompatible shapes")

### Example: Fixed Batch and Fixed Timesteps

Fixed batch size is *required* for stateful models.

In [None]:
reproducibility.set_seed(123)

inputs = tf.keras.Input(batch_shape=(batch_size, timesteps, features))
x = tf.keras.layers.LSTM(4)(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model1 = tf.keras.Model(inputs, outputs, name = "Fixed_Batch-Fixed_Timesteps")
model1.compile(loss = "mean_squared_error", optimizer="Adam")
print(f"Initial Weights Hash: {hash_weights(model1)}")

model1.summary()

In [None]:
rand2.shape

In [None]:
model1.fit(x=rand2, y = yrand2)

### Example: Flexible Batch, Fixed Timesteps

In [None]:
reproducibility.set_seed(123)

inputs = tf.keras.Input(batch_shape=(None, timesteps, features))
x = tf.keras.layers.LSTM(4)(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model2 = tf.keras.Model(inputs, outputs, name = "Flexible_Batch-Fixed_Timesteps")
print(f"Initial Weights Hash: {hash_weights(model2)}")

model2.summary()

### Example 3: Flexible Batch Size, Flexible Timesteps

In [None]:
reproducibility.set_seed(123)

inputs = tf.keras.Input(batch_shape=(None, None, features))
x = tf.keras.layers.LSTM(4)(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model3 = tf.keras.Model(inputs, outputs, name = "Flexible_Batch-Flexible_Timesteps")
print(f"Initial Weights Hash: {hash_weights(model3)}")

model3.summary()

## Return Sequences