# Build Machine Learning Dataset


This notebook is meant to describe aspects of RNN model architectures.  

In this project, we use the Functional API to tensorflow for 2 reasons:
1. It allows for dynamic hidden layer building. With the Functional API, we loop over a list of hidden layers and add an arbitrary number of layers. The Sequential API does not make this easy
2. When visualizing models with `model.summary()`, the Functional API treats the input as a layer and will show the input shape, while Sequential does not and thus makes the summary less informative

## Setup

In [None]:
import numpy as np
import tensorflow as tf
import sys
sys.path.append("../src")
import reproducibility
from utils import read_pkl, hash_weights

In [None]:
# dat = read_pkl("../data/test_data/test_rnn_dat.pkl")
# dat.scale_data()

## Input Shape

### Overview

All RNN layers have an input shape of `(batch_size, timesteps, features)`. This applies to SimpleRNN layers as well as LSTM and GRU. (Does NOT apply to attention layers and transformers). These shape hyperparameters control how data is fed into the network

* `batch_size`: the number of samples in a batch (aka minibatch) of training. After all samples in a batch are passed through the network *independently*, the loss is calculated and model weights are updated
* `timesteps`: the number of timesteps that defines a single sample input. Also referred to as "sequence length"
* `features`: the dimesionality of predictors/covariates/independent variables

So in a given batch, samples of shape `(timesteps, features)` are passed through the network. Each sample is processed by each recurrent cell (e.g. a LSTM cell). In tensorflow, the `Input()` layer is used to control the input shape. 

See [Keras documentation](https://keras.io/api/layers/core_layers/input/) for more details

### Flexible vs Fixed Input Shapes

The `batch_size` and the `timesteps` hyperparameters can either be fixed to an integer value, or be set to `None`. There are two programmatically equivalent ways to set `batch_size` to `None` in tensorflow:

* `Input(shape=(timesteps, features))`: implicitely sets `batch_size` to `None`
* `Input(batch_shape=(None, timesteps, features))`: explicitely sets `batch_size` to `None`

Further, there is the option to set the batch_size at the time of calling fit: `model.fit(batch_size=___)`. While it seems like this should be redundant, tensorflow has a way of dynamically adjusting things where if you don't explicitly set batch_size and give input data of different shapes, it will try to reconcile and make things work leading to unexpected results.

If these hyperparameters are set to None, there are different consequences.

* `batch_size`: If set to `None`, the network can accept input data with any positive integer number of batches.
    * The model still needs a batch size to process training gradient descent, and tensorflow will default to a `batch_size` of 32 unless otherwise directed. In tensorflow, `batch_size` is set in the `.fit(batch_size = __)` method if it was set to None initially
    * This *will NOT* work with a `stateful` model, which requires consistent batch sizes because it needs to know how to pass hidden states

* `timesteps`: If set to `None`, the network can accept input data with any positive integer number of timesteps

    * In practice it will be determined by the input array that is passed to the `.fit` or `.predict` call


In this project, we fix `batch_size` and `timesteps` during training since it allows for a more systematic hyperparameter tuning procedure. In other words, you can test various values of these hyperparameters and evaluate which leads to the most accurate models. However, when predicting with a trained model, we want to be able to forecast values at an arbitrary number of locations and arbitrarily far into the future. So for forecasting, we want these hyperparmeters to be `None`. Fortunately, these hyperparameters do not actually change the number of connections or weights within the network, so we can train a "training model" with the fixed hyperparameters, and then copy the weights over to a network with the same number of trainable parameters but with a more flexible input shape, the so-called "prediction model". 

Below we demonstrate the various input shapes and how they data can or cannot be passed through the network. We print a unique hash value of the model weights for each one to demonstrate that the weights of these networks are identical on initialization and following training.

In [None]:
# Hyperparameters to use below
features = 3
timesteps = 5
batch_size = 4

# Random Data of various shapes to illustrate compatibility
# Assume response variable is 1-d 

rand1 = np.random.randn(batch_size, timesteps, features)
yrand1 = np.random.randn(batch_size, timesteps, 1)

rand2 = np.random.randn(batch_size+5, timesteps, features)
yrand2 = np.random.randn(batch_size+5, timesteps, 1)

rand3 = np.random.randn(batch_size+5, timesteps+5, features)
yrand3 = np.random.randn(batch_size+5, timesteps+5, 1)

print(rand1.shape)
print(rand2.shape)
print(rand3.shape)

### Example: Stateful

Forces consistent batch size, will not process partial batches

In [None]:
reproducibility.set_seed(123)

inputs = tf.keras.Input(batch_shape=(batch_size, timesteps, features))
x = tf.keras.layers.SimpleRNN(4, stateful=True)(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model1 = tf.keras.Model(inputs, outputs, name = "Stateful")
model1.compile(loss = "mean_squared_error", optimizer="Adam")
print(f"Initial Weights Hash: {hash_weights(model1)}")

model1.summary()

In [None]:
model1.fit(x=rand1, y = yrand1)
print(f"Trained model weights: {hash_weights(model1)}")

In [None]:
try:
    model1.fit(x=rand2, y = yrand2)
except Exception as e:
    print("Error due to incompatible shapes")

In [None]:
try:
    model1.fit(x=rand3, y = yrand3)
except Exception as e:
    print("Error due to incompatible shapes")

### Example: Fixed Batch and Fixed Timesteps

Stateful set to default of `False`. The trained model is the same for the stateful model with the same data, but in this case the model can accept incomplete batches and process them without error.

In [None]:
reproducibility.set_seed(123)

inputs = tf.keras.Input(batch_shape=(batch_size, timesteps, features))
x = tf.keras.layers.SimpleRNN(2)(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model1 = tf.keras.Model(inputs, outputs, name = "Fixed_Batch-Fixed_Timesteps")
model1.compile(loss = "mean_squared_error", optimizer="Adam")
print(f"Initial Weights Hash: {hash_weights(model1)}")

model1.summary()

In [None]:
model1.fit(x=rand1, y = yrand1, batch_size = batch_size)
print(f"Trained model weights: {hash_weights(model1)}")

In [None]:
rand2.shape

In [None]:
reproducibility.set_seed(123)

inputs = tf.keras.Input(batch_shape=(batch_size, timesteps, features))
x = tf.keras.layers.SimpleRNN(2)(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model1 = tf.keras.Model(inputs, outputs, name = "Fixed_Batch-Fixed_Timesteps")
model1.compile(loss = "mean_squared_error", optimizer="Adam")
print(f"Initial Weights Hash: {hash_weights(model1)}")

model1.fit(x=rand2, y = yrand2, batch_size=batch_size)

print(f"Trained model weights: {hash_weights(model1)}")

In [None]:
rand3.shape

In [None]:
# # NOTE: data below throws error when FIRST call to fit, but not
# # if model has been fit with proper data already. TODO: Dig into this 
reproducibility.set_seed(123)

inputs = tf.keras.Input(batch_shape=(batch_size, timesteps, features))
x = tf.keras.layers.SimpleRNN(2)(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model1 = tf.keras.Model(inputs, outputs, name = "Fixed_Batch-Fixed_Timesteps")
model1.compile(loss = "mean_squared_error", optimizer="Adam")
print(f"Initial Weights Hash: {hash_weights(model1)}")
try:
    model1.fit(x=rand3, y = yrand3)
except Exception as e:
    print("Error due to incompatible shapes")

### Example: Flexible Batch, Fixed Timesteps

In [None]:
reproducibility.set_seed(123)

inputs = tf.keras.Input(batch_shape=(None, timesteps, features))
x = tf.keras.layers.SimpleRNN(2)(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model2 = tf.keras.Model(inputs, outputs, name = "Flexible_Batch-Fixed_Timesteps")
model2.compile(loss = "mean_squared_error", optimizer="Adam")
print(f"Initial Weights Hash: {hash_weights(model2)}")

model2.summary()

In [None]:
model2.fit(x=rand1, y = yrand1)

print(f"Trained model weights: {hash_weights(model2)}")

In [None]:
reproducibility.set_seed(123)

inputs = tf.keras.Input(batch_shape=(None, timesteps, features))
x = tf.keras.layers.SimpleRNN(2)(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model2 = tf.keras.Model(inputs, outputs, name = "Flexible_Batch-Fixed_Timesteps")
model2.compile(loss = "mean_squared_error", optimizer="Adam")
print(f"Initial Weights Hash: {hash_weights(model2)}")

model2.fit(x=rand2, y = yrand2, batch_size = None)

print(f"Trained model weights: {hash_weights(model2)}")

In [None]:
reproducibility.set_seed(123)

inputs = tf.keras.Input(batch_shape=(None, timesteps, features))
x = tf.keras.layers.SimpleRNN(2)(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model2 = tf.keras.Model(inputs, outputs, name = "Flexible_Batch-Fixed_Timesteps")
model2.compile(loss = "mean_squared_error", optimizer="Adam")
print(f"Initial Weights Hash: {hash_weights(model2)}")

model2.fit(x=rand2, y = yrand2, batch_size = rand2.shape[0])

print(f"Trained model weights: {hash_weights(model2)}")

In [None]:
# # NOTE: data below throws error when FIRST call to fit, but not
# # if model has been fit with proper data already. TODO: Dig into this 
reproducibility.set_seed(123)

inputs = tf.keras.Input(batch_shape=(None, timesteps, features))
x = tf.keras.layers.SimpleRNN(2)(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model2 = tf.keras.Model(inputs, outputs, name = "Fixed_Batch-Fixed_Timesteps")
model2.compile(loss = "mean_squared_error", optimizer="Adam")
print(f"Initial Weights Hash: {hash_weights(model1)}")
try:
    model2.fit(x=rand3, y = yrand3)
except Exception as e:
    print("Error due to incompatible shapes")

### Example 3: Flexible Batch Size, Flexible Timesteps

In [None]:
reproducibility.set_seed(123)

inputs = tf.keras.Input(batch_shape=(None, None, features))
x = tf.keras.layers.SimpleRNN(2)(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model3 = tf.keras.Model(inputs, outputs, name = "Flexible_Batch-Flexible_Timesteps")
model3.compile(loss = "mean_squared_error", optimizer="Adam")
print(f"Initial Weights Hash: {hash_weights(model3)}")

model3.summary()

In [None]:
model3.fit(x=rand1, y = yrand1)

print(f"Trained model weights: {hash_weights(model2)}")

In [None]:
reproducibility.set_seed(123)

inputs = tf.keras.Input(batch_shape=(None, None, features))
x = tf.keras.layers.SimpleRNN(2)(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model3 = tf.keras.Model(inputs, outputs, name = "Flexible_Batch-Flexible_Timesteps")
model3.compile(loss = "mean_squared_error", optimizer="Adam")
print(f"Initial Weights Hash: {hash_weights(model3)}")

model3.fit(x=rand2, y = yrand2, batch_size=None)

print(f"Trained model weights: {hash_weights(model3)}")

In [None]:
reproducibility.set_seed(123)

inputs = tf.keras.Input(batch_shape=(None, None, features))
x = tf.keras.layers.SimpleRNN(2)(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model3 = tf.keras.Model(inputs, outputs, name = "Flexible_Batch-Flexible_Timesteps")
model3.compile(loss = "mean_squared_error", optimizer="Adam")
print(f"Initial Weights Hash: {hash_weights(model3)}")

model3.fit(x=rand3, y = yrand3, batch_size=None)

## Return Sequences

All recurrent layers expect an input shape of `(batch_size, timesteps, features)`.  The output shape of the recurrent layer depends on the number of cells, or units, but also the `return_sequences` parameter. 
* If `return_sequences=True`, each recurrent cell will return a sequence of length `timesteps`
* If `return_sequences=False`, each recurrent cell will return only the last value in the sequences, so the max time step in `timesteps`

In [None]:
# Redefine Hyperparameters for clarity
features = 3
timesteps = 5
batch_size = 4

In [None]:
reproducibility.set_seed(123)

inputs = tf.keras.Input(batch_shape=(batch_size, timesteps, features))
x = tf.keras.layers.SimpleRNN(2, return_sequences=True)(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model1 = tf.keras.Model(inputs, outputs, name = "Return_Sequences_True")
model1.compile(loss = "mean_squared_error", optimizer="Adam")
print(f"Initial Weights Hash: {hash_weights(model1)}")

model1.summary()

In [None]:
reproducibility.set_seed(123)

inputs = tf.keras.Input(batch_shape=(batch_size, timesteps, features))
x = tf.keras.layers.SimpleRNN(2, return_sequences=False)(inputs)
outputs = tf.keras.layers.Dense(1)(x)
model2 = tf.keras.Model(inputs, outputs, name = "Return_Sequences_False")
model2.compile(loss = "mean_squared_error", optimizer="Adam")
print(f"Initial Weights Hash: {hash_weights(model1)}")

model2.summary()

In [None]:
rand1.shape

In [None]:
yrand1.shape

In [None]:
reproducibility.set_seed(123)
model1.fit(rand1, yrand1)
print(f"Trained Model Weights: {hash_weights(model1)}")

In [None]:
reproducibility.set_seed(123)
model2.fit(rand1, yrand1)
print(f"Trained Model Weights: {hash_weights(model2)}")

In [None]:
rand1.shape

In [None]:
preds1 = model1.predict(rand1)
preds2 = model2.predict(rand1)
print(f"{preds1.shape=}")
print(f"{preds2.shape=}")

In [None]:
rand2.shape

In [None]:
preds1 = model1.predict(rand2)
preds2 = model2.predict(rand2)
print(f"{preds1.shape=}")
print(f"{preds2.shape=}")

In [None]:
rand3.shape

In [None]:
preds1 = model1.predict(rand3)
preds2 = model2.predict(rand3)
print(f"{preds1.shape=}")
print(f"{preds2.shape=}")