 # Task 1 Lecture Examples
 The code in this notebook is closely aligned to the code shown in the lecture. 
 Here, we use an artificial time series data set. This is based data is roughly based a sine wave. We run three Deep Learning based models on that using a sliding window approach:
 - Fully Connected
 - Convolutional (Operating in only 1 dimension, not like images where it is two dimensions) 
 - Recurrent NN.

**TASK**: Work through the code and try to understand it. After that, try the **Exercises** as written in the Lab Week 10 Instructions. 

**NOTE**: Some parts of the code are outlined with the keyword `ADVANCED CODE`. You do not need to try to understand what this part of the code does, simply read the comment next to it.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

import tensorflow
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing import timeseries_dataset_from_array

# setting the random seeds
tensorflow.random.set_seed(42)
np.random.seed(42)

# turn scientific notation off
#  this is useful for printing out arrays as otherwise they are printed similar to 1.23456789e+01 which is not very readable
np.set_printoptions(suppress=True)

# setting the size of the plots
plt.rcParams['figure.figsize'] = [16, 4]

# Data Ingestion and Prepraration
Usually we would use the data from some source, but for this example we will generate it ourselves.
We will generate a sine wave with some noise and an upward trend and use it as our univatirate time series data. If you run the code below, you will see how this data looks. The x axis is the time step and the y axis is the value of the sine wave at that time step. 
The goal is for the model to predict the next value in the sequence and learn to ignore the noise. 

In [None]:
# ADVANCED CODE: create a time series with trend and seasonality (e.g. create artificial data)
def create_time_series(length, trend, seasonality, noise):
    time = np.arange(length)
    season = seasonality * np.sin(time / seasonality)
    trend = trend * time
    noise = noise * np.random.randn(length)

    # normalize
    series = (season + trend + noise)
    mean = series.mean()
    std = series.std()
    series = (series - mean) / std

    return time, series

# plot the time series data
time, series = create_time_series(length=4000, trend=0.1, seasonality=50, noise=3)
plt.plot(time, series)
plt.legend(['Variable 1'])
plt.show()


In the lecture we covered varying ways to create predictions or to partition the data.
For this example, we will use a train/val/test split. 

This works, because we a) have a lot of data and b) we are using a sliding window approach.

If we had a much smaller dataset, it would be more beneficial to use a cross-validation approach, adapted for time series data.

Recall, the sliding window approach is to use a set number of previous time steps to predict the next time step. The other approach we looked at in the lecture was to use **all** previous time steps to predict the next time step. The sliding window approach allows us to use the same number of time steps for all predictions, which is useful for the neural networks, including fully connected ones. The other approach would require us to use a different number of time steps for each prediction. This is not possible with fully connected networks. It also is more efficient and allows us also to make efficient use of the validation set. 

What we need to define is the *window size* that we want to look at in the past to predict the next time step.

For example, if we have a sequence of 4 time steps such as [1, 2, 3, 4, 5], and we want to use the previous 2 time steps (e.g. window size = 2) to predict the next time step, we would have the following input/output pairs:
- [1, 2] -> 3
- [2, 3] -> 4
- [3, 4] -> 5

We can use the keras function timeseries_dataset_from_array to create the dataset for us, after we have extracted the labels. 

In [None]:
 # simple train/val/test split. Note, you could use any arbitrary split you want which is suited for your problem.
 # we use all but the last 1000 time steps for training
train_series = series[:-1000]
# from the last 1000 steps we use the first 500 for validation
val_series = series[-1000:-500]
# and the last 500 for testing
test_series = series[-500:]

# Note: It is always useful to see with how many time steps you are working with in total and to print the shape of the data
print("Series Length")
print("Train:", train_series.shape)
print("Val:", val_series.shape)
print("Test:", test_series.shape)


# the size of the window or length of the sequence we look at
sequence_length = 45

# labels : shifted by sequence length
# what [sequence_length:] does is to start the labels at the index where the sequence ends
# in the [1, 2, 3, 4, 5] example, the labels would be [3, 4, 5] as we use a window size of 2, therefore starting with the third element as the first label
train_labels = train_series[sequence_length:]
val_labels = val_series[sequence_length:]
test_labels = test_series[sequence_length:]

# limit the training series to all but the last value, as this is the label and would otherwise cause issues with the RNN. 
train_series = train_series[:-1]


# Datasets for tensorflow: https://www.tensorflow.org/api_docs/python/tf/keras/utils/timeseries_dataset_from_array 
# A batch size of 8 is used and the sampling rate is 1, meaning that we use every time step in the sequence
train_dataset = timeseries_dataset_from_array(train_series, train_labels, 
        sequence_length=sequence_length, sampling_rate=1, 
        batch_size=8, shuffle=True)
val_dataset = timeseries_dataset_from_array(val_series, val_labels,
        sequence_length=sequence_length, sampling_rate=1,
        batch_size=8, shuffle=True)
test_dataset = timeseries_dataset_from_array(test_series, test_labels,
        sequence_length=sequence_length, sampling_rate=1,
         batch_size=8)

# Let's look at the first sample of the training dataset
for batch in train_dataset.take(1):
    inputs, targets = batch
    print("Input shape:", inputs.numpy().shape)
    print("Target shape:", targets.numpy().shape)
    print("Input Sample:", inputs.numpy()[0])
    print("Input Sample Label:", targets.numpy()[0])

In [None]:
# ADVANCED CODE: printing an example of the model's predictions based on a random sample
def print_prediction(model, dataset):
    """prints the model's predictions for a random sample from the dataset

    Args:
        model (keras.Model): Model that creates the prediction
        dataset (tensorflow.datasets.dataset): Dataset to sample from
    """
    for batch in dataset.take(1):
        inputs, targets = batch
        print("Input shape:", inputs.numpy().shape)
        print("Target shape:", targets.numpy().shape)
        print("Input Sample:", inputs.numpy()[0])
        print("Input Sample Label:", targets.numpy()[0])
        print("Predictions:", model.predict(inputs).flatten()[0])

# ADVANCED CODE: plotting the model's predictions for a certain number of steps
def plot_results(model, test_series, sequence_length, n=1):
    """Plots the model's predictions for the next n steps

    Args:
        model (keras.Model): Model that creates the prediction
        test_series (numpy.array): Test data
        sequence_length (int): Length of the sequence that the model uses to predict the next step
        n (int, optional): Number of steps to predict. Defaults to 1.
    """
    if n > len(test_series) - sequence_length:
        n = len(test_series) - sequence_length

    time = np.arange(len(test_series[:sequence_length+n]))
    forecast = []
    for step in range(n):       
        # create the sequence from the test series of length sequence_length
        test_d = np.expand_dims(test_series[step:step + sequence_length], axis=[0, -1])
        # forecast the next step
        prediction = model.predict(test_d, verbose=0)

        forecast.append(prediction)
        model.reset_states()
    
    # plot test series
    plt.plot(time, test_series[:sequence_length+n], label='Test Series')
    plt.plot(time[sequence_length:], np.squeeze(forecast), label="Forecast")
    
    return (time, forecast)

# Fully Connected Neural Network

Fully connected neural networks can be used for time series forecasting, however they are not the best choice. This is because they do not take into account the sequentail nature of the data, meaning they weigh all time steps equally.
In this example, we use a simple network that takes the sequence as an input, then has 128 fully connected units before passing it to the single output unit. Note that we do not use an output activation. This is because in our example here, we have a regresssion problem: We want to predict a continous valued number. 

We use the huber loss, as it is a very robust regression loss. You can read more about it [here](https://en.wikipedia.org/wiki/Huber_loss). 


In [None]:
# input shape is set to be the length of the sequence
model = keras.Sequential([
    layers.Dense(64, activation="relu", input_shape=[sequence_length]),
    layers.Dense(1)])
model.summary()


model.compile(loss="huber", optimizer="adam")
history = model.fit(train_dataset, epochs=10, validation_data=val_dataset)    

# refit the model on the entire training dataset
history = model.fit(train_dataset.concatenate(val_dataset), epochs=10)


print_prediction(model, test_dataset)
time, fc_predictions = plot_results(model, test_series, sequence_length, n=500) 


# Convolutional Neural Network
This model example uses a convolutional layer. Remember, convolutional layers slide a window over the input data and apply a function to the window. This is very helpful with sequential data as different time steps within the sequence are taken into account. It can therefore model dependencies between time steps. This example simply uses a single convolutional layer with a window size of 7 and a stride of 1. It applies 64 filters to the input data before we pass its output to a dense layer.

We need to reshape the input data to be 3-dimensional as the convolutional layer expects a 3-dimensional input, e.g. (batch_size, sequence_length, number of features).  In our case, we only have one feature, so the input shape is (batch_size, sequence_length, 1), with the batch size being ignored in the input shape parameter.

In [None]:
model = keras.Sequential([
    layers.Input(shape=[sequence_length, 1]),
    layers.Conv1D(filters=64, kernel_size=7, strides=1, activation="relu"),
    layers.Flatten(),
    layers.Dense(1)])


model.summary()


model.compile(loss="huber", optimizer="adam")

history = model.fit(train_dataset, epochs=10, validation_data=val_dataset, verbose=1)

# refit the model on the entire training dataset
history = model.fit(train_dataset.concatenate(val_dataset), epochs=10)


print_prediction(model, test_dataset)
_ , cnn_predictions = plot_results(model, test_series, sequence_length, n=500) 



# Recurrent Neural Network

The last model is a recurrent neural network with a single LSTM layer. The LSTM layer takes the whole sequence as input. It is much more powerful than the previous models, but it is also much slower to train.

In [None]:
model = keras.Sequential([
    layers.LSTM(32, input_shape=[sequence_length, 1]),
    layers.Dense(1)])

model.summary()
model.compile(loss="huber", optimizer="adam")

# refit the model on the entire training dataset
history = model.fit(train_dataset.concatenate(val_dataset), epochs=10)



history = model.fit(train_dataset, epochs=10, validation_data=val_dataset, verbose=1)
_ , rnn_predictions = plot_results(model, test_series, sequence_length, n=500) 

# Comparing The Results
We can then compare the results of each model to visualize the predictions.

In [None]:
# calculate the mean average percentage error

def mape(y_true, y_pred):
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

print(test_series.shape)
print(np.array(fc_predictions).shape)

fc_mape = mape(test_series[sequence_length:], np.squeeze(fc_predictions))
cnn_mape = mape(test_series[sequence_length:], np.squeeze(cnn_predictions))
rnn_mape = mape(test_series[sequence_length:], np.squeeze(rnn_predictions))

print("Fully Connected mape:", fc_mape)
print("CNN mape:", cnn_mape)
print("RNN mape:", rnn_mape)

plt.plot((time), test_series, label='Test Series')
plt.plot(time[sequence_length:], np.squeeze(fc_predictions), label="FC Forecast")
plt.plot(time[sequence_length:], np.squeeze(cnn_predictions), label="CNN Forecast")
plt.plot(time[sequence_length:], np.squeeze(rnn_predictions), label="RNN Forecast")
plt.legend()
