Supplement to Part 3 of **time series forecasting with energy**

In [0]:
# LOAD THE REPOSITORY
# if you are working from outside the repository
# this happens if you use colab like I do
!git clone https://github.com/sandeshbhatjr/energy-prediction.git

# Basic Deep Learning Models

Deep learning has shown promising results in many areas of machine learning problems, and it is natural to wonder if it can have any significant impact in the time-series forecasting arena. In the image and NLP domains, convNets and LSTMs respectively reign supreme, and the question is- what is the relevant architecture for a SOTA time-series prediction?

In [0]:
%tensorflow_version 2.x
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import SimpleRNN
from tensorflow.keras.layers import GRU
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import BatchNormalization

**Some preprocessing for NNs**

The preprocessing is pretty standard: we scale all continuous features by variance since NNs are extremely sensitive to scale, and one-hot encode the categoric variables.

In [0]:
# NNs are sensitive to range, so rescale
mean = german_df_with_load_and_gen['Day Ahead Price'].mean()
var = german_df_with_load_and_gen['Day Ahead Price'].var()
german_df_with_load_and_gen['Day Ahead Price (rescaled)'] = (german_df_with_load_and_gen['Day Ahead Price'] - mean) / var

mean = german_df_with_load_and_gen['Total Load'].mean()
var = german_df_with_load_and_gen['Total Load'].var()
german_df_with_load_and_gen['Total Load (rescaled)'] = (german_df_with_load_and_gen['Total Load'] - mean) / var

mean = german_df_with_load_and_gen['Total Generation'].mean()
var = german_df_with_load_and_gen['Total Generation'].var()
german_df_with_load_and_gen['Total Generation (rescaled)'] = (german_df_with_load_and_gen['Total Generation'] - mean) / var

We have a few categorical variables in our data: deep models will need them to be one-hot encoded for use.

In [0]:
ohe_holiday_df = pd.get_dummies(german_df_with_load_and_gen[['DE', 'LU']])
ohe_bidding_zones_df = pd.get_dummies(german_df_with_load_and_gen['Bidding Zone'])

proc_german_df = pd.concat([german_df_with_load_and_gen, ohe_bidding_zones_df], axis=1)
proc_german_df2 = pd.concat([proc_german_df, ohe_holiday_df], axis=1)

Finally, we split the dataset into a a training and validation set.

In [0]:
train_df, val_df = test_train_timesplit(proc_german_df2, train_size=0.9, test_size=0.1)

In [0]:
# convert each hour to multi-sequential form
price_train_X_data = extractHourlyData(train_df, ['Day Ahead Price (rescaled)', 'Day of Week'] + sorted(ohe_bidding_zones_df) + sorted(ohe_holiday_df))
price_train_y_data = extractHourlyData(train_df, ['Day Ahead Price (rescaled)'])
price_val_X_data = extractHourlyData(val_df, ['Day Ahead Price (rescaled)', 'Day of Week'] + sorted(ohe_bidding_zones_df) + sorted(ohe_holiday_df))
price_val_y_data = extractHourlyData(val_df, ['Day Ahead Price (rescaled)'])

## Feedforward NN
The simplest deep model that we can come up with is a feed-forward neural network that takes into account the time and date of the price, and the bidding zone. This does not take into account the recent history of the prices, hence is not capable of noticing local trends in price based on previous day prices.

In [0]:
X_train = train_df[[
                  'Day', 
                  'Month', 
                  'Year', 
                  'Hour', 
                  'Day of Week', 
                  'Daylight Savings Time'] \
                   + sorted(ohe_bidding_zones_df) + sorted(ohe_holiday_df)].to_numpy()
y_train = train_df['Day Ahead Price (rescaled)'].to_numpy()

# define model
ffmodelA = Sequential()
ffmodelA.add(Dense(80))
ffmodelA.add(Dense(30))
ffmodelA.add(Dense(1))

# train the model
ffmodelA.compile(loss='mae', optimizer='Adam')
ffmodelA.fit(X_train, y_train, epochs=50, validation_split=0.01, verbose=0)

## RNN

First up, we will tackle standard architectures for sequential data: RNN, GRU and LSTM. `Keras` makes it a relatively easy task to set them up, and see how they perform.

RNN is the most basic one amongst them, though one shortcoming may be that it only takes into account the previous few terms in the sequence; so weekly, monthly and annual patterns are probably not accounted for by this model. But enough of speculations- it's time to see how it actually performs.

In [0]:
# generate windowed datasets
window_size = 30

generator = TimeseriesGenerator(
    price_train_X_data, 
    price_train_y_data, 
    length = window_size, 
    sampling_rate = 1,
    batch_size = 32,
    shuffle = True)

val_generator = TimeseriesGenerator(
    price_val_X_data, 
    price_val_y_data, 
    length = window_size, 
    sampling_rate = 1,
    batch_size = 32)

rmodelA = Sequential()
rmodelA.add(SimpleRNN(20))
rmodelA.add(Dense(24, activation='linear'))

rmodelA.compile(loss='mae', optimizer='Adam')
rmodelA_history = rmodelA.fit(generator, validation_data=val_generator, epochs=5000, verbose=0)

plt.figure(figsize=(15,8))
plt.plot(rmodelA_history.history['loss'])
plt.plot(rmodelA_history.history['val_loss'])

In [0]:
rmodelA_error = error_analysis(lambda X: rmodelA.predict(X))
rmodelA_error.total_smape(val_generator)

## Some attention: LSTM and GRU

### GRU

In [0]:
window_size = 7

generator = TimeseriesGenerator(
    price_train_X_data, 
    price_train_y_data, 
    length = window_size, 
    sampling_rate = 1,
    batch_size = 16)

val_generator = TimeseriesGenerator(
    price_val_X_data, 
    price_val_y_data, 
    length = window_size, 
    sampling_rate = 1,
    batch_size = 16)

rmodelB = Sequential()
rmodelB.add(GRU(100, dropout=0.3, return_sequences=True))
rmodelB.add(BatchNormalization())
rmodelB.add(GRU(100, dropout=0.3, return_sequences=True))
rmodelB.add(BatchNormalization())
rmodelB.add(GRU(50, dropout=0.2, return_sequences=True))
rmodelB.add(BatchNormalization())
rmodelB.add(GRU(24))

rmodelB.compile(loss='mae', optimizer='Adam')

rmodelB_history = rmodelB.fit(generator, validation_data=val_generator, epochs=100, verbose=0)

plt.figure(figsize=(15,8))
plt.plot(rmodelB_history.history['loss'])
plt.plot(rmodelB_history.history['val_loss'])

In [0]:
rmodelB_error = error_analysis(lambda X: rmodelB.predict(X))
rmodelB_error.total_smape(val_generator)

### LSTM

In [0]:
# generate windowed datasets
window_size = 3

generator = TimeseriesGenerator(
    price_train_X_data, 
    price_train_y_data, 
    length = window_size, 
    sampling_rate = 1,
    batch_size = 4)

val_generator = TimeseriesGenerator(
    price_val_X_data, 
    price_val_y_data, 
    length = window_size, 
    sampling_rate = 1,
    batch_size = 4)

rmodelC = Sequential()
# rmodelC.add(LSTM(200, dropout=0.1, return_sequences=True))
# rmodelC.add(BatchNormalization())
rmodelC.add(LSTM(50, dropout=0.1, return_sequences=True))
rmodelC.add(BatchNormalization())
rmodelC.add(LSTM(50, dropout=0.1, return_sequences=True))
rmodelC.add(BatchNormalization())
rmodelC.add(LSTM(24))

rmodelC.compile(loss='mae', optimizer='Adam')

rmodelC_history = rmodelC.fit(generator, validation_data=val_generator, epochs=100, verbose=0)

plt.figure(figsize=(15,8))
plt.plot(rmodelC_history.history['loss'])
plt.plot(rmodelC_history.history['val_loss'])

In [0]:
rmodelC_error = error_analysis(lambda X: rmodelC.predict(X))
rmodelC_error.total_smape(val_generator)

## Convolutional LSTM

# Inception Time

InceptionNet is one of the SOTA models in image classification, which uses a very wide CNN with the help of depthwise 1D-convolution to improve the accuracy of the model. The ...

# LSTM-MS Net

The SOTA approach for multi-seasonal sequences is supposed to be the LSTM-MSNet model. The idea here is quite simple; deseasonalise the time-series, then feed into a deep neural network, and then seasonalise it again.

# Bibliography

**[CHN11]** Sven F. Crone, Michèle Hibon, Konstantinos Nikolopoulos, *Advances in forecasting with neural networks? Empirical evidence from the NN3 competition on time series prediction*, International Journal of Forecasting
Volume 27, Issue 3, July–September 2011, Pages 635-660

**[JFB17]** J. Lago, F. D. Ridder, B. D. Schutter, *Forecasting spot electricity prices: Deep learning approaches and empirical comparison of traditional algorithms*, doi:https://doi.org/10.1016/j.apenergy.2018.02.069 (2017).

**[BBH19]** K. Bandara, C. Bergmeir, H. Hewamalage, *LSTM-MSNet: Leveraging Forecasts on Sets of
Related Time Series with Multiple Seasonal Patterns* arXiv:1909.04293, Sep. (2019). [https://arxiv.org/pdf/1909.04293.pdf]

Supplements:  
[The Uber approach and M4 winner: ES-RNN]()  

Next part: Analysis and discussion `WIP`