# First Approach to a univariate Long-Short-Term Memory model for predicting the solar output  #

For our optimisation we need solar output predictions. In this notebook we will use a univariate Long-Short-Term Memory model to predict the solar output. 


 https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM


First install all the dependencies:

In [None]:
import pandas as pd
import numpy as np
import os

import plotly.express as px
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import seaborn as sns

from itertools import permutations

from sklearn.metrics import mean_squared_error
from math import sqrt
from statsmodels.tsa.stattools import adfuller,kpss
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.graphics.tsaplots import plot_pacf

import statsmodels.graphics.tsaplots as tsaplot
from statsmodels.tsa.holtwinters import Holt, ExponentialSmoothing, SimpleExpSmoothing

import tensorflow as tf

from tensorflow import keras

from tensorflow.keras import layers
from tensorflow.keras import regularizers
from tensorflow.keras.utils import plot_model

import keras 
from keras.models import Sequential # intitialize the ANN
from keras.layers import Dense, Activation, Dropout, LSTM , Conv1D, MaxPooling1D  # create layers


from sklearn.model_selection import train_test_split
 
np.random.seed(42)
tf.random.set_seed(42)


We will start with loading the pickle file with our full dataset into this notebook. 



In [None]:
df = pd.read_pickle("../data/final_dataframe.pkl")

The column names are not very easy to work with and can be a bit hard to read. Therefore we will rename them to make them easier to read.

In [None]:
def col_names(df):
    ''' this function renames the columns to make them easier to read 
      additionally set the date as index in our dataframe'''
    column_names = {'Photovoltaics [MWh] Original resolutions': 'Solar_generation_MWh',
                'Photovoltaics [MW] Calculated resolutions': 'Solar_installed_MW',
                'Total (grid load) [MWh] Original resolutions': 'Total_consumption_MWh',
                'Germany/Luxembourg [€/MWh] Calculated resolutions': 'DE_LU_price_per_MWh',}
    df.rename(columns=column_names, inplace=True)
    #df.set_index('Date', inplace=True)
    return df

col_names(df)


Now we can already split the data into train and test set. Important to note here is that the shuffle has to be false otherwise the split is not appropriate for time series analysis. I will use the previously defined approach from the 20230704_train_val_test_split notebook 

In [None]:
# I have a huge problem with the 0 therfore I will add 1 to all my datapoints 
#df['Solar_generation_MWh_normalized'] = df['Solar_generation_MWh_normalized'] + 1

In [None]:
target = ['Date', 'Solar_generation_MWh_normalized']

In [None]:
def test_train_timeseries(df):
    ''' In the first part we select the train and test data.
    In the second per the columns we want to use for our predictions '''
    
    test = df[df.Date >= '2022-06-01']
    train = df[df.Date < '2022-06-01']

    # now we select the columns we want to use for our predictions

    test = test[target]
    train = train[target]
    return test, train

test, train = test_train_timeseries(df)

In [None]:
#Alternatively we could also use 
#train, test = train_test_split(df[['Solar_generation_MWh_normalized']], test_size=.25, shuffle=False)

In [None]:
test

In [None]:
train

I worked nicely

In [None]:
#Let's scale the data
from sklearn.preprocessing import StandardScaler, MinMaxScaler

In [None]:
#Scale the data for the model #! Test MinMaxScaler 

scalabe = ['Solar_generation_MWh_normalized']

scaler = StandardScaler()
train[scalabe] = scaler.fit_transform(train[scalabe])
test[scalabe] = scaler.transform(test[scalabe])

In [None]:
test

old version ....
 split a univariate sequence into samller samples to feed into the LSTM
def split_sequence(input, n_steps, pred_size):
    x, y = list(), list()
    for i in range(len(input)):
        end_ix = i + n_steps # find the end of this pattern
        if end_ix+pred_size > len(input)-1: # check if we are beyond the sequence
            break
        seq_x, seq_y = input[i:end_ix], input[end_ix: end_ix+pred_size]# gather input and output parts of the pattern
        x.append(seq_x)
        y.append(seq_y)
    return np.array(x), np.squeeze(np.array(y))

In [None]:
# split a univariate sequence into samller samples to feed into the LSTM
def split_sequence(input, n_steps, pred_size, target = []):
    ''' This function will split our timeseries into supervised timeseries snipets. 
    input = dataframe to be split
    n_steps = length of the X_variable 
    pred_size = length of the y_variable
    features = list of targets to be split
    At the same time we will collect the corresponding timestamps in two additional arrays '''
    input_array = input[target]
    date_array = input['Date']

    x_index, y_index = list(), list()
    x, y = list(), list()
    for i in range(len(input_array)):
        end_ix = i + n_steps # find the end of this pattern
        if end_ix+pred_size > len(input)-1: # check if we are beyond the sequence
            break
        seq_x, seq_y = input_array[i:end_ix], input_array[end_ix: end_ix+pred_size]# gather input and output parts of the pattern
        ind_x, ind_y = date_array[i:end_ix], date_array[end_ix: end_ix+pred_size]# gather input and output Dates of the pattern
        x.append(seq_x)
        y.append(seq_y)
        x_index.append(ind_x)
        y_index.append(ind_y)

    
    return np.array(x), np.squeeze(np.array(y)), np.array(x_index), np.squeeze(np.array(y_index)) 

In [None]:
# define input sequence
input = train
# choose a number of time steps
n_steps = 672
# prediction size 
pred_size= 96

target = ['Solar_generation_MWh_normalized']

# split into samples
X, y, X_train_index, Y_train_index = split_sequence(input, n_steps, pred_size, target)
# summarize the data
print(len(X), len(y))


In [None]:
print(X.shape, y.shape)

print(X_train_index.shape, Y_train_index.shape)

In [None]:
X_test, y_test, X_test_index, Y_test_index = split_sequence(test , n_steps, pred_size, target)

In [None]:
print(X_test.shape, y_test.shape)

print(X_test_index.shape, Y_test_index.shape)

In [None]:
#Now we have to define the validation set for our model #! I see this approach is not so useful, therefore I will use the train test split with shuffling to obtain the validation data. Here i am not loosing the lateest data for training my model 
def val_set(X,y):
    X, X_val, y, y_val = train_test_split(X, y, test_size=0.2, shuffle=False)
    return X, X_val, y, y_val
    #! old approach
    #train_size = round(len(X) * 0.8)
    #X = X[:train_size, :]
    #X_val = X[train_size:, :]
    #y = y[:train_size, :]
    #y_val = y[train_size:, :]
    
X, X_val, y, y_val = val_set(X, y)

In [None]:
y_val.shape

In [None]:
# reshape from [samples, timesteps] into [samples, timesteps, features]
 #! correction still necessary

def reshape_for_LSTM(X, y, features):
    features
    X = X.reshape((X.shape[0], X.shape[1], features))
    y = y.reshape((y.shape[0], y.shape[1]))
    return X, y

In [None]:
X, y = reshape_for_LSTM(X, y, 1)

In [None]:
X_val, y_val = reshape_for_LSTM(X_val, y_val, 1)

In [None]:
X_test, y_test = reshape_for_LSTM(X_test, y_test, 1)

In [None]:
X_val.shape

In [None]:
X.shape

## Lets start the modeling approach using the Long short term memory model ##



In [None]:
# Define dictionary to store results
history = {}

# Define number of epochs and learning rate decay
N_TRAIN = len(X)
EPOCHS = 50
BATCH_SIZE = 2600 # total sample size = 97593 each batch 2600 samples (49 batches ) #! has to be adjusted further to improve
STEPS_PER_EPOCH = N_TRAIN // BATCH_SIZE
lr_schedule = tf.keras.optimizers.schedules.InverseTimeDecay( 
    0.001,  #! please adjust and finetune ? should be fine like this 
    decay_steps=STEPS_PER_EPOCH*1000,
    decay_rate=1,
    staircase=False)


# Define optimizer used for modelling
optimizer = tf.keras.optimizers.legacy.Adam(learning_rate=lr_schedule, name='Adam')  # due to a warning message I used the legacy.Adam 

In [None]:
# Define path where checkpoints should be stored
checkpoint_path = "modeling/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)

# Create a callback that saves the model's weights
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                 save_weights_only=True,
                                                 verbose=0) # Set verbose != 0 if you want output during training 

#create a callback to stop early once there is no improvement in the loss
cp_early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5, mode='min',
                                restore_best_weights=True,
                                verbose = True)

Note how many output layer are needed for predicting several timestamps? Please check one output layer is enough but some of the parameters have to be adjusted,

n_steps, n_features
X.shape[1], X.shape[2]

reason for not having activation functions https://datascience.stackexchange.com/questions/66594/activation-function-between-lstm-layers
https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTMCell

output layer structure : https://stackoverflow.com/questions/46797891/output-shape-of-lstm-model#46799544

https://shiva-verma.medium.com/understanding-input-and-output-shape-in-lstm-keras-c501ee95c65e

In [None]:
#from keras.layers import Dense, Activation, Dropout, LSTM , Conv1D, MaxPooling1D, LeakyReLU

In [None]:
def get_simple_LSTM_model():
    simple_LSTM = tf.keras.Sequential([
      tf.keras.layers.Conv1D(32, kernel_size = 5, activation ='relu', input_shape =(X.shape[1], X.shape[2])),
      tf.keras.layers.MaxPooling1D(pool_size = 2),
      tf.keras.layers.LSTM(45, kernel_initializer = 'uniform', return_sequences=True), # ! units are not set in stone yet 
      tf.keras.layers.Dropout(0.2),
      tf.keras.layers.LSTM(32, return_sequences=True),
      tf.keras.layers.Dropout(0.2),
      tf.keras.layers.LSTM(32, return_sequences=False),
      tf.keras.layers.Dropout(0.2),
      tf.keras.layers.Dense(y.shape[1] ,kernel_initializer = 'uniform', activation='linear') #96 to predict a day 
    ])

    simple_LSTM.compile(optimizer=optimizer,
                  loss=tf.keras.losses.MeanAbsolutePercentageError(), 
                  metrics=[tf.keras.losses.MeanAbsolutePercentageError()])
    return simple_LSTM

In [None]:
with tf.device('/cpu:0'):
    simple_LSTM = get_simple_LSTM_model()
    print(simple_LSTM.summary())

In [None]:
with tf.device('/cpu:0'):
    history = simple_LSTM.fit(X,
                        y,
                        batch_size= BATCH_SIZE,
                        validation_data= (X_val, y_val),   ##### probably best to make validation data D #! TO DO 
                        verbose=10,
                        steps_per_epoch=STEPS_PER_EPOCH,
                        epochs=EPOCHS,
                        shuffle = False, 
                        callbacks=[cp_callback, cp_early_stop]) # try without early stopping to see if there is something wrong cp_early_stop #!patience=5 helps to aviod getting stuck in local minima 

In [None]:
#with tf.device('/cpu:0'):
#    simple_LSTM_reloaded = tf.keras.models.load_model('saved_model/simple_LSTM')
STOP
# Check its architecture
#simple_LSTM_reloaded.summary()

In [None]:
history.history

In [None]:
scores = simple_LSTM.evaluate(X, y)

In [None]:
print("\n%s: %.2f%%" % (simple_LSTM.metrics_names[1], scores[1]))

In [None]:
x_input = X_test
#x_input = x_input.reshape((x_input.shape[0], x_input.shape[1], 1))
y_pred = simple_LSTM.predict(x_input, verbose=0)
print(y_pred)

In [None]:
print("Evaluate on test data")
results = simple_LSTM.evaluate(X_test, y_test, batch_size=2600)
print("test loss, test acc:", results)

In [None]:
#! simple_LSTM.save('../models/saved_model/simple_LSTM_70error')

Lets now make a new timeseries from out predicted values so we can plot them nicely

In [None]:
# First we get all the original timestamps from our original test dataset. We splitted this dataset into features and target. 
# we used the first 672 entries to predict the next 96 so for our predicted values we will start only from index 672.
# The following timestamps will be predicted by our model 
df_trial = test.iloc[672:].copy() 

In [None]:
# Then we make a new column containing the first prediction of our
df_trial['predicted_values'] = y_pred[:, 0]

In [None]:
df_trial['Solar_generation_MWh_normalized_inverse_transformed'] = scaler.inverse_transform(df_trial['Solar_generation_MWh_normalized'])

In [None]:
y_pred[:1]

The results are still not very good. It seems that there are some problems with handeling the 0 and also the seasonality. More work is needed. 
    Option 1: I will remove the seasonality from the data before it goes into the model 
    Option 2: I will include the price and weather data to make the model more robust. 

### Let's now create a proper output table which can then be used for the Optimisation ###
Now we will transform out output back into the same units as before and add them to a dataframe. 

In [None]:
def reverse_and_frame(X, y, X_test_index):
    ''' The input are both our arrays X_test and the predicted y and the date index of the original X_test array.

     1st. Inverse transfrom the arrays to the original dimensions needed by the optimizer 
     2nd. Create a dataframe for both x_test and y_pred 
     3rd. Merge all the columns representing the timesteps into a single array in one column 
     4th. Create an empty dataframe and add the last date of X_test_index as the timepoint from which the prediction started
     5th  Concatenate the two dataframes into one 
    '''
    inversed_y_pred = scaler.inverse_transform(y)
    inversed_X_test = scaler.inverse_transform(X.reshape(X.shape[0], X.shape[1]))

    X_test = pd.DataFrame(inversed_X_test)
    y_pred = pd.DataFrame(inversed_y_pred)

    X_test['input_array'] = X_test.apply(lambda row: np.array(row), axis=1)
    y_pred['output_array'] = y_pred.apply(lambda row: np.array(row), axis=1)

    df_final = pd.DataFrame()
    df_final["Date"]= X_test_index[:, -1]

    df_final['output'] = y_pred['output_array']
    df_final['input'] = X_test['input_array']
    return df_final

In [None]:
solar_predictions= reverse_and_frame(X_test, y_pred, X_test_index)

### include the date and time column back to the output dataframe ###

In [None]:
solar_predictions

In [None]:
df_solar_predictions = solar_predictions[['Date', 'output']].copy()

In [None]:
df_solar_predictions.to_pickle("../predictions/solar_predictions.pkl")

In [None]:
# Save the entire small model as a SavedModel.
#!mkdir -p ../models/saved_model
#simple_LSTM.save('../models/saved_model/simple_LSTM_2')

### Short summary ###
model is performing poorly, the overall MAPE is still around 65 % error which is helpful for our approach. Based on the output I is not able to handle the 0 very well. I will now look a bit deeper into the 0 problem. 

### Alternative approach get rid of the seasonality beforehand 

TO DO