<a href="https://www.kaggle.com/code/gianpieroandrenacci/energy-prediction-with-lstm-deep-learning?scriptVersionId=154877806" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Introduction - 🕥 LSTM Timeseries forecasting with Tensorflow and Keras

## <h1 style="background-color:#336600;font-family:cursive;font-size:200%;color:#fff;text-align:center;border-radius:40px;height:40px;line-height:40px;">LSTM (Long Short-Term Memory) for timeseries</h1> 



In this notebook, we will be exploring LSTM (Long Short-Term Memory) for timeseries forecasting with the help of two popular deep learning libraries: Tensorflow and Keras. 

**LSTM is a type of Recurrent Neural Network (RNN) that is well-suited for timeseries** data analysis and prediction tasks. We will learn how to implement LSTM models using Keras and Tensorflow, and use them to forecast future values of a timeseries.

## <h1 style="background-color:#336600;font-family:cursive;font-size:200%;color:#fff;text-align:center;border-radius:40px;height:40px;line-height:40px;">The Timeseries Dataset - Italian Energy Price</h1> 


<div class="alert alert-block alert alert-info" style="font-size:14px; font-family:verdana; line-height: 1.7em; ">
    📌 &nbsp; The PUN (Italian acronym for Prezzo Unico Nazionale, "National Single Price") is the wholesale reference price of electricity purchased on the Borsa Elettrica Italiana market (IPEX - Italian Power Exchange). At the Italian Power Exchange, active since 2007 following the entry into force of the
</div>



Legislative Decree governing the liberalization of the electricity market, the transactions between producers and suppliers of electricity are regulated. The PUN therefore represents the national weighted average of the zonal sales prices of electricity for each hour and for each day. The national figure is an amount that is calculated on the average of various factors, and which takes into account the quantities and prices formed in the different areas of Italy and at different times of the day.


**How the PUN affects the price of energy**
The wholesale price of electricity is established directly on the market based on the trades between the various players involved, i.e. between producers and energy suppliers (who purchase the energy from producers to supply to their end customers). The fluctuations of the PUN are a determining factor in calculating the final costs of energy in the bill. In fact, in the periods in which the PUN increases its value, costs tend to rise, to fall instead when the value of the PUN falls. 

Energy suppliers generally provide for tariffs for the final consumer at a fixed cost or at an indexed cost as regards the price of the energy component. Opting for an indexed price of the energy component means that this will vary over time depending on the performance of the PUN on the Italian Power Exchange. An offer at a fixed price of the energy component, on the other hand, will remain unchanged for a certain period of time depending on the offer chosen, generally for one or two years.


The PUN unit measure is €/MWh

https://www.enel.it/en/supporto/faq/cos-e-il-pun

## <h1 style="background-color:#336600;font-family:cursive;font-size:200%;color:#fff;text-align:center;border-radius:40px;height:40px;line-height:40px;">Dataset Details</h1> 

**PUN** is National Single Price of energy. It is the wholesale reference price of electricity purchased on the Borsa Elettrica Italiana market .

**Foreign Virtual Zone**: point of interconnection with neighboring countries. It includes: France (FRAN), Switzerland (SVIZ), Austria (AUST), Slovenia (SLOV), Slovenia Coupling representing the interconnection dedicated to Market Coupling between Italy and Slovenia (BSP); Corsica (CORS), Corsica AC (COAC), Greece (GREC), France coupling (XFRA), Austria coupling (XAUS), Malta (MALT), Montenegro (MONT) and Italy coupling (COUP).

# <h1 style="background-color:#336600;font-family:cursive;font-size:200%;color:#fff;text-align:center;border-radius:40px;height:40px;line-height:40px;">Import libraries</h1> 

In [None]:
#pip install numpy==1.19

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import seaborn as sns
import plotly.express as px
import tensorflow as tf
import numpy as np

#import tensorflow as tf
import tensorflow as tf
from tensorflow import keras

import calendar


import sklearn
from sklearn import metrics
import math

import datetime
import matplotlib.dates as mdates

from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import MinMaxScaler
from packaging import version

import os
#import keras

plt.style.use("seaborn-whitegrid")
plt.rc("figure", autolayout=True)
plt.rc("axes", labelweight="bold", labelsize="large", titleweight="bold", titlesize=14, titlepad=10)

In [None]:
version.parse(tf.__version__)

# <h1 style="background-color:#336600;font-family:cursive;font-size:200%;color:#fff;text-align:center;border-radius:40px;height:40px;line-height:40px;">Custom Functions</h1> 

In [None]:
def num_to_time(num):
    """ 
    Function convert number to time format
    num: hour as numneric
    """
    time = num
    hours = int(time)
    # hour
    if hours == 24:
        hours = 0
    # minutes and sec
    minutes = 0
    seconds = 0
    out_time = "%02d:%02d:%02d" % (hours, minutes, seconds)
    
    return out_time

import matplotlib.pyplot as plt

def plot_two_lines(last_period, X1,y1,X2,y2,data1_label,data2_label, ylabel, title,
                   legend_pos = 'lower right',color1="#1f77b4",
                   color2="#ff7f0e",linestyle1='solid', linestyle2='dashed'):
    """
    Plot the train and test data on a single plot.

    Keyword arguments:
    - last_period (int): The last period in the data.
    - X1 (array): The X values for the first set of data.
    - y1 (array): The Y values for the first set of data.
    - X2 (array): The X values for the second set of data.
    - y2 (array): The Y values for the second set of data.
    - data1_label (str): The label for the first set of data.
    - data2_label (str): The label for the second set of data.
    - ylabel (str): The label for the Y axis.
    - title (str): The title of the pl
    - legend_pos : leged position
    """
    #if we want only a slice of the data
    if last_period != 0:
        X1 = X1[-last_period:] 
        y1 = y1.iloc[-last_period:]
        X2 = X2[-last_period:] 
        y2 = y2.iloc[-last_period:]

    # Create the plot
    plt.figure(figsize=(10, 7))
    plt.plot(X1, y1, color=color1, label=data1_label,linestyle =linestyle1 )
    plt.plot(X2, y2, color=color2, label=data2_label, linestyle = linestyle2)
    plt.ylabel(ylabel, fontsize=14)
    plt.grid(axis='x')
    plt.legend(fontsize=14, loc=legend_pos)

    # Remove the top and right borders
    plt.gca().spines['top'].set_visible(False)
    plt.gca().spines['right'].set_visible(False)

    # Remove the bottom and left borders
    plt.gca().spines['bottom'].set_visible(False)
    plt.gca().spines['left'].set_visible(False)

    plt.title(title, fontsize=16)
    plt.show()  


def error_metrics(y_test, y_pred):
    """
    Calculate the most common forecast error metrics
    y_test: variable name y
    y_pred: variable name yhat
    """
    #R2 - coefficient of determination
    R2 = sklearn.metrics.r2_score(y_test, y_pred)

    #Mean squared error
    MSE = sklearn.metrics.mean_squared_error(y_test, y_pred)

    #Root Mean squared error
    RMSE = math.sqrt(MSE)

    #Mean absolute error
    MAE =  sklearn.metrics.mean_absolute_error(y_test, y_pred)

    #Median absolute error 
    MdAE = sklearn.metrics.median_absolute_error(y_test, y_pred)

    #Mean percentage error
    MAPE = sklearn.metrics.mean_absolute_percentage_error(y_test, y_pred)

    if R2.ndim > 0: # if mae isn't already a scalar, reduce it to one by aggregating tensors to mean
        R2 = tf.reduce_mean(R2)
        MSE  = tf.reduce_mean(MSE)
        RMSE = tf.reduce_mean(RMSE)
        MAE = tf.reduce_mean(MAE)
        MdAE = tf.reduce_mean(MdAE)
        MAPE = tf.reduce_mean(MAPE)

    error_dic =  {
          "R2": R2,
          "mse": MSE,
          "rmse": RMSE,
          "mae": MAE,
          "MdAE": MdAE,
          "MAPE": MAPE
          }

    
    print('R2: %.3f' % R2)
    print('MSE (Mean squared error): ' f"{MSE:,.0f}")
    print('RMSE (Root mean squared error): ' f"{RMSE:,.0f}")
    print('MAE (Mean absolute error): ' f"{MAE:,.0f}")
    print('MdAE (Median absolute error ): ' f"{MdAE:,.0f}")
    print('MAPE (Mean percentage error): ' f"{MAPE:,.2%}")
   
    
    return error_dic    

# <h1 style="background-color:#336600;font-family:cursive;font-size:200%;color:#fff;text-align:center;border-radius:40px;height:40px;line-height:40px;">Load the dataset</h1> 

In this section, we will start our analysis by loading the dataset into our Python environment. For this purpose, we are using the widely popular data analysis library in Python, **pandas**. We load the "energy_pun_main_zones.csv" file which resides in our Kaggle input directory.

First, we'll use the pandas **read_csv()** function to import the .csv file. Notice that we are parsing the 'DATE' column as date type right at the time of loading. This is a handy trick to ensure our date data is in the right format from the outset, making subsequent time series analysis more convenient.

Once the data is loaded into a pandas dataframe (df), we proceed to drop the 'IDX' column. This step is based on the understanding that 'IDX' is an identifier column, not needed for our analysis.

Finally, **we check the completeness of the dataset by using the info() function**, which provides a concise summary of the dataframe including the number of non-null entries in each column. This initial check for missing values is an important step in data cleaning, as missing values can potentially affect our analysis results.

Let's proceed with loading our dataset.

In [None]:
# Input data files are available in the read-only "../input/" directory
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
# read file and parse date
df = pd.read_csv("/kaggle/input/energy-pun-main-zones/energy_pun_main_zones.csv",parse_dates = ['DATE'])

In [None]:
df.head(5)

In [None]:
# drop id , we don't need it
df = df.drop(columns = 'IDX')

In [None]:
# make all columns lowercase
df.columns = df.columns.str.lower()

In [None]:
# check if there are nulls
df.info()

<p style="font-size:130%; background-color:#336600;color:#fff;border-radius: 0px 30px;text-align:center;font-family:Sans-serif;font-weight:bold">Select only PUN</p>

In this section, we are narrowing our focus to only three columns from the entire dataset: **'date', 'hour', and 'pun'**. This is done to simplify our dataset and keep our analysis focused on the necessary information.

Upon further inspection of the 'hour' column, we noticed some inconsistencies: some rows had values greater than 24, which is not applicable to our **24-hour format**. These values are due to the transition to **daylight saving time**. To keep our dataset consistent, we filter out these rows, ensuring the 'hour' column only contains values less than or equal to 24.

Next, we apply the custom function num_to_time on the 'hour' column, converting numerical representations into time format. We also take care of instances where the hour is '24', treating them as '0' (midnight) for consistency in our time series analysis.

The 'hour' column is then converted to an integer data type for ease of further processing and analysis.

Next, we create a new 'date_time' column by **concatenating 'date' and 'time'**. We ensure this column is of type datetime by using the pandas to_datetime() function.

The 'date' column is also converted into a datetime format if it wasn't already. This format is preferred as it enables more straightforward date-related operations.

Lastly, we set **'date_time' as the index** of our dataframe and sort our dataframe in ascending order. This allows for a more intuitive and efficient time series analysis going forward, as data indexed and sorted by time makes analyses and visualizations more manageable.

Let's proceed with these transformations to refine our dataset for further analysis.

In [None]:
df = df[['date','hour','pun']]

In [None]:
df.head(5)

In [None]:
# there are some rows with hour greater than 24
df[df['hour']  > 24]

In [None]:
# It is for daylight saving time. We can delete these rows
# out of range hour filter

df =df[df['hour']  < 25]

In [None]:
# convert number to time format
df['time'] = df['hour'].apply(num_to_time)

# convert 24 to 0 (midnight)
df['hour'] = df['hour'].apply(lambda x: 0 if x == 24  else x)

# convert to int
df['hour'] = df['hour'].astype(int) 

# convert date and time to datetime
df['date_time'] = pd.to_datetime(df.date.astype(str) + ' ' + df.time.astype(str))
df['date_time'] = pd.to_datetime(df['date_time'])

#convert date to date
df['date'] = pd.to_datetime(df['date'])

# set index and sort ascending
df = df.set_index('date_time')
df = df.sort_index(ascending = True)

In [None]:
df.head(2)

# <h1 style="background-color:#336600;font-family:cursive;font-size:200%;color:#fff;text-align:center;border-radius:40px;height:40px;line-height:40px;">Preprocess the data</h1>  

In [None]:
## Final Dataframe
pun = df[['pun']].copy()
pun

## <p style="font-size:130%; background-color:#336600;color:#fff;border-radius: 0px 30px;text-align:center;font-family:Sans-serif;font-weight:bold">Baseline</p>

**We are tracking data from past 672 timestamps (4 weeks in hours). This data will be used to predict the energy price in the next hour**


In this section, we will conduct a baseline evaluation of our data. Our primary objective here is to predict the energy price for the next hour using the past 672 timestamps, equivalent to 4 weeks of hourly data.

To begin, we split our data into a **training-validation set** and a test set, using a 95-5 split. We assign the majority of the data to the training-validation set to adequately train our models. The remaining 5% will serve as our test set for evaluating the model's performance on unseen data.

Our baseline model will use **a naive method of timeseries forecasting**, predicting the next hour's energy price as the energy price 24 hours prior. This method creates a lagged dataset where the current 'pun' value is predicted by the 'pun' value from the previous day at the same time.


In [None]:
past = 672

In [None]:
# create test split

split_fraction_test = 0.95
train_val_split = int(split_fraction_test * int(pun.shape[0]))

train_data_val = pun.iloc[0 : train_val_split]
test_data = pun.iloc[train_val_split:]

len(train_data_val), len(test_data)

**We create a baseline for comparing our model. We assume that the PUN of next hour will be the same value of the day before at the same time of the day.**

In [None]:
# This code creates a baseline model for timeseries forecasting.
y_base_pred = pd.DataFrame()
y_base_pred.index = test_data.index
shift = 24
# sets the value of the predicted pun timeseries to be the same as the pun timeseries 
# shifted by 24 time steps (i.e., one day if the data is daily).

y_base_pred['pun'] = test_data['pun'].shift(shift)

# removes any rows from y_base_pred that contain missing values, +
# which are created because the first 24 time steps

y_base_pred.dropna(inplace = True)

In [None]:
y_base_pred

In [None]:
# baseline vs true values

plot_two_lines(past, y_base_pred.index, test_data['pun'].iloc[shift:],y_base_pred.index, y_base_pred ,\
               ylabel="Pun",data1_label= 'Test Data', data2_label = 'Pred Data' ,
               title="Baseline Prediction VS Actual", legend_pos = 'upper left',
               color1 = '#001a4d', color2 = "#ffa83d",
               )

In [None]:
# calculate error metrics
error_dic = error_metrics(test_data['pun'].iloc[shift:], y_base_pred)

After generating the predictions, we compute several **error metrics** including R-squared (R2), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Median Absolute Error (MdAE), and Mean Percentage Error (MAPE). These metrics provide different perspectives on the model's prediction accuracy.


These values provide a reference point, or a "**baseline**", against which we can measure the performance of more sophisticated forecasting models. Remember, the goal of introducing more complexity to our model is to reduce these error metrics further, indicating improved forecast accuracy.

Now that we've established our baseline, let's proceed to develop more advanced predictive models.

# <h1 style="background-color:#336600;font-family:cursive;font-size:200%;color:#fff;text-align:center;border-radius:40px;height:40px;line-height:40px;">Timeseries forecasting</h1> 

## <p style="font-size:100%; background-color:#336600;color:#fff;border-radius: 0px 30px;text-align:center;font-family:Sans-serif;font-weight:bold">Prepare the dataset for training</p>

In this section, we're preparing our data for **LSTM (Long Short-Term Memory)** model, a type of recurrent neural network that is commonly used for time series prediction tasks.

First, we further **split our training-validation** data into a separate training and validation set, using an 80-20 split. The training set will be used to train the model, while the validation set will be used to tune the model parameters and prevent overfitting.

Next, we normalize our data. Normalization is a common preprocessing step for deep learning models, especially for neural networks, as it scales all input features to the same range. This makes the model less sensitive to the scale of features and helps to speed up the training process.

We **normalize** the training, validation, and test datasets using the maximum and minimum values from the training dataset. It's important to note that we only use the maximum and minimum of the training set to avoid data leakage, which refers to the usage of information from the test set in the model training process, potentially leading to overoptimistic results.

By the end of this section, our data is adequately prepared and normalized for the LSTM model. This marks an important step in our time series analysis, as we are transitioning from simpler prediction techniques to more complex and powerful deep learning models. Let's proceed to train our LSTM model with this prepared data.

In [None]:
# split train and validation
split_fraction_train = 0.8
train_split = int(split_fraction_train * len(train_data_val))

train_data = train_data_val.iloc[0 : train_split]
val_data = train_data_val.iloc[train_split:]

len(train_data), len(val_data)

In [None]:
# visualize train, validation and test date indexes 

train_data.index, val_data.index, test_data.index

<p style="font-size:100%; background-color:#336600;color:#fff;border-radius: 0px 30px;text-align:center;font-family:Sans-serif;font-weight:bold">Normalize</p>

This code performs min-max normalization on the training, validation, and test data before training the LSTM model. Min-max normalization scales the data so that all values are between 0 and 1, with the minimum value of the data being mapped to 0 and the maximum value being mapped to 1. 

In [None]:
# Normalize the data  the data 
train_max = train_data.max()
train_min = train_data.min()

train_data_norm = (train_data - train_min)  / (train_max - train_min)
val_data_norm = (val_data - train_min) / (train_max - train_min)
test_data_norm = (test_data - train_min) / (train_max - train_min)

In [None]:
train_data_norm.head(3)

<p style="font-size:100%; background-color:#336600;color:#fff;border-radius: 0px 30px;text-align:center;font-family:Sans-serif;font-weight:bold">Build the train, validation and test set with Keras</p>

In this section, we are preparing our train and validation datasets to be compatible with the Keras API for model training. We're employing a method of data preparation called batch training, which is commonly used when training deep learning models.

We set our **learning rate** to 0.001 and batch size to 256. The learning rate is a hyperparameter that determines how much the weights of our network will change in each step of learning. The batch size is the number of training examples used in one iteration.

The '**sequence_length**' parameter denotes the number of past time steps (or 'lags') that will be used as input features to predict the next time step. In this case, we set it equal to 'past', which represents the number of previous timestamps we're using for our predictions.

We prepare our 'X_train' and 'y_train' data arrays. 'X_train' represents the input features for our model, while 'y_train' represents the target variable we're trying to predict (the 'pun' variable in our case).

We use the **'timeseries_dataset_from_array'** function from Keras preprocessing module to generate a time-series dataset suitable for training our LSTM model. This function will automatically transform our input data and labels into a format that is compatible with time series prediction tasks.

We repeat the same process for our validation data to create 'dataset_val'.

Finally, to verify that our data is correctly formatted, we print the shape of one batch of inputs and targets from our training set. The shapes should match our expectations, with each input batch having a shape of (batch size, sequence_length, number of features) and each target batch having a shape of (batch size, 1).

Now that we've transformed our data into a suitable format for **LSTM model**, we're ready to proceed with the model building and training process.

In [None]:
learning_rate = 0.001
batch_size = 256
sequence_length = int(past)

In [None]:
# the labels start from the startnd observation  
# beacasue prediction is done with n past timesteps
start = past 

#end = start + train_split
X_train = train_data_norm.values
y_train = train_data_norm[['pun']].iloc[:].values


print(len(X_train)), print(len(y_train))

The timeseries_dataset_from_array function takes in a sequence of data-points gathered at equal intervals, along with time series parameters such as length of the sequences to produce batches of sub-timeseries inputs and targets sampled from the main timeseries.

In [None]:
# create train keras dataset

dataset_train =  keras.preprocessing.timeseries_dataset_from_array(
    X_train,
    y_train,
    sequence_length=sequence_length,
    batch_size=batch_size
   
)

In [None]:
for batch in dataset_train.take(1):
    inputs, targets = batch

print("Input shape:", inputs.numpy().shape)
print("Target shape:", targets.numpy().shape)

In [None]:
# create validation tf.keras dataset

X_val = val_data_norm.values
y_val = val_data_norm[['pun']].iloc[:].values

dataset_val = tf.keras.preprocessing.timeseries_dataset_from_array(
    X_val,
    y_val,
    sequence_length=sequence_length,
    batch_size=batch_size,
)


for batch in dataset_train.take(1):
    inputs, targets = batch

print("Input shape:", inputs.numpy().shape)
print("Target shape:", targets.numpy().shape)

In [None]:
print(len(X_val)), print(len(y_val))


In [None]:
train_data.tail(1)

In [None]:
val_data.head(1)

## <p style="font-size:130%; background-color:#336600;color:#fff;border-radius: 0px 30px;text-align:center;font-family:Sans-serif;font-weight:bold">Build and train the LSTM model</p>

In this section, we set up and train our **LSTM (Long Short-Term Memory) model** using the Keras API. Our LSTM model is a type of recurrent neural network that's well-suited to time series data as it can learn long-term dependencies, which is useful for our task of predicting the 'pun' variable based on past observations.

First, we set the number of training epochs to 10. An epoch is a full pass through the entire training dataset.

We then define the **architecture of our LSTM model**:

1. The Input layer accepts data of the shape that corresponds to the number of past steps and features in our dataset.
2. The first LSTM layer with 64 units returns sequences, meaning it outputs an hidden state for each input time step.
3. A Dropout layer follows, which helps prevent overfitting by randomly setting a fraction (20% here) of input units to 0 at each update during training time.
4. Another LSTM layer follows, this time with 32 units and it does not return sequences.
5. This is followed by another Dropout layer for regularisation.
6. Finally, a Dense layer (or fully connected layer) with a single unit and a ReLU (Rectified Linear Unit) activation function. This is our output layer, which will provide the predicted 'pun' value.

After defining the architecture, we **compile the model**. During model compilation, we specify the optimizer and the loss function. We use the RMSprop optimizer with the previously defined learning rate, and Mean Absolute Error (MAE) as our loss function, which is a common choice for regression problems.

We print the summary of our model architecture, which gives a neat tabular overview of the model layers, their type, output shape and the number of parameters.

Once our LSTM model is set up, the next step is to **train the model** using our training and validation datasets. Remember that the aim of training this LSTM model is to achieve lower error metrics than our baseline model.



In [None]:
epochs = 10

# Define the model architecture
inputs = tf.keras.layers.Input(shape=(inputs.shape[1], inputs.shape[2]))
x = tf.keras.layers.LSTM(64, return_sequences=True)(inputs)
x = tf.keras.layers.Dropout(0.2)(x)
x = tf.keras.layers.LSTM(32)(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = tf.keras.layers.Dense(1, activation='relu')(x)

# Build and compile the model
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer=tf.keras.optimizers.RMSprop(learning_rate=learning_rate), loss="mae")

# Print the model summary
model.summary()


This part of the code sets up callbacks for our model and commences the training process.

The line **path_checkpoint** = "model_checkpoint.h5" sets the file path where the model checkpoints will be saved during training. The '.h5' extension indicates that the model is saved in the HDF5 file format, which is a common format for storing large quantities of numerical data like the weights of a trained neural network.

The **tf.keras.callbacks.EarlyStopping** is a callback function that stops training when a monitored quantity has stopped improving. In this case, we monitor the validation loss ("val_loss") and stop the training if it hasn't decreased (min_delta=0) for 5 consecutive epochs (**patience**=5). This is done to prevent overfitting and reduce computational costs.

The tf.keras.callbacks.ModelCheckpoint callback saves the model or weights (in this case, only weights) at some interval, so the model or weights can be loaded later to continue the training from the state saved. Here, it saves the weights of the model that has the **lowest validation loss** (monitor="val_loss" and save_best_only=True).

Finally, **model.fit** starts the training of the model for a specified number of epochs (iterations on a dataset). It takes the training data, validation data, and the callbacks as arguments. The training data is fed to the model, and the model learns to make accurate predictions. The validation data is used to evaluate the model's performance at the end of each epoch. The callbacks we defined earlier are used during training to implement early stopping and save the model weights. The training process outputs a History object, which is a record of training loss values and metrics values at successive epochs, as well as validation loss values and validation metrics values (if applicable).

In [None]:
path_checkpoint = "model_checkpoint.h5"
es_callback = tf.keras.callbacks.EarlyStopping(monitor="val_loss", min_delta=0, patience=5)

modelckpt_callback = tf.keras.callbacks.ModelCheckpoint(
    monitor="val_loss",
    filepath=path_checkpoint,
    verbose=1,
    save_weights_only=True,
    save_best_only=True,
)

history = model.fit(
    dataset_train,
    epochs=epochs,
    validation_data=dataset_val,
    callbacks=[es_callback, modelckpt_callback],
)

## Evaluate the model

This part of the code evaluates the trained LSTM model on both the training-validation and the test datasets.

The tf.keras.preprocessing.timeseries_dataset_from_array function is used again to create datasets for both the training-validation and test data. The 'past' variable defines how many previous time steps we use as input features for our model.

**The LSTM model is then used to make predictions** (model.predict(X_test)) on the test dataset.

The predicted values (y_pred) are initially in the normalized range (between 0 and 1), as the model was trained on normalized data. To interpret these predictions in the context of our original data, we need to rescale them back to the original range. This is achieved by **reversing the normalization process** ((pred_df * (train_max - train_min)) + train_min), converting our predictions to the same scale as the original PUN values.

In [None]:
# create train validation dataset
start = past 

X_train_val = train_data_norm.values
y_train_val = train_data_norm[['pun']].iloc[start:].values


# create test  dataset
start = past 

X_test = test_data_norm.values
y_test = test_data_norm[['pun']].iloc[start:].values



dataset_train_val =  tf.keras.preprocessing.timeseries_dataset_from_array(
    X_train_val,
    y_train_val,
    sequence_length=sequence_length,
    batch_size=batch_size,
   
)

dataset_test =  tf.keras.preprocessing.timeseries_dataset_from_array(
    X_test,
    y_test,
    sequence_length=sequence_length,
    batch_size=batch_size,
   
)

In [None]:
model.fit(dataset_train_val)

In [None]:
y_pred = model.predict(X_test)

In [None]:
pred_df = pd.DataFrame(y_pred, columns=['pun'])

In [None]:
pred_df = (pred_df *  (train_max - train_min))  + train_min 
pred_df

In [None]:
test_data

In [None]:
#last_period = split_size_test - pred_hours
plot_two_lines( 0, test_data.iloc[-len(test_data):].index,pred_df.iloc[-len(test_data):]['pun'],  test_data.iloc[-len(test_data):].index, test_data['pun'].iloc[-len(test_data):], \
               ylabel="Pun",data1_label= 'Train Data', data2_label = 'Test Data' ,
               title="Slice of Train and Test Data over Time", 
               color1 = '#001a4d', color2 = "#ff7f50"
               )

In [None]:
error_metrics (test_data['pun'],pred_df['pun'])

Finally, various error metrics (R2 score, Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Median Absolute Error (MdAE) and Mean Percentage Error (MAPE)) are calculated using the error_metrics function. These metrics provide us with different ways to understand the performance of our model. For instance, the R2 score (the coefficient of determination) tells us the proportion of the variance in the dependent variable that is predictable from the independent variable(s). MSE, RMSE, MAE, MdAE provide measures of the differences between values predicted by the model and the values actually observed. The MAPE gives us a percentage error, helping to understand the accuracy of the model in relative terms.

**The LSTM model performance seems to be substantially better than the baseline model, as the errors are significantly lower in all metrics, and the R2 score is higher.**