# Building a LSTM neural network for Time Series analysis

### Sliding window with multi-step forecasting method

*This LSTM model was initially tested using data from WTW. However, in this presentation, I used dummy data randomly generated. No data saved from WTW has been used here and any coincidence is a casualty.  

In [22]:
import pandas as pd
dataframe = pd.read_csv('C:\\Users\\mukle\\Desktop\\lstm\\test1.csv', header=0, sep=',')
dataframe.head(5)

Unnamed: 0,Year,Month,Sales
0,2017,1,15000
1,2017,2,14895
2,2017,3,16558
3,2017,4,17042
4,2017,5,17821


###### Before jumping directly into the code. We will explore how the code is built so you can better understand each section.

Interestingly while I was researching the best possible approach for building a LSTM neural network for Time Series forecasting many of the trusted sources consulted (books and websites) was by building window time within the dataset. This allows the algorithm to better understand periods of time.

Before explaining what is window time and how it works. We need to reshape the existing dataset. Instead of having year and month separated we will reshape into a single column.

In [14]:
from pandas import read_csv
from pandas import datetime

def parse(x):
    return datetime.strptime(x, '%Y %m')
dataset = read_csv('C:\\Users\\mukle\\Desktop\\lstm\\test1.csv',
                   parse_dates = [['Year', 'Month']],
                   header=0,
                   date_parser=parse)
dataset.head(5)
dataset.to_csv('C:\\Users\\mukle\\Desktop\\lstm\\test-parse.csv')

Unnamed: 0,Year_Month,Sales
0,2017-01-01,15000
1,2017-02-01,14895
2,2017-03-01,16558
3,2017-04-01,17042
4,2017-05-01,17821


### Sliding Window With Multi-Step Forecasting

Given a sequence of numbers for a time series dataset, we can restructure the data to look like a supervised learning problem. We can do this by using previous time steps as input variables and use the next time step as the output variable.
Let’s make this concrete with an example. Imagine we have a time series as follows:

In [22]:
# 2017-01-01  15000
# 2017-02-01  14895
# 2017-03-01  16558
# 2017-04-01  17042
# 2017-05-01  17821

We can restructure this time series dataset as a supervised learning problem by using the value at the previous time step to predict the value at the next time-step. Re-organizing the time series dataset this way, the data would look as follows:

In [21]:
#   x      y
#   ?    15000
# 15000  14895
# 14895  16558
# 16558  17042
# 17042  17042
# 17821    ?

Here are some observations:

- We can see that the previous time step is the input (X) and the next time step is the output (y) in our supervised learning problem.
- We can see that the order between the observations is preserved, and must continue to be preserved when using this dataset to train a supervised model.
- We can see that we have no previous value that we can use to predict the first value in the sequence. We will delete this row as we cannot use it.
- We can also see that we do not have a known next value to predict for the last value in the sequence. We may want to delete this value while training our supervised model also.

The use of prior time steps to predict the next time step is called the sliding window method. For short, it may be called the window method in some literature. In statistics and time series analysis, this is called a lag or lag method. The number of previous time steps is called the window width or size of the lag. Careful thought and experimentation are needed on your problem to find a window width that results in acceptable model performance.

### The window_method() Function

We can use the shift() function in Pandas to automatically create new framings of time series problems given the desired length of input and output sequences.

This would be a useful tool as it would allow us to explore different framings of a time series problem with machine learning algorithms to see which might result in better performing models.

In this section, we will define a new Python function named window_method() that takes a univariate or multivariate time series and frames it as a supervised learning dataset.

The function takes four arguments:
- data: Sequence of observations as a list or 2D NumPy array. Required.
- lookBack: Number of lag observations as input (X). Values may be between [1..len(data)] Optional. Defaults to 1.
- delay: Number of observations as output (y). Values may be between [0..len(data)-1]. Optional. Defaults to 1.
- dropnan: Boolean whether or not to drop rows with NaN values. Optional. Defaults to True.

The function returns a single value:
- return: Pandas DataFrame of series framed for supervised learning.

In [47]:
from pandas import DataFrame
from pandas import concat

def window_method(data, lookBack=1, delay=1, dropnan=True):
    n_vars = 1 if type(data) is list else data.shape[1]
    df = DataFrame(data)
    cols, names = list(), list()
    # input sequence (t-n, ... t-1)
    for i in range(lookBack, 0, -1):
        cols.append(df.shift(i))
        names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
    # forecast sequence (t, t+1, ... t+n)
    for i in range(0, delay):
        cols.append(df.shift(-i))
        if i == 0:
            names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
        else:
            names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
    # put it all together
    agg = concat(cols, axis=1)
    agg.columns = names
    # drop rows with NaN values
    if dropnan:
        agg.dropna(inplace=True)
    return agg


df = pd.read_csv('C:\\Users\\mukle\\Desktop\\lstm\\test-parse.csv', header=0, sep=',')
df.values
data = window_method(df)
print(data)

     var1(t-1)  var2(t-1)     var1(t)  var2(t)
1   2017-01-01    15000.0  2017-02-01    14895
2   2017-02-01    14895.0  2017-03-01    16558
3   2017-03-01    16558.0  2017-04-01    17042
4   2017-04-01    17042.0  2017-05-01    17821
5   2017-05-01    17821.0  2017-06-01    18600
6   2017-06-01    18600.0  2017-07-01    19379
7   2017-07-01    19379.0  2017-08-01    20158
8   2017-08-01    20158.0  2017-09-01    20937
9   2017-09-01    20937.0  2017-10-01    21716
10  2017-10-01    21716.0  2017-11-01    22495
11  2017-11-01    22495.0  2017-12-01    23274


In this section, we've seen how to reframe dataset using shift() function to convert into one-step or multi-step supervised learning

Looking at the code. We take as an example, look back 1 month and try to predict the following 3 month:

In [9]:
from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from pandas import datetime

# convert time series into supervised learning problem
def window_method(data, lookBack=1, delay=1, dropnan=True):
    n_vars = 1 if type(data) is list else data.shape[1]
    df = DataFrame(data)
    cols, names = list(), list()
    # input sequence (t-n, ... t-1)
    for i in range(lookBack, 0, -1):
        cols.append(df.shift(i))
        names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
    # forecast sequence (t, t+1, ... t+n)
    for i in range(0, delay):
        cols.append(df.shift(-i))
        if i == 0:
            names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
        else:
            names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
    # put it all together
    agg = concat(cols, axis=1)
    agg.columns = names
    # drop rows with NaN values
    if dropnan:
        agg.dropna(inplace=True)
    return agg
 
# transform series into train and test sets for supervised learning
def prepare_data(series, test_samples, lookBack, delay):
    # extract raw values
    raw_values = series.values
    raw_values = raw_values.reshape(len(raw_values), 1)
    # transform into supervised learning problem X, y
    supervised = window_method(raw_values, lookBack, delay)
    supervised_values = supervised.values
    # split into train and test sets
    train, test = supervised_values[0:-test_samples], supervised_values[-test_samples:]
    return train, test

def reference_number_print(train):
    for row in range(train.shape[0]):
        for column in range(delay):
            print('look back:', train[row][0], '(t-1)', 'delay:', train[row][column+1], '(t+'+str(column+1)+')')
    
# load dataset
series = read_csv('C:\\Users\\mukle\\Desktop\\lstm\\test-parse.csv', header=0, index_col=0, sep=',')
# configure
lookBack = 1
delay = 3
test_samples = 3
# prepare data
train, test = prepare_data(series, test_samples, lookBack, delay)
print(train)
print(' ')
ref = reference_number_print(train)
print(ref)
print(' ')
print('Train: %s, Test: %s' % (train.shape, test.shape))

[[15000. 14895. 16558. 17042.]
 [14895. 16558. 17042. 17821.]
 [16558. 17042. 17821. 18600.]
 [17042. 17821. 18600. 19379.]
 [17821. 18600. 19379. 20158.]
 [18600. 19379. 20158. 20937.]]
 
look back: 15000.0 (t-1) delay: 14895.0 (t+1)
look back: 15000.0 (t-1) delay: 16558.0 (t+2)
look back: 15000.0 (t-1) delay: 17042.0 (t+3)
look back: 14895.0 (t-1) delay: 16558.0 (t+1)
look back: 14895.0 (t-1) delay: 17042.0 (t+2)
look back: 14895.0 (t-1) delay: 17821.0 (t+3)
look back: 16558.0 (t-1) delay: 17042.0 (t+1)
look back: 16558.0 (t-1) delay: 17821.0 (t+2)
look back: 16558.0 (t-1) delay: 18600.0 (t+3)
look back: 17042.0 (t-1) delay: 17821.0 (t+1)
look back: 17042.0 (t-1) delay: 18600.0 (t+2)
look back: 17042.0 (t-1) delay: 19379.0 (t+3)
look back: 17821.0 (t-1) delay: 18600.0 (t+1)
look back: 17821.0 (t-1) delay: 19379.0 (t+2)
look back: 17821.0 (t-1) delay: 20158.0 (t+3)
look back: 18600.0 (t-1) delay: 19379.0 (t+1)
look back: 18600.0 (t-1) delay: 20158.0 (t+2)
look back: 18600.0 (t-1) dela

Looking at the dummy dataset we generated randomly. The first 15000 relates to 2017-01-01 while the next 3 numbers are the following 3 months

### Building LSTM neural network

As our model is univariate (only 1 input) we will use Sequential() model from Keras library

#### Transform Time Series to Scale

Like other neural networks, LSTMs expect data to be within the scale of the activation function used by the network.

The default activation function for LSTMs is the hyperbolic tangent (tanh), which outputs values between -1 and 1. This is the preferred range for the time series data. To make the experiment fair, the scaling coefficients (min and max) values must be calculated on the training dataset and applied to scale the test dataset and any forecasts. This is to avoid contaminating the experiment with knowledge from the test dataset, which might give the model a small edge.
We can transform the dataset to the range [-1, 1] using the MinMaxScaler class. Like other scikit-learn transform classes, it requires data provided in a matrix format with rows and columns. Therefore, we must reshape our NumPy arrays before transforming.

In [5]:
from pandas import read_csv
from pandas import datetime
from pandas import Series
from sklearn.preprocessing import MinMaxScaler

series = read_csv('C:\\Users\\mukle\\Desktop\\lstm\\test-parse.csv', header=0, index_col=0, sep=',')
print("not changed dataset:")
print(series.head(5))

def transform(series):
    series = series.values                        # transform into numpy arrays
    series = series.reshape(len(series), 1)       # reshape the dataset
    scaler = MinMaxScaler(feature_range=(-1, 1))  # range values to the transformed
    scaler = scaler.fit(series)                   # see how to range values
    scaled_serie = scaler.transform(series)       # apply changes to the values
    scaled_series = Series(scaled_serie[:, 0])    # create a new series of data transformed
    return scaled_series, scaler, scaled_serie

inverted_array, scaler, inverted_serie = transform(series)
print(" ")
print("transformed dataset between range -1,1:")
print(inverted_array.head(5))

def invert_transform(inverted_array, scaler, inverted_serie):
    inverted_serie = scaler.inverse_transform(inverted_serie)   # revert the scaler
    inverted_series = Series(inverted_serie[:, 0])              # create a new series from reverted data
    return inverted_series

original_dataset = invert_transform(inverted_array, scaler, inverted_serie)
print(" ")
print("Inverted back the dataset to the original numbers:")
print(original_dataset.head(5))

not changed dataset:
            Sales
Year_Month       
2017-01-01  15000
2017-02-01  14895
2017-03-01  16558
2017-04-01  17042
2017-05-01  17821
 
transformed dataset between range -1,1:
0   -0.974937
1   -1.000000
2   -0.603055
3   -0.487528
4   -0.301587
dtype: float64
 
Inverted back the dataset to the original numbers:
0    15000.0
1    14895.0
2    16558.0
3    17042.0
4    17821.0
dtype: float64




The red box is telling us that the funcion is changing from intengers (example: 1) to float (example: 1.0). It is expected as we are transforming values between -1 to 1.

This changing values to smaller ranges also occurs when using other types of neural networks. This helps the neural network to understand more faster the numbers while having large ranges may take more time and training rounds for the neurons to understand and make relationships

Looking at the code. The transformation of input values occurs at prepare_data() function and the invert transformation occurs after forecasting the number at inverse_transformation(). The scaler values (ranges of values used when transforming values) must need to be returned at prepare_data() and called at inverse_transform()

In [None]:
# preprocess data
# transform series into train and test sets for supervised learning
def prepare_data(series, n_test, lookBack, delay):
    # extract raw values
    raw_values = series.values
    raw_values = raw_values.reshape(len(raw_values), 1)
    # rescale values to -1, 1
    scaler = MinMaxScaler(feature_range=(-1, 1))
    scaled_values = scaler.fit_transform(raw_values)
    scaled_values = scaled_values.reshape(len(scaled_values), 1)
    # transform into supervised learning problem X, y
    supervised = window_method(scaled_values, lookBack, delay)
    supervised_values = supervised.values
    # split into train and test sets
    train, test = supervised_values[0:-n_test], supervised_values[-n_test:]
    return scaler, train, test

# process predictions
def inverse_transform(series, forecasts, scaler, n_test):   # call back scaler from prepare_data()
    inverted = list()
    for i in range(len(forecasts)):
        # create array from forecast
        forecast = array(forecasts[i])
        forecast = forecast.reshape(1, len(forecast))
        # invert scaling
        inv_scale = scaler.inverse_transform(forecast)
        inv_scale = inv_scale[0, :]
        # store
        inverted.append(inv_scale)
    return inverted

#### LSTM input reshape

By default, an LSTM layer in Keras maintains state between data within one batch. A batch of data is a fixed-sized number of rows from the training dataset that defines how many patterns to process before updating the weights of the network. State in the LSTM layer between batches is cleared by default, therefore we must make the LSTM stateful. This gives us fine-grained control over when state of the LSTM layer is cleared, by calling the reset_states() function.

The LSTM layer expects input to be in a matrix with the dimensions: [samples, time steps, features].
- Samples: These are independent observations from the domain, typically rows of data.
- Time steps: These are separate time steps of a given variable for a given observation (for this dataset, each line is a time step).
- Features: These are separate measures observed at the time of observation.

This first requires that the training dataset be transformed from a 2D array [samples, features] to a 3D array [samples, timesteps, features]. We will fix time steps at 1, so this change is straightforward.

In [6]:
from pandas import *

series = read_csv('C:\\Users\\mukle\\Desktop\\lstm\\test-parse.csv', header=0, index_col=0, sep=',')
series = series.values  # transform to numpy array

original = series.shape        # original shape, 2D array
print("Original shape:")
print(original)
print('rows:', original[0], 'columns:', original[-1])   

print("Transformed shape for LSTM:")
transformed = series.reshape(series.shape[0], 1, series.shape[-1])    # changing the shape to 3D array
print(transformed.shape)
print('rows:', transformed.shape[0], 'Time-Steps:', transformed.shape[1], 'columns:', transformed.shape[-1])

Original shape:
(12, 1)
rows: 12 columns: 1
Transformed shape for LSTM:
(12, 1, 1)
rows: 12 Time-Steps: 1 columns: 1


Looking at the code. Reshape occurs at fit_lstm() function. Once splitted the dataset between features and labels (x and y) the features (x) must need to be reshaped using the inbuilt function "reshape"

In [None]:
def fit_lstm(train, lookBack, delay, n_batch, n_epochs):
    # reshape training into [samples, timesteps, features]
    X, y = train[:, 0:n_lag], train[:, n_lag:]
    X = X.reshape(X.shape[0], 1, X.shape[1])    # Reshape input values to 3D Array
    
    
    ...
    
    
    return model

#### LSTM model development

Next, we need to design an LSTM network. We will use a simple structure with 1 hidden layer with 1 LSTM unit, then an output layer with linear activation and 3 output values (look back: 1 month, delay: 3 month). The network will use a mean squared error loss function and the efficient ADAM optimization algorithm.

The shape of the input data must be specified in the LSTM layer using the “batch_input_shape” argument as a tuple that specifies the expected number of observations to read each batch, the number of time steps, and the number of features.
The batch size is often much smaller than the total number of samples. It, along with the number of epochs, defines how quickly the network learns the data (how often the weights are updated).
The final import parameter in defining the LSTM layer is the number of neurons, also called the number of memory units or blocks.

The line below creates a single LSTM hidden layer that also specifies the expectations of the input layer via the “batch_input_shape” argument.

In [None]:
from keras.layers import LSTM

layer = LSTM(neurons, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True)

The network requires a single neuron in the output layer with a linear activation to predict the number of shampoo sales at the next time step. Once the network is specified, it must be compiled into an efficient symbolic representation using a backend mathematical library, such as TensorFlow or Theano.

In [None]:
from keras.models import Sequential
from keras.layers import LSTM

model = Sequential()
model.add(LSTM(neurons, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True)) # LSTM neuron
model.add(Dense(1)) # Single neuron which means that only has 1 outpuy
model.compile(loss='mean_squared_error', optimizer='adam')

Looking at the code the model is designed at fit_lstm() where we also split the data, reshape and build the layers. The output layer takes the shape of "y". This makes us easy when we want more than 1 outcome, let's say we want to forecast three consecutive months. Therefore, we would have 3 output (3 neurons) that would predict t+1 (1 neuron), t+2 (1 neuron) and t+3 (1 neuron)

In [None]:
def fit_lstm(neurons, train, lookBack, delay, n_batch, n_epochs, learning_rate):
    # reshape training into [samples, timesteps, features]
    X, y = train[:, 0:lookBack], train[:, lookBack:]
    X = X.reshape(X.shape[0], 1, X.shape[1])
    # design network
    model = Sequential()
    model.add(LSTM(neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
    model.add(Dense(y.shape[1]))
    model.compile(loss='mean_squared_error', optimizer=Adam(lr=learning_rate))
    fit = model.fit(X, y, epochs=n_epochs, batch_size=n_batch, verbose=1, shuffle=False)

    loss = fit.history['loss']
    epochs = range(1, n_epochs+1)
    plt.figure()
    plt.plot(epochs, loss, 'bo', label='Training loss')
    plt.title('Training loss')
    plt.legend()
    plt.show()

    return model

To add more layers into the code, we simply have to add another layer defining the neuron units, batch input shape and if it returns the state of the network such as:

In [None]:
model.add(LSTM(10, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))

###### - *Understanding stateful = True or stateful = False

This part can be difficult to understand. When to use stateful as true or false. The stateful setup could be quite difficult to grasp at first. One would expect the state to be transferred between the last sample of one batch to the first sample of the next batch. But the state is actually propagated across batches between the same numbered samples.

By default, an LSTM layer in Keras maintains state between data within one batch (stateful = False). A batch of data is a fixed-sized number of rows from the training dataset that defines how many patterns to process before updating the weights of the network. State in the LSTM layer between batches is cleared by default. When we set stateful as true (stateful = True), in this mode the state is propagated from sample "i" of one batch to sample"i" of the next batch.

The reason why by default keras set stateful as false (stateful = False) is because in large dataset after returning each state to the beginning of the next epoch and so on, it grows the state and it can be unstable. In small dataset resetting the state or keeping it for the next iteration shouldn't be an issue. But it is recommended to use both combinations to see which one performs better (in small dataset). 

If you consider to reset the state after each iteration, you don't need to change to stateful = False. Simply changing the model.fit() by adding an internal loop (number of epochs to iterate) and after each iteration we would add model.reset_state() to reset the previous state and have no state for the next batch. The following code does reset the state after each iteration.

In [None]:
for i in range(nb_epoch):
        model.fit(X, y, epochs=1, batch_size=batch_size, verbose=0, shuffle=False)
        model.reset_states()

During training, the internal state is reset after each epoch. While forecasting, we will not want to reset the internal state between forecasts. In fact, we would like the model to build up state as we forecast each time step in the test dataset.
This raises the question as to what would be a good initial state for the network prior to forecasting the test dataset.
In our code, we will seed the state by making a prediction on all samples in the training dataset. In theory, the internal state should be set up ready to forecast the next time step.

By default, the samples within an epoch are shuffled prior to being exposed to the network. Again, this is undesirable for the LSTM because we want the network to build up state as it learns across the sequence of observations. We can disable the shuffling of samples by setting “shuffle” to “False“.

Also by default, the network reports a lot of debug information about the learning progress and skill of the model at the end of each epoch. We can disable this by setting the “verbose” argument to the level of “0“.

In [None]:
model.fit(X, y, epochs=1, batch_size=batch_size, verbose=0, shuffle=False)

Currently, looking at the above code we set stateful as true. Which builds a state on each iteration during training.

#### Overfitting, dropout, recurrent dropout

There is no solution direct to overfitting. One of the main issues working with small datasets is the risk of overfitting is greater. To deal with overfitting Dropout/recurrent dropout/batch normalization layers may be useful. To add such layers we would have to do as it follows:

In [None]:
from keras.layers import Dropout

model.add(Dropout(0.05))

#### Forecasts

To make a forecast, we can call the predict() function on the model. This requires a 3D NumPy array input as an argument. In this case, it will be an array of one value, the observation at the previous time step.

The predict() function returns an array of predictions, one for each input row provided. Because we are providing a single input, the output will be a 2D NumPy array with one value.


In [None]:
def forecast_lstm(model, X, n_batch):
    # reshape input pattern to [samples, timesteps, features]
    X = X.reshape(1, 1, len(X))
    # make forecast
    forecast = model.predict(X, batch_size=n_batch)
    # convert to array
    return [x for x in forecast[0, :]]

#### Save model

Every time a new model is created it saves it automatically using the function model_save(). You may need to change the name but technically it won't overwrite an existing model because of random integer values is created on each time a new model is created

In [None]:
def model_save(save_model_directory, model, lookBack, delay, n_test, n_epochs):
    """
    Saves automatically each new model execution. Make sure to change:
        - model_name: A new name for your model
        # random_number: Generates a random number to make sure not to overwrite any existing model
    """
    random_number = str(random.randint(1,99999999999))
    model_stats = '_lookBack_'+str(lookBack)+'_delay_'+str(delay)+'_test_'+str(n_test)+'_epochs_'+str(n_epochs)
    model_name = '_modelName_'+'changeName'
    model_random_number = '_'+ random_number
    model.save(save_model_directory+model_stats+model_name+model_random_number)
    print("\n model saved as:", save_model_directory+model_stats+model_name+model_random_number)

## How to execute the program

The program has two build function: To execute a model, To load a model and To reformat de dataset.

When the program executes it will appear three options:

1. Execute a new model. execute_new_model()
2. Reformat the dataset. change_date_to_one_column()

In [2]:
def main():
    """
    OPTIONS MENU
    Allows to select differents tasks:
        1. Execute model will allow you to train a new model
        2. Change the dataset will allow you to reformat the dataset so it can run using models
    """
    options = 'What would you like to do?'
    options += '\n Enter 1 to execute a new model'
    options += '\n Enter 2 to format the dataset'
    options += '\n Your option: '
    choice = eval(input(options))
    
    if choice == 1:
        print("You selected to execute a new model")
        execute_new_model()
    elif choice == 2:
        print("You decided to reformat the dataset")
        change_date_to_one_column()
    else:
        print('You entered a wrong number')

Before running a new model. Make sure you are using a valid dataset. If you go to change_date_to_one_column() function there is more information. At the beginning of this document, I also talked about how the dataset columns needs to be changed

## How to change parameters

To change the parameters when executing a new model. You need to go at execute_new_model() function.
Make sure to follow the instruction when it comes to executing a new model or it might not run

In [None]:
def execute_new_model():
    
    ....
    
    neurons = 10
    lookBack = 12
    delay = 2
    test_samples = 12     
    n_epochs = 10
    n_batch = 1
    learning_rate = 0.01
    
    ....

I haven't explored much the different combinations. I am sure the model can give better results by tunning the above parameters (or adding more layers).