# **Part 3: Deep Learning for Time Series**


## Instructors: 

* Rockefeller
* Dr Yae Gaba
* Dr Colleen Farrelly
    
    
Time Series Applications, AMLD Africa, Morocco 2022. 

At the end of this tutorial, you will know the basics of:

*   Data preprocessing for time series forecasting
*   The concept of forward and backward pass when training RNNs
*   Customizing RNN and LSTM and learnable parameters 
*   Process input sequences through the network (forward pass)
*   Propagate gradients back into the network’s parameters (Backward pass)
*   Update the weights of the network.
*   Performing a a simple update rule: `weight = weight - learning_rate * gradient`






In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
%cd "/content/drive/MyDrive/AMLD_files"

# **Time Series as Supervised Learning**

# 1. Introduction

Time series forecasting can be framed as a supervised learning problem. This reframing of your
time series data allows you access to the suite of standard linear and nonlinear machine learning
algorithms on your problem. 

In this practical, you will discover how you can re-frame your time-series problem as a supervised learning problem for machine learning. After reading this notebook,
you will know:


1. What supervised learning is and how it is the foundation for all predictive modeling machine learning algorithms.


2. The sliding window method for framing a time series dataset and how to use it.


3. How to use the sliding window for multivariate data and multi-step forecasting.


Let's get started.

# 2. Supervised Machine Learning


The majority of practical machine learning uses supervised learning. Supervised learning is
where you have `input variables (X)` and an `output variable (y)` and you use an algorithm to
learn the mapping function from the input to the output.

$$Y = f(X)$$


The goal is to approximate the real underlying mapping so well that when you have new
input data $(X)$, you can predict the output variables $(y)$ for that data. Below is a contrived
example of a supervised learning dataset where each row is an observation comprised of one
input variable $(X)$ and one output variable to be predicted $(y)$.

In [None]:
X, y
5, 0.9
4, 0.8
5, 1.0
3, 0.7
4, 0.9

It is called supervised learning because the process of an algorithm learning from the training
dataset can be thought of as a teacher supervising the learning process. We know the correct
answers; the algorithm iteratively makes predictions on the training data and is corrected by
making updates. 

Learning stops when the algorithm achieves an acceptable level of performance.
Supervised learning problems can be further grouped into regression and classification problems.


* **Classification:** A classification problem is when the output variable is a category, such
as red and blue or disease and no disease.


* **Regression:** A regression problem is when the output variable is a real value, such as
dollars or weight. The contrived example above is a regression problem.

# 3. Sliding Window

Time series data can be phrased as supervised learning. Given a sequence of numbers for a time
series dataset, we can restructure the data to look like a supervised learning problem. We can
do this by using previous time steps as input variables and use the next time step as the output
variable. Let's make this concrete with an example. Imagine we have a time series as follows:

In [None]:
time, measure
1,    100
2,    110
3,    108
4,    115
5,    120

We can restructure this time series dataset as a supervised learning problem by using the
value at the previous time step to predict the value at the next time-step. Re-organizing the
time series dataset this way, the data would look as follows:

In [None]:
X,    y
?,   100
100, 110
110, 108
108, 115
115, 120
120,  ?

Take a look at the above transformed dataset and compare it to the original time series.
Here are some observations:

* We can see that the previous time step is the input (X) and the next time step is the output (y) in our supervised learning problem.


* We can see that the order between the observations is preserved, and must continue to be preserved when using this dataset to train a supervised model.


* We can see that we have no previous value that we can use to predict the first value in the sequence. We will delete this row as we cannot use it.

We can also see that we do not have a known next value to predict for the last value in
the sequence. We may want to delete this value while training our supervised model also.
The use of prior time steps to predict the next time step is called the sliding window method.
For short, it may be called the window method in some literature. 


In statistics and time series
analysis, this is called a **lag or lag method**. 

The number of previous time steps is called the
**window width or size of the lag**. This sliding window is the basis for how we can turn any time
series dataset into a supervised learning problem. From this simple example, we can notice a
few things:


1. We can see how this can work to turn a time series into either a regression or a classification
supervised learning problem for real-valued or labeled time series values.


2. We can see how once a time series dataset is prepared this way that any of the standard
linear and nonlinear machine learning algorithms may be applied, as long as the order of
the rows is preserved.


3. We can see how the width sliding window can be increased to include more previous time
steps.


4. We can see how the sliding window approach can be used on a time series that has more than one value, or so-called multivariate time series.

# 4. Sliding Window With Multivariates

The number of observations recorded for a given time in a time series dataset matters. Traditionally, different names are used:
    
* Univariate Time Series: These are datasets where only a single variable is observed
at each time, such as temperature each hour. The example in the previous section is a
univariate time series dataset.


* Multivariate Time Series: These are datasets where two or more variables are observed
at each time.


Most time series analysis methods, and even books on the topic, focus on univariate data.
This is because it is the simplest to understand and work with. Multivariate data is often more
difficult to work with. It is harder to model and often many of the classical methods do not
perform well.


Multivariate time series analysis considers simultaneously multiple time series.
It is, in general, much more complicated than univariate time series analysis

The sweet spot for using machine learning for time series is where classical methods fall
down. This may be with complex univariate time series, and is more likely with multivariate
time series given the additional complexity. Below is another worked example to make the
sliding window method concrete for multivariate time series. Assume we have the contrived
multivariate time series dataset below with two observations at each time step. Let's also assume
that we are only concerned with predicting **measure2.**

In [None]:
time, measure1, measure2
1,      0.2,      88
2,      0.5,      89
3,      0.7,      87
4,      0.4,      88
5,      1.0,      90

We can reframe this time series dataset as a supervised learning problem with a window
width of one. This means that we will use the previous time step values of `measure1` and
`measure2`. We will also have available the next time step value for measure1. We will then
predict the next time step value of **measure2.** This will give us 3 input features and one output
value to predict for each training pattern.

In [None]:
X1,    X2,    X3,      y
?,      ?,    0.2,    88
0.2,    88,   0.5,    89
0.5,    89,   0.7,    87
0.7,    87,   0.4,    88
0.4,    88,   1.0,    90
1.0,    90,    ?,      ?

We can see that as in the univariate time series example above, we may need to remove the
first and last rows in order to train our supervised learning model. This example raises the
question of what if we wanted to predict both measure1 and measure2 for the next time step?


The sliding window approach can also be used in this case. Using the same time series dataset
above, we can phrase it as a supervised learning problem where we predict both `measure1` and
`measure2` with the same window width of one, as follows.

In [None]:
X1,    X2,    y1,    y2
?,     ?,     0.2,   88
0.2,   88,    0.5,   89
0.5,   89,    0.7,   87
0.7,   87,    0.4,   88
0.4,   88,    1.0,   90
1.0,   90,    ?,     ?

Not many supervised learning methods can handle the prediction of multiple output values
without modification, but some methods, like artificial neural networks, have little trouble. **We
can think of predicting more than one value as predicting a sequence.** In this case, we were
predicting two different output variables, but we may want to predict multiple time-steps ahead
of one output variable. This is called multi-step forecasting.

# 5.  Sliding Window With Multiple Steps

The number of time steps ahead to be forecasted is important. Again, it is traditional to use
different names for the problem depending on the number of time-steps to forecast:
    
* One-step Forecast: This is where the next time step (t+1) is predicted.
* Multi-step Forecast: This is where two or more future time steps are to be predicted.



All of the examples we have looked at so far have been one-step forecasts. There are a
number of ways to model multi-step forecasting as a supervised learning problem. For now,
we are focusing on framing multi-step forecast using the sliding window method. Consider the
same univariate time series dataset from the first sliding window example above:

In [None]:
time, measure
1,    100
2,    110
3,    108
4,    115
5,    120

We can frame this time series as a two-step forecasting dataset for supervised learning with
a window width of one, as follows:

In [None]:
X1,     y1,    y2
?  ,    100,   110
100,    110,   108
110,    108,   115
108,    115,   120
115,    120,    ?
120,     ?,     ? 

We can see that the first row and the last two rows cannot be used to train a supervised
model. It is also a good example to show the burden on the input variables. Specifically, that
a supervised model only has X1 to work with in order to predict both y1 and y2. Careful
thought and experimentation are needed on your problem to find a window width that results
in acceptable model performance.

The end

# References:

1. Brownlee, Jason. Deep learning for time series forecasting: predict the future with MLPs, CNNs and LSTMs in Python. Machine Learning Mastery, 2018.
2. Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
3. Deep Learning for Time Series Forecasting, yijingchen https://github.com/Azure/DeepLearningForTimeSeriesForecasting

The end! 