# Deep Learning for Time Series Forecasting

Sources: 
- https://machinelearningmastery.com/how-to-get-started-with-deep-learning-for-time-series-forecasting-7-day-mini-course/
- Machine Learning with Python Cookbook by Chris Albon
- Talk by Ilja Rasin (https://www.linkedin.com/in/iljarasin/) at IBM Developer Unconference June 2019, Switzerland

I found this course via LinkedIn posted by Steve Nouri (https://www.linkedin.com/in/stevenouri/)

## Lesson 01: Promise of Deep Learning

In this lesson, you will discover the promise of deep learning methods for time series forecasting. <br>
Generally, neural networks like Multilayer Perceptrons or MLPs provide capabilities that are offered by few algorithms, such as:<br>
 - Robust to Noise. Neural networks are robust to noise in input data and in the mapping function and can even support learning and prediction in the presence of missing values.
 - Nonlinear. Neural networks do not make strong assumptions about the mapping function and readily learn linear and nonlinear relationships.
 - Multivariate Inputs. An arbitrary number of input features can be specified, providing direct support for multivariate forecasting.
 - Multi-step Forecasts. An arbitrary number of output values can be specified, providing direct support for multi-step and even multivariate forecasting.

#### From Machine Learning with Python Cookbook 
> Multilayer perceptrons are feedforward neural networks and represent the simplest artificial neural network used in any real-world setting. The name feedforward comes from the fact that the observations feature values are fed forward through the network. Each layer aims to transform the feature values so that the output at the end is the same as the target's value. 

> Forward propagagion means that an observation (usually a set of observations called a batch) is fed through the network and the output is compared with the true value of the observation using a loss function. 

> Backward propagation means that after the forward propagation the algorithm goes back through the network identifying how much each parameter has contributed to the error between predicted and true value. The optimization algorithm determines at each parameter how much each weight should be adjusted to improve the output.

> The way neural networks learn is by repeating this process of forward and backpropagation for every observation multiple times. Each time all observations have been sent through the network is called an epoch. 

### Your Task

For this lesson you must suggest one capability from both Convolutional Neural Networks and Recurrent Neural Networks that may be beneficial in modeling time series forecasting problems.

Post your answer in the comments below. I would love to see what you discover.



### Answer

##### Convolutional Neural Networks (CNN):
    Sources: 
        - https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
        
CNNs have proven to be very effective in areas such as image recognition and classification. It takes an imput image, assigns importance, meaning that it learns weights 
and biases. The architecture of a CNN mimics the connectivity pattern of Neurons in the Human Brain. It is dense, meaning that "everything is connected with everything" and the resulting network is dense. Typically, a CNN consists of several layers with four main operations (typically there is a sequence of many CONV, RELU, CONV, RELU, POOL, CONV, RELU, VON, RELU, POOL, ...):
    
1. Convolution
    Extract features from the input image using a feature detector (kernel). The resulting (filtered) image is called a Feature Map. In a CNN there are typically many different type of filters producing a Feature Map with a certain depth.
2. ReLU (Rectified Linear Unit)
    Introduces Non Linearity, by adding an additional element wise operation (on each pixel) after every Convolution operation. It replaces all negative pixel values in the Feature Map with zero.
3. Pooling or Sub Sampling
    Reduces the dimensionality of each Feature Map while retaining the most important information. Spatial pooling is usually Max, Average, Sum, ... For example in Max Pooling, a submatrix of the Feature Map is selected (i.e. 2x2) and take the largest element from the rectified feature map within this submatrix. Pooling progressively reduces spatial size of the input representation. It reduces dimensionality, also reduces number of parameters and computations in the network as a means to control overfitting. It also makes the network invariant to small transformations and help to arrive at an almost scale invariant representation of the image
4. Classification
    In the last step, a MLP is used in combination with i.e. a Softmax activation function to output probabilities, i.e. what is displayed in the image. 
        

#### Recurrent Neural Networks (RNN) 
    Sources:
        -https://machinelearningmastery.com/promise-recurrent-neural-networks-time-series-forecasting/
RNNS (like LSTM) allow the explicit handling of order between observations when learning a mapping function from inputs to outputs. What this means is that there can be a layout of the network such that the connection between input layer and encoder layer 1 is dense, the connection within the network are recurrent, i.e. not dense, and the connection between decoder and output layer is again dense. That way the encoder layer learns to "read" the data and puts out a prediction, but the decoder layer does not consider the output of the encoder, it only considers the "thought" process that the encoder underwent and takes it from there.

The promise of recurrent neural networks is that the temporal dependence in the input data can be learned. That a fixed set of lagged observations does not need to be specified. Rather than having a single multi-tasking cell, the model will use two specialised cells. One for memorising important events of the past (encoder) and one for converting the important events into a prediction of the future (decoder).


## Lesson 02: How to Transform Data for Time Series

In this lesson, you will discover how to transform your time series data into a supervised learning format.

The majority of practical machine learning uses supervised learning.

Supervised learning is where you have input variables (X) and an output variable (y) and you use an algorithm to learn the mapping function from the input to the output. The goal is to approximate the real underlying mapping so well that when you have new input data, you can predict the output variables for that data.

Time series data can be phrased as supervised learning.

Given a sequence of numbers for a time series dataset, we can restructure the data to look like a supervised learning problem. We can do this by using previous time steps as input variables and use the next time step as the output variable.

### Your Task

For this lesson you must develop Python code to transform the daily female births dataset into a supervised learning format with some number of inputs and one output.

You can download the dataset from here: daily-total-female-births.csv

Post your answer in the comments below. I would love to see what you discover.

### Answer


In [3]:
import pandas as pd

df = pd.read_csv('./data/time_series_course_data/daily-total-female-births.csv')
df.head()

Unnamed: 0,Date,Births
0,1959-01-01,35
1,1959-01-02,32
2,1959-01-03,30
3,1959-01-04,31
4,1959-01-05,44


In [44]:
import numpy as np

# define the window width
window_size = 3

# make a test series to develop algorithm
s = pd.Series(np.arange(1,100,1))

# loop to make a new row in the dataframe 
df_window = pd.DataFrame()
for ii in range(0,len(s)-window_size):
    t = s.shift(-ii).values[0:window_size+1]
    df_window = df_window.append(pd.Series(t),ignore_index=True)
    
#     print(t, np.isnan(t).any()) # I used this to check if one of the rows contain a nan
print("Head of df:")
print(df_window.head(),"\n")

print("Tail of df:")
print(df_window.tail(),"\n")

Head of df:
     0    1    2    3
0  1.0  2.0  3.0  4.0
1  2.0  3.0  4.0  5.0
2  3.0  4.0  5.0  6.0
3  4.0  5.0  6.0  7.0
4  5.0  6.0  7.0  8.0 

Tail of df:
       0     1     2     3
91  92.0  93.0  94.0  95.0
92  93.0  94.0  95.0  96.0
93  94.0  95.0  96.0  97.0
94  95.0  96.0  97.0  98.0
95  96.0  97.0  98.0  99.0 



In [None]:
# make a function

def transformDataSetTimeSeries(s):
    """
    Takes a series as an input and 
    """

In [6]:
df['Births']

0      35
1      32
2      30
3      31
4      44
5      29
6      45
7      43
8      38
9      27
10     38
11     33
12     55
13     47
14     45
15     37
16     50
17     43
18     41
19     52
20     34
21     53
22     39
23     32
24     37
25     43
26     39
27     35
28     44
29     38
       ..
335    32
336    46
337    41
338    34
339    33
340    36
341    49
342    43
343    43
344    34
345    39
346    35
347    52
348    47
349    52
350    39
351    40
352    42
353    42
354    53
355    39
356    40
357    38
358    44
359    34
360    37
361    52
362    48
363    55
364    50
Name: Births, Length: 365, dtype: int64