# Re-framing time serias as supervised learning problems.

From a sequence of pairs of input and output sequences. 

Supervised learning concept:   
https://machinelearningmastery.com/time-series-forecasting-supervised-learning/

Example:  
https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/


In [94]:
import pandas as pd
import numpy as np

## Time series vs supervised learning (Concepts)

**Time series:** A sequence of numbers ordered by a time index  

**Supervised learning:** A supervised learning probles has input patterns (X) and output patterns or variables (y). We want to predict y with from X, and we use an algorithm to learn the mapping function.  

Y = f(X)  

**Categories of SL:** 
- Classification
- Regression

**Sliding window (Lag method):** The use of prior steps to predict the next step.

**Window width:** The number of previous time steps 

**Univariate time series:** Only a single variable is observed at each time

```
# Univariate, two-step forecasting. Window width of 2
X1,  y1,  y2
?,   100, 110
100, 110, 108
110, 108, 115
108, 115, 120
115, 120, ?
120, ?,    ?
```

**Multivariate time series:** Two or more variables observed at each time
```
# Multivariate
X1,  X2, y1,  y2
?,   ?,  0.2, 88
0.2, 88, 0.5, 89
0.5, 89, 0.7, 87
0.7, 87, 0.4, 88
0.4, 88, 1.0, 90
1.0, 90, ?,    ?
```

## Convert a Time Series to Supervised Learning in Python

Shift function. 

```
df['t-1'] = df['t'].shift(1)        # Past observations
df['Col+1'] = df['Col] .shift(-1)   # Forecasts (like in example above)
```

We use lagged observations (e.g. t-1) as input variables to forecast the current time step (t).


### The `series_to_supervised()` Function

#### Arguments & results

* **data:** Sequence of observations as a list or 2D NumPy array. Required.
* **n_in:** Number of lag observations as input (X). Values may be between [1..len(data)] Optional.
* **n_out:** Number of observations as output (y). Values may be between [0..len(data)-1]. Optional. 
* **dropnan:** Boolean whether or not to drop rows with NaN values. Optional.  

* **return:** df of series framed for supervised learning


In [95]:
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
	"""
	Frame a time series as a supervised learning dataset.
	Arguments:
		data: Sequence of observations as a list or NumPy array.
		n_in: Number of lag observations as input (X).
		n_out: Number of observations as output (y).
		dropnan: Boolean whether or not to drop rows with NaN values.
	Returns:
		Pandas DataFrame of series framed for supervised learning.
	"""
	n_vars = 1 if type(data) is list else data.shape[1]
	df = pd.DataFrame(data)
	cols, names = list(), list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
		names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
		if i == 0:
			names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
		else:
			names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
	# put it all together
	agg = pd.concat(cols, axis=1)
	agg.columns = names
	# drop rows with NaN values
	if dropnan:
		agg.dropna(inplace=True)
	return agg

In [96]:
# Univariate sample
values = np.around(np.random.default_rng().random(10) + np.arange(0,10), 1)
values = list(values)
print(values)

[0.1, 1.8, 2.2, 3.6, 4.3, 5.5, 6.5, 7.1, 8.4, 9.8]


In [97]:
# Example of a t-1 (lagged) df, framed to predict the current time (t)
print(series_to_supervised(values))

   var1(t-1)  var1(t)
1        0.1      1.8
2        1.8      2.2
3        2.2      3.6
4        3.6      4.3
5        4.3      5.5
6        5.5      6.5
7        6.5      7.1
8        7.1      8.4
9        8.4      9.8


In [98]:
# Example t-3 lagged sequence
print(series_to_supervised(values, n_in=2))

   var1(t-2)  var1(t-1)  var1(t)
2        0.1        1.8      2.2
3        1.8        2.2      3.6
4        2.2        3.6      4.3
5        3.6        4.3      5.5
6        4.3        5.5      6.5
7        5.5        6.5      7.1
8        6.5        7.1      8.4
9        7.1        8.4      9.8


### Multi-Step or Sequence Forecasting

**Sequence | Multi-step forecasting:** Use of past observations to forecast furture observations.



In [99]:
# A sequence of 2 past observations to forecast 2 future observations 
print(series_to_supervised(values, n_in=2, n_out=2) )

   var1(t-2)  var1(t-1)  var1(t)  var1(t+1)
2        0.1        1.8      2.2        3.6
3        1.8        2.2      3.6        4.3
4        2.2        3.6      4.3        5.5
5        3.6        4.3      5.5        6.5
6        4.3        5.5      6.5        7.1
7        5.5        6.5      7.1        8.4
8        6.5        7.1      8.4        9.8


### Multivariate Forecasting

In [103]:
# Multivariate sample
values = pd.DataFrame()
values['ob1'] = np.around(np.random.default_rng().random(10) + np.arange(0,10), 1)
values['ob2'] = np.around(np.random.default_rng().random(10) + np.arange(50,60), 1)
print(values)

   ob1   ob2
0  0.1  50.7
1  1.8  51.3
2  2.2  52.2
3  3.1  53.5
4  4.8  54.2
5  5.5  55.8
6  6.9  56.1
7  7.8  57.5
8  8.2  58.3
9  9.2  59.5


In [104]:
# Example of a t-1 (lagged) df, framed to predict the current time (t)
print(series_to_supervised(values.values))

   var1(t-1)  var2(t-1)  var1(t)  var2(t)
1        0.1       50.7      1.8     51.3
2        1.8       51.3      2.2     52.2
3        2.2       52.2      3.1     53.5
4        3.1       53.5      4.8     54.2
5        4.8       54.2      5.5     55.8
6        5.5       55.8      6.9     56.1
7        6.9       56.1      7.8     57.5
8        7.8       57.5      8.2     58.3
9        8.2       58.3      9.2     59.5


In [105]:
# Example of a reframed df with 1 time step as input and 2 time steps as forecast sequence.
print(series_to_supervised(values.values, 1, 2))

   var1(t-1)  var2(t-1)  var1(t)  var2(t)  var1(t+1)  var2(t+1)
1        0.1       50.7      1.8     51.3        2.2       52.2
2        1.8       51.3      2.2     52.2        3.1       53.5
3        2.2       52.2      3.1     53.5        4.8       54.2
4        3.1       53.5      4.8     54.2        5.5       55.8
5        4.8       54.2      5.5     55.8        6.9       56.1
6        5.5       55.8      6.9     56.1        7.8       57.5
7        6.9       56.1      7.8     57.5        8.2       58.3
8        7.8       57.5      8.2     58.3        9.2       59.5
