## Time series to supervised learning conversion

### Adapted from:
- https://machinelearningmastery.com/time-series-forecasting-supervised-learning/
- https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/

In [1]:
%reset -f

import pandas as pd
import numpy as np

In [4]:
def series_to_supervised(data, n_in=1, n_out=1, drop_nan=True):
    """
    Frame a time series as a supervised learning dataset.    
    """
    
    n_vars = 1 if type(data) is list else data.shape[1]
    df = pd.DataFrame(data)
    cols, names = list(), list()
    
    # Input sequence --> t-n, t+1-n, ..., t-1
    for i in range(n_in, 0, -1):
        cols.append(df.shift(i))
        names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
        
    # Forecast sequence --> t, t+1, ..., t-1+n, t+n
    for i in range(0, n_out):
        cols.append(df.shift(-i))
        if i == 0:
            names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
        else:
            names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
            
    # Aggregate data
    agg = pd.concat(cols, axis=1)
    agg.columns = names
    
    # Drop rows containing NaN values
    if drop_nan:
        agg.dropna(inplace=True)
    
    return agg


## One-Step Univariate Forecasting

- It is standard practice in time series forecasting to use lagged observations (e.g. t-1) as input variables to forecast the current time step (t).

- This is called one-step forecasting.

The example below demonstrates a one lag time step (t-1) to predict the current time step (t).

In [5]:
dataTimeSeries = [x for x in range(10)]
dataSupervisedOneStep = series_to_supervised(dataTimeSeries, 3)

print dataTimeSeries
print "------------"
print dataSupervisedOneStep

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
------------
   var1(t-3)  var1(t-2)  var1(t-1)  var1(t)
3        0.0        1.0        2.0        3
4        1.0        2.0        3.0        4
5        2.0        3.0        4.0        5
6        3.0        4.0        5.0        6
7        4.0        5.0        6.0        7
8        5.0        6.0        7.0        8
9        6.0        7.0        8.0        9


## Multi-Step or Sequence Forecasting

- A different type of forecasting problem is using past observations to forecast a sequence of future observations.

- This may be called sequence forecasting or multi-step forecasting.

- We can frame a time series for sequence forecasting by specifying another argument. For example, we could frame a forecast problem with an input sequence of 2 past observations to forecast 2 future observations as follows:

In [6]:
dataSupervisedMultiStep = series_to_supervised(dataTimeSeries, 2, 2)
print dataSupervisedMultiStep

   var1(t-2)  var1(t-1)  var1(t)  var1(t+1)
2        0.0        1.0        2        3.0
3        1.0        2.0        3        4.0
4        2.0        3.0        4        5.0
5        3.0        4.0        5        6.0
6        4.0        5.0        6        7.0
7        5.0        6.0        7        8.0
8        6.0        7.0        8        9.0


## Multivariate Forecasting

- Another important type of time series is called multivariate time series.

- This is where we may have observations of multiple different measures and an interest in forecasting one or more of them.

- For example, we may have two sets of time series observations obs1 and obs2 and we wish to forecast one or both of these.

- We can call series_to_supervised() in exactly the same way.

In [7]:
dataTimeSeriesMultiVariate = pd.DataFrame()
dataTimeSeriesMultiVariate['Obs1'] = [x for x in range(10)]
dataTimeSeriesMultiVariate['Obs2'] = [x for x in range(50, 60)]

dataSupervisedOneStepMultiVariate = series_to_supervised(dataTimeSeriesMultiVariate.values)

print dataTimeSeriesMultiVariate
print "-----------------------"
print dataSupervisedOneStepMultiVariate

   Obs1  Obs2
0     0    50
1     1    51
2     2    52
3     3    53
4     4    54
5     5    55
6     6    56
7     7    57
8     8    58
9     9    59
-----------------------
   var1(t-1)  var2(t-1)  var1(t)  var2(t)
1        0.0       50.0        1       51
2        1.0       51.0        2       52
3        2.0       52.0        3       53
4        3.0       53.0        4       54
5        4.0       54.0        5       55
6        5.0       55.0        6       56
7        6.0       56.0        7       57
8        7.0       57.0        8       58
9        8.0       58.0        9       59


In [8]:
dataSupervisedOneStepMultiVariate.columns = ['X1', 'X2', 'X3', 'Y']
print dataSupervisedOneStepMultiVariate

    X1    X2  X3   Y
1  0.0  50.0   1  51
2  1.0  51.0   2  52
3  2.0  52.0   3  53
4  3.0  53.0   4  54
5  4.0  54.0   5  55
6  5.0  55.0   6  56
7  6.0  56.0   7  57
8  7.0  57.0   8  58
9  8.0  58.0   9  59


In [9]:
def conversion(inData, outData, inNames, outName='Y', n_in=1):
    n_vars = 1 if (type(inData) is list) or (len(inData.shape) == 1) else inData.shape[1]
    
    dfInput = pd.DataFrame(inData)
    dfOutput = pd.DataFrame(outData)
    
    cols, names = list(), list()
    
    for i in range(n_in, 0, -1):
        cols.append(dfInput.shift(i))
        cols.append(dfOutput.shift(i))
        names += [('%s(t-%d)' % (inNames[j], i)) for j in range(n_vars)]
        names += [('%s(t-%d)' % (outName, i))]
    
    cols.append(dfInput)
    cols.append(dfOutput)
    names += [('%s(t)' % inNames[j]) for j in range(n_vars)]
    names += [('%s(t)' % outName)]
    
    agg = pd.concat(cols, axis=1)
    agg.columns = names
    
    # Drop rows containing NaN values
    agg.dropna(inplace=True)
    return agg  


Unnamed: 0,A(t-2),B(t-2),C(t-2),Y(t-2),A(t-1),B(t-1),C(t-1),Y(t-1),A(t),B(t),C(t),Y(t)
2,1.0,5.0,9.0,1.0,2.0,6.0,10.0,0.0,3,7,11,0
3,2.0,6.0,10.0,0.0,3.0,7.0,11.0,0.0,4,8,12,1


In [10]:
eegData = np.genfromtxt('EEGEyeState.arff.csv', delimiter=',', skip_header=1)
eegHeader = np.genfromtxt('EEGEyeState.arff.csv', delimiter=',', max_rows=1, dtype=str)

eegInData = eegData[:, 0:14]
eegOutData = eegData[:, 14]

eegInNames = eegHeader[0:14]
eegOutName = eegHeader[14]

conversion(eegInData, eegOutData, eegInNames, eegOutName, 2)


Unnamed: 0,AF3(t-2),F7(t-2),F3(t-2),FC5(t-2),T7(t-2),P7(t-2),O1(t-2),O2(t-2),P8(t-2),T8(t-2),...,P7(t),O1(t),O2(t),P8(t),T8(t),FC6(t),F4(t),F8(t),AF4(t),eyeDetection(t)
2,4329.23,4009.23,4289.23,4148.21,4350.26,4586.15,4096.92,4641.03,4222.05,4238.46,...,4583.59,4096.92,4630.26,4207.69,4222.05,4206.67,4282.05,4628.72,4389.23,0.0
3,4324.62,4004.62,4293.85,4148.72,4342.05,4586.67,4097.44,4638.97,4210.77,4226.67,...,4582.56,4097.44,4630.77,4217.44,4235.38,4210.77,4287.69,4632.31,4396.41,0.0
4,4327.69,4006.67,4295.38,4156.41,4336.92,4583.59,4096.92,4630.26,4207.69,4222.05,...,4586.67,4095.90,4627.69,4210.77,4244.10,4212.82,4288.21,4632.82,4398.46,0.0
5,4328.72,4011.79,4296.41,4155.90,4343.59,4582.56,4097.44,4630.77,4217.44,4235.38,...,4587.18,4093.33,4616.92,4202.56,4232.82,4209.74,4281.03,4628.21,4389.74,0.0
6,4326.15,4011.79,4292.31,4151.28,4347.69,4586.67,4095.90,4627.69,4210.77,4244.10,...,4584.62,4089.74,4615.90,4212.31,4226.67,4201.03,4269.74,4625.13,4378.46,0.0
7,4321.03,4004.62,4284.10,4153.33,4345.64,4587.18,4093.33,4616.92,4202.56,4232.82,...,4583.08,4087.18,4614.87,4205.64,4230.26,4195.90,4266.67,4622.05,4380.51,0.0
8,4319.49,4001.03,4280.51,4151.79,4343.59,4584.62,4089.74,4615.90,4212.31,4226.67,...,4584.10,4091.28,4608.21,4187.69,4229.74,4202.05,4273.85,4627.18,4389.74,0.0
9,4325.64,4006.67,4278.46,4143.08,4344.10,4583.08,4087.18,4614.87,4205.64,4230.26,...,4582.56,4092.82,4608.72,4194.36,4228.72,4212.82,4277.95,4637.44,4393.33,0.0
10,4326.15,4010.77,4276.41,4139.49,4345.13,4584.10,4091.28,4608.21,4187.69,4229.74,...,4579.49,4087.69,4615.90,4206.15,4228.72,4210.77,4272.82,4631.79,4382.56,0.0
11,4326.15,4011.28,4276.92,4142.05,4344.10,4582.56,4092.82,4608.72,4194.36,4228.72,...,4581.54,4086.15,4615.38,4195.90,4223.59,4197.44,4262.05,4613.33,4370.77,0.0
