# Introduction

<div class="alert alert-block alert-warning">
<font color=black><br>

**What?** Feature engineering for time series

<br></font>
</div>

# Three types of feature

<div class="alert alert-block alert-info">
<font color=black><br>

- **Date Time Features**: these are components of the time step itself for each observation.
- **Lag Features**: these are values at prior time steps.
- **Window Features**: these are a summary of values over a fixed window of prior time steps.

<br></font>
</div>

# Import modules

In [2]:
from pandas import read_csv
from pandas import DataFrame
from pandas import concat

# Import dataset

In [3]:
series = read_csv('../DATASETS/daily-min-temperatures.csv', header=0, index_col=0, parse_dates=True, squeeze=True)

# Date Time Features

In [4]:

dataframe = DataFrame()
dataframe['month'] = [series.index[i].month for i in range(len(series))] 
dataframe['day'] = [series.index[i].day for i in range(len(series))] 
dataframe['temperature'] = [series[i] for i in range(len(series))]
print(dataframe.head(5))

   month  day  temperature
0      1    1         20.7
1      1    2         17.9
2      1    3         18.8
3      1    4         14.6
4      1    5         15.8


<div class="alert alert-block alert-info">
<font color=black><br>

- Using just the month and day information alone to predict temperature is not sophisticated and will likely result in a poor model. 
- Nevertheless, this information coupled with additional engineered features may ultimately result in a better model.  

<br></font>
</div>

# Lag Features

<div class="alert alert-block alert-info">
<font color=black><br>

- The addition of lag features is called the sliding WINDOW METHOD, in this case with a window width of 1. 
- It is as though we are sliding our focus along the time series for each observation with an interest in only what is within the window width.

<br></font>
</div>

In [5]:
temps = DataFrame(series.values)
dataframe = concat([temps.shift(1), temps], axis=1) 
dataframe.columns = ['t', 't+1'] 
print(dataframe.head(5))

      t   t+1
0   NaN  20.7
1  20.7  17.9
2  17.9  18.8
3  18.8  14.6
4  14.6  15.8


<div class="alert alert-block alert-info">
<font color=black><br>

- We can expand the window width and include more lagged features. 
- For example, below is the above case modified to include the last 3 observed values to predict the value at the next time step.  

<br></font>
</div>

In [6]:
temps = DataFrame(series.values)
dataframe = concat([temps.shift(3), temps.shift(2), temps.shift(1), temps], axis=1) 
dataframe.columns = ['t-2', 't-1', 't', 't+1']
print(dataframe.head(5))

    t-2   t-1     t   t+1
0   NaN   NaN   NaN  20.7
1   NaN   NaN  20.7  17.9
2   NaN  20.7  17.9  18.8
3  20.7  17.9  18.8  14.6
4  17.9  18.8  14.6  15.8


<div class="alert alert-block alert-info">
<font color=black><br>

- A difficulty with the sliding window approach is how large to make the window for your problem. 
- Perhaps a good starting point is to perform a sensitivity analysis and try a suite of different window widths to in turn create a suite of different views of your dataset and see which results in better performing models.
- There will be a point of diminishing returns.

<br></font>
</div>

# Rolling Window Statistics

<div class="alert alert-block alert-info">
<font color=black><br>

- We can calculate the mean of the current and previous values and use that to predict the next value. 
- For the temperature data, we would have to wait 3 time steps before we had 2 values to take the average of before we could use that value to predict a 3rd value.

<br></font>
</div>

In [9]:
temps = DataFrame(series.values)
shifted = temps.shift(1)
window = shifted.rolling(window=2)
means = window.mean()
dataframe = concat([means, temps], axis=1) 
dataframe.columns = ['mean(t-1,t)', 't+1']
print(dataframe.head(5))

   mean(t-1,t)   t+1
0          NaN  20.7
1          NaN  17.9
2        19.30  18.8
3        18.35  14.6
4        16.70  15.8


<div class="alert alert-block alert-info">
<font color=black><br>

- Finally, the third row shows the expected value of 19.30 (the mean of 20.7 and 17.9) used to predict the 3rd value in the series of 18.8. 
- Below is another example that shows a window width of 3 and a dataset comprised of more summary statistics, specifically the minimum, mean, and maximum value in the window.

<br></font>
</div>

In [10]:
temps = DataFrame(series.values)
width = 3
shifted = temps.shift(width - 1)
window = shifted.rolling(window=width)
dataframe = concat([window.min(), window.mean(), window.max(), temps], axis=1) 
dataframe.columns = ['min', 'mean', 'max', 't+1']
print(dataframe.head(5))

    min       mean   max   t+1
0   NaN        NaN   NaN  20.7
1   NaN        NaN   NaN  17.9
2   NaN        NaN   NaN  18.8
3   NaN        NaN   NaN  14.6
4  17.9  19.133333  20.7  15.8


<div class="alert alert-block alert-info">
<font color=black><br>

- We can spot-check the correctness of the values on the 5th row (array index 4). 
- We can see that indeed 17.9 is the minimum and 20.7 is the maximum of values in the window of [20.7, 17.9, 18.8].

<br></font>
</div>

# Expanding Window Statistics

<div class="alert alert-block alert-info">
<font color=black><br>

- Below is an example of calculating the minimum, mean, and maximum values of the expanding window on the daily temperature dataset.

<br></font>
</div>

In [11]:
temps = DataFrame(series.values)
window = temps.expanding()
dataframe = concat([window.min(), window.mean(), window.max(), temps.shift(-1)], axis=1) 
dataframe.columns = ['min', 'mean', 'max', 't+1']
print(dataframe.head(5))

    min       mean   max   t+1
0  20.7  20.700000  20.7  17.9
1  17.9  19.300000  20.7  18.8
2  17.9  19.133333  20.7  14.6
3  14.6  18.000000  20.7  15.8
4  14.6  17.560000  20.7  15.8


# References

<div class="alert alert-warning">
<font color=black>

- https://machinelearningmastery.com/?s=time+series&post_type=post&submit=Search
- Dataset can be donwload from: https://github.com/jbrownlee/Datasets

</font>
</div>