In [None]:
import featuretools as ft
from featuretools.primitives import RollingMean, NumericLag, RollingMin

# Feature Engineering for Time Series Problems

- Explain time series problem and how it presents a different feature engineering than other problem types.
Time series forecasting is different from other machine learning problems in that there is an inherent temporal ordering to the data. The ordering comes from a time index column, so at a specific point in time, we may have knowlege about earlier observations but not later ones. If the data is unordered, it’d be hard to see any overall trend or seasonality, but when sorted by date, any relationships that exist in the data can be seen and used when making predictions (winter is cold; summer is hot!). Notice how this is different from non-time series data, which can be presented in any order without having an impact on the resulting predictions. Therefore, in time series problems, we need to set a time_index column that is just as important for defining our problem as the target_column is.

    In that demo, the primary impact the data's temporal ordering has on modeling occurs in splitting the data into training and test data where we use a cutoff_time to determine when that split occurs and make sure no future values are exposed in the feature engineering process. We'll have to account for the same things in this demo since we have to be very careful when splitting our data to not expose future observations in the feature engineering stage.

But once the data is split, the predict-remaining-useful-life demo calculates aggregations over the entire life of the engine up until the last available time. In that case, that makes sense! The goal is to predict one future value (the remaining useful life) per engine, so we should look at its entire available history when making that prediction. There is an entire dataframe of engines for which we're predicting this value, but the individual engines are not dependent on one another.

However, in this demo and in many time series problems, we're trying to predict a sequential series of values that are highly dependent on one another. In these cases, we can exploit the fact that more recent observations are more predictive than more distant ones--when trying to determine tomorrow's temperature, knowing today's temperature may be the most predictive piece of information we can get. We also only have one table in our dataset, so any aggregations have to be calculated over earlier data from the same column.

- Introduce dataset - weather 
We'll be working with a temperature dataframe of minimum daily temperatures that includes two columns: `Temp` and `Date`. `Date` is our time index, and `Temp` is our target column, which means that the end-goal for our feature engineering is to help us predict future temperatures. 

    - This is single table, so the concepts introduced about cutoff times and last time indices do not exist in the same way. A cutoff time assumes that each row exists only after its time index value, not that the feature engineering window could actually be entirely before the instance itself. 


- Introduce problem 
**Assumes that the data has evenly spaced intervals - support for unevenly spaced intervals is ongoing**

The fact that we can build features from our target column comes from its temporal nature. If we are at a point in time `t`, we have access to information from times less than `t`, and we do not have information from times greater than `t`. Our limitations in feature engineering, then, will come from when exactly before `t` we have access to the data. Consider an example where we're recording data that takes a week to ingest; the earliest data we have access to is from seven days ago, or `t - 7`. We'll call this our `gap`. We also need to determine how far back in time before `t - 7` we can go. Too far back, and we may lose the potency of our recent observations, but too recent, and we may not capture the full spectrum of behaviors displayed by the data. In this example, let's say that we only want to look at 5 days worth of data at a time. We'll call this our `window_length`. 

With these two parameters (`gap` and `window_length`) set, we define our feature engineering window. We can aggregate features over this window as if it were a child DataFrame. 



In [None]:
gap = 7
window_length = 5

- Introduce features - use pictures with windows? 

There are three types of primitives we'll focus on for time series problems. One of them will extract features from the time index, and the other two types will extract features from our target column. 

### Datetime Transform Primitives

We need a way of implicating time in our time series features. Yes, using recent temperatures is incredibly predictive in determining future temperatures, but there is also a whole host of historical data suggesting that the month of the year is a pretty good indicator for the temperature outside. However, if we look at the data, we'll see that, though the day changes, the observations are always taken at the same hour, so the `Hour` primitive will not likely be useful. Of course, in a dataset that is measured at an hourly frequency or one more granular, `Hour` may be incrediby predictive. 

In [None]:
datetime_primitives = ['Day', "Year", "Weekday"]

### Delaying Primitives

The simplest thing we can do with our target column is to build features that are delayed (or lagging) versions of the target column. We'll make one feature per observation in our feature engineering windows, so we'll range over time from `t - gap - window_length` to `t - gap`. 

For this purpose, we can use our `NumericLag` primitive and create one primitive for each instance in our window. 

In [None]:
delaying_primitives = [NumericLag(periods=i + gap) for i in range(window_length)]

### Rolling Transform Primitives

Since we have access to the entire feature engineering window, we can aggregate over that window. Featuretools has several rolling primitives with which we can achieve this. Here, we'll use the `RollingMean` primitives `RollingMin`, setting the `gap` and `window_length` accordingly. Here, the gap is incredibly important, because when the gap is zero, it means the current observation's taret value is present in the window, which exposes our target.

This concern also exists for other primitives that reference earlier values in the dataframe. Because of this, when using primitives for time series feature engineering, one must be incredibly careful to not use primitives on the target column that incorporate the current observation when calculating a feature value.

In [None]:
rolling_mean_primitive = RollingMean(window_length=window_length, 
                                     gap=gap,
                                     min_periods=window_length)

rolling_min_primitive = RollingMin(window_length=window_length, 
                                     gap=gap,
                                     min_periods=window_length)

## Run DFS

Now that we've definied our time series primitives, we can pass them into DFS and get our feature matrix! 

In [None]:
from featuretools.demo.weather import load_weather
es = load_weather()

es['temperatures'].head()

In [None]:




fm, f = ft.dfs(entityset=es,
               target_dataframe_name='temperatures',
              trans_primitives = (datetime_primitives + 
                                  delaying_primitives + 
                                  [rolling_mean_primitive, rolling_min_primitive])
              )

fm