# Handling time

https://featuretools.alteryx.com/en/stable/getting_started/handling_time.html

Use the time index and provide a cutoff time during feature calculation to automatically filter out data after the cutoff time. Useful for e.g. training/test set splits.

## Time index

Used to specify a data column to be used as a timestamp.

In [6]:
import featuretools as ft
import pandas as pd

In [3]:
es = ft.demo.load_mock_customer(return_entityset = True, random_seed = 0)
es["transactions"].head()

Unnamed: 0,transaction_id,session_id,transaction_time,product_id,amount,_ft_last_time
298,298,1,2014-01-01 00:00:00,5,127.64,2014-01-01 00:00:00
2,2,1,2014-01-01 00:01:05,2,109.48,2014-01-01 00:01:05
308,308,1,2014-01-01 00:02:10,3,95.06,2014-01-01 00:02:10
116,116,1,2014-01-01 00:03:15,4,78.92,2014-01-01 00:03:15
371,371,1,2014-01-01 00:04:20,3,31.54,2014-01-01 00:04:20


In this example, `transaction-time` is the time index. The `_ft_last_time` column was generated by Featuretools.

In [4]:
es["customers"]

Unnamed: 0,customer_id,zip_code,join_date,birthday,_ft_last_time
5,5,60091,2010-07-17 05:27:50,1984-07-28,2014-01-01 08:09:40
4,4,60091,2011-04-08 20:08:14,2006-08-15,2014-01-01 05:31:30
1,1,60091,2011-04-17 10:48:33,1994-07-18,2014-01-01 07:26:20
3,3,13244,2011-08-13 15:42:34,2003-11-21,2014-01-01 09:00:35
2,2,13244,2012-04-15 23:31:04,1986-08-18,2014-01-01 08:23:45


Here note that only `join_date` should be used as the time index; DOB should not be used for that purpose.

## Cutoff time

Specifies the last point in time that data can be used to calculate features. For example, to set a cutoff time of 4 AM on Jan 1, 2014:

In [7]:
fm, features = ft.dfs(
    entityset = es,
    target_dataframe_name="customers",
    cutoff_time = pd.Timestamp("2014-1-1 04:00"),
    instance_ids = [1, 2, 3],
    cutoff_time_in_index = True,
)
fm

Unnamed: 0_level_0,Unnamed: 1_level_0,zip_code,COUNT(sessions),MODE(sessions.device),NUM_UNIQUE(sessions.device),COUNT(transactions),MAX(transactions.amount),MEAN(transactions.amount),MIN(transactions.amount),MODE(transactions.product_id),NUM_UNIQUE(transactions.product_id),...,STD(sessions.SKEW(transactions.amount)),STD(sessions.SUM(transactions.amount)),SUM(sessions.MAX(transactions.amount)),SUM(sessions.MEAN(transactions.amount)),SUM(sessions.MIN(transactions.amount)),SUM(sessions.NUM_UNIQUE(transactions.product_id)),SUM(sessions.SKEW(transactions.amount)),SUM(sessions.STD(transactions.amount)),MODE(transactions.sessions.device),NUM_UNIQUE(transactions.sessions.device)
customer_id,time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1,2014-01-01 04:00:00,60091,4,tablet,3,67,139.23,74.002836,5.81,4,5,...,0.500353,271.917637,540.04,304.6017,27.62,20.0,-0.505043,169.572874,tablet,3
2,2014-01-01 04:00:00,13244,4,desktop,2,49,146.81,84.7,12.07,4,5,...,0.324809,307.743859,569.29,340.791792,105.24,20.0,0.045171,157.262738,desktop,2
3,2014-01-01 04:00:00,13244,1,tablet,1,15,146.31,62.791333,8.19,1,5,...,,,146.31,62.791333,8.19,5.0,0.618455,47.264797,tablet,1


We can provide cutoff times in a dataframe.

In [9]:
cutoff_times = pd.DataFrame()
cutoff_times["customer_id"] = [1, 2, 3, 1]
cutoff_times["time"] = pd.to_datetime(
    ["2014-1-1 04:00", "2014-1-1 05:00", "2014-1-1 06:00", "2014-1-1 08:00"]
)
cutoff_times["label"] = [True, True, False, True]
cutoff_times
fm, features = ft.dfs(
    entityset=es,
    target_dataframe_name="customers",
    cutoff_time=cutoff_times,
    cutoff_time_in_index=True,
)
fm

Unnamed: 0_level_0,Unnamed: 1_level_0,zip_code,COUNT(sessions),MODE(sessions.device),NUM_UNIQUE(sessions.device),COUNT(transactions),MAX(transactions.amount),MEAN(transactions.amount),MIN(transactions.amount),MODE(transactions.product_id),NUM_UNIQUE(transactions.product_id),...,STD(sessions.SUM(transactions.amount)),SUM(sessions.MAX(transactions.amount)),SUM(sessions.MEAN(transactions.amount)),SUM(sessions.MIN(transactions.amount)),SUM(sessions.NUM_UNIQUE(transactions.product_id)),SUM(sessions.SKEW(transactions.amount)),SUM(sessions.STD(transactions.amount)),MODE(transactions.sessions.device),NUM_UNIQUE(transactions.sessions.device),label
customer_id,time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1,2014-01-01 04:00:00,60091,4,tablet,3,67,139.23,74.002836,5.81,4,5,...,271.917637,540.04,304.6017,27.62,20.0,-0.505043,169.572874,tablet,3,True
2,2014-01-01 05:00:00,13244,5,desktop,2,62,146.81,83.149355,12.07,4,5,...,266.912832,688.14,418.096407,127.06,25.0,-0.269747,190.987775,desktop,2,True
3,2014-01-01 06:00:00,13244,4,desktop,2,44,146.31,65.174773,6.65,1,5,...,417.557763,493.07,290.968018,126.66,16.0,0.860577,119.136697,desktop,2,False
1,2014-01-01 08:00:00,60091,8,mobile,3,126,139.43,71.631905,5.81,4,5,...,279.510713,1057.97,582.193117,78.59,40.0,-0.476122,312.745952,mobile,3,True


Now it's straightforward to set a training window when performing DFS.

In [10]:
window_fm, window_features = ft.dfs(
    entityset=es,
    target_dataframe_name="customers",
    cutoff_time=cutoff_times,
    cutoff_time_in_index=True,
    training_window="2 hour",
)

window_fm

Unnamed: 0_level_0,Unnamed: 1_level_0,zip_code,COUNT(sessions),MODE(sessions.device),NUM_UNIQUE(sessions.device),COUNT(transactions),MAX(transactions.amount),MEAN(transactions.amount),MIN(transactions.amount),MODE(transactions.product_id),NUM_UNIQUE(transactions.product_id),...,STD(sessions.SUM(transactions.amount)),SUM(sessions.MAX(transactions.amount)),SUM(sessions.MEAN(transactions.amount)),SUM(sessions.MIN(transactions.amount)),SUM(sessions.NUM_UNIQUE(transactions.product_id)),SUM(sessions.SKEW(transactions.amount)),SUM(sessions.STD(transactions.amount)),MODE(transactions.sessions.device),NUM_UNIQUE(transactions.sessions.device),label
customer_id,time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1,2014-01-01 04:00:00,60091,2,desktop,2,27,139.09,76.95037,5.81,4,5,...,18.667619,271.81,155.6045,12.59,10.0,-0.604638,86.730914,desktop,2,True
2,2014-01-01 05:00:00,13244,3,desktop,2,31,146.81,84.051935,12.07,4,5,...,203.331699,404.04,253.240615,90.35,15.0,-0.110009,109.500185,desktop,2,True
3,2014-01-01 06:00:00,13244,3,desktop,1,29,128.26,66.407586,6.65,1,5,...,477.281339,346.76,228.176684,118.47,11.0,0.242122,71.8719,desktop,1,False
1,2014-01-01 08:00:00,60091,3,mobile,2,47,139.43,66.471277,5.91,4,5,...,330.655558,384.44,198.98475,24.61,15.0,-0.003438,107.128899,mobile,2,True


Various options are included to tweak the behavior, e.g. `include_cutoff_time = False` would be an open interval, and not include entries equal to that cutoff time.

## Rounding cutoff times

Cutoff times can be rounded off to a chosen time unit. This speeds up DFS at the cost of losing a (small) bit of information.