# Feature Tools toy example

In [1]:
# suppress warnings
import warnings
warnings.filterwarnings("ignore")

In [2]:
# Import featuretools module
import featuretools as ft
import pandas as pd

### Import dataset

In [3]:
# Load toy dataset
data = ft.demo.load_mock_customer()

In [4]:
type(data)

dict

**Prepare data**

In this toy dataset, there are 3 tables.
Each **table is called an entity** in Featuretools.

* **customers**: unique customers who had sessions

* **sessions**: unique sessions and associated attributes

* **transactions**: list of events in this session

In [5]:
# TABLE 1: Customers
customers_df = data["customers"]
customers_df

Unnamed: 0,customer_id,zip_code,join_date,date_of_birth
0,1,60091,2011-04-17 10:48:33,1994-07-18
1,2,13244,2012-04-15 23:31:04,1986-08-18
2,3,13244,2011-08-13 15:42:34,2003-11-21
3,4,60091,2011-04-08 20:08:14,2006-08-15
4,5,60091,2010-07-17 05:27:50,1984-07-28


In [6]:
# TABLE 2: Sessions
sessions_df = data["sessions"]
sessions_df.head(n=7)

Unnamed: 0,session_id,customer_id,device,session_start
0,1,2,desktop,2014-01-01 00:00:00
1,2,5,mobile,2014-01-01 00:17:20
2,3,4,mobile,2014-01-01 00:28:10
3,4,1,mobile,2014-01-01 00:44:25
4,5,4,mobile,2014-01-01 01:11:30
5,6,1,tablet,2014-01-01 01:23:25
6,7,3,tablet,2014-01-01 01:39:40


In [7]:
# TABLE 3: Transactions
transactions_df = data["transactions"]
transactions_df.head(n=10)

Unnamed: 0,transaction_id,session_id,transaction_time,product_id,amount
0,298,1,2014-01-01 00:00:00,5,127.64
1,2,1,2014-01-01 00:01:05,2,109.48
2,308,1,2014-01-01 00:02:10,3,95.06
3,116,1,2014-01-01 00:03:15,4,78.92
4,371,1,2014-01-01 00:04:20,3,31.54
5,486,1,2014-01-01 00:05:25,3,23.76
6,271,1,2014-01-01 00:06:30,3,43.63
7,192,1,2014-01-01 00:07:35,4,42.27
8,341,1,2014-01-01 00:08:40,3,47.68
9,10,1,2014-01-01 00:09:45,5,57.39


## Featuretools worflow

### 1. First, we specify a dictionary with all the entities in our dataset.

In [8]:
entities = {"customers" : (customers_df, "customer_id"),
            "sessions" : (sessions_df, "session_id", "session_start"),
            "transactions" : (transactions_df, "transaction_id", "transaction_time")}

### 2. Second, we specify how the entities are related. 
When two entities have a **one-to-many** relationship, we call the “one” enitity, the “parent entity”.

A relationship between a parent and child is defined like this: (parent_entity, parent_variable, child_entity, child_variable)

In [9]:
relationships = [("sessions", "session_id", "transactions", "session_id"),
                 ("customers", "customer_id", "sessions", "customer_id")]

### 3. Run Deep Feature Synthesis

A minimal input to Deep Feature Synthesis (DFS) is:
1. a set of entities,
2. a list of relationships, and 
3. the **“target_entity”** to calculate features for. 

The ouput of DFS is a **feature matrix** and the corresponding**list of feature definitions**.

In [14]:
# Let’s first create a feature matrix for each customer in the data

feature_matrix_customers, features_defs = ft.dfs(entities=entities,
                                                 relationships=relationships,
                                                 target_entity="customers")

In [15]:
feature_matrix_customers

Unnamed: 0_level_0,zip_code,MIN(transactions.amount),NUM_UNIQUE(transactions.product_id),MAX(transactions.amount),WEEKDAY(date_of_birth),MEAN(transactions.amount),COUNT(transactions),MONTH(join_date),NUM_UNIQUE(sessions.device),YEAR(date_of_birth),...,MIN(sessions.MAX(transactions.amount)),MIN(sessions.SKEW(transactions.amount)),MIN(sessions.NUM_UNIQUE(transactions.product_id)),STD(sessions.COUNT(transactions)),MIN(sessions.COUNT(transactions)),MIN(sessions.MEAN(transactions.amount)),SKEW(sessions.COUNT(transactions)),STD(sessions.SUM(transactions.amount)),MODE(sessions.YEAR(session_start)),STD(sessions.MAX(transactions.amount))
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,60091,5.81,5,139.43,0,71.631905,126,4,3,1994,...,118.9,-1.038434,5,4.062019,12,50.623125,1.946018,279.510713,2014,7.322191
2,13244,8.73,5,146.81,0,77.422366,93,4,3,1986,...,100.04,-0.763603,5,3.450328,8,61.91,-0.303276,251.609234,2014,17.221593
3,13244,5.89,5,149.15,4,67.06043,93,8,3,2003,...,126.74,-0.289466,4,2.428992,11,55.579412,-1.507217,219.02142,2014,10.724241
4,60091,5.73,5,149.95,1,80.070459,109,4,3,2006,...,139.2,-0.711744,4,3.335416,10,70.638182,0.282488,235.992478,2014,3.514421
5,60091,7.55,5,149.02,5,80.375443,79,7,3,1984,...,128.51,-0.53906,5,3.600926,8,66.666667,-0.317685,402.775486,2014,7.928001


Documantation: <br/>
**agg_primitives**: List of Aggregation Feature types to apply. <br/>

Default: [“sum”, “std”, “max”, “skew”, “min”, “mean”, “count”, “percent_true”, “num_unique”, “mode”]

### Change target entity

One of the reasons DFS is so powerful is that it can create a feature matrix for any entity in our data. <br/>
For example, if we wanted to build features for sessions.

In [13]:
feature_matrix_sessions, features_defs = ft.dfs(entities=entities,
                                                relationships=relationships,
                                                target_entity="sessions")

In [16]:
feature_matrix_sessions.head(8)

Unnamed: 0_level_0,device,customer_id,DAY(session_start),STD(transactions.amount),MONTH(session_start),MAX(transactions.amount),MIN(transactions.amount),YEAR(session_start),NUM_UNIQUE(transactions.product_id),SKEW(transactions.amount),...,customers.MODE(sessions.device),customers.SUM(transactions.amount),customers.MONTH(join_date),customers.MEAN(transactions.amount),MODE(transactions.MONTH(transaction_time)),customers.NUM_UNIQUE(transactions.product_id),MODE(transactions.WEEKDAY(transaction_time)),customers.WEEKDAY(join_date),customers.DAY(date_of_birth),customers.MIN(transactions.amount)
session_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,desktop,2,1,41.600976,1,141.66,20.91,2014,5,0.295458,...,desktop,7200.28,4,77.422366,1,5,2,6,18,8.73
2,mobile,5,1,45.893591,1,135.25,9.32,2014,5,-0.16055,...,mobile,6349.66,7,80.375443,1,5,2,5,28,7.55
3,mobile,4,1,46.240016,1,147.73,8.7,2014,5,-0.324012,...,mobile,8727.68,4,80.070459,1,5,2,4,15,5.73
4,mobile,1,1,40.187205,1,129.0,6.29,2014,5,0.234349,...,mobile,9025.62,4,71.631905,1,5,2,6,18,5.81
5,mobile,4,1,48.918663,1,139.2,7.43,2014,5,0.336381,...,mobile,8727.68,4,80.070459,1,5,2,4,15,5.73
6,tablet,1,1,42.654755,1,139.23,8.74,2014,5,-0.134754,...,mobile,9025.62,4,71.631905,1,5,2,6,18,5.81
7,tablet,3,1,47.264797,1,146.31,8.19,2014,5,0.618455,...,desktop,6236.62,8,67.06043,1,5,2,5,21,5.89
8,tablet,4,1,44.213242,1,143.85,12.59,2014,5,0.200676,...,mobile,8727.68,4,80.070459,1,5,2,4,15,5.73
