# FeatureTools

**FeatureTools** é um framework de engenhária de features automática.

In [1]:
import featuretools as ft

## Leitura dos dados "mockados"

In [2]:
data = ft.demo.load_mock_customer()

Esse dataset é constituido em 3 tabelas. Cada tabela é chamada de **entity** no FeatureTools.

In [3]:
customers_df = data["customers"]
sessions_df = data["sessions"]
transactions_df = data["transactions"]

* **Costumers**: clientes únicos que tiveram seções.

In [4]:
customers_df.sample(5)

Unnamed: 0,customer_id,zip_code,join_date,date_of_birth
3,4,60091,2011-04-08 20:08:14,2006-08-15
2,3,13244,2011-08-13 15:42:34,2003-11-21
0,1,60091,2011-04-17 10:48:33,1994-07-18
4,5,60091,2010-07-17 05:27:50,1984-07-28
1,2,13244,2012-04-15 23:31:04,1986-08-18


* **Session**: seções unicas e atributos associados.

In [5]:
sessions_df.sample(5)

Unnamed: 0,session_id,customer_id,device,session_start
13,14,1,tablet,2014-01-01 03:28:00
6,7,3,tablet,2014-01-01 01:39:40
1,2,5,mobile,2014-01-01 00:17:20
29,30,5,desktop,2014-01-01 07:27:25
18,19,3,desktop,2014-01-01 04:27:35


* **Transactions**: Lista de eventos associados a sessão.

In [6]:
transactions_df.sample(5)

Unnamed: 0,transaction_id,session_id,transaction_time,product_id,amount
74,232,5,2014-01-01 01:20:10,1,139.2
231,27,17,2014-01-01 04:10:15,2,90.79
434,36,31,2014-01-01 07:50:10,3,62.35
420,56,30,2014-01-01 07:35:00,3,72.7
54,444,4,2014-01-01 00:58:30,4,43.59


## Entities

É necessário especificar um dicionário com todas as entidades contidas no dataset.

In [7]:
entities = {
    "customers" : (customers_df, "customer_id"),
    "sessions" : (sessions_df, "session_id", "session_start"),
    "transactions" : (transactions_df, "transaction_id", "transaction_time")
}

## Relationships

É necessário especificar o relacionamento entre as entendidades.

In [8]:
relationships = [("sessions", "session_id", "transactions", "session_id"), 
                 ("customers", "customer_id", "sessions", "customer_id")]

## Deep Feature Synthesis(DFS)

A entrada mínima para o DFS é um conjunto de entidades, uma lista de relacionamentos e a "target_entity" para calcular as váriaveis.

In [9]:
feature_matrix_customers, features_defs = ft.dfs(entities=entities,
                                                 relationships=relationships,
                                                 target_entity="customers")

In [10]:
feature_matrix_customers

Unnamed: 0_level_0,zip_code,COUNT(sessions),NUM_UNIQUE(sessions.device),MODE(sessions.device),SUM(transactions.amount),STD(transactions.amount),MAX(transactions.amount),SKEW(transactions.amount),MIN(transactions.amount),MEAN(transactions.amount),...,NUM_UNIQUE(sessions.WEEKDAY(session_start)),MODE(sessions.MONTH(session_start)),MODE(sessions.DAY(session_start)),MODE(sessions.YEAR(session_start)),MODE(sessions.MODE(transactions.product_id)),MODE(sessions.WEEKDAY(session_start)),NUM_UNIQUE(transactions.sessions.device),NUM_UNIQUE(transactions.sessions.customer_id),MODE(transactions.sessions.device),MODE(transactions.sessions.customer_id)
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,60091,8,3,mobile,9025.62,40.442059,139.43,0.019698,5.81,71.631905,...,1,1,1,2014,4,2,3,1,mobile,1
2,13244,7,3,desktop,7200.28,37.705178,146.81,0.098259,8.73,77.422366,...,1,1,1,2014,3,2,3,1,desktop,2
3,13244,6,3,desktop,6236.62,43.683296,149.15,0.41823,5.89,67.06043,...,1,1,1,2014,1,2,3,1,desktop,3
4,60091,8,3,mobile,8727.68,45.068765,149.95,-0.036348,5.73,80.070459,...,1,1,1,2014,1,2,3,1,mobile,4
5,60091,6,3,mobile,6349.66,44.09563,149.02,-0.025941,7.55,80.375443,...,1,1,1,2014,3,2,3,1,mobile,5


In [11]:
features_defs

[<Feature: zip_code>,
 <Feature: COUNT(sessions)>,
 <Feature: NUM_UNIQUE(sessions.device)>,
 <Feature: MODE(sessions.device)>,
 <Feature: SUM(transactions.amount)>,
 <Feature: STD(transactions.amount)>,
 <Feature: MAX(transactions.amount)>,
 <Feature: SKEW(transactions.amount)>,
 <Feature: MIN(transactions.amount)>,
 <Feature: MEAN(transactions.amount)>,
 <Feature: COUNT(transactions)>,
 <Feature: NUM_UNIQUE(transactions.product_id)>,
 <Feature: MODE(transactions.product_id)>,
 <Feature: DAY(date_of_birth)>,
 <Feature: DAY(join_date)>,
 <Feature: YEAR(date_of_birth)>,
 <Feature: YEAR(join_date)>,
 <Feature: MONTH(date_of_birth)>,
 <Feature: MONTH(join_date)>,
 <Feature: WEEKDAY(date_of_birth)>,
 <Feature: WEEKDAY(join_date)>,
 <Feature: SUM(sessions.NUM_UNIQUE(transactions.product_id))>,
 <Feature: SUM(sessions.SKEW(transactions.amount))>,
 <Feature: SUM(sessions.STD(transactions.amount))>,
 <Feature: SUM(sessions.MAX(transactions.amount))>,
 <Feature: SUM(sessions.MIN(transactions.amo

# Exemplo Iris

In [12]:
import seaborn as sns
iris = sns.load_dataset('iris')

In [13]:
es = ft.EntitySet(id = 'iris')

es.entity_from_dataframe(entity_id = 'data', 
                         dataframe = iris, 
                         make_index = True, 
                         index = 'index')

Entityset: iris
  Entities:
    data [Rows: 150, Columns: 6]
  Relationships:
    No relationships

In [14]:
feature_matrix, feature_defs = ft.dfs(entityset = es, 
                                      target_entity = 'data',
                                      trans_primitives = ['add_numeric', 'multiply_numeric'])

In [15]:
feature_matrix.head()

Unnamed: 0_level_0,sepal_length,sepal_width,petal_length,petal_width,species,petal_length + petal_width,sepal_length + sepal_width,petal_length + sepal_width,petal_width + sepal_width,petal_width + sepal_length,...,petal_length + sepal_length * petal_width,petal_length + sepal_width * sepal_length + sepal_width,petal_length + sepal_length * sepal_length,petal_length + petal_width * sepal_length,petal_length * petal_width + sepal_length,petal_width + sepal_length * sepal_length + sepal_width,petal_length + petal_width * petal_width,sepal_length + sepal_width * sepal_width,petal_width + sepal_width * sepal_length,petal_length + sepal_width * petal_width + sepal_length
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,5.1,3.5,1.4,0.2,setosa,1.6,8.6,4.9,3.7,5.3,...,1.3,42.14,33.15,8.16,7.42,45.58,0.32,30.1,18.87,25.97
1,4.9,3.0,1.4,0.2,setosa,1.6,7.9,4.4,3.2,5.1,...,1.26,34.76,30.87,7.84,7.14,40.29,0.32,23.7,15.68,22.44
2,4.7,3.2,1.3,0.2,setosa,1.5,7.9,4.5,3.4,4.9,...,1.2,35.55,28.2,7.05,6.37,38.71,0.3,25.28,15.98,22.05
3,4.6,3.1,1.5,0.2,setosa,1.7,7.7,4.6,3.3,4.8,...,1.22,35.42,28.06,7.82,7.2,36.96,0.34,23.87,15.18,22.08
4,5.0,3.6,1.4,0.2,setosa,1.6,8.6,5.0,3.8,5.2,...,1.28,43.0,32.0,8.0,7.28,44.72,0.32,30.96,19.0,26.0


In [16]:
feature_defs

[<Feature: sepal_length>,
 <Feature: sepal_width>,
 <Feature: petal_length>,
 <Feature: petal_width>,
 <Feature: species>,
 <Feature: petal_length + petal_width>,
 <Feature: sepal_length + sepal_width>,
 <Feature: petal_length + sepal_width>,
 <Feature: petal_width + sepal_width>,
 <Feature: petal_width + sepal_length>,
 <Feature: petal_length + sepal_length>,
 <Feature: sepal_length * sepal_width>,
 <Feature: petal_width * sepal_length>,
 <Feature: petal_length * sepal_length>,
 <Feature: petal_length * petal_width>,
 <Feature: petal_length * sepal_width>,
 <Feature: petal_width * sepal_width>,
 <Feature: petal_width + sepal_width * sepal_width>,
 <Feature: petal_width + sepal_length * sepal_width>,
 <Feature: petal_length * sepal_length + sepal_width>,
 <Feature: petal_length + petal_width * petal_width + sepal_width>,
 <Feature: petal_length + petal_width * sepal_width>,
 <Feature: petal_width * petal_width + sepal_length>,
 <Feature: petal_width * petal_width + sepal_width>,
 <Feat