### Feature tools

It is automated feature tools Transforms Transactional and relational datasets into feature matrices for machine learning. Deep Feature Synthesis (DFS) to perform automated feature engineering. DFS is used to create the “Data Science Machine” to automatically build predictive models for complex, multi-table datasets.<br>
Each table is called an entity in Featuretools. When 2 two entities have a one-to-many relationship, then “one” enitity, is called the “parent entity”. A relationship between a parent and child is defined like this:<br>
(parent_entity, parent_variable, child_entity, child_variable) <br>

A minimal input to DFS is a set of entities, a list of relationships, and the “target_entity” to calculate features for. The ouput of DFS is a feature matrix and the corresponding list of feature defintions.

example:<br>
feature_matrix_customers, features_defs = ft.dfs(entities=entities,
                                                   relationships=relationships,
                                                    target_entity="customers"
<br>
We can change target entity and get feature matrix for any entities of our choice.

In [5]:
import pandas as pd
import urllib
from urllib import request
import featuretools as ft
import numpy as np

url = "https://raw.githubusercontent.com/LuisM78/Appliances-energy-prediction-data/master/energydata_complete.csv"
data = pd.read_csv(url)
data.head(5)

Unnamed: 0,date,Appliances,lights,T1,RH_1,T2,RH_2,T3,RH_3,T4,...,T9,RH_9,T_out,Press_mm_hg,RH_out,Windspeed,Visibility,Tdewpoint,rv1,rv2
0,2016-01-11 17:00:00,60,30,19.89,47.596667,19.2,44.79,19.79,44.73,19.0,...,17.033333,45.53,6.6,733.5,92.0,7.0,63.0,5.3,13.275433,13.275433
1,2016-01-11 17:10:00,60,30,19.89,46.693333,19.2,44.7225,19.79,44.79,19.0,...,17.066667,45.56,6.483333,733.6,92.0,6.666667,59.166667,5.2,18.606195,18.606195
2,2016-01-11 17:20:00,50,30,19.89,46.3,19.2,44.626667,19.79,44.933333,18.926667,...,17.0,45.5,6.366667,733.7,92.0,6.333333,55.333333,5.1,28.642668,28.642668
3,2016-01-11 17:30:00,50,40,19.89,46.066667,19.2,44.59,19.79,45.0,18.89,...,17.0,45.4,6.25,733.8,92.0,6.0,51.5,5.0,45.410389,45.410389
4,2016-01-11 17:40:00,60,40,19.89,46.333333,19.2,44.53,19.79,45.0,18.89,...,17.0,45.4,6.133333,733.9,92.0,5.666667,47.666667,4.9,10.084097,10.084097


### Converting dataset into Entity

In [25]:
entities = {
    "energy" : (data, "id")
}

In [39]:
es=ft.EntitySet("my-entity-set", entities)
es



Entityset: my-entity-set
  Entities:
    energy (shape = [19735, 30])
  Relationships:
    No relationships

In [45]:
es["energy"].variables

[<Variable: date (dtype: datetime, format: None)>,
 <Variable: Appliances (dtype = numeric, count = 19735)>,
 <Variable: lights (dtype = numeric, count = 19735)>,
 <Variable: T1 (dtype = numeric, count = 19735)>,
 <Variable: RH_1 (dtype = numeric, count = 19735)>,
 <Variable: T2 (dtype = numeric, count = 19735)>,
 <Variable: RH_2 (dtype = numeric, count = 19735)>,
 <Variable: T3 (dtype = numeric, count = 19735)>,
 <Variable: RH_3 (dtype = numeric, count = 19735)>,
 <Variable: T4 (dtype = numeric, count = 19735)>,
 <Variable: RH_4 (dtype = numeric, count = 19735)>,
 <Variable: T5 (dtype = numeric, count = 19735)>,
 <Variable: RH_5 (dtype = numeric, count = 19735)>,
 <Variable: T6 (dtype = numeric, count = 19735)>,
 <Variable: RH_6 (dtype = numeric, count = 19735)>,
 <Variable: T7 (dtype = numeric, count = 19735)>,
 <Variable: RH_7 (dtype = numeric, count = 19735)>,
 <Variable: T8 (dtype = numeric, count = 19735)>,
 <Variable: RH_8 (dtype = numeric, count = 19735)>,
 <Variable: T9 (dtype

In [55]:
es = es.normalize_entity(base_entity_id="energy",
                             new_entity_id="lights",
                             index="Appliances",additional_variables=["T3", "T4", "T5"])

es["lights"].variables



[<Variable: Appliances (dtype = index, count = 92)>,
 <Variable: T3 (dtype = numeric, count = 92)>,
 <Variable: T4 (dtype = numeric, count = 92)>,
 <Variable: T5 (dtype = numeric, count = 92)>]

In [56]:
es

Entityset: my-entity-set
  Entities:
    energy (shape = [19735, 24])
    lights (shape = [92, 4])
  Relationships:
    energy.Appliances -> lights.Appliances

In [61]:
es["lights"].head()

Unnamed: 0_level_0,Appliances,T3,T4,T5
Appliances,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
10,10,19.7,18.6,17.1
20,20,20.29,20.533333,18.7
30,30,20.2,20.566667,19.2
40,40,20.23,20.7,19.2
50,50,19.79,18.926667,17.166667
60,60,19.79,19.0,17.166667
70,70,19.79,18.89,17.1
80,80,20.2,18.963333,17.2
90,90,20.2,18.926667,17.166667
100,100,20.033333,19.0,17.1


In [62]:
es["energy"].head()

Unnamed: 0_level_0,id,date,Appliances,lights,RH_1,RH_2,RH_3,RH_4,RH_5,RH_6,...,T9,RH_9,T_out,Press_mm_hg,RH_out,Windspeed,Visibility,Tdewpoint,rv1,rv2
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,0,2016-01-11 17:00:00,60,30,47.596667,44.79,44.73,45.566667,55.2,84.256667,...,17.033333,45.53,6.6,733.5,92.0,7.0,63.0,5.3,13.275433,13.275433
1,1,2016-01-11 17:10:00,60,30,46.693333,44.7225,44.79,45.9925,55.2,84.063333,...,17.066667,45.56,6.483333,733.6,92.0,6.666667,59.166667,5.2,18.606195,18.606195
2,2,2016-01-11 17:20:00,50,30,46.3,44.626667,44.933333,45.89,55.09,83.156667,...,17.0,45.5,6.366667,733.7,92.0,6.333333,55.333333,5.1,28.642668,28.642668
3,3,2016-01-11 17:30:00,50,40,46.066667,44.59,45.0,45.723333,55.09,83.423333,...,17.0,45.4,6.25,733.8,92.0,6.0,51.5,5.0,45.410389,45.410389
4,4,2016-01-11 17:40:00,60,40,46.333333,44.53,45.0,45.53,55.09,84.893333,...,17.0,45.4,6.133333,733.9,92.0,5.666667,47.666667,4.9,10.084097,10.084097
5,5,2016-01-11 17:50:00,50,40,46.026667,44.5,44.933333,45.73,55.03,85.766667,...,17.0,45.29,6.016667,734.0,92.0,5.333333,43.833333,4.8,44.919484,44.919484
6,6,2016-01-11 18:00:00,60,50,45.766667,44.5,44.9,45.79,54.966667,86.09,...,17.0,45.29,5.9,734.1,92.0,5.0,40.0,4.7,47.233763,47.233763
7,7,2016-01-11 18:10:00,60,50,45.56,44.5,44.9,45.863333,54.9,86.423333,...,17.0,45.29,5.916667,734.166667,91.833333,5.166667,40.0,4.683333,33.03989,33.03989
8,8,2016-01-11 18:20:00,60,40,45.5975,44.433333,44.79,45.79,55.0,87.226667,...,17.0,45.29,5.933333,734.233333,91.666667,5.333333,40.0,4.666667,31.455702,31.455702
9,9,2016-01-11 18:30:00,70,40,46.09,44.4,44.863333,46.096667,55.0,87.626667,...,17.0,45.29,5.95,734.3,91.5,5.5,40.0,4.65,3.089314,3.089314


In [None]:
new_relationship = ft.Relationship(es["lights"]["Appliances"],
                                     es["energy"]["Appliances"])

In [63]:
feature_matrix, feature_defs = ft.dfs(entityset=es,
                                          target_entity="lights")


In [64]:
feature_matrix

Unnamed: 0_level_0,T3,T4,T5,SUM(energy.lights),SUM(energy.RH_1),SUM(energy.RH_2),SUM(energy.RH_3),SUM(energy.RH_4),SUM(energy.RH_5),SUM(energy.RH_6),...,MEAN(energy.rv2),COUNT(energy),NUM_UNIQUE(energy.DAY(date)),NUM_UNIQUE(energy.YEAR(date)),NUM_UNIQUE(energy.MONTH(date)),NUM_UNIQUE(energy.WEEKDAY(date)),MODE(energy.DAY(date)),MODE(energy.YEAR(date)),MODE(energy.MONTH(date)),MODE(energy.WEEKDAY(date))
Appliances,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
10,19.700000,18.600000,17.100000,10,388.211667,392.656667,377.687500,386.090000,460.230825,829.831111,...,18.903283,9,7,1,2,5,28,2016,1,3
20,20.290000,20.533333,18.700000,170,14075.958929,14260.389544,14008.221917,14003.102333,17767.378927,27685.161240,...,24.866657,343,31,1,5,7,27,2016,1,4
30,20.200000,20.566667,19.200000,780,28573.419167,29275.997506,28520.481087,28059.508548,36518.674745,50411.846572,...,25.960188,723,31,1,5,7,18,2016,1,4
40,20.230000,20.700000,19.200000,2730,81469.864211,82956.507901,81010.238017,80232.707178,103386.057881,144598.758299,...,24.878402,2019,31,1,5,7,19,2016,1,1
50,19.790000,18.926667,17.166667,5210,174458.232753,178660.487326,171553.540294,170013.238402,219820.662122,256046.016930,...,25.104459,4368,31,1,5,7,19,2016,4,1
60,19.790000,19.000000,17.166667,10590,131585.699674,134149.038077,127859.407501,127638.785237,163948.620873,166927.484823,...,25.181673,3282,31,1,5,7,12,2016,4,1
70,19.790000,18.890000,17.100000,7880,62253.329592,62556.575831,60458.791107,60447.136386,79456.418826,77628.319468,...,24.501307,1560,31,1,5,7,25,2016,4,2
80,20.200000,18.963333,17.200000,5460,47813.004756,47304.958316,46461.487159,46449.237613,61777.688307,56761.166294,...,24.656351,1205,31,1,5,7,20,2016,5,2
90,20.200000,18.926667,17.166667,5700,40606.643236,39995.960522,39323.172769,39211.660532,52370.610106,46845.591517,...,25.640040,1015,31,1,5,7,24,2016,3,2
100,20.033333,19.000000,17.100000,5730,39718.010959,38903.498476,37888.602407,37957.009800,51424.245282,44340.338765,...,25.172715,978,31,1,5,7,14,2016,2,6
