## We have used Feature Tools for Feature Importance

### Import necessary package

In [1]:

import pandas as pd
import featuretools as ft

### Reading the file

In [2]:
df = pd.read_csv("feature.csv",header=0,encoding='utf-8')

In [3]:
df

Unnamed: 0,issue_d,grade,sub_grade,term,addr_state
0,11-Dec-19,B,B2,36 months,AZ
1,11-Dec-19,C,C4,60 months,GA
2,11-Dec-19,C,C5,36 months,IL
3,11-Dec-19,C,C1,36 months,CA
4,11-Dec-19,B,B5,60 months,OR
...,...,...,...,...,...
887374,15-Jan-19,B,B5,36 months,CA
887375,15-Jan-19,B,B5,36 months,NJ
887376,15-Jan-19,D,D2,60 months,TN
887377,15-Jan-19,E,E3,60 months,MA


### Make an entityset and add the entity

In [4]:

es = ft.EntitySet(id = 'data')
es.entity_from_dataframe(entity_id = 'data', dataframe = df, 
                         make_index = True, index = 'index')



Entityset: data
  Entities:
    data [Rows: 887379, Columns: 6]
  Relationships:
    No relationships

### List all the avaliable primitives

In [6]:
ft.list_primitives()

Unnamed: 0,name,type,description
0,std,aggregation,Computes the dispersion relative to the mean v...
1,max,aggregation,"Calculates the highest value, ignoring `NaN` v..."
2,count,aggregation,"Determines the total number of values, excludi..."
3,avg_time_between,aggregation,Computes the average number of seconds between...
4,min,aggregation,"Calculates the smallest value, ignoring `NaN` ..."
...,...,...,...
73,negate,transform,Negates a numeric value.
74,multiply_numeric,transform,Element-wise multiplication of two lists.
75,divide_numeric_scalar,transform,Divide each element in the list by a scalar.
76,hour,transform,Determines the hour value of a datetime.


### Run deep feature synthesis with transformation primitives

In [7]:

feature_matrix, feature_defs = ft.dfs(entityset = es, target_entity = 'data',
                                      trans_primitives = ['year', 'month'])

feature_matrix.head()

Unnamed: 0_level_0,grade,sub_grade,term,addr_state,YEAR(issue_d),MONTH(issue_d)
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,B,B2,36 months,AZ,2019,12
1,C,C4,60 months,GA,2019,12
2,C,C5,36 months,IL,2019,12
3,C,C1,36 months,CA,2019,12
4,B,B5,60 months,OR,2019,12


### Put feature matrix in feature_gained csv

In [8]:
feature_matrix.to_csv('feature_gained.csv')

------- Feature Tools vs Manual Feature -----
Feature tool helps a user to find some features that are necessary for mathematical calculation like avg,min,max,count etc.
Feature tool have a lots of limitation like you need to pass proper entities before gaining features out of it.
so, if a user needs to find some function related to maths then feature tool is useful. Feature tool works well only with the numeric data.
A user needs to decide manually first what features he need and the pass the primitives accordingly.In short it is manually feature selection .