# Explainable Boosting Machine 
### Author: Francesca Naretto
### Dataset: German dataset
#### Download EBM at https://github.com/interpretml/interpret
An enhanced version of Generalized Additive Models, based on trees. It is a transparent by design method.

In [1]:
from interpret.glassbox import ExplainableBoostingClassifier
import pickle
import numpy as np
import pandas as pd
from sklearn import preprocessing
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

In [2]:
source_file = 'datasets/german_credit.csv'
class_field = 'default'
# Load and transform dataset 
df = pd.read_csv(source_file, skipinitialspace=True, na_values='?', keep_default_na=True)

In [10]:
df.head()

Unnamed: 0,account_check_status,duration_in_month,credit_history,purpose,credit_amount,savings,present_emp_since,installment_as_income_perc,personal_status_sex,other_debtors,present_res_since,property,age,other_installment_plans,housing,credits_this_bank,job,people_under_maintenance,telephone,foreign_worker
0,< 0 DM,6,critical account/ other credits existing (not ...,domestic appliances,1169,unknown/ no savings account,.. >= 7 years,4,male : single,none,4,real estate,67,none,own,2,skilled employee / official,1,"yes, registered under the customers name",yes
1,0 <= ... < 200 DM,48,existing credits paid back duly till now,domestic appliances,5951,... < 100 DM,1 <= ... < 4 years,2,female : divorced/separated/married,none,2,real estate,22,none,own,1,skilled employee / official,1,none,yes
2,no checking account,12,critical account/ other credits existing (not ...,(vacation - does not exist?),2096,... < 100 DM,4 <= ... < 7 years,2,male : single,none,3,real estate,49,none,own,1,unskilled - resident,2,none,yes
3,< 0 DM,42,existing credits paid back duly till now,radio/television,7882,... < 100 DM,4 <= ... < 7 years,2,male : single,guarantor,4,if not A121 : building society savings agreeme...,45,none,for free,1,skilled employee / official,2,none,yes
4,< 0 DM,24,delay in paying off in the past,car (new),4870,... < 100 DM,1 <= ... < 4 years,3,male : single,none,4,unknown / no property,53,none,for free,2,skilled employee / official,2,none,yes


Split the dataset into train and test

In [5]:
test_size = 0.3
random_state = 42
labels = df.pop('default')
features = list(df.columns)
X_train, X_test, Y_train, Y_test = train_test_split(df, labels,
                                                        test_size=test_size,
                                                        random_state=random_state,
                                                        stratify=labels)

Define and fit the EBM

In [6]:
import time 
start = time.time()
ebm = ExplainableBoostingClassifier()
ebm.fit(X_train, Y_train)
end = time.time()
print('Time for the creation of EBM model ', end - start)

Time for the creation of EBM model  2.7158799171447754


Now we can visualize the interpretation of the results. We may have local and global intepretations.
Starting from the global ones, we use the explain_global.

In [7]:
from interpret import show

ebm_global = ebm.explain_global(name='EBM Adult Global')
show(ebm_global)

In [8]:
ebm_local = ebm.explain_local(X_test.iloc[:10], Y_test.iloc[:10])
show(ebm_local)

We can also exploit other kinds of plots. Here there is a ROC on the test dataset. 

In [9]:
from interpret.perf import ROC

ebm_perf = ROC(ebm.predict_proba).explain_perf(X_test, Y_test, name='EBM Adult')

show(ebm_perf)