# Programa Nanodegree Data Science for Business

## by Rafael Tomaz, Identify Fraud from Enron Email

Em 2000, Enron era uma das maiores empresas dos Estados Unidos. Já em 2002, ela colapsou e quebrou devido a uma fraude que envolveu grande parte da corporação. Resultando em uma investigação federal, muitos dados que são normalmente confidenciais, se tornaram públicos, incluindo dezenas de milhares de e-mails e detalhes financeiros para os executivos dos mais altos níveis da empresa. Neste projeto, você irá bancar o detetive, e colocar suas habilidades na construção de um modelo preditivo que visará determinar se um funcionário é ou não um funcionário de interesse (POI). Um funcionário de interesse é um funcionário que participou do escândalo da empresa Enron. Para te auxiliar neste trabalho de detetive, nós combinamos os dados financeiros e sobre e-mails dos funcionários investigados neste caso de fraude, o que significa que eles foram indiciados, fecharam acordos com o governo, ou testemunharam em troca de imunidade no processo.

### Otimização da Seleção de Características/Engenharia

In [1]:
import sys
import pickle

from feature_format import featureFormat, targetFeatureSplit
from tester import dump_classifier_and_data, test_classifier
from sklearn.grid_search import GridSearchCV

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

all_features_finacial_payments = ['salary',
'bonus',
'long_term_incentive',
'deferred_income',
'deferral_payments',
'loan_advances',
'other',
'expenses',
'director_fees',
'total_payments']

all_features_finacial_stocks = ['exercised_stock_options',
 'restricted_stock',
 'restricted_stock_deferred',
 'total_stock_value']

all_features_email = ['to_messages',
'from_poi_to_this_person',
'from_messages',
'from_this_person_to_poi',
'shared_receipt_with_poi']

email_address = 'email_address'
target_label = 'poi'
pessoas_para_serem_removidas = set()
pois_candidates = set()
def normalize_value_to_numpy_nan(value):
    if value is None or value == 'NaN':
        value = np.nan
    return value

def print_index_df(df):
    for x in df.index:
        print x

def get_array_string_df(df):
    str_indext_list = []
    for x in df.index:
        str_indext_list.append(str(x))
    return str_indext_list

def remove_outliers_df(df, set_key):
    return df.drop(axis=0, labels=set_key)

with open("final_project_dataset.pkl", "r") as data_file:
    data_dict = pickle.load(data_file)

features_list = [target_label] + all_features_finacial_payments + all_features_finacial_stocks + all_features_email # You will need to use more features

df = pd.DataFrame.from_dict(data_dict,orient='index')
df = df.applymap(lambda x: normalize_value_to_numpy_nan(x))

is_poi = df[target_label] == True
pois_count = df['poi'].value_counts()
total_items = df.shape[0]

missing_values_by_columns = df.shape[0]-df.count()
missing_values_by_columns.sort_values(axis=0, ascending=False, inplace=True)

total_celulas_preenchidas = sum(df.count())
total_celulas_faltantes = sum(missing_values_by_columns)
total_celulas = total_celulas_preenchidas + total_celulas_faltantes
pois_var = df[df['to_messages'].isnull() & df['from_messages'].isnull() & is_poi]

user_without_financial_information = df[
    df['salary'].isnull() &
    df['deferral_payments'].isnull() &
    df['total_payments'].isnull() &
    df['loan_advances'].isnull() &
    df['bonus'].isnull() &
    df['restricted_stock_deferred'].isnull() &
    df['deferred_income'].isnull() &
    df['total_stock_value'].isnull() &
    df['expenses'].isnull() &
    df['exercised_stock_options'].isnull() &
    df['other'].isnull() &
    df['long_term_incentive'].isnull() &
    df['restricted_stock'].isnull() &
    df['director_fees'].isnull()
]

travel_agency = 'THE TRAVEL AGENCY IN THE PARK'
pessoas_para_serem_removidas.update(get_array_string_df(user_without_financial_information))
pessoas_para_serem_removidas.add(travel_agency)
user_without_financial_total_payments_and_total_stock_values_information = df[
    df['total_payments'].isnull() &
    df['total_stock_value'].isnull()
]

user_without_financial_total_payments_and_total_stock_values_information[all_features_finacial_payments]
user_without_financial_total_payments_and_total_stock_values_information[all_features_finacial_payments]
user_without_financial_total_payments_and_total_stock_values_information[all_features_email + [target_label]]

df.fillna(value=0, inplace=True)
erros = (df[df[all_features_finacial_payments[:-1]].sum(axis='columns') != df['total_payments']])
robert_belfer = df[:].loc['BELFER ROBERT']
robert_belfer['salary'] = 0.0
robert_belfer['bonus'] = 0.0
robert_belfer['long_term_incentive'] = 0.0
robert_belfer['deferred_income'] = -102500.0
robert_belfer['deferral_payments'] = 0.0
robert_belfer['loan_advances'] = 0.0
robert_belfer['other'] = 0.0
robert_belfer['expenses'] = 3285.0
robert_belfer['director_fees'] = 102500.0
robert_belfer['total_payments'] = 3285.0
robert_belfer['exercised_stock_options'] = 0.0
robert_belfer['restricted_stock'] = 44093.0
robert_belfer['restricted_stock_deferred'] = -44093.0
robert_belfer['total_stock_value'] = 0.0
df[:].loc['BELFER ROBERT'] = robert_belfer
bhatnagar_sanjay = df[:].loc['BHATNAGAR SANJAY']
bhatnagar_sanjay['salary'] = 0.0
bhatnagar_sanjay['bonus'] = 0.0
bhatnagar_sanjay['long_term_incentive'] = 0.0
bhatnagar_sanjay['deferred_income'] = 0.0
bhatnagar_sanjay['deferral_payments'] = 0.0
bhatnagar_sanjay['loan_advances'] = 0.0
bhatnagar_sanjay['other'] = 0.0
bhatnagar_sanjay['expenses'] = 137864.0
bhatnagar_sanjay['director_fees'] = 0.0
bhatnagar_sanjay['total_payments'] = 137864.0
bhatnagar_sanjay['exercised_stock_options'] = 15456290.0
bhatnagar_sanjay['restricted_stock'] = 2604490.0
bhatnagar_sanjay['restricted_stock_deferred'] = -2604490.0
bhatnagar_sanjay['total_stock_value'] = 15456290.0
df[:].loc['BHATNAGAR SANJAY'] = bhatnagar_sanjay
salay_gte1MI_or_bonus_gte5MI = df[(df['salary'] >= 1000000) | (df['bonus'] >= 5000000)]
pois_candidates.update(['FREVERT MARK A','LAVORATO JOHN J'])
pessoas_para_serem_removidas.add('TOTAL')
df = remove_outliers_df(df, pessoas_para_serem_removidas)



In [None]:
# Task 3: Create new feature(s)
### Criacao de novas caracteristicas (relacionado com o mini-projeto: Licao 11)
df['sum_total_poi_messages'] =  df['from_poi_to_this_person'] + df['from_this_person_to_poi']

df['bonus_over_total_salary'] = df['bonus'] / df['total_payments'] 
df['salary_over_total_salary'] = df['salary'] / df['total_payments'] 

df['shared_receipt_over_from_messages'] =  df['shared_receipt_with_poi'] / df['from_messages'] 
df['shared_receipt_over_to_messages'] =  df['shared_receipt_with_poi'] / df['to_messages'] 

new_features = ['sum_total_poi_messages', 
                'bonus_over_total_salary', 
                'salary_over_total_salary', 
                'shared_receipt_over_from_messages',
                'shared_receipt_over_to_messages']

In [None]:
df.replace([np.inf, -np.inf], np.nan, inplace=True)
df.fillna(value=0.0, inplace=True)

In [None]:
#criar my_dataset com dataframe apos a limpeza dos dados e criacao das novas features
my_dataset = pd.DataFrame.to_dict(df,orient='index')

my_feature_list = features_list[:]

my_feature_list = my_feature_list + new_features
### Criar duas variaveis locais labels e features para serem utilizadas para teste local
# Selecao de caracteristicas feita de forma inteligente (relacionado com o mini-projeto: Licao 11)
labels,features = df[my_feature_list[0]],df[my_feature_list[1:]]

from sklearn.feature_selection import SelectKBest

k = 4
k_best = SelectKBest(k=k)
k_best.fit(features, labels)
scores = k_best.scores_
unsorted_pairs = zip(my_feature_list[1:], scores)
sorted_pairs = list(reversed(sorted(unsorted_pairs, key=lambda x: x[1])))
k_best_features = dict(sorted_pairs[:k])

In [64]:

print "{0} melhores caracteristicas: {1} a serem utilizadas \n".format(k, k_best_features.keys())
print sorted_pairs

print "Inserir na minha lista de features, as features levantadas pelo k best e as 3 novas features criadas"
my_feature_list_old = my_feature_list[:]
my_feature_list = [target_label] + k_best_features.keys() + new_features
print my_feature_list

# print features
print "{0} caracteristicas selecionadas: {1}\n".format(len(my_feature_list) - 1, my_feature_list[1:])

4 melhores caracteristicas: ['bonus', 'exercised_stock_options', 'salary', 'total_stock_value'] a serem utilizadas 

[('total_stock_value', 22.510549090242055), ('exercised_stock_options', 22.348975407306217), ('bonus', 20.792252047181535), ('salary', 18.289684043404513), ('deferred_income', 11.424891485418364), ('long_term_incentive', 9.922186013189823), ('total_payments', 9.283873618427371), ('restricted_stock', 8.825442219916463), ('shared_receipt_with_poi', 8.589420731682381), ('loan_advances', 7.184055658288725), ('expenses', 5.418900189407036), ('from_poi_to_this_person', 5.243449713374958), ('other', 4.202436300271228), ('from_this_person_to_poi', 2.382612108227674), ('director_fees', 2.1314839924612046), ('to_messages', 1.6463411294420076), ('restricted_stock_deferred', 0.7681463447871311), ('deferral_payments', 0.22885961902145746), ('from_messages', 0.16970094762175533)]
Inserir na minha lista de features, as features levantadas pelo k best e as 3 novas features criadas
['poi

In [65]:
### Criar duas variaveis locais labels e features para serem utilizadas para teste local
# Ajuste de escala das caracteristicas feito corretamente
df2 = df.copy()
labels,features = df[my_feature_list[0]],df[my_feature_list[1:]]

# escalonamento de caracteristicas via min-max
from sklearn import preprocessing
scaler = preprocessing.MinMaxScaler()
features = scaler.fit_transform(features)

# copiand as features escalonadas para dentro do dataframe
df2[my_feature_list[1:]] = features

#realizando uma copia do my_dataframe para um outro atributo para avaliar os atributos iniciais
my_dataset_old = my_dataset.copy()

# criar o my_dataset com os dados do dataframe com os dados escalonados
my_dataset = pd.DataFrame.to_dict(df2,orient='index')

## Escolha e Afinamento de um Algorítmo

In [66]:
### Task 4: Try a varity of classifiers
### Please name your classifier clf for easy export below.
### Note that if you want to do PCA or other multi-stage operations,
### you'll need to use Pipelines. For more info:
### http://scikit-learn.org/stable/modules/pipeline.html

### Naive Bayes Gaussian

In [67]:
### Naive Bayes Gaussian
from sklearn.naive_bayes import GaussianNB
nf_clf = GaussianNB()
print "Realizando teste com o dataset sem escalonamento de variaveis"
test_classifier(nf_clf, my_dataset_old, my_feature_list_old)

Realizando teste com o dataset sem escalonamento de variaveis
GaussianNB(priors=None)
	Accuracy: 0.76353	Precision: 0.24564	Recall: 0.37350	F1: 0.29637	F2: 0.33828
	Total predictions: 15000	True positives:  747	False positives: 2294	False negatives: 1253	True negatives: 10706



In [68]:
test_classifier(nf_clf, my_dataset, my_feature_list)

GaussianNB(priors=None)
	Accuracy: 0.83729	Precision: 0.41055	Recall: 0.31900	F1: 0.35903	F2: 0.33389
	Total predictions: 14000	True positives:  638	False positives:  916	False negatives: 1362	True negatives: 11084



In [69]:
print "Tunning"
print "Não Possui Tunning"

Tunning
Não Possui Tunning


### Decision Tree Classifier

In [70]:
### Decision Tree Classifier
from sklearn.tree import DecisionTreeClassifier
dt_clf = DecisionTreeClassifier()

test_classifier(dt_clf, my_dataset_old, my_feature_list_old)

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')
	Accuracy: 0.79633	Precision: 0.22426	Recall: 0.21450	F1: 0.21927	F2: 0.21638
	Total predictions: 15000	True positives:  429	False positives: 1484	False negatives: 1571	True negatives: 11516



In [71]:
test_classifier(dt_clf, my_dataset, my_feature_list)

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')
	Accuracy: 0.79500	Precision: 0.28487	Recall: 0.28800	F1: 0.28642	F2: 0.28737
	Total predictions: 14000	True positives:  576	False positives: 1446	False negatives: 1424	True negatives: 10554



In [73]:
parameters = {"criterion": ['gini', 'entropy'],
              "splitter":['best'],
              "max_depth" : [15,20]
              }
opt_model_dt_clf = GridSearchCV(dt_clf, param_grid=parameters)
test_classifier(opt_model_dt_clf, my_dataset, my_feature_list)

GridSearchCV(cv=None, error_score='raise',
       estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'),
       fit_params={}, iid=True, n_jobs=1,
       param_grid={'splitter': ['best'], 'criterion': ['gini', 'entropy'], 'max_depth': [15]},
       pre_dispatch='2*n_jobs', refit=True, scoring=None, verbose=0)
	Accuracy: 0.80071	Precision: 0.29254	Recall: 0.27850	F1: 0.28535	F2: 0.28120
	Total predictions: 14000	True positives:  557	False positives: 1347	False negatives: 1443	True negatives: 10653



In [74]:
opt_model_dt_clf.best_estimator_

DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=15,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')

In [None]:
opt_model_dt_clf.best_estimator_.feature_importances_

### Decision Tree Regressor

In [76]:
### Decision Tree Regressor
from sklearn import tree
dtr_model = tree.DecisionTreeRegressor()
test_classifier(dtr_model, my_dataset_old, my_feature_list_old)

DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=None, splitter='best')
	Accuracy: 0.79753	Precision: 0.23037	Recall: 0.22150	F1: 0.22585	F2: 0.22322
	Total predictions: 15000	True positives:  443	False positives: 1480	False negatives: 1557	True negatives: 11520



In [77]:
test_classifier(dtr_model, my_dataset, my_feature_list)

DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=None, splitter='best')
	Accuracy: 0.79729	Precision: 0.29481	Recall: 0.30100	F1: 0.29787	F2: 0.29974
	Total predictions: 14000	True positives:  602	False positives: 1440	False negatives: 1398	True negatives: 10560



In [78]:
parameters = {"criterion": ['mse'],
              "splitter": ['best', 'random'],
              "presort": [True, False],

              "max_depth": range(1, 5),
              "min_samples_split": [10, 40],
              "min_samples_leaf": range(1, 3),
              "random_state": [20, 40]
              }
opt_model_dtr_model = GridSearchCV(dtr_model, param_grid=parameters)

test_classifier(opt_model_dtr_model, my_dataset, my_feature_list)

All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed p

All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed p

All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed p

All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed p

All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed p

All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed p

All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed p

All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed p

All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed p

All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed p

All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed p

All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed p

All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed predictions:
All predictions should take value 0 or 1.
Evaluating performance for processed p

KeyboardInterrupt: 

In [None]:
opt_model_dtr_model.best_estimator_

In [None]:
opt_model_dtr_model.best_estimator_.feature_importances_