# Credit Card Churn

[데이터 출처](https://www.kaggle.com/datasets/sakshigoyal7/credit-card-customers)

- 은행의 한 관리자는 점점 더 많은 고객이 신용 카드 서비스를 떠나는 것에 대해 걱정
- 고객이 이탈할 가능성을 예측하여 능동적으로 고객에게 다가가 더 나은 서비스를 제공하고 고객의 결정을 반대 방향으로 돌릴 수 있도록 이탈 고객 예측 모델을 만들어보도록 함. 
- 데이터 세트는 연령, 급여, 결혼 상태, 신용 카드 한도, 신용 카드 범주 등을 포함하는 10,000명의 고객으로 구성

- CLIENTNUM: int,  고객번호
- Atrition_Flag: str, 현재 상태(Target)
- Customer_Age: int,  고객 연령
- Gender: str, 결혼여부
- Dependent_count: int,  부양가족수
- Education_Level: str, 교육수준
- Marital_Status: str, 결혼 상태
- Income_Category: str, 연간 소득액 기준
- Card_Category: str, 카드 등급
- Months_on_book: int, 은행거래 기간 (월 단위)
- Total_Relationship_Count: int,  총 보유계좌
- Months_Inactive_12_mon: int,  거래내역 없는 거래월 수
- Contacts_Count_12_mon: float, 거래기간 있는 거래월 수
- Credit_Limit: int,  신용한도
- Total_Revolving_Bal: int,  신용카드에서 결제해야 할 평균 금액
- Avg_Open_To_Buy: float, 추가 결제를 위한 신용한도 승인액
- Total_Amt_Chng_Q4_Q1: float, 1분기 대비 4분기 카드 거래금액 비율
- Total_Trans_Amt: int, 12개월 동안의 총 거래금액
- Total_Trans_Ct: int, 12개월 동안의 총 거래건수
- Total_Ct_Chng_Q4_Q1: float,	1분기 대비 4분기 거래건수 비율
- Total_Utilization_Ratio : float, 평균 카드 이용률

## Import package

In [3]:
import warnings
warnings.filterwarnings("ignore")

import numpy as np
from glob import glob
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import random
import os
from tqdm import tqdm
import time

from sklearn.preprocessing import LabelEncoder

pd.set_option('display.max_columns',None)

In [4]:
glob('*.csv')

['predictor_1.csv',
 'predictor_2.csv',
 'predictor_3.csv',
 'predictor.csv',
 'predictor2.csv',
 'predictor3.csv',
 'predictor1.csv',
 'BankChurners.csv']

In [5]:
data = pd.read_csv('BankChurners.csv')
data.head()

Unnamed: 0,CLIENTNUM,Attrition_Flag,Customer_Age,Gender,Dependent_count,Education_Level,Marital_Status,Income_Category,Card_Category,Months_on_book,Total_Relationship_Count,Months_Inactive_12_mon,Contacts_Count_12_mon,Credit_Limit,Total_Revolving_Bal,Avg_Open_To_Buy,Total_Amt_Chng_Q4_Q1,Total_Trans_Amt,Total_Trans_Ct,Total_Ct_Chng_Q4_Q1,Avg_Utilization_Ratio,Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1,Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2
0,768805383,Existing Customer,45,M,3,High School,Married,$60K - $80K,Blue,39,5,1,3,12691.0,777,11914.0,1.335,1144,42,1.625,0.061,9.3e-05,0.99991
1,818770008,Existing Customer,49,F,5,Graduate,Single,Less than $40K,Blue,44,6,1,2,8256.0,864,7392.0,1.541,1291,33,3.714,0.105,5.7e-05,0.99994
2,713982108,Existing Customer,51,M,3,Graduate,Married,$80K - $120K,Blue,36,4,1,0,3418.0,0,3418.0,2.594,1887,20,2.333,0.0,2.1e-05,0.99998
3,769911858,Existing Customer,40,F,4,High School,Unknown,Less than $40K,Blue,34,3,4,1,3313.0,2517,796.0,1.405,1171,20,2.333,0.76,0.000134,0.99987
4,709106358,Existing Customer,40,M,3,Uneducated,Married,$60K - $80K,Blue,21,5,1,0,4716.0,0,4716.0,2.175,816,28,2.5,0.0,2.2e-05,0.99998


In [6]:
data.columns

Index(['CLIENTNUM', 'Attrition_Flag', 'Customer_Age', 'Gender',
       'Dependent_count', 'Education_Level', 'Marital_Status',
       'Income_Category', 'Card_Category', 'Months_on_book',
       'Total_Relationship_Count', 'Months_Inactive_12_mon',
       'Contacts_Count_12_mon', 'Credit_Limit', 'Total_Revolving_Bal',
       'Avg_Open_To_Buy', 'Total_Amt_Chng_Q4_Q1', 'Total_Trans_Amt',
       'Total_Trans_Ct', 'Total_Ct_Chng_Q4_Q1', 'Avg_Utilization_Ratio',
       'Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1',
       'Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2'],
      dtype='object')

In [7]:
data.head()

Unnamed: 0,CLIENTNUM,Attrition_Flag,Customer_Age,Gender,Dependent_count,Education_Level,Marital_Status,Income_Category,Card_Category,Months_on_book,Total_Relationship_Count,Months_Inactive_12_mon,Contacts_Count_12_mon,Credit_Limit,Total_Revolving_Bal,Avg_Open_To_Buy,Total_Amt_Chng_Q4_Q1,Total_Trans_Amt,Total_Trans_Ct,Total_Ct_Chng_Q4_Q1,Avg_Utilization_Ratio,Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1,Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2
0,768805383,Existing Customer,45,M,3,High School,Married,$60K - $80K,Blue,39,5,1,3,12691.0,777,11914.0,1.335,1144,42,1.625,0.061,9.3e-05,0.99991
1,818770008,Existing Customer,49,F,5,Graduate,Single,Less than $40K,Blue,44,6,1,2,8256.0,864,7392.0,1.541,1291,33,3.714,0.105,5.7e-05,0.99994
2,713982108,Existing Customer,51,M,3,Graduate,Married,$80K - $120K,Blue,36,4,1,0,3418.0,0,3418.0,2.594,1887,20,2.333,0.0,2.1e-05,0.99998
3,769911858,Existing Customer,40,F,4,High School,Unknown,Less than $40K,Blue,34,3,4,1,3313.0,2517,796.0,1.405,1171,20,2.333,0.76,0.000134,0.99987
4,709106358,Existing Customer,40,M,3,Uneducated,Married,$60K - $80K,Blue,21,5,1,0,4716.0,0,4716.0,2.175,816,28,2.5,0.0,2.2e-05,0.99998


In [8]:
# pr = data.profile_report()
# pr

In [10]:
data.head()

Unnamed: 0,CLIENTNUM,Attrition_Flag,Customer_Age,Gender,Dependent_count,Education_Level,Marital_Status,Income_Category,Card_Category,Months_on_book,Total_Relationship_Count,Months_Inactive_12_mon,Contacts_Count_12_mon,Credit_Limit,Total_Revolving_Bal,Avg_Open_To_Buy,Total_Amt_Chng_Q4_Q1,Total_Trans_Amt,Total_Trans_Ct,Total_Ct_Chng_Q4_Q1,Avg_Utilization_Ratio,Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1,Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2
0,768805383,Existing Customer,45,M,3,High School,Married,$60K - $80K,Blue,39,5,1,3,12691.0,777,11914.0,1.335,1144,42,1.625,0.061,9.3e-05,0.99991
1,818770008,Existing Customer,49,F,5,Graduate,Single,Less than $40K,Blue,44,6,1,2,8256.0,864,7392.0,1.541,1291,33,3.714,0.105,5.7e-05,0.99994
2,713982108,Existing Customer,51,M,3,Graduate,Married,$80K - $120K,Blue,36,4,1,0,3418.0,0,3418.0,2.594,1887,20,2.333,0.0,2.1e-05,0.99998
3,769911858,Existing Customer,40,F,4,High School,Unknown,Less than $40K,Blue,34,3,4,1,3313.0,2517,796.0,1.405,1171,20,2.333,0.76,0.000134,0.99987
4,709106358,Existing Customer,40,M,3,Uneducated,Married,$60K - $80K,Blue,21,5,1,0,4716.0,0,4716.0,2.175,816,28,2.5,0.0,2.2e-05,0.99998


In [11]:
# categorical data -> numeric data

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

In [12]:
dic = {'Existing Customer':0,'Attrited Customer':1}
data['class'] = data['Attrition_Flag'].apply(lambda x: dic[x])

In [13]:
data[['Attrition_Flag','class']]

Unnamed: 0,Attrition_Flag,class
0,Existing Customer,0
1,Existing Customer,0
2,Existing Customer,0
3,Existing Customer,0
4,Existing Customer,0
...,...,...
10122,Existing Customer,0
10123,Attrited Customer,1
10124,Attrited Customer,1
10125,Attrited Customer,1


In [14]:
data.drop('Attrition_Flag', axis=1,inplace=True)

In [15]:
# Kaggle 설명란을 통해 확인한 결과, 두 feature는 Target feature와의 상관관계가 높다 보니 제거한 이후 분석할 것을 권장함.
# 그에 따라 두 변수를 제거하고 예측 모델 설계.

data.rename(columns={'Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1':'NB_mon1',
                     'Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2':'NB_mon2'},inplace=True)

In [16]:
data = data.drop(['NB_mon1','NB_mon2'],axis=1)

In [17]:
data.columns

Index(['CLIENTNUM', 'Customer_Age', 'Gender', 'Dependent_count',
       'Education_Level', 'Marital_Status', 'Income_Category', 'Card_Category',
       'Months_on_book', 'Total_Relationship_Count', 'Months_Inactive_12_mon',
       'Contacts_Count_12_mon', 'Credit_Limit', 'Total_Revolving_Bal',
       'Avg_Open_To_Buy', 'Total_Amt_Chng_Q4_Q1', 'Total_Trans_Amt',
       'Total_Trans_Ct', 'Total_Ct_Chng_Q4_Q1', 'Avg_Utilization_Ratio',
       'class'],
      dtype='object')

In [18]:
cat_features = data.select_dtypes(include=['object']).columns.to_list()
num_features = data.select_dtypes(exclude=['object']).columns.to_list()

In [19]:
cat_features

['Gender',
 'Education_Level',
 'Marital_Status',
 'Income_Category',
 'Card_Category']

In [20]:
#sex
for feature in cat_features:
    le = LabelEncoder()
    le.fit(data[feature].drop_duplicates())
    data[feature] = le.transform(data[feature])

In [21]:
data.corr()['class'].sort_values()

Total_Trans_Ct             -0.371403
Total_Ct_Chng_Q4_Q1        -0.290054
Total_Revolving_Bal        -0.263053
Avg_Utilization_Ratio      -0.178410
Total_Trans_Amt            -0.168598
Total_Relationship_Count   -0.150005
Total_Amt_Chng_Q4_Q1       -0.131063
CLIENTNUM                  -0.046430
Gender                     -0.037272
Credit_Limit               -0.023873
Card_Category              -0.006038
Avg_Open_To_Buy            -0.000285
Education_Level             0.005551
Months_on_book              0.013687
Income_Category             0.017584
Customer_Age                0.018203
Marital_Status              0.018597
Dependent_count             0.018991
Months_Inactive_12_mon      0.152449
Contacts_Count_12_mon       0.204491
class                       1.000000
Name: class, dtype: float64

In [22]:
data.columns

Index(['CLIENTNUM', 'Customer_Age', 'Gender', 'Dependent_count',
       'Education_Level', 'Marital_Status', 'Income_Category', 'Card_Category',
       'Months_on_book', 'Total_Relationship_Count', 'Months_Inactive_12_mon',
       'Contacts_Count_12_mon', 'Credit_Limit', 'Total_Revolving_Bal',
       'Avg_Open_To_Buy', 'Total_Amt_Chng_Q4_Q1', 'Total_Trans_Amt',
       'Total_Trans_Ct', 'Total_Ct_Chng_Q4_Q1', 'Avg_Utilization_Ratio',
       'class'],
      dtype='object')

In [35]:
# Generate x and y sets
x = data.drop('class', axis=1).values
y = data['class']

In [36]:
from sklearn.model_selection import train_test_split

train,test = train_test_split(data, test_size = 0.2, random_state=1234)

In [37]:
# Importing packages for SMOTE
from imblearn.over_sampling import SMOTE
from imblearn.over_sampling import BorderlineSMOTE
from imblearn.under_sampling import RandomUnderSampler

In [38]:
sm = SMOTE(sampling_strategy='auto', random_state=1234)
x_sm, y_sm = sm.fit_resample(x, y)

In [39]:
from imblearn.under_sampling import TomekLinks

In [40]:
tl = TomekLinks(sampling_strategy='majority')
x_tmk, y_tmk = tl.fit_resample(x, y)

In [41]:
from imblearn.combine import SMOTETomek

In [42]:
smoteto = SMOTETomek(tomek=TomekLinks(sampling_strategy='majority'))
x_smoteto, y_smoteto = smoteto.fit_resample(x, y)

In [43]:
x_sm = pd.DataFrame(x_sm)
x_sm[20] = y_sm

In [44]:
x_tmk = pd.DataFrame(x_tmk)
x_tmk[20] = y_tmk

In [45]:
x_smoteto = pd.DataFrame(x_smoteto)
x_smoteto[20] = y_smoteto

In [46]:
from autogluon.tabular import TabularDataset, TabularPredictor

In [47]:
# split data

train_sm, test_sm = train_test_split(x_sm, test_size=0.2, random_state=1234)
train_tmk, test_tmk = train_test_split(x_tmk, test_size=0.2, random_state=1234)
train_smoteto, test_smoteto = train_test_split(x_smoteto, test_size=0.2, random_state=1234)

In [48]:
train_data = TabularDataset(train)
test_data= TabularDataset(train)

In [49]:
train_data1 = TabularDataset(train_sm)
test_data1 = TabularDataset(test_sm)

train_data2 = TabularDataset(train_tmk)
test_data2 = TabularDataset(test_tmk)

train_data3 = TabularDataset(train_smoteto)
test_data3 = TabularDataset(test_smoteto)

In [50]:
# define target column
label = 20

In [53]:
predictor = TabularPredictor(label='class', eval_metric='precision').fit(train_data)

No path specified. Models will be saved in: "AutogluonModels/ag-20230604_113302/"
Beginning AutoGluon training ...
AutoGluon will save models to "AutogluonModels/ag-20230604_113302/"
AutoGluon Version:  0.7.0
Python Version:     3.8.16
Operating System:   Darwin
Platform Machine:   arm64
Platform Version:   Darwin Kernel Version 22.4.0: Mon Mar  6 20:59:28 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6000
Train Data Rows:    8101
Train Data Columns: 20
Label Column: class
Preprocessing data ...
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [1, 0]
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Selected class <--> label mapping:  class 1 = 1, class 0 = 0
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Availab

In [55]:
predictor1 = TabularPredictor(label=label, eval_metric='precision').fit(train_data1)

No path specified. Models will be saved in: "AutogluonModels/ag-20230604_113329/"
Beginning AutoGluon training ...
AutoGluon will save models to "AutogluonModels/ag-20230604_113329/"
AutoGluon Version:  0.7.0
Python Version:     3.8.16
Operating System:   Darwin
Platform Machine:   arm64
Platform Version:   Darwin Kernel Version 22.4.0: Mon Mar  6 20:59:28 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6000
Train Data Rows:    13600
Train Data Columns: 20
Label Column: 20
Preprocessing data ...
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [1, 0]
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Selected class <--> label mapping:  class 1 = 1, class 0 = 0
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available

In [56]:
predictor2 = TabularPredictor(label=label, eval_metric='precision').fit(train_data2)

No path specified. Models will be saved in: "AutogluonModels/ag-20230604_113414/"
Beginning AutoGluon training ...
AutoGluon will save models to "AutogluonModels/ag-20230604_113414/"
AutoGluon Version:  0.7.0
Python Version:     3.8.16
Operating System:   Darwin
Platform Machine:   arm64
Platform Version:   Darwin Kernel Version 22.4.0: Mon Mar  6 20:59:28 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6000
Train Data Rows:    7519
Train Data Columns: 20
Label Column: 20
Preprocessing data ...
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [0, 1]
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Selected class <--> label mapping:  class 1 = 1, class 0 = 0
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available 

In [57]:
predictor3 = TabularPredictor(label=label, eval_metric='precision').fit(train_data3)

No path specified. Models will be saved in: "AutogluonModels/ag-20230604_113435/"
Beginning AutoGluon training ...
AutoGluon will save models to "AutogluonModels/ag-20230604_113435/"
AutoGluon Version:  0.7.0
Python Version:     3.8.16
Operating System:   Darwin
Platform Machine:   arm64
Platform Version:   Darwin Kernel Version 22.4.0: Mon Mar  6 20:59:28 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6000
Train Data Rows:    12786
Train Data Columns: 20
Label Column: 20
Preprocessing data ...
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [0, 1]
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Selected class <--> label mapping:  class 1 = 1, class 0 = 0
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available

[1000]	valid_set's binary_logloss: 0.0335077	valid_set's precision: 0.982533


	0.9826	 = Validation score   (precision)
	6.55s	 = Training   runtime
	0.02s	 = Validation runtime
Fitting model: LightGBM ...
	0.9896	 = Validation score   (precision)
	4.08s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: RandomForestGini ...
	0.9722	 = Validation score   (precision)
	0.82s	 = Training   runtime
	0.04s	 = Validation runtime
Fitting model: RandomForestEntr ...
	0.9708	 = Validation score   (precision)
	0.83s	 = Training   runtime
	0.04s	 = Validation runtime
Fitting model: CatBoost ...
	0.9926	 = Validation score   (precision)
	4.6s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: ExtraTreesGini ...
	0.9654	 = Validation score   (precision)
	0.41s	 = Training   runtime
	0.04s	 = Validation runtime
Fitting model: ExtraTreesEntr ...
	0.9654	 = Validation score   (precision)
	0.41s	 = Training   runtime
	0.04s	 = Validation runtime
Fitting model: NeuralNetFastAI ...
	0.967	 = Validation score   (precision)
	6.6s	 = Training   runtime
	0

In [58]:
predictor.leaderboard(test_data, silent=True)

Unnamed: 0,model,score_test,score_val,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,LightGBMLarge,1.0,1.0,0.004121,0.002011,6.240703,0.004121,0.002011,6.240703,1,True,13
1,LightGBMXT,1.0,1.0,0.004586,0.001896,1.872593,0.004586,0.001896,1.872593,1,True,3
2,WeightedEnsemble_L2,1.0,1.0,0.0056,0.004011,2.391273,0.001014,0.002115,0.51868,2,True,14
3,RandomForestEntr,0.996115,0.954545,0.077225,0.037706,0.497312,0.077225,0.037706,0.497312,1,True,6
4,ExtraTreesGini,0.996066,0.946809,0.084845,0.039235,0.325103,0.084845,0.039235,0.325103,1,True,8
5,ExtraTreesEntr,0.995276,0.935484,0.088508,0.040451,0.349861,0.088508,0.040451,0.349861,1,True,9
6,RandomForestGini,0.994569,0.9375,0.07516,0.043162,0.497481,0.07516,0.043162,0.497481,1,True,5
7,NeuralNetTorch,0.989796,1.0,0.025189,0.007959,2.399579,0.025189,0.007959,2.399579,1,True,12
8,CatBoost,0.988345,0.966942,0.007266,0.002078,2.032738,0.007266,0.002078,2.032738,1,True,7
9,LightGBM,0.983182,0.984615,0.00313,0.001734,1.916659,0.00313,0.001734,1.916659,1,True,4


In [59]:
predictor1.leaderboard(test_data1, silent=True)

Unnamed: 0,model,score_test,score_val,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,CatBoost,0.989814,0.988183,0.005431,0.002799,3.797137,0.005431,0.002799,3.797137,1,True,7
1,LightGBM,0.989228,0.988183,0.033932,0.013509,5.378546,0.033932,0.013509,5.378546,1,True,4
2,WeightedEnsemble_L2,0.98867,0.992604,0.123587,0.063368,11.438449,0.00179,0.00158,0.697695,2,True,14
3,XGBoost,0.987455,0.98966,0.014326,0.005883,1.733371,0.014326,0.005883,1.733371,1,True,11
4,LightGBMXT,0.986944,0.988287,0.026431,0.010993,4.797402,0.026431,0.010993,4.797402,1,True,3
5,LightGBMLarge,0.982738,0.985316,0.021867,0.007814,9.893825,0.021867,0.007814,9.893825,1,True,13
6,RandomForestGini,0.974601,0.969653,0.062553,0.040946,0.853414,0.062553,0.040946,0.853414,1,True,5
7,NeuralNetFastAI,0.974224,0.969914,0.033516,0.012982,6.664492,0.033516,0.012982,6.664492,1,True,10
8,RandomForestEntr,0.972861,0.970972,0.062223,0.04095,0.884294,0.062223,0.04095,0.884294,1,True,6
9,NeuralNetTorch,0.972765,0.966906,0.011547,0.008144,7.450125,0.011547,0.008144,7.450125,1,True,12


In [60]:
predictor2.leaderboard(test_data2, silent=True)

Unnamed: 0,model,score_test,score_val,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,LightGBM,1.0,1.0,0.001947,0.001803,1.842805,0.001947,0.001803,1.842805,1,True,4
1,LightGBMLarge,1.0,1.0,0.002443,0.001908,6.021473,0.002443,0.001908,6.021473,1,True,13
2,LightGBMXT,1.0,1.0,0.002538,0.001711,2.103195,0.002538,0.001711,2.103195,1,True,3
3,WeightedEnsemble_L2,1.0,1.0,0.003565,0.002721,2.604957,0.001027,0.00101,0.501762,2,True,14
4,CatBoost,0.945455,0.964912,0.008494,0.001753,0.855575,0.008494,0.001753,0.855575,1,True,7
5,XGBoost,0.94,0.960938,0.008211,0.003091,1.145676,0.008211,0.003091,1.145676,1,True,11
6,RandomForestGini,0.928315,0.941667,0.050263,0.036839,0.454869,0.050263,0.036839,0.454869,1,True,5
7,ExtraTreesEntr,0.927273,0.954545,0.059196,0.037971,0.325768,0.059196,0.037971,0.325768,1,True,9
8,RandomForestEntr,0.92674,0.957265,0.048707,0.03733,0.454463,0.048707,0.03733,0.454463,1,True,6
9,ExtraTreesGini,0.909091,0.964706,0.059909,0.037655,0.325297,0.059909,0.037655,0.325297,1,True,8


In [61]:
predictor3.leaderboard(test_data3, silent=True)

Unnamed: 0,model,score_test,score_val,pred_time_test,pred_time_val,fit_time,pred_time_test_marginal,pred_time_val_marginal,fit_time_marginal,stack_level,can_infer,fit_order
0,CatBoost,0.985882,0.992604,0.007307,0.002628,4.602699,0.007307,0.002628,4.602699,1,True,7
1,XGBoost,0.981829,0.986686,0.01393,0.006233,1.642772,0.01393,0.006233,1.642772,1,True,11
2,LightGBMXT,0.981319,0.982583,0.03824,0.015265,6.553326,0.03824,0.015265,6.553326,1,True,3
3,LightGBMLarge,0.981297,0.983824,0.047092,0.020137,14.024193,0.047092,0.020137,14.024193,1,True,13
4,LightGBM,0.980679,0.989645,0.020155,0.009312,4.081542,0.020155,0.009312,4.081542,1,True,4
5,WeightedEnsemble_L2,0.97792,0.995582,2.385619,1.054183,13.94618,0.002979,0.001498,0.697067,2,True,14
6,RandomForestEntr,0.967986,0.97076,0.064436,0.040839,0.830897,0.064436,0.040839,0.830897,1,True,6
7,RandomForestGini,0.96686,0.972222,0.063043,0.040652,0.818707,0.063043,0.040652,0.818707,1,True,5
8,NeuralNetTorch,0.965725,0.968037,0.012948,0.008713,6.577344,0.012948,0.008713,6.577344,1,True,12
9,ExtraTreesGini,0.963499,0.965418,0.074227,0.040635,0.40878,0.074227,0.040635,0.40878,1,True,8


In [63]:
import joblib

joblib.dump(predictor1, 'predictor1.pkl')
joblib.dump(predictor2, 'predictor2.pkl')
joblib.dump(predictor3, 'predictor3.pkl')

['predictor3.pkl']