# Introduction
- This notebook is focus on machine learning algorithm demonstration and evaluation analysis (feature engineering is not the emphasis)
- The data to work on is Telecon Churn data set (I used a concise version (7032*20), in order to decrease training time)
- The task is a binary classification problem (target: churn or not churn). the ratio of churn/not churn is: 0.3620 (imbalanced)
- I used h2o.ai machine package for this demonstration, for the reason that h2o 's great performance with categorical data.
- I will demonstrate Random Forest model and Gradient Boosting model in h2o. 
- Assuming feature engineering process has been done, the procedure is consist of 
    - data splitting
    - (no need for one-hot-encoding, since h2o package can handle categorical data directly)
    - model training: nfold/cross validation, grid search for optimal hyper parameters
    - model evaluation: confusion matrix, accuracy, precision, recall, AUC, PRAUC(for imbalanced data), F1 score, F0.5 score, F2 score, etc.
    
 ### to do later
 - Include and compare with other models, i.e. logistic regression, KNN, Naive Bayes, Decision Tree, SVM.
 - Compare the performance with sklearn model, whose model require one-hot-encoding
 - Visualize the evaluation result, i.e. ROC curve
 - Case-wise analysis on precision and recall

In [4]:
import pandas as pd
import h2o
h2o.init(nthreads = -1, max_mem_size = 16)
# h2o.connect(verbose= False)

Checking whether there is an H2O instance running at http://localhost:54321 . connected.


0,1
H2O cluster uptime:,1 min 04 secs
H2O cluster timezone:,America/Los_Angeles
H2O data parsing timezone:,UTC
H2O cluster version:,3.28.0.3
H2O cluster version age:,"21 days, 13 hours and 58 minutes"
H2O cluster name:,H2O_from_python_kefei_ywe3pq
H2O cluster total nodes:,1
H2O cluster free memory:,16 Gb
H2O cluster total cores:,12
H2O cluster allowed cores:,12


In [5]:
df = h2o.import_file('./data/Telco-Customer-Churn_clean.csv')
df

Parse progress: |█████████████████████████████████████████████████████████| 100%


gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
Female,No,Yes,No,1,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
Male,No,No,No,34,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
Male,No,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
Male,No,No,No,45,No,No phone service,DSL,Yes,No,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
Female,No,No,No,2,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes
Female,No,No,No,8,Yes,Yes,Fiber optic,No,No,Yes,No,Yes,Yes,Month-to-month,Yes,Electronic check,99.65,820.5,Yes
Male,No,No,Yes,22,Yes,Yes,Fiber optic,No,Yes,No,No,Yes,No,Month-to-month,Yes,Credit card (automatic),89.1,1949.4,No
Female,No,No,No,10,No,No phone service,DSL,Yes,No,No,No,No,No,Month-to-month,No,Mailed check,29.75,301.9,No
Female,No,Yes,No,28,Yes,Yes,Fiber optic,No,No,Yes,Yes,Yes,Yes,Month-to-month,Yes,Electronic check,104.8,3046.05,Yes
Male,No,No,Yes,62,Yes,No,DSL,Yes,Yes,No,No,No,No,One year,No,Bank transfer (automatic),56.15,3487.95,No




In [6]:
df.describe()

Rows:7032
Cols:20




Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
type,enum,enum,enum,enum,int,enum,enum,enum,enum,enum,enum,enum,enum,enum,enum,enum,enum,real,real,enum
mins,,,,,1.0,,,,,,,,,,,,,18.25,18.8,
mean,,,,,32.42178612059158,,,,,,,,,,,,,64.79820819112628,2283.300440841866,
maxs,,,,,72.0,,,,,,,,,,,,,118.75,8684.8,
sigma,,,,,24.545259709263256,,,,,,,,,,,,,30.085973884049842,2266.771361883145,
zeros,,,,,0,,,,,,,,,,,,,0,0,
missing,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,Female,No,Yes,No,1.0,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,Male,No,No,No,34.0,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,Male,No,No,No,2.0,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes


In [7]:
df.columns

['gender',
 'SeniorCitizen',
 'Partner',
 'Dependents',
 'tenure',
 'PhoneService',
 'MultipleLines',
 'InternetService',
 'OnlineSecurity',
 'OnlineBackup',
 'DeviceProtection',
 'TechSupport',
 'StreamingTV',
 'StreamingMovies',
 'Contract',
 'PaperlessBilling',
 'PaymentMethod',
 'MonthlyCharges',
 'TotalCharges',
 'Churn']

In [8]:
# spcify columns as target or features
y = 'Churn'
x = list(df.columns)
x.remove(y)

A quick check using the df[y].type to check if the data type is desired. If not, we can then convert this column to an enum type using the as_factor() function.
good thing about h2o is that it can handel categorical type (enum) directly without one-hot-encoding.

In [7]:
df.types

{'gender': 'enum',
 'SeniorCitizen': 'enum',
 'Partner': 'enum',
 'Dependents': 'enum',
 'tenure': 'int',
 'PhoneService': 'enum',
 'MultipleLines': 'enum',
 'InternetService': 'enum',
 'OnlineSecurity': 'enum',
 'OnlineBackup': 'enum',
 'DeviceProtection': 'enum',
 'TechSupport': 'enum',
 'StreamingTV': 'enum',
 'StreamingMovies': 'enum',
 'Contract': 'enum',
 'PaperlessBilling': 'enum',
 'PaymentMethod': 'enum',
 'MonthlyCharges': 'real',
 'TotalCharges': 'real',
 'Churn': 'enum'}

To create training and testing sets, we’ll use the H2O split_frame() function instead of using the sklearn train_test_split() function. 
the "valid_df" is another hold-out test set that would not be touched at all during training, which can be useful to evaluate performance across different models.

In [11]:
splits = df.split_frame(ratios=[0.7, 0.15], seed=1)
train_df = splits[0]
test_df = splits[1]
valid_df = splits[2]

We can also check the size of each set using the .nrow attribute. Note that each of these sets contain both the X variable features and the y target, which is different than our process if we were to have done it in sklearn.

In [12]:
print(df.nrow)
print(train_df.nrow)
print(test_df.nrow)
print(valid_df.nrow)

7032
4940
1028
1064


# Model training
### Random forest

In [13]:
from h2o.estimators.random_forest import H2ORandomForestEstimator
# use default hyper-parameter
rf = H2ORandomForestEstimator(seed=1)

##### Fitting & Predicting Outcomes
- To fit our model to the data, we’ll have to pass at least three parameters: the training_frame, y column, and x columns. 
- However, if the x parameter is left empty, h2o will assume to use all columns except for the y column when fitting. 

In [27]:
rf.train(x=x, y=y, training_frame=train_df)

drf Model Build progress: |███████████████████████████████████████████████| 100%


#### Predictions
- Returning predicted probabilities for each class is quite trivial, and can be done using the predict() function. 
- For random forest the predictions also provide probability for each classes, which can be handy.

In [28]:
y_pred = rf.predict(test_data=test_df)
y_pred

drf prediction progress: |████████████████████████████████████████████████| 100%


predict,No,Yes
Yes,0.530905,0.469095
Yes,0.338218,0.661782
Yes,0.443333,0.556667
No,0.883333,0.116667
No,0.8815,0.1185
Yes,0.553333,0.446667
Yes,0.351667,0.648333
No,0.856667,0.143333
Yes,0.3185,0.6815
Yes,0.446667,0.553333




##### Performance Evaluation
- Truly understanding the basics of model evaluation is critical for decision makers in ultimately determining whether or not a model is suitable for deployment and user interaction. 
- To retrieve a report on how well our model did, we can use the model_performance() function, and print the result to the console.

In [24]:
# performance on training set
rf_performance = rf.model_performance(train_df)
rf_performance


ModelMetricsBinomial: drf
** Reported on test data. **

MSE: 0.03347883296161848
RMSE: 0.18297221909792338
LogLoss: 0.14837589775523233
Mean Per-Class Error: 0.01678270187242603
AUC: 0.9985989596435744
AUCPR: 0.993732244214413
Gini: 0.9971979192871487

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.47532213866710665: 


Unnamed: 0,Unnamed: 1,No,Yes,Error,Rate
0,No,3612.0,27.0,0.0074,(27.0/3639.0)
1,Yes,51.0,1250.0,0.0392,(51.0/1301.0)
2,Total,3663.0,1277.0,0.0158,(78.0/4940.0)



Maximum Metrics: Maximum metrics at their respective thresholds


Unnamed: 0,metric,threshold,value,idx
0,max f1,0.475322,0.969744,191.0
1,max f2,0.394189,0.977879,209.0
2,max f0point5,0.541754,0.980612,177.0
3,max accuracy,0.475322,0.984211,191.0
4,max precision,0.998783,1.0,0.0
5,max recall,0.264298,1.0,250.0
6,max specificity,0.998783,1.0,0.0
7,max absolute_mcc,0.475322,0.959139,191.0
8,max min_per_class_accuracy,0.400314,0.982321,207.0
9,max mean_per_class_accuracy,0.394189,0.983217,209.0



Gains/Lift Table: Avg response rate: 26.34 %, avg score: 26.53 %


Unnamed: 0,Unnamed: 1,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
0,,1,0.010121,0.969455,3.797079,3.797079,1.0,0.981859,1.0,0.981859,0.038432,0.038432,279.707917,279.707917
1,,2,0.02004,0.951028,3.797079,3.797079,1.0,0.960652,1.0,0.971362,0.037663,0.076095,279.707917,279.707917
2,,3,0.030162,0.933878,3.797079,3.797079,1.0,0.940568,1.0,0.961029,0.038432,0.114527,279.707917,279.707917
3,,4,0.040081,0.920456,3.797079,3.797079,1.0,0.927113,1.0,0.952636,0.037663,0.152191,279.707917,279.707917
4,,5,0.05,0.898233,3.797079,3.797079,1.0,0.90915,1.0,0.944009,0.037663,0.189854,279.707917,279.707917
5,,6,0.1,0.822999,3.797079,3.797079,1.0,0.85966,1.0,0.901835,0.189854,0.379708,279.707917,279.707917
6,,7,0.15,0.75,3.781706,3.791955,0.995951,0.787727,0.99865,0.863799,0.189085,0.568793,278.170638,279.195491
7,,8,0.2,0.675019,3.766334,3.78555,0.991903,0.715541,0.996964,0.826734,0.188317,0.75711,276.633359,278.554958
8,,9,0.3,0.306791,2.390469,3.320523,0.629555,0.507165,0.874494,0.720211,0.239047,0.996157,139.046887,232.052267
9,,10,0.4,0.16633,0.038432,2.5,0.010121,0.224712,0.658401,0.596336,0.003843,1.0,-96.156802,150.0







In [25]:
print(rf_performance.auc())
print(rf_performance.aucpr())
print(rf_performance.confusion_matrix())

0.9985989596435744
0.993732244214413

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.47532213866710665: 


Unnamed: 0,Unnamed: 1,No,Yes,Error,Rate
0,No,3612.0,27.0,0.0074,(27.0/3639.0)
1,Yes,51.0,1250.0,0.0392,(51.0/1301.0)
2,Total,3663.0,1277.0,0.0158,(78.0/4940.0)





In [19]:
# performance on test set
rf_performance = rf.model_performance(test_df)
rf_performance


ModelMetricsBinomial: drf
** Reported on test data. **

MSE: 0.14908868060776614
RMSE: 0.38612003393733163
LogLoss: 0.4597128851706852
Mean Per-Class Error: 0.240963687927487
AUC: 0.8256733210995569
AUCPR: 0.6278695608597819
Gini: 0.6513466421991139

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.3105555552244187: 


Unnamed: 0,Unnamed: 1,No,Yes,Error,Rate
0,No,575.0,159.0,0.2166,(159.0/734.0)
1,Yes,78.0,216.0,0.2653,(78.0/294.0)
2,Total,653.0,375.0,0.2305,(237.0/1028.0)



Maximum Metrics: Maximum metrics at their respective thresholds


Unnamed: 0,metric,threshold,value,idx
0,max f1,0.310556,0.64574,215.0
1,max f2,0.194162,0.742164,272.0
2,max f0point5,0.5344,0.628492,122.0
3,max accuracy,0.5344,0.786965,122.0
4,max precision,0.983571,1.0,0.0
5,max recall,0.001693,1.0,395.0
6,max specificity,0.983571,1.0,0.0
7,max absolute_mcc,0.310556,0.48634,215.0
8,max min_per_class_accuracy,0.293333,0.758503,225.0
9,max mean_per_class_accuracy,0.310556,0.759036,215.0



Gains/Lift Table: Avg response rate: 28.60 %, avg score: 27.55 %


Unnamed: 0,Unnamed: 1,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
0,,1,0.0107,0.938863,2.542981,2.542981,0.727273,0.959163,0.727273,0.959163,0.027211,0.027211,154.298083,154.298083
1,,2,0.020428,0.896517,1.748299,2.164561,0.5,0.911132,0.619048,0.936291,0.017007,0.044218,74.829932,116.456106
2,,3,0.030156,0.86631,3.146939,2.481457,0.9,0.884165,0.709677,0.919476,0.030612,0.07483,214.693878,148.14571
3,,4,0.040856,0.850742,2.860853,2.580823,0.818182,0.859198,0.738095,0.903689,0.030612,0.105442,186.085343,158.082281
4,,5,0.050584,0.808473,3.146939,2.689691,0.9,0.82835,0.769231,0.889201,0.030612,0.136054,214.693878,168.969126
5,,6,0.100195,0.682484,2.605309,2.64791,0.745098,0.743511,0.757282,0.817063,0.129252,0.265306,160.530879,164.790965
6,,7,0.150778,0.596661,2.084511,2.458898,0.596154,0.639645,0.703226,0.757542,0.105442,0.370748,108.451073,145.88984
7,,8,0.200389,0.51711,2.056823,2.359355,0.588235,0.557034,0.674757,0.707902,0.102041,0.472789,105.682273,135.935539
8,,9,0.300584,0.381127,1.595535,2.104749,0.456311,0.448786,0.601942,0.62153,0.159864,0.632653,59.55353,110.47487
9,,10,0.399805,0.281475,1.302654,1.905689,0.372549,0.328415,0.545012,0.548786,0.129252,0.761905,30.26544,90.56888







In [23]:
print(rf_performance.auc())
print(rf_performance.aucpr())
print(rf_performance.confusion_matrix())

0.8256733210995569
0.6278695608597819

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.3105555552244187: 


Unnamed: 0,Unnamed: 1,No,Yes,Error,Rate
0,No,575.0,159.0,0.2166,(159.0/734.0)
1,Yes,78.0,216.0,0.2653,(78.0/294.0)
2,Total,653.0,375.0,0.2305,(237.0/1028.0)





### random forest result sumary
- The auc and aucpr (aucpr stands for area under precision-recall curve, which is a good metric for unbalanced data) dropped in the test set, which indicats overfitting.
- To address overfitting, we can train the data using cross-validation on n-folds.
- In addition, we can use grid search for better hyper-parameters.
- cross-validation and hyper-parameter gridsearch will perform in the following gradient boosting model.

### Gradient Boosting Model

In [29]:
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o.grid.grid_search import H2OGridSearch

##### Train GBM
- We will use a GBM model provided by H2O framework (H2OGradientBoostingEstimator) for prediction of the target 
- The training predictors columns, the target values and the dataframe are specified as parameters of the train function.
- Let's start with the default GBM model, and we will perform cross-validaton or gridserach for hyper-parameter later.

In [31]:
# initialize the H2O GBM, use the default model
gbm = H2OGradientBoostingEstimator()
# train with the initialized model
gbm.train(x=x, y=y, training_frame=train_df)

gbm Model Build progress: |███████████████████████████████████████████████| 100%


In [32]:
# gbm model information
gbm.summary()


Model Summary: 


Unnamed: 0,Unnamed: 1,number_of_trees,number_of_internal_trees,model_size_in_bytes,min_depth,max_depth,mean_depth,min_leaves,max_leaves,mean_leaves
0,,50.0,50.0,20207.0,5.0,5.0,5.0,18.0,32.0,27.46




##### prediction

In [33]:
gbm.predict(test_df)

gbm prediction progress: |████████████████████████████████████████████████| 100%


predict,No,Yes
No,0.661758,0.338242
Yes,0.212834,0.787166
Yes,0.606558,0.393442
No,0.861522,0.138478
No,0.823234,0.176766
No,0.793385,0.206615
Yes,0.298959,0.701041
No,0.878658,0.121342
Yes,0.362589,0.637411
Yes,0.612607,0.387393




##### Evaluation on training set and test set

In [36]:
gbm.model_performance(train_df)


ModelMetricsBinomial: gbm
** Reported on test data. **

MSE: 0.10792629896414406
RMSE: 0.32852138281114074
LogLoss: 0.3418011092079633
Mean Per-Class Error: 0.17119422584652266
AUC: 0.9067797003974578
AUCPR: 0.7775125725104265
Gini: 0.8135594007949156

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.33958781119919873: 


Unnamed: 0,Unnamed: 1,No,Yes,Error,Rate
0,No,3056.0,583.0,0.1602,(583.0/3639.0)
1,Yes,240.0,1061.0,0.1845,(240.0/1301.0)
2,Total,3296.0,1644.0,0.1666,(823.0/4940.0)



Maximum Metrics: Maximum metrics at their respective thresholds


Unnamed: 0,metric,threshold,value,idx
0,max f1,0.339588,0.720543,215.0
1,max f2,0.200625,0.807145,279.0
2,max f0point5,0.537324,0.722433,133.0
3,max accuracy,0.473069,0.847773,158.0
4,max precision,0.922804,1.0,0.0
5,max recall,0.014881,1.0,395.0
6,max specificity,0.922804,1.0,0.0
7,max absolute_mcc,0.339588,0.612544,215.0
8,max min_per_class_accuracy,0.320962,0.82825,223.0
9,max mean_per_class_accuracy,0.320962,0.828806,223.0



Gains/Lift Table: Avg response rate: 26.34 %, avg score: 26.35 %


Unnamed: 0,Unnamed: 1,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
0,,1,0.010121,0.880687,3.721138,3.721138,0.98,0.894501,0.98,0.894501,0.037663,0.037663,272.113759,272.113759
1,,2,0.02004,0.859507,3.564605,3.643662,0.938776,0.868631,0.959596,0.881697,0.035357,0.073021,256.460493,264.366183
2,,3,0.030162,0.840592,3.417371,3.567725,0.9,0.85075,0.939597,0.871312,0.034589,0.10761,241.737125,256.772539
3,,4,0.040081,0.816878,3.642096,3.58613,0.959184,0.828573,0.944444,0.860735,0.036126,0.143736,264.209635,258.613033
4,,5,0.05,0.790573,3.177148,3.504996,0.836735,0.807556,0.923077,0.850185,0.031514,0.17525,217.714788,250.499616
5,,6,0.1,0.677635,3.166795,3.335895,0.834008,0.735674,0.878543,0.79293,0.15834,0.33359,216.679477,233.589547
6,,7,0.15,0.58932,2.628747,3.100179,0.692308,0.63111,0.816464,0.73899,0.131437,0.465027,162.874712,210.017935
7,,8,0.2,0.518804,2.290546,2.897771,0.603239,0.553774,0.763158,0.692686,0.114527,0.579554,129.054573,189.777095
8,,9,0.3,0.376719,1.844735,2.546759,0.48583,0.443292,0.670715,0.609554,0.184473,0.764028,84.473482,154.67589
9,,10,0.4,0.25583,1.137586,2.194466,0.299595,0.313847,0.577935,0.535628,0.113759,0.877786,13.758647,119.44658







In [34]:
gbm.model_performance(test_df)


ModelMetricsBinomial: gbm
** Reported on test data. **

MSE: 0.1415976375832351
RMSE: 0.3762946154055823
LogLoss: 0.4360557825489725
Mean Per-Class Error: 0.2287808856512632
AUC: 0.8418923427681699
AUCPR: 0.6631084194181939
Gini: 0.6837846855363399

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.24369977807040588: 


Unnamed: 0,Unnamed: 1,No,Yes,Error,Rate
0,No,541.0,193.0,0.2629,(193.0/734.0)
1,Yes,59.0,235.0,0.2007,(59.0/294.0)
2,Total,600.0,428.0,0.2451,(252.0/1028.0)



Maximum Metrics: Maximum metrics at their respective thresholds


Unnamed: 0,metric,threshold,value,idx
0,max f1,0.2437,0.65097,243.0
1,max f2,0.147856,0.768116,294.0
2,max f0point5,0.45984,0.651326,150.0
3,max accuracy,0.45984,0.798638,150.0
4,max precision,0.859491,0.947368,12.0
5,max recall,0.014433,1.0,395.0
6,max specificity,0.901962,0.998638,0.0
7,max absolute_mcc,0.349543,0.50199,193.0
8,max min_per_class_accuracy,0.272048,0.76158,229.0
9,max mean_per_class_accuracy,0.202609,0.771219,263.0



Gains/Lift Table: Avg response rate: 28.60 %, avg score: 26.73 %


Unnamed: 0,Unnamed: 1,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
0,,1,0.011673,0.867665,3.205215,3.205215,0.916667,0.879297,0.916667,0.879297,0.037415,0.037415,220.521542,220.521542
1,,2,0.020428,0.848054,3.108088,3.163589,0.888889,0.861209,0.904762,0.871545,0.027211,0.064626,210.808768,216.358925
2,,3,0.030156,0.822345,2.447619,2.932631,0.7,0.839647,0.83871,0.861255,0.02381,0.088435,144.761905,193.263112
3,,4,0.040856,0.808379,2.225108,2.747328,0.636364,0.815824,0.785714,0.849357,0.02381,0.112245,122.510823,174.73275
4,,5,0.050584,0.776384,2.797279,2.756934,0.8,0.791593,0.788462,0.838248,0.027211,0.139456,179.727891,175.693354
5,,6,0.100195,0.693,2.67387,2.715805,0.764706,0.73247,0.776699,0.785872,0.132653,0.272109,167.386955,171.580477
6,,7,0.150778,0.602396,2.420722,2.616809,0.692308,0.645246,0.748387,0.738694,0.122449,0.394558,142.072214,161.68093
7,,8,0.200389,0.52783,1.919701,2.444224,0.54902,0.568095,0.699029,0.696458,0.095238,0.489796,91.970121,144.422429
8,,9,0.300584,0.367288,1.66343,2.18396,0.475728,0.444588,0.624595,0.612502,0.166667,0.656463,66.343042,118.395967
9,,10,0.399805,0.253728,1.234094,1.948226,0.352941,0.312002,0.557178,0.537925,0.122449,0.778912,23.409364,94.822649







#### There is a little improvement comparing to the random forest model. AUC and AUCPR didn't drop that drastically, also with higher score.  But can we do better? Let's do cross-validation and hyper-parameter grid sereach.

### GBM with GridSearchCV


##### Training and prediction
the parameter settings worth mentioning are:
- "nfolds=" in "H2OGradientBoostingEstimator" 
- "hyper_params=" in "H2OGridSearch"
- "search_criteria=" in "H2OGridSearch"

In [37]:
grid_search_gbm = H2OGradientBoostingEstimator(
    nfolds = 5, keep_cross_validation_fold_assignment=True,
    stopping_rounds = 25,
    stopping_metric = "AUC",
    col_sample_rate = 0.65,
    sample_rate = 0.65,
    seed = 1
) 

hyper_params = {
    'learn_rate':[0.01, 0.02],
    'max_depth':[4,8],
    'ntrees':[50, 250]}

grid = H2OGridSearch(model=grid_search_gbm, hyper_params=hyper_params,
                         grid_id='grid_depth',#################### be careful, for some reason, 
                                                            ##########if the grid_id is the same, it might add the result to
                                                              ####### the former H2OGridSearch instances instead of create a new one.
                         search_criteria={'strategy': "Cartesian"})
#Train grid search
grid.train(x=x, 
           y=y,
           training_frame=train_df,
           validation_frame=test_df)

gbm Grid Build progress: |████████████████████████████████████████████████| 100%


In [38]:
y_pred = grid.predict(test_data=test_df)
y_pred

gbm prediction progress: |████████████████████████████████████████████████| 100%
gbm prediction progress: |████████████████████████████████████████████████| 100%
gbm prediction progress: |████████████████████████████████████████████████| 100%
gbm prediction progress: |████████████████████████████████████████████████| 100%
gbm prediction progress: |████████████████████████████████████████████████| 100%
gbm prediction progress: |████████████████████████████████████████████████| 100%
gbm prediction progress: |████████████████████████████████████████████████| 100%
gbm prediction progress: |████████████████████████████████████████████████| 100%


predict,No,Yes
Yes,0.630534,0.369466
Yes,0.323741,0.676259
Yes,0.56587,0.43413
No,0.801564,0.198436
No,0.809507,0.190493
Yes,0.670385,0.329615
Yes,0.404066,0.595934
No,0.855866,0.144134
Yes,0.492185,0.507815
Yes,0.55291,0.44709


predict,No,Yes
Yes,0.652397,0.347603
Yes,0.363855,0.636145
Yes,0.635737,0.364263
No,0.828525,0.171475
No,0.829359,0.170641
No,0.773996,0.226004
Yes,0.419765,0.580235
No,0.827561,0.172439
Yes,0.4583,0.5417
Yes,0.620064,0.379936


predict,No,Yes
Yes,0.642713,0.357287
Yes,0.410368,0.589632
Yes,0.641558,0.358442
No,0.820258,0.179742
No,0.810654,0.189346
No,0.757197,0.242803
Yes,0.446579,0.553421
No,0.817283,0.182717
Yes,0.49783,0.50217
Yes,0.6482,0.3518


predict,No,Yes
Yes,0.660722,0.339278
Yes,0.421206,0.578794
Yes,0.6248,0.3752
No,0.781449,0.218551
No,0.789353,0.210647
Yes,0.682619,0.317381
Yes,0.484871,0.515129
No,0.820542,0.179458
Yes,0.583738,0.416262
Yes,0.597632,0.402368


predict,No,Yes
Yes,0.676723,0.323277
Yes,0.485833,0.514167
Yes,0.6516,0.3484
No,0.783486,0.216514
No,0.803066,0.196934
No,0.753448,0.246552
Yes,0.523638,0.476362
No,0.811103,0.188897
Yes,0.542029,0.457971
Yes,0.667347,0.332653


predict,No,Yes
Yes,0.679657,0.320343
Yes,0.475131,0.524869
Yes,0.642698,0.357302
No,0.767323,0.232677
No,0.782961,0.217039
Yes,0.668672,0.331328
Yes,0.523762,0.476238
No,0.803229,0.196771
Yes,0.614551,0.385449
Yes,0.644674,0.355326


predict,No,Yes
Yes,0.68209,0.31791
Yes,0.535061,0.464939
Yes,0.670279,0.329721
No,0.765256,0.234744
No,0.781764,0.218236
Yes,0.745108,0.254892
Yes,0.57477,0.42523
No,0.791234,0.208766
Yes,0.581874,0.418126
Yes,0.677177,0.322823


predict,No,Yes
Yes,0.690357,0.309643
Yes,0.542895,0.457105
Yes,0.6677,0.3323
No,0.757913,0.242087
No,0.768492,0.231508
Yes,0.686109,0.313891
Yes,0.581225,0.418775
No,0.785259,0.214741
Yes,0.647945,0.352055
Yes,0.6664,0.3336


{'grid_depth_model_6': ,
 'grid_depth_model_8': ,
 'grid_depth_model_4': ,
 'grid_depth_model_2': ,
 'grid_depth_model_7': ,
 'grid_depth_model_5': ,
 'grid_depth_model_3': ,
 'grid_depth_model_1': }

##### Select "best" model
- As expected, grid search will generate several models with specified hyper-parameters. (In this demo, we are expecting 2*2*2 different models, training on 5 cross-validation folders)
- We can sort the models according to particular criteria.
- In this demo, we can use 'auc' as the critera to choose our "best" model.
- note, according to h2o documentary, if nfolder > 1, then the criteria is on training set no matter what.

In [39]:
grid_sorted = grid.get_grid(sort_by='auc',decreasing=True)
best_gbm = grid_sorted.models[0]
best_gbm

Model Details
H2OGradientBoostingEstimator :  Gradient Boosting Machine
Model Key:  grid_depth_model_6


Model Summary: 


Unnamed: 0,Unnamed: 1,number_of_trees,number_of_internal_trees,model_size_in_bytes,min_depth,max_depth,mean_depth,min_leaves,max_leaves,mean_leaves
0,,93.0,93.0,23906.0,4.0,4.0,4.0,15.0,16.0,15.860215




ModelMetricsBinomial: gbm
** Reported on train data. **

MSE: 0.12936928723651864
RMSE: 0.35967942287058713
LogLoss: 0.4057941496350746
Mean Per-Class Error: 0.21089491056724075
AUC: 0.8685273699242915
AUCPR: 0.7018662908739439
Gini: 0.7370547398485829

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.3286812161917472: 


Unnamed: 0,Unnamed: 1,No,Yes,Error,Rate
0,No,2989.0,650.0,0.1786,(650.0/3639.0)
1,Yes,342.0,959.0,0.2629,(342.0/1301.0)
2,Total,3331.0,1609.0,0.2008,(992.0/4940.0)



Maximum Metrics: Maximum metrics at their respective thresholds


Unnamed: 0,metric,threshold,value,idx
0,max f1,0.328681,0.659107,205.0
1,max f2,0.183352,0.768689,296.0
2,max f0point5,0.48868,0.662613,122.0
3,max accuracy,0.447636,0.818623,144.0
4,max precision,0.778001,1.0,0.0
5,max recall,0.056283,1.0,398.0
6,max specificity,0.778001,1.0,0.0
7,max absolute_mcc,0.353108,0.525707,194.0
8,max min_per_class_accuracy,0.292733,0.783732,226.0
9,max mean_per_class_accuracy,0.278142,0.789105,235.0



Gains/Lift Table: Avg response rate: 26.34 %, avg score: 26.30 %


Unnamed: 0,Unnamed: 1,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
0,,1,0.010121,0.732607,3.645196,3.645196,0.96,0.751215,0.96,0.751215,0.036895,0.036895,264.5196,264.5196
1,,2,0.02004,0.711713,3.254639,3.45189,0.857143,0.723743,0.909091,0.737618,0.032283,0.069178,225.463929,245.189015
2,,3,0.030162,0.692228,3.34143,3.414823,0.88,0.701149,0.899329,0.72538,0.03382,0.102998,234.142967,241.482288
3,,4,0.040486,0.676076,3.424817,3.417371,0.901961,0.684574,0.9,0.714974,0.035357,0.138355,242.481651,241.737125
4,,5,0.05,0.655844,3.312346,3.397387,0.87234,0.66437,0.894737,0.705345,0.031514,0.169869,231.234566,239.738663
5,,6,0.1,0.573165,2.613374,3.00538,0.688259,0.61621,0.791498,0.660778,0.130669,0.300538,161.337433,200.538048
6,,7,0.150202,0.510723,2.465039,2.824781,0.649194,0.537898,0.743935,0.619707,0.123751,0.424289,146.50393,182.478127
7,,8,0.2,0.461301,2.145504,2.65565,0.565041,0.486245,0.699393,0.586477,0.106841,0.53113,114.550408,165.56495
8,,9,0.3,0.361432,1.70638,2.339226,0.449393,0.412026,0.616059,0.528327,0.170638,0.701768,70.637971,133.922624
9,,10,0.4,0.268699,1.21445,2.058032,0.319838,0.308134,0.542004,0.473278,0.121445,0.823213,21.445042,105.803228




ModelMetricsBinomial: gbm
** Reported on validation data. **

MSE: 0.1443067351040791
RMSE: 0.3798772632102099
LogLoss: 0.4430866826526678
Mean Per-Class Error: 0.22190865446996244
AUC: 0.8440077665943763
AUCPR: 0.6585186508732533
Gini: 0.6880155331887525

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.29997889409404527: 


Unnamed: 0,Unnamed: 1,No,Yes,Error,Rate
0,No,583.0,151.0,0.2057,(151.0/734.0)
1,Yes,70.0,224.0,0.2381,(70.0/294.0)
2,Total,653.0,375.0,0.215,(221.0/1028.0)



Maximum Metrics: Maximum metrics at their respective thresholds


Unnamed: 0,metric,threshold,value,idx
0,max f1,0.299979,0.669656,204.0
1,max f2,0.171175,0.763771,296.0
2,max f0point5,0.404115,0.651079,153.0
3,max accuracy,0.404115,0.799611,153.0
4,max precision,0.769998,1.0,0.0
5,max recall,0.057649,1.0,395.0
6,max specificity,0.769998,1.0,0.0
7,max absolute_mcc,0.299979,0.522116,204.0
8,max min_per_class_accuracy,0.292173,0.772109,210.0
9,max mean_per_class_accuracy,0.299979,0.778091,204.0



Gains/Lift Table: Avg response rate: 28.60 %, avg score: 26.76 %


Unnamed: 0,Unnamed: 1,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
0,,1,0.0107,0.725599,3.178726,3.178726,0.909091,0.738688,0.909091,0.738688,0.034014,0.034014,217.872604,217.872604
1,,2,0.020428,0.699051,2.797279,2.997085,0.8,0.708696,0.857143,0.724406,0.027211,0.061224,179.727891,199.708455
2,,3,0.030156,0.688819,2.797279,2.932631,0.8,0.693133,0.83871,0.714318,0.027211,0.088435,179.727891,193.263112
3,,4,0.040856,0.670557,2.542981,2.83058,0.727273,0.680301,0.809524,0.705409,0.027211,0.115646,154.298083,183.057985
4,,5,0.050584,0.650966,2.447619,2.756934,0.7,0.660167,0.788462,0.696709,0.02381,0.139456,144.761905,175.693354
5,,6,0.100195,0.583702,2.67387,2.715805,0.764706,0.61542,0.776699,0.656459,0.132653,0.272109,167.386955,171.580477
6,,7,0.150778,0.517761,2.084511,2.504016,0.596154,0.54808,0.716129,0.620099,0.105442,0.377551,108.451073,150.40158
7,,8,0.200389,0.46614,2.056823,2.393303,0.588235,0.492865,0.684466,0.5886,0.102041,0.479592,105.682273,139.330295
8,,9,0.300584,0.36945,1.697378,2.161328,0.485437,0.419485,0.618123,0.532228,0.170068,0.64966,69.737798,116.132796
9,,10,0.399805,0.27515,1.439776,1.982257,0.411765,0.314627,0.56691,0.478225,0.142857,0.792517,43.977591,98.225665




ModelMetricsBinomial: gbm
** Reported on cross-validation data. **

MSE: 0.13663067781586877
RMSE: 0.36963587192785924
LogLoss: 0.4250213809173442
Mean Per-Class Error: 0.2323572519838566
AUC: 0.8451371986670156
AUCPR: 0.6570573000059055
Gini: 0.6902743973340313

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.3116351076994076: 


Unnamed: 0,Unnamed: 1,No,Yes,Error,Rate
0,No,2876.0,763.0,0.2097,(763.0/3639.0)
1,Yes,338.0,963.0,0.2598,(338.0/1301.0)
2,Total,3214.0,1726.0,0.2229,(1101.0/4940.0)



Maximum Metrics: Maximum metrics at their respective thresholds


Unnamed: 0,metric,threshold,value,idx
0,max f1,0.311635,0.636274,209.0
1,max f2,0.176048,0.74642,289.0
2,max f0point5,0.45958,0.634104,133.0
3,max accuracy,0.45958,0.806275,133.0
4,max precision,0.809756,1.0,0.0
5,max recall,0.036269,1.0,399.0
6,max specificity,0.809756,1.0,0.0
7,max absolute_mcc,0.359623,0.493489,185.0
8,max min_per_class_accuracy,0.29095,0.763259,220.0
9,max mean_per_class_accuracy,0.256687,0.767643,239.0



Gains/Lift Table: Avg response rate: 26.34 %, avg score: 26.30 %


Unnamed: 0,Unnamed: 1,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
0,,1,0.010121,0.743731,3.34143,3.34143,0.88,0.769226,0.88,0.769226,0.03382,0.03382,234.142967,234.142967
1,,2,0.020445,0.713486,3.350364,3.345941,0.882353,0.725987,0.881188,0.747392,0.034589,0.068409,235.036397,234.594105
2,,3,0.030162,0.690779,3.322444,3.338372,0.875,0.701393,0.879195,0.732574,0.032283,0.100692,232.244427,233.837162
3,,4,0.040081,0.669761,3.254639,3.31765,0.857143,0.679152,0.873737,0.719353,0.032283,0.132975,225.463929,231.764998
4,,5,0.05,0.652407,3.254639,3.30515,0.857143,0.660002,0.870445,0.707579,0.032283,0.165257,225.463929,230.514988
5,,6,0.1,0.576239,2.428901,2.867025,0.639676,0.614667,0.755061,0.661123,0.121445,0.286703,142.890085,186.702537
6,,7,0.15,0.499489,2.2598,2.664617,0.595142,0.534855,0.701754,0.619034,0.11299,0.399693,125.980015,166.461696
7,,8,0.2,0.454467,2.213682,2.551883,0.582996,0.475955,0.672065,0.583264,0.110684,0.510377,121.368178,155.188317
8,,9,0.3,0.358512,1.683321,2.262362,0.44332,0.406523,0.595816,0.52435,0.168332,0.678709,68.332052,126.236229
9,,10,0.4,0.273054,1.1299,1.979247,0.297571,0.310915,0.521255,0.470991,0.11299,0.791699,12.990008,97.924673




Cross-Validation Metrics Summary: 


Unnamed: 0,Unnamed: 1,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
0,accuracy,0.7723251,0.04325697,0.7918288,0.76138616,0.7074414,0.7761044,0.82486486
1,auc,0.84755987,0.026824925,0.86246127,0.85517263,0.8026424,0.8459208,0.8716023
2,aucpr,0.6609391,0.030466506,0.68222964,0.6704987,0.60706556,0.6718318,0.6730697
3,err,0.22767489,0.04325697,0.2081712,0.23861386,0.2925586,0.22389558,0.17513514
4,err_count,225.4,45.25815,214.0,241.0,287.0,223.0,162.0
5,f0point5,0.58638626,0.044929203,0.60385066,0.57180154,0.5187032,0.5978584,0.6397174
6,f1,0.6414569,0.029619418,0.65923566,0.64506626,0.59174967,0.6432,0.66803277
7,f2,0.70987344,0.021861475,0.7258065,0.7398649,0.68874174,0.69598335,0.69897085
8,lift_top_group,3.2938256,0.34199086,3.5133288,2.7511065,3.64684,3.2835164,3.2743363
9,logloss,0.42480248,0.027237814,0.42244083,0.41080758,0.469063,0.42530462,0.39639628



See the whole table with table.as_data_frame()

Scoring History: 


Unnamed: 0,Unnamed: 1,timestamp,duration,number_of_trees,training_rmse,training_logloss,training_auc,training_pr_auc,training_lift,training_classification_error,validation_rmse,validation_logloss,validation_auc,validation_pr_auc,validation_lift,validation_classification_error
0,,2020-02-27 09:11:09,17.406 sec,0.0,0.440456,0.576542,0.5,0.0,1.0,0.73664,0.452452,0.599821,0.5,0.0,1.0,0.714008
1,,2020-02-27 09:11:09,17.412 sec,1.0,0.437835,0.570633,0.838648,0.357904,2.760435,0.229555,0.449872,0.593849,0.828201,0.381618,2.445174,0.259728
2,,2020-02-27 09:11:09,17.415 sec,2.0,0.435273,0.564942,0.843474,0.532402,3.184647,0.232186,0.447426,0.588266,0.829341,0.535725,2.678246,0.227626
3,,2020-02-27 09:11:09,17.419 sec,3.0,0.432826,0.559569,0.848372,0.537447,3.184647,0.225709,0.445125,0.583075,0.82961,0.535781,2.678246,0.231518
4,,2020-02-27 09:11:09,17.423 sec,4.0,0.430473,0.554462,0.85216,0.567776,3.274128,0.218623,0.443007,0.578349,0.829615,0.553184,2.646075,0.244163
5,,2020-02-27 09:11:09,17.428 sec,5.0,0.428212,0.549614,0.853636,0.570362,3.274128,0.217004,0.440815,0.573508,0.83179,0.554396,2.646075,0.251946
6,,2020-02-27 09:11:09,17.433 sec,6.0,0.425934,0.54476,0.853878,0.634854,3.458054,0.222065,0.43862,0.568693,0.833187,0.614582,2.646075,0.243191
7,,2020-02-27 09:11:09,17.438 sec,7.0,0.423747,0.540149,0.854421,0.635443,3.458054,0.208704,0.436511,0.564127,0.835124,0.620243,2.646075,0.222763
8,,2020-02-27 09:11:09,17.446 sec,8.0,0.421682,0.535808,0.854635,0.642294,3.458054,0.208704,0.434588,0.559959,0.835094,0.62846,2.680726,0.217899
9,,2020-02-27 09:11:09,17.453 sec,9.0,0.419724,0.53173,0.855391,0.643587,3.458054,0.20668,0.43277,0.556068,0.834244,0.627656,2.680726,0.216926



See the whole table with table.as_data_frame()

Variable Importances: 


Unnamed: 0,variable,relative_importance,scaled_importance,percentage
0,Contract,3019.020752,1.0,0.437553
1,tenure,783.966248,0.259676,0.113622
2,OnlineSecurity,655.65387,0.217174,0.095025
3,InternetService,545.52887,0.180697,0.079065
4,TotalCharges,464.130981,0.153736,0.067267
5,MonthlyCharges,456.012299,0.151046,0.066091
6,TechSupport,391.279266,0.129605,0.056709
7,PaymentMethod,235.543549,0.07802,0.034138
8,PaperlessBilling,108.492752,0.035936,0.015724
9,MultipleLines,67.674408,0.022416,0.009808




In [40]:
# get detailed info from cross validation (on training set)
best_gbm.get_xval_models

Model Details
H2OGradientBoostingEstimator :  Gradient Boosting Machine
Model Key:  grid_depth_model_6


Model Summary: 


Unnamed: 0,Unnamed: 1,number_of_trees,number_of_internal_trees,model_size_in_bytes,min_depth,max_depth,mean_depth,min_leaves,max_leaves,mean_leaves
0,,93.0,93.0,23906.0,4.0,4.0,4.0,15.0,16.0,15.860215




ModelMetricsBinomial: gbm
** Reported on train data. **

MSE: 0.12936928723651864
RMSE: 0.35967942287058713
LogLoss: 0.4057941496350746
Mean Per-Class Error: 0.21089491056724075
AUC: 0.8685273699242915
AUCPR: 0.7018662908739439
Gini: 0.7370547398485829

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.3286812161917472: 


Unnamed: 0,Unnamed: 1,No,Yes,Error,Rate
0,No,2989.0,650.0,0.1786,(650.0/3639.0)
1,Yes,342.0,959.0,0.2629,(342.0/1301.0)
2,Total,3331.0,1609.0,0.2008,(992.0/4940.0)



Maximum Metrics: Maximum metrics at their respective thresholds


Unnamed: 0,metric,threshold,value,idx
0,max f1,0.328681,0.659107,205.0
1,max f2,0.183352,0.768689,296.0
2,max f0point5,0.48868,0.662613,122.0
3,max accuracy,0.447636,0.818623,144.0
4,max precision,0.778001,1.0,0.0
5,max recall,0.056283,1.0,398.0
6,max specificity,0.778001,1.0,0.0
7,max absolute_mcc,0.353108,0.525707,194.0
8,max min_per_class_accuracy,0.292733,0.783732,226.0
9,max mean_per_class_accuracy,0.278142,0.789105,235.0



Gains/Lift Table: Avg response rate: 26.34 %, avg score: 26.30 %


Unnamed: 0,Unnamed: 1,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
0,,1,0.010121,0.732607,3.645196,3.645196,0.96,0.751215,0.96,0.751215,0.036895,0.036895,264.5196,264.5196
1,,2,0.02004,0.711713,3.254639,3.45189,0.857143,0.723743,0.909091,0.737618,0.032283,0.069178,225.463929,245.189015
2,,3,0.030162,0.692228,3.34143,3.414823,0.88,0.701149,0.899329,0.72538,0.03382,0.102998,234.142967,241.482288
3,,4,0.040486,0.676076,3.424817,3.417371,0.901961,0.684574,0.9,0.714974,0.035357,0.138355,242.481651,241.737125
4,,5,0.05,0.655844,3.312346,3.397387,0.87234,0.66437,0.894737,0.705345,0.031514,0.169869,231.234566,239.738663
5,,6,0.1,0.573165,2.613374,3.00538,0.688259,0.61621,0.791498,0.660778,0.130669,0.300538,161.337433,200.538048
6,,7,0.150202,0.510723,2.465039,2.824781,0.649194,0.537898,0.743935,0.619707,0.123751,0.424289,146.50393,182.478127
7,,8,0.2,0.461301,2.145504,2.65565,0.565041,0.486245,0.699393,0.586477,0.106841,0.53113,114.550408,165.56495
8,,9,0.3,0.361432,1.70638,2.339226,0.449393,0.412026,0.616059,0.528327,0.170638,0.701768,70.637971,133.922624
9,,10,0.4,0.268699,1.21445,2.058032,0.319838,0.308134,0.542004,0.473278,0.121445,0.823213,21.445042,105.803228




ModelMetricsBinomial: gbm
** Reported on validation data. **

MSE: 0.1443067351040791
RMSE: 0.3798772632102099
LogLoss: 0.4430866826526678
Mean Per-Class Error: 0.22190865446996244
AUC: 0.8440077665943763
AUCPR: 0.6585186508732533
Gini: 0.6880155331887525

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.29997889409404527: 


Unnamed: 0,Unnamed: 1,No,Yes,Error,Rate
0,No,583.0,151.0,0.2057,(151.0/734.0)
1,Yes,70.0,224.0,0.2381,(70.0/294.0)
2,Total,653.0,375.0,0.215,(221.0/1028.0)



Maximum Metrics: Maximum metrics at their respective thresholds


Unnamed: 0,metric,threshold,value,idx
0,max f1,0.299979,0.669656,204.0
1,max f2,0.171175,0.763771,296.0
2,max f0point5,0.404115,0.651079,153.0
3,max accuracy,0.404115,0.799611,153.0
4,max precision,0.769998,1.0,0.0
5,max recall,0.057649,1.0,395.0
6,max specificity,0.769998,1.0,0.0
7,max absolute_mcc,0.299979,0.522116,204.0
8,max min_per_class_accuracy,0.292173,0.772109,210.0
9,max mean_per_class_accuracy,0.299979,0.778091,204.0



Gains/Lift Table: Avg response rate: 28.60 %, avg score: 26.76 %


Unnamed: 0,Unnamed: 1,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
0,,1,0.0107,0.725599,3.178726,3.178726,0.909091,0.738688,0.909091,0.738688,0.034014,0.034014,217.872604,217.872604
1,,2,0.020428,0.699051,2.797279,2.997085,0.8,0.708696,0.857143,0.724406,0.027211,0.061224,179.727891,199.708455
2,,3,0.030156,0.688819,2.797279,2.932631,0.8,0.693133,0.83871,0.714318,0.027211,0.088435,179.727891,193.263112
3,,4,0.040856,0.670557,2.542981,2.83058,0.727273,0.680301,0.809524,0.705409,0.027211,0.115646,154.298083,183.057985
4,,5,0.050584,0.650966,2.447619,2.756934,0.7,0.660167,0.788462,0.696709,0.02381,0.139456,144.761905,175.693354
5,,6,0.100195,0.583702,2.67387,2.715805,0.764706,0.61542,0.776699,0.656459,0.132653,0.272109,167.386955,171.580477
6,,7,0.150778,0.517761,2.084511,2.504016,0.596154,0.54808,0.716129,0.620099,0.105442,0.377551,108.451073,150.40158
7,,8,0.200389,0.46614,2.056823,2.393303,0.588235,0.492865,0.684466,0.5886,0.102041,0.479592,105.682273,139.330295
8,,9,0.300584,0.36945,1.697378,2.161328,0.485437,0.419485,0.618123,0.532228,0.170068,0.64966,69.737798,116.132796
9,,10,0.399805,0.27515,1.439776,1.982257,0.411765,0.314627,0.56691,0.478225,0.142857,0.792517,43.977591,98.225665




ModelMetricsBinomial: gbm
** Reported on cross-validation data. **

MSE: 0.13663067781586877
RMSE: 0.36963587192785924
LogLoss: 0.4250213809173442
Mean Per-Class Error: 0.2323572519838566
AUC: 0.8451371986670156
AUCPR: 0.6570573000059055
Gini: 0.6902743973340313

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.3116351076994076: 


Unnamed: 0,Unnamed: 1,No,Yes,Error,Rate
0,No,2876.0,763.0,0.2097,(763.0/3639.0)
1,Yes,338.0,963.0,0.2598,(338.0/1301.0)
2,Total,3214.0,1726.0,0.2229,(1101.0/4940.0)



Maximum Metrics: Maximum metrics at their respective thresholds


Unnamed: 0,metric,threshold,value,idx
0,max f1,0.311635,0.636274,209.0
1,max f2,0.176048,0.74642,289.0
2,max f0point5,0.45958,0.634104,133.0
3,max accuracy,0.45958,0.806275,133.0
4,max precision,0.809756,1.0,0.0
5,max recall,0.036269,1.0,399.0
6,max specificity,0.809756,1.0,0.0
7,max absolute_mcc,0.359623,0.493489,185.0
8,max min_per_class_accuracy,0.29095,0.763259,220.0
9,max mean_per_class_accuracy,0.256687,0.767643,239.0



Gains/Lift Table: Avg response rate: 26.34 %, avg score: 26.30 %


Unnamed: 0,Unnamed: 1,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
0,,1,0.010121,0.743731,3.34143,3.34143,0.88,0.769226,0.88,0.769226,0.03382,0.03382,234.142967,234.142967
1,,2,0.020445,0.713486,3.350364,3.345941,0.882353,0.725987,0.881188,0.747392,0.034589,0.068409,235.036397,234.594105
2,,3,0.030162,0.690779,3.322444,3.338372,0.875,0.701393,0.879195,0.732574,0.032283,0.100692,232.244427,233.837162
3,,4,0.040081,0.669761,3.254639,3.31765,0.857143,0.679152,0.873737,0.719353,0.032283,0.132975,225.463929,231.764998
4,,5,0.05,0.652407,3.254639,3.30515,0.857143,0.660002,0.870445,0.707579,0.032283,0.165257,225.463929,230.514988
5,,6,0.1,0.576239,2.428901,2.867025,0.639676,0.614667,0.755061,0.661123,0.121445,0.286703,142.890085,186.702537
6,,7,0.15,0.499489,2.2598,2.664617,0.595142,0.534855,0.701754,0.619034,0.11299,0.399693,125.980015,166.461696
7,,8,0.2,0.454467,2.213682,2.551883,0.582996,0.475955,0.672065,0.583264,0.110684,0.510377,121.368178,155.188317
8,,9,0.3,0.358512,1.683321,2.262362,0.44332,0.406523,0.595816,0.52435,0.168332,0.678709,68.332052,126.236229
9,,10,0.4,0.273054,1.1299,1.979247,0.297571,0.310915,0.521255,0.470991,0.11299,0.791699,12.990008,97.924673




Cross-Validation Metrics Summary: 


Unnamed: 0,Unnamed: 1,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
0,accuracy,0.7723251,0.04325697,0.7918288,0.76138616,0.7074414,0.7761044,0.82486486
1,auc,0.84755987,0.026824925,0.86246127,0.85517263,0.8026424,0.8459208,0.8716023
2,aucpr,0.6609391,0.030466506,0.68222964,0.6704987,0.60706556,0.6718318,0.6730697
3,err,0.22767489,0.04325697,0.2081712,0.23861386,0.2925586,0.22389558,0.17513514
4,err_count,225.4,45.25815,214.0,241.0,287.0,223.0,162.0
5,f0point5,0.58638626,0.044929203,0.60385066,0.57180154,0.5187032,0.5978584,0.6397174
6,f1,0.6414569,0.029619418,0.65923566,0.64506626,0.59174967,0.6432,0.66803277
7,f2,0.70987344,0.021861475,0.7258065,0.7398649,0.68874174,0.69598335,0.69897085
8,lift_top_group,3.2938256,0.34199086,3.5133288,2.7511065,3.64684,3.2835164,3.2743363
9,logloss,0.42480248,0.027237814,0.42244083,0.41080758,0.469063,0.42530462,0.39639628



See the whole table with table.as_data_frame()

Scoring History: 


Unnamed: 0,Unnamed: 1,timestamp,duration,number_of_trees,training_rmse,training_logloss,training_auc,training_pr_auc,training_lift,training_classification_error,validation_rmse,validation_logloss,validation_auc,validation_pr_auc,validation_lift,validation_classification_error
0,,2020-02-27 09:11:09,17.406 sec,0.0,0.440456,0.576542,0.5,0.0,1.0,0.73664,0.452452,0.599821,0.5,0.0,1.0,0.714008
1,,2020-02-27 09:11:09,17.412 sec,1.0,0.437835,0.570633,0.838648,0.357904,2.760435,0.229555,0.449872,0.593849,0.828201,0.381618,2.445174,0.259728
2,,2020-02-27 09:11:09,17.415 sec,2.0,0.435273,0.564942,0.843474,0.532402,3.184647,0.232186,0.447426,0.588266,0.829341,0.535725,2.678246,0.227626
3,,2020-02-27 09:11:09,17.419 sec,3.0,0.432826,0.559569,0.848372,0.537447,3.184647,0.225709,0.445125,0.583075,0.82961,0.535781,2.678246,0.231518
4,,2020-02-27 09:11:09,17.423 sec,4.0,0.430473,0.554462,0.85216,0.567776,3.274128,0.218623,0.443007,0.578349,0.829615,0.553184,2.646075,0.244163
5,,2020-02-27 09:11:09,17.428 sec,5.0,0.428212,0.549614,0.853636,0.570362,3.274128,0.217004,0.440815,0.573508,0.83179,0.554396,2.646075,0.251946
6,,2020-02-27 09:11:09,17.433 sec,6.0,0.425934,0.54476,0.853878,0.634854,3.458054,0.222065,0.43862,0.568693,0.833187,0.614582,2.646075,0.243191
7,,2020-02-27 09:11:09,17.438 sec,7.0,0.423747,0.540149,0.854421,0.635443,3.458054,0.208704,0.436511,0.564127,0.835124,0.620243,2.646075,0.222763
8,,2020-02-27 09:11:09,17.446 sec,8.0,0.421682,0.535808,0.854635,0.642294,3.458054,0.208704,0.434588,0.559959,0.835094,0.62846,2.680726,0.217899
9,,2020-02-27 09:11:09,17.453 sec,9.0,0.419724,0.53173,0.855391,0.643587,3.458054,0.20668,0.43277,0.556068,0.834244,0.627656,2.680726,0.216926



See the whole table with table.as_data_frame()

Variable Importances: 


Unnamed: 0,variable,relative_importance,scaled_importance,percentage
0,Contract,3019.020752,1.0,0.437553
1,tenure,783.966248,0.259676,0.113622
2,OnlineSecurity,655.65387,0.217174,0.095025
3,InternetService,545.52887,0.180697,0.079065
4,TotalCharges,464.130981,0.153736,0.067267
5,MonthlyCharges,456.012299,0.151046,0.066091
6,TechSupport,391.279266,0.129605,0.056709
7,PaymentMethod,235.543549,0.07802,0.034138
8,PaperlessBilling,108.492752,0.035936,0.015724
9,MultipleLines,67.674408,0.022416,0.009808


<bound method ModelBase.get_xval_models of >

In [41]:
best_gbm.model_performance(train_df)


ModelMetricsBinomial: gbm
** Reported on test data. **

MSE: 0.12936928840434475
RMSE: 0.35967942449401347
LogLoss: 0.40579415312285455
Mean Per-Class Error: 0.21089491056724075
AUC: 0.8685273699242915
AUCPR: 0.7018662908739439
Gini: 0.7370547398485829

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.3286812504133001: 


Unnamed: 0,Unnamed: 1,No,Yes,Error,Rate
0,No,2989.0,650.0,0.1786,(650.0/3639.0)
1,Yes,342.0,959.0,0.2629,(342.0/1301.0)
2,Total,3331.0,1609.0,0.2008,(992.0/4940.0)



Maximum Metrics: Maximum metrics at their respective thresholds


Unnamed: 0,metric,threshold,value,idx
0,max f1,0.328681,0.659107,205.0
1,max f2,0.183352,0.768689,296.0
2,max f0point5,0.48868,0.662613,122.0
3,max accuracy,0.447636,0.818623,144.0
4,max precision,0.778001,1.0,0.0
5,max recall,0.056283,1.0,398.0
6,max specificity,0.778001,1.0,0.0
7,max absolute_mcc,0.353109,0.525707,194.0
8,max min_per_class_accuracy,0.292733,0.783732,226.0
9,max mean_per_class_accuracy,0.278142,0.789105,235.0



Gains/Lift Table: Avg response rate: 26.34 %, avg score: 26.30 %


Unnamed: 0,Unnamed: 1,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
0,,1,0.010121,0.732607,3.645196,3.645196,0.96,0.751215,0.96,0.751215,0.036895,0.036895,264.5196,264.5196
1,,2,0.02004,0.711713,3.254639,3.45189,0.857143,0.723743,0.909091,0.737618,0.032283,0.069178,225.463929,245.189015
2,,3,0.030162,0.692228,3.34143,3.414823,0.88,0.701149,0.899329,0.72538,0.03382,0.102998,234.142967,241.482288
3,,4,0.040486,0.676076,3.424817,3.417371,0.901961,0.684574,0.9,0.714974,0.035357,0.138355,242.481651,241.737125
4,,5,0.05,0.655844,3.312346,3.397387,0.87234,0.66437,0.894737,0.705345,0.031514,0.169869,231.234566,239.738663
5,,6,0.1,0.573165,2.613374,3.00538,0.688259,0.61621,0.791498,0.660778,0.130669,0.300538,161.337433,200.538048
6,,7,0.150202,0.510723,2.465039,2.824781,0.649194,0.537898,0.743935,0.619707,0.123751,0.424289,146.50393,182.478127
7,,8,0.2,0.461301,2.145504,2.65565,0.565041,0.486245,0.699393,0.586477,0.106841,0.53113,114.550408,165.56495
8,,9,0.3,0.361432,1.70638,2.339226,0.449393,0.412026,0.616059,0.528327,0.170638,0.701768,70.637971,133.922624
9,,10,0.4,0.268699,1.21445,2.058032,0.319838,0.308134,0.542004,0.473278,0.121445,0.823213,21.445042,105.803228







In [43]:
best_gbm.model_performance(test_df)


ModelMetricsBinomial: gbm
** Reported on test data. **

MSE: 0.1443067351040791
RMSE: 0.3798772632102099
LogLoss: 0.4430866826526678
Mean Per-Class Error: 0.22190865446996244
AUC: 0.8440077665943763
AUCPR: 0.6585186508732533
Gini: 0.6880155331887525

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.29997889409404527: 


Unnamed: 0,Unnamed: 1,No,Yes,Error,Rate
0,No,583.0,151.0,0.2057,(151.0/734.0)
1,Yes,70.0,224.0,0.2381,(70.0/294.0)
2,Total,653.0,375.0,0.215,(221.0/1028.0)



Maximum Metrics: Maximum metrics at their respective thresholds


Unnamed: 0,metric,threshold,value,idx
0,max f1,0.299979,0.669656,204.0
1,max f2,0.171175,0.763771,296.0
2,max f0point5,0.404115,0.651079,153.0
3,max accuracy,0.404115,0.799611,153.0
4,max precision,0.769998,1.0,0.0
5,max recall,0.057649,1.0,395.0
6,max specificity,0.769998,1.0,0.0
7,max absolute_mcc,0.299979,0.522116,204.0
8,max min_per_class_accuracy,0.292173,0.772109,210.0
9,max mean_per_class_accuracy,0.299979,0.778091,204.0



Gains/Lift Table: Avg response rate: 28.60 %, avg score: 26.76 %


Unnamed: 0,Unnamed: 1,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
0,,1,0.0107,0.725599,3.178726,3.178726,0.909091,0.738688,0.909091,0.738688,0.034014,0.034014,217.872604,217.872604
1,,2,0.020428,0.699051,2.797279,2.997085,0.8,0.708696,0.857143,0.724406,0.027211,0.061224,179.727891,199.708455
2,,3,0.030156,0.688819,2.797279,2.932631,0.8,0.693133,0.83871,0.714318,0.027211,0.088435,179.727891,193.263112
3,,4,0.040856,0.670557,2.542981,2.83058,0.727273,0.680301,0.809524,0.705409,0.027211,0.115646,154.298083,183.057985
4,,5,0.050584,0.650966,2.447619,2.756934,0.7,0.660167,0.788462,0.696709,0.02381,0.139456,144.761905,175.693354
5,,6,0.100195,0.583702,2.67387,2.715805,0.764706,0.61542,0.776699,0.656459,0.132653,0.272109,167.386955,171.580477
6,,7,0.150778,0.517761,2.084511,2.504016,0.596154,0.54808,0.716129,0.620099,0.105442,0.377551,108.451073,150.40158
7,,8,0.200389,0.46614,2.056823,2.393303,0.588235,0.492865,0.684466,0.5886,0.102041,0.479592,105.682273,139.330295
8,,9,0.300584,0.36945,1.697378,2.161328,0.485437,0.419485,0.618123,0.532228,0.170068,0.64966,69.737798,116.132796
9,,10,0.399805,0.27515,1.439776,1.982257,0.411765,0.314627,0.56691,0.478225,0.142857,0.792517,43.977591,98.225665







### performance compare

In [None]:
### random forest
### gradient boosting model (default model)
### gradient boosting model (with GridSearchCV), then select the best based on auc

In [None]:
print("random forest on test set")
print("auc: {}".format(rf.model_performance(test_df).auc()))
print("aucpr: {}".format(rf.model_performance(test_df).aucpr()))
print("accuracy: {}".format(rf.model_performance(test_df).accuracy()[0][1]))
print(rf.model_performance(test_df).confusion_matrix())

print("GBM on test set")
print("auc: {}".format(gbm.model_performance(test_df).auc()))
print("aucpr: {}".format(gbm.model_performance(test_df).aucpr()))
print("accuracy: {}".format(gbm.model_performance(test_df).accuracy()[0][1]))
print(gbm.model_performance(test_df).confusion_matrix())

print("GridSearchCV GBM on test set")
print("auc: {}".format(best_gbm.model_performance(test_df).auc()))
print("aucpr: {}".format(best_gbm.model_performance(test_df).aucpr()))
print("accuracy: {}".format(best_gbm.model_performance(test_df).accuracy()[0][1]))
print(best_gbm.model_performance(test_df).confusion_matrix())

random forest on test set
auc: 0.8256733210995569


# coclusion
- GBM model has better performance over random forest on this churn anaylsis data, with higher AUC and AUCPR score.
- By using Grid Search CV, we can mitigate overfitting problem.
- Note that the GridSearchCV GBM model has lower AUCPR score than h2o default GBM model. There should be some way to tackle that or we can justify which case is prioritized, to avoid false negative or false positive.