# Credit Card fraud detection - Using H2O AutoML

#### We explored using the popular H2O.ai platform, specifically the AutoML package 
#### This is the 1st of 2 notebooks on H2O
The package runs the dataset through various ML models including ensembles to train and test. The output is then shown on a leaderboard scoring different model performances.
The package is available to run as a service on Amazon AWS Market place with a lot of whistles and bells but we went with pip installing H2O on our AWS instance and then running it on a standard jupyter notebook.

This notebook shows the call to AutoML with max_models=10 and oversampling the data using H2O provided parameters

However as part of hyper paramter tuning, we ran with both without oversampling, as well as, with oversampling the unbalanced Class.

We also show in a second notebook the results and the Model leader board with max_models = 5 and unbalanced data

The results of all combinations of model hypertuning have been included in the paper writeup for our project

One of the drawbacks of using AutoML that is it both time and resource intensive as it runs through multiple models as part of the algorithm

Reference for notebook: H20 Documentation from http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science.html#
http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/max_after_balance_size.html


### Importing and initiating H2O instance

In [1]:
import h2o
from h2o.automl import H2OAutoML

h2o.init()

Checking whether there is an H2O instance running at http://localhost:54321. connected.


0,1
H2O cluster uptime:,1 hour 15 mins
H2O cluster timezone:,Etc/UTC
H2O data parsing timezone:,UTC
H2O cluster version:,3.22.0.2
H2O cluster version age:,27 days
H2O cluster name:,H2O_from_python_ubuntu_00mwx9
H2O cluster total nodes:,1
H2O cluster free memory:,12.47 Gb
H2O cluster total cores:,4
H2O cluster allowed cores:,4


### Importing and preparing the dataset

#### Loading the data

We used the Kaggle dataset on Credit card fraud ref:[Kaggle](https://www.kaggle.com/dalpozz/creditcardfraud). 
It contains data about credit card transactions from European credit card holders that occurred during a period of two days in Sep 2013, with 492 frauds out of 284,807 transactions. The frauds account for 0.172% of the total transactions.

Kaggle describes the data to have 30 features that are numberical values from a transformed dataset using PCA transformation(s) in a reduced feature dimension space due to privacy reasons. 

The two features that haven't been changed are Time and Amount. Time contains the seconds elapsed between each transaction and the first transaction in the dataset.

Label 'Class' is the target class lable with 1 representing the fraud case and 0 representing the normal case

In [2]:
credit = h2o.import_file("creditcard.csv")

Parse progress: |█████████████████████████████████████████████████████████| 100%


In [3]:
# For binary classification, response should be a factor
credit['Class'] = credit['Class'].asfactor()

credit= credit.drop(['Time'], axis=1)
credit.head()

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
-1.35981,-0.0727812,2.53635,1.37816,-0.338321,0.462388,0.239599,0.0986979,0.363787,0.0907942,-0.5516,-0.617801,-0.99139,-0.311169,1.46818,-0.470401,0.207971,0.0257906,0.403993,0.251412,-0.0183068,0.277838,-0.110474,0.0669281,0.128539,-0.189115,0.133558,-0.0210531,149.62,0
1.19186,0.266151,0.16648,0.448154,0.0600176,-0.0823608,-0.078803,0.0851017,-0.255425,-0.166974,1.61273,1.06524,0.489095,-0.143772,0.635558,0.463917,-0.114805,-0.183361,-0.145783,-0.0690831,-0.225775,-0.638672,0.101288,-0.339846,0.16717,0.125895,-0.0089831,0.0147242,2.69,0
-1.35835,-1.34016,1.77321,0.37978,-0.503198,1.8005,0.791461,0.247676,-1.51465,0.207643,0.624501,0.0660837,0.717293,-0.165946,2.34586,-2.89008,1.10997,-0.121359,-2.26186,0.52498,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.0553528,-0.0597518,378.66,0
-0.966272,-0.185226,1.79299,-0.863291,-0.0103089,1.2472,0.237609,0.377436,-1.38702,-0.0549519,-0.226487,0.178228,0.507757,-0.287924,-0.631418,-1.05965,-0.684093,1.96578,-1.23262,-0.208038,-0.1083,0.0052736,-0.190321,-1.17558,0.647376,-0.221929,0.0627228,0.0614576,123.5,0
-1.15823,0.877737,1.54872,0.403034,-0.407193,0.0959215,0.592941,-0.270533,0.817739,0.753074,-0.822843,0.538196,1.34585,-1.11967,0.175121,-0.451449,-0.237033,-0.0381948,0.803487,0.408542,-0.0094307,0.798278,-0.137458,0.141267,-0.20601,0.502292,0.219422,0.215153,69.99,0
-0.425966,0.960523,1.14111,-0.168252,0.420987,-0.0297276,0.476201,0.260314,-0.568671,-0.371407,1.34126,0.359894,-0.358091,-0.137134,0.517617,0.401726,-0.0581328,0.0686531,-0.0331938,0.0849677,-0.208254,-0.559825,-0.0263977,-0.371427,-0.232794,0.105915,0.253844,0.0810803,3.67,0
1.22966,0.141004,0.0453708,1.20261,0.191881,0.272708,-0.005159,0.0812129,0.46496,-0.0992543,-1.41691,-0.153826,-0.751063,0.167372,0.0501436,-0.443587,0.00282051,-0.611987,-0.045575,-0.219633,-0.167716,-0.27071,-0.154104,-0.780055,0.750137,-0.257237,0.0345074,0.00516777,4.99,0
-0.644269,1.41796,1.07438,-0.492199,0.948934,0.428118,1.12063,-3.80786,0.615375,1.24938,-0.619468,0.291474,1.75796,-1.32387,0.686133,-0.076127,-1.22213,-0.358222,0.324505,-0.156742,1.94347,-1.01545,0.0575035,-0.649709,-0.415267,-0.0516343,-1.20692,-1.08534,40.8,0
-0.894286,0.286157,-0.113192,-0.271526,2.6696,3.72182,0.370145,0.851084,-0.392048,-0.41043,-0.705117,-0.110452,-0.286254,0.0743554,-0.328783,-0.210077,-0.499768,0.118765,0.570328,0.0527357,-0.0734251,-0.268092,-0.204233,1.01159,0.373205,-0.384157,0.0117474,0.142404,93.2,0
-0.338262,1.11959,1.04437,-0.222187,0.499361,-0.246761,0.651583,0.0695386,-0.736727,-0.366846,1.01761,0.83639,1.00684,-0.443523,0.150219,0.739453,-0.54098,0.476677,0.451773,0.203711,-0.246914,-0.633753,-0.120794,-0.38505,-0.069733,0.0941988,0.246219,0.0830756,3.68,0




In [4]:
# set the predictor names and the response column name
predictors = credit.columns[0:28]
response = 'Class'

In [5]:
# split into train and validation sets
train, valid = credit.split_frame(ratios = [.7], seed = 1234)

### Calling the AutoML package (using over sampling to balance Class) 

In [6]:
# Run AutoML for 5,10,20 base models (limited to 1 hour max runtime by default)
aml = H2OAutoML(max_models=10, seed=1234, balance_classes = True, max_after_balance_size = 0.85)
aml.train(x = predictors, y = response, training_frame = train, validation_frame = valid)

AutoML progress: |████████████████████████████████████████████████████████| 100%


### Model leader board shows XGBoost as the top performing techinque

In [7]:
# View the AutoML Leaderboard
lb = aml.leaderboard
lb.head(rows=lb.nrows)  # Print all rows instead of default (10 rows)


model_id,auc,logloss,mean_per_class_error,rmse,mse
XGBoost_1_AutoML_20181220_005816,0.981113,0.00249092,0.080723,0.0198301,0.000393234
GLM_grid_1_AutoML_20181220_005816_model_1,0.978821,0.00398823,0.0999025,0.0261906,0.000685946
XGBoost_2_AutoML_20181220_005816,0.972787,0.00289851,0.0895256,0.0205521,0.000422387
DRF_1_AutoML_20181220_005816,0.952372,0.0108481,0.0939194,0.0355181,0.00126154
XRT_1_AutoML_20181220_005816,0.947599,0.0113933,0.0968495,0.0359485,0.00129229
StackedEnsemble_BestOfFamily_AutoML_20181220_005816,0.947481,0.00314425,0.101216,0.0208257,0.000433709
StackedEnsemble_AllModels_AutoML_20181220_005816,0.946547,0.00316667,0.102682,0.0209589,0.000439276




### Details on performance metrics of the top model selected

In [8]:
# The leader model is stored here
aml.leader

Model Details
H2OXGBoostEstimator :  XGBoost
Model Key:  XGBoost_1_AutoML_20181220_005816


ModelMetricsBinomial: xgboost
** Reported on train data. **

MSE: 0.00018621979612176758
RMSE: 0.013646237434610596
LogLoss: 0.0009447178048844039
Mean Per-Class Error: 0.004578774041536837
AUC: 0.99980426010319
pr_auc: 0.94372678763981
Gini: 0.99960852020638
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.3811877965927124: 


0,1,2,3,4
,0.0,1.0,Error,Rate
0,199240.0,4.0,0.0,(4.0/199244.0)
1,35.0,306.0,0.1026,(35.0/341.0)
Total,199275.0,310.0,0.0002,(39.0/199585.0)


Maximum Metrics: Maximum metrics at their respective thresholds



0,1,2,3
metric,threshold,value,idx
max f1,0.3811878,0.9400922,133.0
max f2,0.0482570,0.9222474,197.0
max f0point5,0.5384995,0.9715946,125.0
max accuracy,0.4396348,0.9998046,129.0
max precision,0.9994673,1.0,0.0
max recall,0.0011217,1.0,343.0
max specificity,0.9994673,1.0,0.0
max absolute_mcc,0.3811878,0.9410656,133.0
max min_per_class_accuracy,0.0063788,0.9941349,285.0


Gains/Lift Table: Avg response rate:  0.17 %, avg score:  0.17 %



0,1,2,3,4,5,6,7,8,9,10,11,12,13
,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
,1,0.0100008,0.0030638,99.4060188,99.4060188,0.1698397,0.1573605,0.1698397,0.1573605,0.9941349,0.9941349,9840.6018782,9840.6018782
,2,0.0200015,0.0015514,0.2932331,49.8496259,0.0005010,0.0021304,0.0851703,0.0797455,0.0029326,0.9970674,-70.6766906,4884.9625938
,3,0.0300023,0.0010750,0.2932331,33.3308283,0.0005010,0.0012832,0.0569472,0.0535914,0.0029326,1.0,-70.6766906,3233.0828323
,4,0.0400030,0.0008181,0.0,24.9981212,0.0,0.0009366,0.0427104,0.0404277,0.0,1.0,-100.0,2399.8121242
,5,0.0500038,0.0006551,0.0,19.9984970,0.0,0.0007298,0.0341683,0.0324881,0.0,1.0,-100.0,1899.8496994
,6,0.1000025,0.0003154,0.0,9.9997495,0.0,0.0004474,0.0170850,0.0164685,0.0,1.0,-100.0,899.9749486
,7,0.1500013,0.0002048,0.0,6.6666110,0.0,0.0002530,0.0113902,0.0110635,0.0,1.0,-100.0,566.6610996
,8,0.2,0.0001501,0.0,5.0,0.0,0.0001746,0.0085427,0.0083414,0.0,1.0,-100.0,400.0
,9,0.3000025,0.0000979,0.0,3.3333055,0.0,0.0001199,0.0056951,0.0056009,0.0,1.0,-100.0,233.3305498




ModelMetricsBinomial: xgboost
** Reported on validation data. **

MSE: 0.0004335236941845021
RMSE: 0.020821231812371287
LogLoss: 0.002781479354784432
Mean Per-Class Error: 0.05722500901272887
AUC: 0.9773056335257477
pr_auc: 0.8308404984430413
Gini: 0.9546112670514955
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.5808205604553223: 


0,1,2,3,4
,0.0,1.0,Error,Rate
0,85065.0,6.0,0.0001,(6.0/85071.0)
1,32.0,119.0,0.2119,(32.0/151.0)
Total,85097.0,125.0,0.0004,(38.0/85222.0)


Maximum Metrics: Maximum metrics at their respective thresholds



0,1,2,3
metric,threshold,value,idx
max f1,0.5808206,0.8623188,92.0
max f2,0.1881070,0.8333333,107.0
max f0point5,0.7167526,0.9149278,85.0
max accuracy,0.5808206,0.9995541,92.0
max precision,0.9992800,1.0,0.0
max recall,0.0000269,1.0,395.0
max specificity,0.9992800,1.0,0.0
max absolute_mcc,0.5808206,0.8659608,92.0
max min_per_class_accuracy,0.0004542,0.9271523,346.0


Gains/Lift Table: Avg response rate:  0.18 %, avg score:  0.17 %



0,1,2,3,4,5,6,7,8,9,10,11,12,13
,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
,1,0.0100092,0.0031468,87.9989286,87.9989286,0.1559203,0.1540593,0.1559203,0.1540593,0.8807947,0.8807947,8699.8928596,8699.8928596
,2,0.0200066,0.0015552,1.9872680,45.0183216,0.0035211,0.0021507,0.0797654,0.0781495,0.0198675,0.9006623,98.7267979,4401.8321648
,3,0.0300275,0.0010557,0.6608713,30.2151710,0.0011710,0.0012673,0.0535365,0.0524921,0.0066225,0.9072848,-33.9128682,2921.5170972
,4,0.0400014,0.0007979,0.6639813,22.8468779,0.0011765,0.0009156,0.0404811,0.0396320,0.0066225,0.9139073,-33.6018699,2184.6877859
,5,0.0500106,0.0006411,0.0,18.2742859,0.0,0.0007112,0.0323792,0.0318424,0.0,0.9139073,-100.0,1727.4285927
,6,0.1000094,0.0003167,0.6622672,9.4693098,0.0011734,0.0004421,0.0167781,0.0161441,0.0331126,0.9470199,-33.7732802,846.9309768
,7,0.1500082,0.0002025,0.2649069,6.4014155,0.0004694,0.0002509,0.0113423,0.0108468,0.0132450,0.9602649,-73.5093121,540.1415470
,8,0.2000070,0.0001493,0.1324534,4.8342669,0.0002347,0.0001734,0.0085656,0.0081786,0.0066225,0.9668874,-86.7546560,383.4266909
,9,0.3000047,0.0000970,0.1324534,3.2670571,0.0002347,0.0001194,0.0057887,0.0054923,0.0132450,0.9801325,-86.7546560,226.7057053




ModelMetricsBinomial: xgboost
** Reported on cross-validation data. **

MSE: 0.00039323390456606956
RMSE: 0.019830126186337532
LogLoss: 0.0024909247724580075
Mean Per-Class Error: 0.057368546654742
AUC: 0.9811133003574627
pr_auc: 0.8483044372178924
Gini: 0.9622266007149254
Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.16402749717235565: 


0,1,2,3,4
,0.0,1.0,Error,Rate
0,199213.0,31.0,0.0002,(31.0/199244.0)
1,55.0,286.0,0.1613,(55.0/341.0)
Total,199268.0,317.0,0.0004,(86.0/199585.0)


Maximum Metrics: Maximum metrics at their respective thresholds



0,1,2,3
metric,threshold,value,idx
max f1,0.1640275,0.8693009,161.0
max f2,0.1640275,0.8506841,161.0
max f0point5,0.6651369,0.9134948,123.0
max accuracy,0.1640275,0.9995691,161.0
max precision,0.9994099,1.0,0.0
max recall,0.0000482,1.0,393.0
max specificity,0.9994099,1.0,0.0
max absolute_mcc,0.1640275,0.8696660,161.0
max min_per_class_accuracy,0.0004630,0.9266862,364.0


Gains/Lift Table: Avg response rate:  0.17 %, avg score:  0.16 %



0,1,2,3,4,5,6,7,8,9,10,11,12,13
,group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain
,1,0.0100008,0.0030416,88.8496274,88.8496274,0.1518036,0.1488374,0.1518036,0.1488374,0.8885630,0.8885630,8784.9627407,8784.9627407
,2,0.0200015,0.0015702,1.4661655,45.1578964,0.0025050,0.0021461,0.0771543,0.0754918,0.0146628,0.9032258,46.6165469,4415.7896438
,3,0.0300023,0.0010534,0.2932331,30.2030087,0.0005010,0.0012760,0.0516032,0.0507532,0.0029326,0.9061584,-70.6766906,2920.3008657
,4,0.0400030,0.0007855,1.1729324,22.9454896,0.0020040,0.0009046,0.0392034,0.0382910,0.0117302,0.9178886,17.2932375,2194.5489586
,5,0.0500038,0.0006311,0.2932331,18.4150383,0.0005010,0.0007042,0.0314629,0.0307737,0.0029326,0.9208211,-70.6766906,1741.5038288
,6,0.1000025,0.0003142,0.3519150,9.3839291,0.0006013,0.0004377,0.0160329,0.0156064,0.0175953,0.9384164,-64.8085025,838.3929137
,7,0.1500013,0.0002086,0.2932625,6.3538081,0.0005011,0.0002538,0.0108558,0.0104890,0.0146628,0.9530792,-70.6737521,535.3808134
,8,0.2,0.0001562,0.1759575,4.8093842,0.0003006,0.0001797,0.0082171,0.0079118,0.0087977,0.9618768,-82.4042513,380.9384164
,9,0.3000025,0.0001040,0.1759487,3.2648799,0.0003006,0.0001267,0.0055782,0.0053167,0.0175953,0.9794721,-82.4051329,226.4879872



Cross-Validation Metrics Summary: 


0,1,2,3,4,5,6,7
,mean,sd,cv_1_valid,cv_2_valid,cv_3_valid,cv_4_valid,cv_5_valid
accuracy,0.9996042,0.0000685,0.9996242,0.9997745,0.9995992,0.9994990,0.999524
auc,0.9807777,0.0048477,0.9742219,0.9894251,0.9855295,0.9713875,0.9833243
err,0.0003958,0.0000685,0.0003758,0.0002255,0.0004008,0.0005010,0.0004760
err_count,15.8,2.734959,15.0,9.0,16.0,20.0,19.0
f0point5,0.9091428,0.0167388,0.9189189,0.9288538,0.9347181,0.8730159,0.8902077
f1,0.8757464,0.0244580,0.9006622,0.9126214,0.8873239,0.8148148,0.8633093
f2,0.8452890,0.0328319,0.8831169,0.8969466,0.844504,0.7638889,0.8379889
lift_top_group,88.22055,2.7679935,87.83878,92.260994,92.016464,81.49721,87.48932
logloss,0.0024909,0.0003606,0.0026775,0.0015771,0.0023397,0.0030343,0.0028261


Scoring History: 


0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
,timestamp,duration,number_of_trees,training_rmse,training_logloss,training_auc,training_pr_auc,training_lift,training_classification_error,validation_rmse,validation_logloss,validation_auc,validation_pr_auc,validation_lift,validation_classification_error
,2018-12-20 01:34:48,10 min 30.296 sec,0.0,0.5,0.6931472,0.5,0.0,1.0,0.9982915,0.5,0.6931472,0.5,0.0,1.0,0.9982282
,2018-12-20 01:34:49,10 min 31.247 sec,5.0,0.3876166,0.4903885,0.9747050,0.8659121,84.9002809,0.0004008,0.3876616,0.4904597,0.9602148,0.8081362,85.2147895,0.0005280
,2018-12-20 01:34:49,10 min 31.495 sec,10.0,0.3011344,0.3582482,0.9812913,0.8722424,91.3057478,0.0003808,0.3012355,0.3583784,0.9656437,0.8076769,86.4728864,0.0005280
,2018-12-20 01:34:49,10 min 31.764 sec,15.0,0.2343247,0.2668622,0.9834820,0.8733213,90.7861358,0.0003958,0.2344763,0.2670238,0.9646699,0.8172461,86.6756364,0.0004928
,2018-12-20 01:34:50,10 min 32.056 sec,20.0,0.1826081,0.2013652,0.9835819,0.8758566,91.6442165,0.0003808,0.1828368,0.2015659,0.9648599,0.8183741,86.6756364,0.0004928
---,---,---,---,---,---,---,---,---,---,---,---,---,---,---,---
,2018-12-20 01:35:24,11 min 6.491 sec,215.0,0.0142013,0.0010762,0.9996239,0.9327175,99.4060188,0.0002104,0.0208645,0.0027736,0.9789639,0.8279657,87.9989286,0.0004576
,2018-12-20 01:35:26,11 min 8.014 sec,220.0,0.0140882,0.0010428,0.9997046,0.9355741,99.4060188,0.0002104,0.0208341,0.0027718,0.9786662,0.8294190,87.9989286,0.0004459
,2018-12-20 01:35:27,11 min 9.631 sec,225.0,0.0139476,0.0010087,0.9997438,0.9356930,99.4060188,0.0002054,0.0208269,0.0027793,0.9775159,0.8297141,87.9989286,0.0004459



See the whole table with table.as_data_frame()
Variable Importances: 


0,1,2,3
variable,relative_importance,scaled_importance,percentage
V4,341.0,1.0,0.0913964
V14,312.0,0.9149560,0.0836237
V7,205.0,0.6011730,0.0549451
V12,183.0,0.5366569,0.0490485
V17,179.0,0.5249267,0.0479764
---,---,---,---
V5,90.0,0.2639296,0.0241222
V9,85.0,0.2492669,0.0227821
V2,84.0,0.2463343,0.0225141



See the whole table with table.as_data_frame()




### Predictions

In [10]:
# If you need to generate predictions on a test set, you can make
# predictions directly on the `"H2OAutoML"` object, or on the leader
# model object directly

preds = aml.predict(valid)

xgboost prediction progress: |████████████████████████████████████████████| 100%


In [11]:
# or:
preds = aml.leader.predict(valid)

xgboost prediction progress: |████████████████████████████████████████████| 100%
