# Lending Club Analysis Using AutoML

## Setup H2O Cluster

In [None]:
import h2o

In [None]:
h2o.init(max_mem_size = "6g")

## Import data and Manage Data Types

This exploration of H2O will use a version of the Lending Club Loan Data that can be found on [Kaggle](https://www.kaggle.com/wendykan/lending-club-loan-data). This data consists of 15 variables:

|     | Column Name | Description |
| --- | ----------- | ----------- |
|   1 | loan_amnt   | Requested loan amount (US dollars) |
|   2 | term        | Loan term length (months) |
|   3 | int_rate    | Recommended interest rate |
|   4 | emp_length  | Employment length (years) |
|   5 | home_ownership| Housing status |
|   6 | annual_inc  | Annual income (US dollars) |
|   7 | purpose     | Purpose for the loan |
|   8 | addr_state  | State of residence |
|   9 | dti         | Debt to income ratio |
|  10 | delinq_2yrs | Number of delinquencies in the past 2 years |
|  11 | revol_util  | Percent of revolving credit line utilized |
|  12 | total_acc   | Number of active accounts |
|  13 | bad_loan    | Bad loan indicator |
|  14 | longest_credit_length | Age of oldest active account |
|  15 | verification_status | Income verification status |

In [None]:
# https://s3-us-west-2.amazonaws.com/h2o-tutorials/data/topics/data/automl/loan.csv
train = h2o.import_file("../../data/topics/automl/loan.csv")
train["bad_loan"] = train["bad_loan"].asfactor()

In [None]:
train.describe()

## Train Models Using H2O's AutoML

In [None]:
# Set target and predictor variables
y = "bad_loan"
x = train.col_names
x.remove(y)
x.remove("int_rate")

# Use Auto ML to train models
from h2o.automl import H2OAutoML
aml = H2OAutoML(max_models = 6, exclude_algos = ['DeepLearning'])
aml.train(x = x, y = y, training_frame = train)

In [None]:
print(aml.leaderboard)

## Examine the Best Model

In [None]:
best_model = h2o.get_model(aml.leaderboard[2,'model_id'])

In [None]:
%matplotlib inline
best_model.varimp_plot()

## Shutdown H2O Cluster

In [None]:
h2o.cluster().shutdown()