# Gradient Boosted Trees and AutoML

Last updated: Jul 13th 2021

This Gradient Notebook is part of the project *Gradient Boosted Trees and AutoML* at https://github.com/gradient-ai/Classical-ML-Example .

Business and other problems not amenable to deep learning are often best solved by using well-tuned Gradient-boosted decision trees. These methods are, like deep learning, capable of solving arbitrarily complex problems via nonlinear mappings, but can do so without requiring the large training sets and compute-intensive processing that deep learning sometimes can.

This project shows that such methods are supported on Gradient by demonstrating training of **gradient-boosted decision trees** (GBT) using the well-known open source machine learning (ML) library H2O.

We also show H2O's **automated machine learning** (AutoML) capability that can search the model hyperparameter tuning space. This can both save the user time required to so do manually, and produce better results by finding hyperparameter combinations that the user may miss. AutoML used in this way can surpass even expert human data scientists in some situations.

H2O's AutoML includes within it another well-known GBT library, **XGBoost**.

This project does not aim to show extensive model tuning, large datasets, or specific business problems, but to show the **end-to-end** combination of data preparation, model training, and deployment to production of the H2O model that is enabled within Gradient. We therefore show the commonly used [Census Income Dataset](https://archive.ics.uci.edu/ml/datasets/census+income) from the UCI ML repository.

In [None]:
# Copied from gbm_in_h2o.ipynb and modified
# Try on income dataset since know expected performance gini ~ 0.86
# http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html (also has example)
#
# Jun 14th 2019
# Jun 17th 2021: Run on Gradient
# Jun 21st 2021: Add save model as MOJO
# Jul 13th 2021: Polish text to make shareable

## Setup
This Notebook runs on the Gradient container `tensorflow/tensorflow:2.4.1-gpu-jupyter`, and requires the installation of H2O, and hence Java.

In [None]:
# Install H2O
!pip install h2o==3.32.1.3

You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.[0m


In [None]:
# Install Java using https://pypi.org/project/install-jdk/
!pip install install-jdk

You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.[0m


In [None]:
# This may show an error if jdk is already installed from a previous run of the notebook,
# but it is OK to proceed

import jdk

jdk.install('11', jre=True)

StopIteration: 

Add the Java to the path so that H2O can see it.

In [None]:
import os
import subprocess

os.environ['PATH'] = "/root/.jre/jdk-11.0.11+9-jre/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
subprocess.run('echo $PATH', shell=True, check=True, stdout=subprocess.PIPE, universal_newlines=True)

CompletedProcess(args='echo $PATH', returncode=0, stdout='/root/.jre/jdk-11.0.11+9-jre/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin\n')

H2O runs as a server, so we start this up.

In [None]:
import h2o
from h2o.automl import H2OAutoML
h2o.init()

Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "11.0.11" 2021-04-20; OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9); OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed mode)
  Starting server from /usr/local/lib/python3.6/dist-packages/h2o/backend/bin/h2o.jar
  Ice root: /tmp/tmpbpb46tm5
  JVM stdout: /tmp/tmpbpb46tm5/h2o_unknownUser_started_from_python.out
  JVM stderr: /tmp/tmpbpb46tm5/h2o_unknownUser_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.


0,1
H2O_cluster_uptime:,02 secs
H2O_cluster_timezone:,Etc/GMT
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.32.1.3
H2O_cluster_version_age:,1 month and 24 days
H2O_cluster_name:,H2O_from_python_unknownUser_fku9br
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,6.750 Gb
H2O_cluster_total_cores:,8
H2O_cluster_allowed_cores:,8


## Prepare data
We load the slightly modified version of the income dataset supplied with the repo. This saves some data cleaning lines not relevant to this project such as removing the final empty line.

The original data is at the [UCI ML Repository](https://archive.ics.uci.edu/ml/datasets/census+income) .

H2O provides an `import_file` method that enables convenient import of a CSV file to a dataframe. This process is fine here because the data are small.

In [None]:
df = h2o.import_file(path = "income.csv")

Parse progress: |█████████████████████████████████████████████████████████| 100%


The data can be viewed. It consists of 14 columns of demographic information of mixed data type, and a binary ground-truth column `yearly-income`.

Our task is to build a binary supervised ML classification model to predict whether a person's income is low (`<=50K`) or high (`>50K`).

This has obvious potential business applications, such as deciding who to market cheap or expensive products to, but we will not explore those here.

In [None]:
df

age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,yearly-income
39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K
37,Private,284582,Masters,14,Married-civ-spouse,Exec-managerial,Wife,White,Female,0,0,40,United-States,<=50K
49,Private,160187,9th,5,Married-spouse-absent,Other-service,Not-in-family,Black,Female,0,0,16,Jamaica,<=50K
52,Self-emp-not-inc,209642,HS-grad,9,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,45,United-States,>50K
31,Private,45781,Masters,14,Never-married,Prof-specialty,Not-in-family,White,Female,14084,0,50,United-States,>50K
42,Private,159449,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,5178,0,40,United-States,>50K




We can also summarize the dataframe with various statistics particularly useful for the exploratory data science that we are performing, using H2O's `summary()` method. Information includes min/max/spread, but also data type, number of zeros, and number of missing values.

In [None]:
df.summary()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,yearly-income
type,int,enum,int,enum,int,enum,enum,enum,enum,enum,int,int,int,enum,enum
mins,17.0,,12285.0,,1.0,,,,,,0.0,0.0,1.0,,
mean,38.58164675532078,,189778.36651208502,,10.0806793403151,,,,,,1077.6488437087312,87.303829734959,40.437455852092995,,
maxs,90.0,,1484705.0,,16.0,,,,,,99999.0,4356.0,99.0,,
sigma,13.640432553581341,,105549.97769702224,,2.5727203320673877,,,,,,7385.29208484034,402.96021864899967,12.347428681731843,,
zeros,0,,0,,0,,,,,,29849,31042,0,,
missing,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,39.0,State-gov,77516.0,Bachelors,13.0,Never-married,Adm-clerical,Not-in-family,White,Male,2174.0,0.0,40.0,United-States,<=50K
1,50.0,Self-emp-not-inc,83311.0,Bachelors,13.0,Married-civ-spouse,Exec-managerial,Husband,White,Male,0.0,0.0,13.0,United-States,<=50K
2,38.0,Private,215646.0,HS-grad,9.0,Divorced,Handlers-cleaners,Not-in-family,White,Male,0.0,0.0,40.0,United-States,<=50K


We separate the data feature columns (1-14) from the label in column 15 (yearly-income).

In [None]:
# Feature columns and label
y = "yearly-income"
x = df.columns
del x[14]
print(x)

['age', 'workclass', 'fnlwgt', 'education', 'education-num', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'capital-gain', 'capital-loss', 'hours-per-week', 'native-country']


And split the data into a training, validation, and testing set.

In H2O, the datasets are put into their *hex* format, which improves performance.

In [None]:
# Split
train, valid, test = df.split_frame(
    ratios = [0.6,0.2],
    seed = 123456,
    destination_frames=['train.hex','valid.hex','test.hex']
)

## Train the model using AutoML

Model training can then be performed using AutoML. Here we set the maximum number of models to search to be 20. The training takes a few minutes to run.

In [None]:
# Run AutoML
aml = H2OAutoML(max_models=20, seed=1)
aml.train(x=x, y=y, training_frame=train)

AutoML progress: |████████████████████████████████████████████████████████| 100%


We see from the searched models that a variety of configurations have been tried, including:

 - Regular GBT (aka. GBM, gradient boosting machine)
 - XGBoost model with grid of hyperparameter values
 - A deep learning model
 - Random forest
 - Stacked ensembles of models (stacking = feed model output into next model input)

For full details of the models searched in AutoML, see [H2O's AutoML documentation](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html).

We also see in the table various metrics for the model performance on the validation set, the leaderboard here being ordered by `auc`, which is the area under curve of model true versus false positive rate. Other [metrics](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/performance-and-prediction.html?#classification) shown include logarithmic loss, area under precision-recall curve, and mean squared error.

Gradient includes support for [tracking model metrics](https://docs.paperspace.com/gradient/data/metrics-overview), both in model experimentation and production.


In [None]:
lb = h2o.automl.get_leaderboard(aml, extra_columns = 'ALL')
lb.head(rows=lb.nrows)

model_id,auc,logloss,aucpr,mean_per_class_error,rmse,mse,training_time_ms,predict_time_per_row_ms,algo
StackedEnsemble_AllModels_AutoML_20210713_220658,0.927879,0.279676,0.828378,0.17474,0.298163,0.0889014,3856,0.051449,StackedEnsemble
StackedEnsemble_BestOfFamily_AutoML_20210713_220658,0.927704,0.279957,0.82827,0.175113,0.29824,0.0889473,2154,0.018812,StackedEnsemble
XGBoost_grid__1_AutoML_20210713_220658_model_4,0.926603,0.281722,0.825606,0.168536,0.299547,0.0897281,945,0.002455,XGBoost
GBM_1_AutoML_20210713_220658,0.925137,0.285528,0.823712,0.172024,0.30061,0.0903661,1662,0.019172,GBM
GBM_2_AutoML_20210713_220658,0.924914,0.286024,0.823076,0.173164,0.300807,0.0904848,1579,0.01554,GBM
XGBoost_3_AutoML_20210713_220658,0.924494,0.28605,0.820167,0.17028,0.301813,0.0910908,903,0.002192,XGBoost
GBM_3_AutoML_20210713_220658,0.924139,0.287323,0.821868,0.173311,0.301484,0.0908926,1577,0.013665,GBM
GBM_grid__1_AutoML_20210713_220658_model_1,0.922166,0.291503,0.816782,0.179959,0.303526,0.092128,1442,0.014716,GBM
XGBoost_grid__1_AutoML_20210713_220658_model_3,0.921608,0.291206,0.814513,0.172251,0.304343,0.0926247,841,0.00224,XGBoost
GBM_4_AutoML_20210713_220658,0.921552,0.292358,0.816714,0.167862,0.304239,0.0925616,1675,0.012231,GBM




The best model is the stacked ensemble, and we can see its properties in more detail. These include further metrics on model performance, such as the F-score harmonic mean of precision and recall, and the confusion matrix between predicted and ground truth labels, showing true and false positives and negatives. The information is shown for the training data, and then for the (cross-validated) validation data.

In [None]:
aml.leader

Model Details
H2OStackedEnsembleEstimator :  Stacked Ensemble
Model Key:  StackedEnsemble_AllModels_AutoML_20210713_220658

No model summary for this model

ModelMetricsBinomialGLM: stackedensemble
** Reported on train data. **

MSE: 0.07353773014710141
RMSE: 0.27117841017879984
LogLoss: 0.23472354840397225
Null degrees of freedom: 10046
Residual degrees of freedom: 10038
Null deviance: 11134.613562297965
Residual deviance: 4716.534981629419
AIC: 4734.534981629419
AUC: 0.9527179829248879
AUCPR: 0.8818935027134936
Gini: 0.9054359658497757

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.442733533010321: 


0,1,2,3,4
,<=50K,>50K,Error,Rate
<=50K,7150.0,459.0,0.0603,(459.0/7609.0)
>50K,570.0,1868.0,0.2338,(570.0/2438.0)
Total,7720.0,2327.0,0.1024,(1029.0/10047.0)



Maximum Metrics: Maximum metrics at their respective thresholds


0,1,2,3
metric,threshold,value,idx
max f1,0.4427335,0.7840504,180.0
max f2,0.2235808,0.8474576,267.0
max f0point5,0.5803231,0.8179173,134.0
max accuracy,0.4885390,0.8979795,164.0
max precision,0.9985563,1.0,0.0
max recall,0.0093222,1.0,385.0
max specificity,0.9985563,1.0,0.0
max absolute_mcc,0.4427335,0.7173048,180.0
max min_per_class_accuracy,0.3107194,0.8761280,232.0



Gains/Lift Table: Avg response rate: 24.27 %, avg score: 24.29 %


0,1,2,3,4,5,6,7,8,9,10,11,12,13
group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0100528,0.9979695,4.1210008,4.1210008,1.0,0.9985509,1.0,0.9985509,0.0414274,0.0414274,312.1000820,312.1000820,0.0414274
2,0.0200060,0.9961053,4.1210008,4.1210008,1.0,0.9971042,1.0,0.9978312,0.0410172,0.0824446,312.1000820,312.1000820,0.0824446
3,0.0300587,0.9939651,4.1210008,4.1210008,1.0,0.9952166,1.0,0.9969568,0.0414274,0.1238720,312.1000820,312.1000820,0.1238720
4,0.0400119,0.9887968,4.1210008,4.1210008,1.0,0.9918443,1.0,0.9956850,0.0410172,0.1648893,312.1000820,312.1000820,0.1648893
5,0.0500647,0.9749837,4.1210008,4.1210008,1.0,0.9821951,1.0,0.9929763,0.0414274,0.2063167,312.1000820,312.1000820,0.2063167
6,0.1000299,0.7909157,3.9157717,4.0184884,0.9501992,0.8883450,0.9751244,0.9407127,0.1956522,0.4019688,291.5771696,301.8488362,0.3986832
7,0.1499950,0.6568419,3.1605285,3.7326915,0.7669323,0.7237687,0.9057731,0.8684461,0.1579163,0.5598852,216.0528518,273.2691519,0.5412230
8,0.2000597,0.5173999,2.7200244,3.4792728,0.6600398,0.5875094,0.8442786,0.7981420,0.1361772,0.6960623,172.0024398,247.9272832,0.6549268
9,0.2999900,0.3204674,1.7075063,2.8890759,0.4143426,0.4090260,0.7010617,0.6685227,0.1706317,0.8666940,70.7506316,188.9075890,0.7482816




ModelMetricsBinomialGLM: stackedensemble
** Reported on cross-validation data. **

MSE: 0.08890141184971788
RMSE: 0.2981633979040987
LogLoss: 0.2796763420174989
Null degrees of freedom: 19679
Residual degrees of freedom: 19672
Null deviance: 21801.127810231956
Residual deviance: 11008.060821808762
AIC: 11024.060821808762
AUC: 0.9278789682208023
AUCPR: 0.8283780381517716
Gini: 0.8557579364416046

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.39722559433554266: 


0,1,2,3,4
,<=50K,>50K,Error,Rate
<=50K,13486.0,1423.0,0.0954,(1423.0/14909.0)
>50K,1212.0,3559.0,0.254,(1212.0/4771.0)
Total,14698.0,4982.0,0.1339,(2635.0/19680.0)



Maximum Metrics: Maximum metrics at their respective thresholds


0,1,2,3
metric,threshold,value,idx
max f1,0.3972256,0.7298267,198.0
max f2,0.1442125,0.8118176,300.0
max f0point5,0.6025775,0.7680444,126.0
max accuracy,0.4987898,0.8720020,161.0
max precision,0.9985042,1.0,0.0
max recall,0.0013010,1.0,397.0
max specificity,0.9985042,1.0,0.0
max absolute_mcc,0.4063935,0.6415411,195.0
max min_per_class_accuracy,0.2804126,0.8428003,241.0



Gains/Lift Table: Avg response rate: 24.24 %, avg score: 24.25 %


0,1,2,3,4,5,6,7,8,9,10,11,12,13
group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0100102,0.9972772,4.1249214,4.1249214,1.0,0.9980797,1.0,0.9980797,0.0412911,0.0412911,312.4921400,312.4921400,0.0412911
2,0.0200203,0.9951675,4.1249214,4.1249214,1.0,0.9963080,1.0,0.9971938,0.0412911,0.0825823,312.4921400,312.4921400,0.0825823
3,0.0300305,0.9920232,4.1039827,4.1179418,0.9949239,0.9937343,0.9983080,0.9960406,0.0410815,0.1236638,310.3982713,311.7941838,0.1235967
4,0.0400407,0.9849227,4.0830440,4.1092174,0.9898477,0.9886592,0.9961929,0.9941953,0.0408719,0.1645357,308.3044026,310.9217385,0.1643345
5,0.05,0.9710766,4.1038759,4.1081534,0.9948980,0.9790359,0.9959350,0.9911757,0.0408719,0.2054077,310.3875883,310.8153427,0.2051394
6,0.1,0.7765685,3.5548103,3.8314819,0.8617886,0.8689297,0.9288618,0.9300527,0.1777405,0.3831482,255.4810312,283.1481870,0.3737579
7,0.15,0.6446937,2.9637393,3.5422343,0.7184959,0.7105667,0.8587398,0.8568907,0.1481870,0.5313351,196.3739258,254.2234332,0.5033655
8,0.2,0.5081031,2.3265563,3.2383148,0.5640244,0.5761540,0.7850610,0.7867065,0.1163278,0.6476630,132.6556278,223.8314819,0.5909187
9,0.3,0.3188533,1.6223014,2.6996437,0.3932927,0.4064082,0.6544715,0.6599404,0.1622301,0.8098931,62.2301404,169.9643681,0.6730630







## Model performance on testing set
The measure of a model's likely performance in production is its performance on unseen data. Therefore it is common to hold out unseen a portion of the data as a testing set, and the model's performance measured against its ground truth.

We can do this here by showing the model running predictions on the testing data (class probabilities), and analyzing its performance via the `model_performance()` method. This shows similar information to the `leader()` method above. We see that the model generalizes quite well to the test data.

In [None]:
model = aml.leader
predictions = model.predict(test)

stackedensemble prediction progress: |████████████████████████████████████| 100%


In [None]:
predictions

predict,<=50K,>50K
<=50K,0.901984,0.0980157
>50K,0.558392,0.441608
>50K,0.00799326,0.992007
<=50K,0.99852,0.00147984
<=50K,0.985341,0.0146592
<=50K,0.925742,0.0742579
<=50K,0.675973,0.324027
<=50K,0.998648,0.00135163
<=50K,0.713482,0.286518
<=50K,0.995738,0.00426244




In [None]:
model.model_performance(test)


ModelMetricsBinomialGLM: stackedensemble
** Reported on test data. **

MSE: 0.08459831945145556
RMSE: 0.2908579025081759
LogLoss: 0.26651807392826693
Null degrees of freedom: 6384
Residual degrees of freedom: 6376
Null deviance: 6997.8448091152995
Residual deviance: 3403.435804063969
AIC: 3421.435804063969
AUC: 0.9342385860762666
AUCPR: 0.8380052497961226
Gini: 0.8684771721525333

Confusion Matrix (Act/Pred) for max f1 @ threshold = 0.39413376060002026: 


0,1,2,3,4
,<=50K,>50K,Error,Rate
<=50K,4411.0,459.0,0.0943,(459.0/4870.0)
>50K,356.0,1159.0,0.235,(356.0/1515.0)
Total,4767.0,1618.0,0.1276,(815.0/6385.0)



Maximum Metrics: Maximum metrics at their respective thresholds


0,1,2,3
metric,threshold,value,idx
max f1,0.3941338,0.7398659,196.0
max f2,0.1609037,0.8140209,291.0
max f0point5,0.6428023,0.7788191,112.0
max accuracy,0.4966910,0.8815975,159.0
max precision,0.9985658,1.0,0.0
max recall,0.0063333,1.0,389.0
max specificity,0.9985658,1.0,0.0
max absolute_mcc,0.4816404,0.6609098,164.0
max min_per_class_accuracy,0.2859887,0.8496920,238.0



Gains/Lift Table: Avg response rate: 23.73 %, avg score: 24.06 %


0,1,2,3,4,5,6,7,8,9,10,11,12,13
group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0100235,0.9976195,4.2145215,4.2145215,1.0,0.9983546,1.0,0.9983546,0.0422442,0.0422442,321.4521452,321.4521452,0.0422442
2,0.0200470,0.9954806,4.2145215,4.2145215,1.0,0.9966830,1.0,0.9975188,0.0422442,0.0844884,321.4521452,321.4521452,0.0844884
3,0.0300705,0.9923587,4.2145215,4.2145215,1.0,0.9942306,1.0,0.9964227,0.0422442,0.1267327,321.4521452,321.4521452,0.1267327
4,0.0400940,0.9851811,4.2145215,4.2145215,1.0,0.9894556,1.0,0.9946810,0.0422442,0.1689769,321.4521452,321.4521452,0.1689769
5,0.0501175,0.9644899,4.1486696,4.2013511,0.984375,0.9755751,0.996875,0.9908598,0.0415842,0.2105611,314.8669554,320.1351073,0.2103557
6,0.1000783,0.7782860,3.6992665,3.9507016,0.8777429,0.8677746,0.9374022,0.9294135,0.1848185,0.3953795,269.9266478,295.0701643,0.3871660
7,0.1500392,0.6404153,3.0651065,3.6558114,0.7272727,0.7094499,0.8674322,0.8561689,0.1531353,0.5485149,206.5106511,265.5811406,0.5224368
8,0.2,0.5050958,2.4441582,3.3531353,0.5799373,0.5741072,0.7956147,0.7857087,0.1221122,0.6706271,144.4158209,235.3135314,0.6170336
9,0.3000783,0.3120960,1.6027053,2.7693541,0.3802817,0.4044781,0.6570981,0.6585655,0.1603960,0.8310231,60.2705341,176.9354127,0.6961155







## Save the model for deployment

Finally, for a model to be put into production, it needs to be saved in a manner that can be accessed later. H2O has several model formats, but the one most [preferred for production](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/productionizing.html) is MOJO, or modified Java object. This allows the most general functionality and datatypes to be passed.

The model is output as a .zip file that includes its single Java dependency, `h2o-genmodel.jar`. Java knowledge is therefore required to proceed to production deployment, but the format allows significant flexibility in where it can be deployed.

The location that we save the model to is the Gradient-provided storage corresponding to this notebook, at `/storage`.

In the command line section of this project (refer back to https://github.com/gradient-ai/Classical-ML-Example), we will deploy this model on Gradient as a REST endpoint, and send inference data to it.

In [None]:
modelfile = model.download_mojo(path="/storage", get_genmodel_jar=True)
print("Model saved to " + modelfile)

Model saved to /storage/StackedEnsemble_AllModels_AutoML_20210713_220658.zip


## Conclusions

We have shown

 - Setup Java and H2O on Gradient
 - Load and prepare small dataset (UCI Census Income)
 - Train gradient-boosted decision tree and other models using H2O's AutoML
 - Evaluate model performance on unseen testing data
 - Save model so that it can be deployed to production

## Next Steps
To see the Workflow portion of this project, or to deploy the model using the command line, refer back to the project GitHub repo at https://github.com/gradient-ai/Classical-ML-Example .