Copyright © 2021, SAS Institute Inc., Cary, NC, USA.  All Rights Reserved.
SPDX-License-Identifier: Apache-2.0

# HMEQ Dataset : Build and Import Trained H2O.ai Models into SAS Model Manager

This notebook provides an example of how to build and train a Python model and then import the model into SAS Model Manager using the HMEQ data set. Lines of code that must be modified by the user, such as directory paths are noted with the comment "_Changes required by user._".

_**Note:** If you download only this notebook and not the rest of the repository, you must also download the hmeq.csv file from the data folder in the examples directory. These files are used when executing this notebook example._

Here are the steps shown in this notebook:

1. Import and review data and preprocess for model training.
2. Build, train, and access an H2O.ai generalized linear estimator model.
3. Serialize the model into pickle or MOJO files.
4. Write the metadata JSON files needed for importing into SAS Model Manager.
4. Write a score code Python file for model scoring.
5. Zip the model, JSON, and score code files into an archive file.
6. Import the ZIP archive file to SAS Model Manager via the Session object and relevant function call.

### Python Package Imports

In [1]:
# Standard Library
from pathlib import Path
import warnings

# Third Party
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import h2o
import pprint as pp
from h2o.estimators.glm import H2OGeneralizedLinearEstimator

# Application Specific
import sasctl.pzmm as pzmm
from sasctl import Session, publish_model

In [2]:
# Global Package Options
pd.options.mode.chained_assignment = None  # default="warn"
plt.rc("font", size=14)
# Ignore warnings from pandas about SWAT using a feature that will be depreciated soon
warnings.simplefilter(action="ignore", category=FutureWarning)

In [3]:
h2o.__version__

'3.40.0.3'

On SAS Viya, models created in H2O versions 3.24 and under are only compatible in the binary model format. For H2O versions 3.26+, models can be in the MOJO or binary model format. If using a binary model, the H2O version on the SAS Viya server must match the exact version of H2O used to create the model.

In [4]:
h2o.init()

Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "11.0.13" 2021-10-19; OpenJDK Runtime Environment JBR-11.0.13.7-1751.21-jcef (build 11.0.13+7-b1751.21); OpenJDK 64-Bit Server VM JBR-11.0.13.7-1751.21-jcef (build 11.0.13+7-b1751.21, mixed mode)
  Starting server from /Users/dalmoo/opt/anaconda3/envs/yeehaw/lib/python3.8/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /var/folders/vs/np2dp7cs1y7ggk5pl92q_rb40000gn/T/tmp0_92875_
  JVM stdout: /var/folders/vs/np2dp7cs1y7ggk5pl92q_rb40000gn/T/tmp0_92875_/h2o_dalmoo_started_from_python.out
  JVM stderr: /var/folders/vs/np2dp7cs1y7ggk5pl92q_rb40000gn/T/tmp0_92875_/h2o_dalmoo_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.
Please download and install the latest version from: https://h2o-release.s3.amazonaws.com/h2o/latest_stable.html


0,1
H2O_cluster_uptime:,02 secs
H2O_cluster_timezone:,America/Chicago
H2O_data_parsing_timezone:,UTC
H2O_cluster_version:,3.40.0.3
H2O_cluster_version_age:,6 months and 1 day
H2O_cluster_name:,H2O_from_python_dalmoo_6awy1u
H2O_cluster_total_nodes:,1
H2O_cluster_free_memory:,4 Gb
H2O_cluster_total_cores:,10
H2O_cluster_allowed_cores:,10


### Import and Review Data Set

In [5]:
hmeq_data = h2o.import_file("data/hmeq.csv", sep= ",")
hmeq_data.shape

Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%


(5960, 13)

### Preprocess Data

In [6]:
hmeq_data["BAD"] = hmeq_data["BAD"].asfactor()

train, validation, test = hmeq_data.split_frame(ratios=[.6, .2], seed=42)

y = "BAD"
x = list(hmeq_data.columns)
x.remove(y)

### Create, Train, and Assess Model

In [7]:
glm = H2OGeneralizedLinearEstimator(family="binomial", model_id="glmfit", lambda_search=True)
glm.train(x=x, y=y, training_frame=train, validation_frame=validation)

glm Model Build progress: |██████████████████████████████████████████████████████| (done) 100%


Unnamed: 0,family,link,regularization,lambda_search,number_of_predictors_total,number_of_active_predictors,number_of_iterations,training_frame
,binomial,logit,"Elastic Net (alpha = 0.5, lambda = 9.244E-4 )","nlambda = 100, lambda.max = 0.2455, lambda.min = 9.244E-4, lambda.1se = -1.0",18,17,93,py_3_sid_a269

Unnamed: 0,0,1,Error,Rate
0,2446.0,414.0,0.1448,(414.0/2860.0)
1,314.0,407.0,0.4355,(314.0/721.0)
Total,2760.0,821.0,0.2033,(728.0/3581.0)

metric,threshold,value,idx
max f1,0.2574172,0.5278859,207.0
max f2,0.1522967,0.6261121,279.0
max f0point5,0.3886058,0.5571256,144.0
max accuracy,0.5672592,0.8352416,90.0
max precision,0.9988092,1.0,0.0
max recall,0.0012377,1.0,399.0
max specificity,0.9988092,1.0,0.0
max absolute_mcc,0.2661852,0.4020977,203.0
max min_per_class_accuracy,0.1782796,0.7066434,258.0
max mean_per_class_accuracy,0.2143151,0.7195026,234.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0100531,0.9466824,4.9667129,4.9667129,1.0,0.9805069,1.0,0.9805069,0.0499307,0.0499307,396.6712899,396.6712899,0.0499307
2,0.0201061,0.8523402,4.8287487,4.8977308,0.9722222,0.8987941,0.9861111,0.9396505,0.0485437,0.0984743,382.8748652,389.7730775,0.0981247
3,0.0301592,0.7479558,3.8629989,4.5528202,0.7777778,0.7984072,0.9166667,0.8925694,0.038835,0.1373093,286.2998921,355.2820157,0.1341624
4,0.0402122,0.691234,3.1731777,4.2079095,0.6388889,0.7152449,0.8472222,0.8482383,0.0319001,0.1692094,217.3177685,320.7909539,0.1615171
5,0.0502653,0.6267302,3.5870704,4.0837417,0.7222222,0.6558936,0.8222222,0.8097693,0.036061,0.2052705,258.7070427,308.3741717,0.1940816
6,0.1002513,0.4302463,2.441736,3.2650257,0.4916201,0.5199749,0.6573816,0.6652757,0.1220527,0.3273232,144.173595,226.502575,0.2843162
7,0.1502374,0.3459178,2.053278,2.8618606,0.4134078,0.3838564,0.5762082,0.5716436,0.1026352,0.4299584,105.3277958,186.1860592,0.3502381
8,0.2002234,0.2844205,1.720314,2.576872,0.3463687,0.311903,0.5188285,0.506799,0.0859917,0.5159501,72.0313965,157.6871964,0.3953207
9,0.3001955,0.2098441,1.331856,2.1622527,0.2681564,0.2431147,0.4353488,0.418986,0.1331484,0.6490985,33.1855973,116.2252685,0.4368607
10,0.4001676,0.1695731,0.887904,1.8438878,0.1787709,0.188252,0.3712491,0.3613428,0.0887656,0.7378641,-11.2096018,84.3887831,0.4228291

Unnamed: 0,0,1,Error,Rate
0,863.0,95.0,0.0992,(95.0/958.0)
1,107.0,131.0,0.4496,(107.0/238.0)
Total,970.0,226.0,0.1689,(202.0/1196.0)

metric,threshold,value,idx
max f1,0.3129933,0.5646552,155.0
max f2,0.1877463,0.6551476,231.0
max f0point5,0.4296125,0.6079404,107.0
max accuracy,0.4438485,0.8461538,105.0
max precision,0.9939857,1.0,0.0
max recall,0.0096581,1.0,398.0
max specificity,0.9939857,1.0,0.0
max absolute_mcc,0.3129933,0.4602072,155.0
max min_per_class_accuracy,0.1954052,0.7478992,225.0
max mean_per_class_accuracy,0.191454,0.7508026,228.0

group,cumulative_data_fraction,lower_threshold,lift,cumulative_lift,response_rate,score,cumulative_response_rate,cumulative_score,capture_rate,cumulative_capture_rate,gain,cumulative_gain,kolmogorov_smirnov
1,0.0100334,0.9422827,5.0252101,5.0252101,1.0,0.9698159,1.0,0.9698159,0.0504202,0.0504202,402.5210084,402.5210084,0.0504202
2,0.0200669,0.8859422,4.6064426,4.8158263,0.9166667,0.9133265,0.9583333,0.9415712,0.0462185,0.0966387,360.6442577,381.5826331,0.0955948
3,0.0301003,0.833388,3.7689076,4.4668534,0.75,0.8662506,0.8888889,0.9164643,0.0378151,0.1344538,276.8907563,346.6853408,0.1302784
4,0.0401338,0.7848582,3.7689076,4.2923669,0.75,0.815402,0.8541667,0.8911987,0.0378151,0.1722689,276.8907563,329.2366947,0.164962
5,0.0501672,0.7211959,2.512605,3.9364146,0.5,0.7505935,0.7833333,0.8630777,0.0252101,0.197479,151.2605042,293.6414566,0.1839091
6,0.1003344,0.4947058,3.2663866,3.6014006,0.65,0.5984199,0.7166667,0.7307488,0.1638655,0.3613445,226.6386555,260.140056,0.3258539
7,0.1505017,0.3581829,2.010084,3.0709617,0.4,0.4130805,0.6111111,0.6248593,0.1008403,0.4621849,101.0084034,207.0961718,0.389116
8,0.2006689,0.3002017,2.010084,2.8057423,0.4,0.3274207,0.5583333,0.5504997,0.1008403,0.5630252,101.0084034,180.5742297,0.452378
9,0.3001672,0.2236892,1.1824024,2.2676435,0.2352941,0.2577997,0.4512535,0.4534766,0.1176471,0.6806723,18.2402373,126.7643548,0.4750355
10,0.4005017,0.1761387,1.0469188,1.9618252,0.2083333,0.1963835,0.3903967,0.3890691,0.105042,0.7857143,4.6918768,96.1825231,0.4809126

Unnamed: 0,timestamp,duration,iteration,lambda,predictors,deviance_train,deviance_test,alpha,iterations,training_rmse,training_logloss,training_r2,training_auc,training_pr_auc,training_lift,training_classification_error,validation_rmse,validation_logloss,validation_r2,validation_auc,validation_pr_auc,validation_lift,validation_classification_error
,2023-10-05 18:17:37,0.000 sec,1,.25E0,1,1.0045100,0.9980509,0.5,,,,,,,,,,,,,,,
,2023-10-05 18:17:37,0.061 sec,3,.22E0,2,0.9946355,0.9864127,0.5,,,,,,,,,,,,,,,
,2023-10-05 18:17:37,0.101 sec,5,.2E0,2,0.9859035,0.9760418,0.5,,,,,,,,,,,,,,,
,2023-10-05 18:17:37,0.130 sec,7,.19E0,2,0.9781697,0.9667699,0.5,,,,,,,,,,,,,,,
,2023-10-05 18:17:37,0.164 sec,9,.17E0,3,0.9686784,0.9544753,0.5,,,,,,,,,,,,,,,
,2023-10-05 18:17:37,0.201 sec,11,.15E0,3,0.9587676,0.9411532,0.5,,,,,,,,,,,,,,,
,2023-10-05 18:17:37,0.220 sec,13,.14E0,3,0.9500953,0.9293709,0.5,,,,,,,,,,,,,,,
,2023-10-05 18:17:37,0.268 sec,15,.13E0,3,0.9424603,0.9188844,0.5,,,,,,,,,,,,,,,
,2023-10-05 18:17:37,0.295 sec,17,.12E0,5,0.9341431,0.9080921,0.5,,,,,,,,,,,,,,,
,2023-10-05 18:17:37,0.351 sec,19,.11E0,5,0.9236449,0.8955093,0.5,,,,,,,,,,,,,,,

variable,relative_importance,scaled_importance,percentage
DELINQ,0.7970053,1.0,0.1473004
JOB.Sales,0.7478484,0.938323,0.1382154
JOB.Office,0.5109144,0.6410426,0.0944259
JOB.Self,0.4571329,0.5735632,0.0844861
CLAGE,0.4484834,0.5627107,0.0828875
DEBTINC,0.4467476,0.5605328,0.0825667
DEROG,0.4263194,0.5349016,0.0787912
NINQ,0.3037309,0.3810902,0.0561347
VALUE,0.24126,0.3027081,0.044589
MORTDUE,0.2384487,0.2991808,0.0440695


In [8]:
# Check the model performance and print its accuracy
glm_performance = glm.model_performance(test)
print(glm_performance.accuracy())

[[0.5513512979207219, 0.8486897717666948]]


In [None]:
glm.pre

### Register Models in SAS Model Manager

In [9]:
model_prefix = "glmfit"
binary_folder = Path.cwd() / "data/hmeqModels/H2OBinaryGLM/" # Changes needed by user
mojo_folder = Path.cwd() / "data/hmeqModels/H2OMOJOGLM/" # Changes needed by user

In [10]:
# Save the model as a H2O binary model file
pzmm.PickleModel.pickle_trained_model(
    model_prefix=model_prefix + "_binary",
    trained_model=glm, 
    pickle_path=binary_folder, 
    is_h2o_model=True, 
    is_binary_model=True
)

In [11]:
# Save the model as a H2O MOJO model file
pzmm.PickleModel.pickle_trained_model(
    model_prefix=model_prefix + "_mojo",
    trained_model=glm,
    pickle_path=mojo_folder, 
    is_h2o_model=True
)

In [12]:
train_df = train.as_data_frame()
# Write input variable mapping to a json file
pzmm.JSONFiles.write_var_json(train_df[x], is_input=True, json_path=binary_folder)
pzmm.JSONFiles.write_var_json(train_df[x], is_input=True, json_path=mojo_folder)

# Set output variables and assign an event threshold, then write output variable mapping
output_var = pd.DataFrame(
    columns=["EM_CLASSIFICATION", "EM_EVENTPROBABILITY"],
    data=[["A", 0.5]]
)
pzmm.JSONFiles.write_var_json(output_var, is_input=False, json_path=binary_folder)
pzmm.JSONFiles.write_var_json(output_var, is_input=False, json_path=mojo_folder)

# Write model properties to a json file
pzmm.JSONFiles.write_model_properties_json(
    model_name=model_prefix + "_binary",
    model_desc="Binary H2O model.",
    target_variable=y,
    target_values=["0", "1"],
    json_path=binary_folder,
    modeler="sasdemo"
)
pzmm.JSONFiles.write_model_properties_json(
    model_name=model_prefix + "_mojo",
    model_desc="MOJO H2O model.",
    target_variable=y,
    target_values=["0", "1"],
    json_path=mojo_folder,
    modeler="sasdemo"
)

# Write model metadata to a json file
pzmm.JSONFiles.write_file_metadata_json(model_prefix=model_prefix + "_binary", json_path=binary_folder)
pzmm.JSONFiles.write_file_metadata_json(model_prefix=model_prefix + "_mojo", json_path=mojo_folder, is_h2o_model=True)

inputVar.json was successfully written and saved to /Users/dalmoo/Documents/GitHub/python-sasctl/examples/data/hmeqModels/H2OBinaryGLM/inputVar.json
inputVar.json was successfully written and saved to /Users/dalmoo/Documents/GitHub/python-sasctl/examples/data/hmeqModels/H2OMOJOGLM/inputVar.json
outputVar.json was successfully written and saved to /Users/dalmoo/Documents/GitHub/python-sasctl/examples/data/hmeqModels/H2OBinaryGLM/outputVar.json
outputVar.json was successfully written and saved to /Users/dalmoo/Documents/GitHub/python-sasctl/examples/data/hmeqModels/H2OMOJOGLM/outputVar.json
ModelProperties.json was successfully written and saved to /Users/dalmoo/Documents/GitHub/python-sasctl/examples/data/hmeqModels/H2OBinaryGLM/ModelProperties.json
ModelProperties.json was successfully written and saved to /Users/dalmoo/Documents/GitHub/python-sasctl/examples/data/hmeqModels/H2OMOJOGLM/ModelProperties.json
fileMetadata.json was successfully written and saved to /Users/dalmoo/Documents/

In [13]:
import getpass
username = getpass.getpass()
password = getpass.getpass()
host = "sas.demo.com"
sess = Session(host, username, password, protocol="http")

In [14]:
binary_model = pzmm.ImportModel.import_model(
    model_files=binary_folder, 
    model_prefix=model_prefix + "_binary",
    project="H2OModels", 
    input_data=train_df[x], 
    predict_method=[glm.predict, [list]], 
    binary_h2o_model=True, 
    score_metrics=["EM_CLASSIFICATION", "EM_EVENTPROBABILITY"],
    missing_values=True,
    overwrite_model=True,
    model_file_name="glmfit_binary.pickle"
)
pzmm.ScoreCode.score_code = ""

mojo_model = pzmm.ImportModel.import_model(
    model_files=mojo_folder, 
    model_prefix=model_prefix + "_mojo",
    project="H2OModels", 
    input_data=train_df[x], 
    predict_method=[glm.predict, [list]], 
    mojo_model=True, 
    score_metrics=["EM_CLASSIFICATION", "EM_EVENTPROBABILITY"],
    missing_values=True,
    overwrite_model=True,
    model_file_name="glmfit_mojo.mojo"
)

All model files were zipped to ~\examples\data\hmeqModels\H2OBinaryGLM.
Model was successfully imported into SAS Model Manager as glmfit_binary with the following UUID: 5929a748-f9b2-4285-b73a-45c40659b4b0.
Model score code was written successfully to ~\examples\data\hmeqModels\H2OBinaryGLM\score_glmfit_binary.py and uploaded to SAS Model Manager.
All model files were zipped to ~\examples\data\hmeqModels\H2OMOJOGLM.
Model was successfully imported into SAS Model Manager as glmfit_mojo with the following UUID: 4c5dd027-d442-4860-9e96-9c26060dc727.
Model score code was written successfully to ~\examples\data\hmeqModels\H2OMOJOGLM\score_glmfit_mojo.py and uploaded to SAS Model Manager.


### Run a Score Test in SAS Model Manager

In [15]:
# Publish the model to the SAS Microanalytic Score destination in SAS Model Manager
module = publish_model(mojo_model[0], "maslocal", name="HMEQMOJO_publish", replace=True)

In [16]:
# Instantiate a API call logger to visualize score calls in realtime
sess.add_stderr_logger(level=20)

<StreamHandler stderr (INFO)>

In [17]:
# Convert h2o dataframe to pandas dataframe
X = train[0:10,:].as_data_frame()
result = []
# Step through the rows of data and collect the score from SAS Microanalytic Score publish destination
for index, row in X.iterrows():
    result.append(module.score(row))

HTTP/1.1 POST http://demo.sas.com/microanalyticScore/modules/hmeqmojo_publish/steps/score
HTTP/1.1 201 http://demo.sas.com/microanalyticScore/modules/hmeqmojo_publish/steps/score
HTTP/1.1 POST http://demo.sas.com/microanalyticScore/modules/hmeqmojo_publish/steps/score
HTTP/1.1 201 http://demo.sas.com/microanalyticScore/modules/hmeqmojo_publish/steps/score
HTTP/1.1 POST http://demo.sas.com/microanalyticScore/modules/hmeqmojo_publish/steps/score
HTTP/1.1 201 http://demo.sas.com/microanalyticScore/modules/hmeqmojo_publish/steps/score
HTTP/1.1 POST http://demo.sas.com/microanalyticScore/modules/hmeqmojo_publish/steps/score
HTTP/1.1 201 http://demo.sas.com/microanalyticScore/modules/hmeqmojo_publish/steps/score
HTTP/1.1 POST http://demo.sas.com/microanalyticScore/modules/hmeqmojo_publish/steps/score
HTTP/1.1 201 http://demo.sas.com/microanalyticScore/modules/hmeqmojo_publish/steps/score
HTTP/1.1 POST http://demo.sas.com/microanalyticScore/modules/hmeqmojo_publish/steps/score
HTTP/1.1 201 ht

In [18]:
# Scoring results
pp.pprint(result)

[(0.0, '0', 0.9470777188580571),
 (0.0, '0', 0.8697221847625589),
 (0.0, '0', 0.932352798571864),
 (0.0, '0', 0.9579361586939033),
 (0.0, '0', 0.8422587334478235),
 (0.0, '0', 0.9112367410859918),
 (0.0, '0', 0.7222737872462567),
 (0.0, '0', 0.8016194612045261),
 (0.0, '0', 0.9285316160726795),
 (0.0, '0', 0.9619797616826327)]
