# Viya 2020 Modeling in Python and SAS AutoML
**Notebook authored by Sophia Rowland (sophia.rowland@sas.com)**

In this simple example, we will build an automated machine learning model using SAS Data Science Pilot via the SWAT package and a Python model pipeline using xgboost and push both models into model manager using the sasctl and pzmm package.
***
## Table of Contents
1. [Introduction](#Introduction)
1. [Gather Resources](#Gather-Resources)
1. [Data Science Pilot in Python](#Data-Science-Pilot-in-Python)
1. [Python XGBOOST Machine Learning Pipeline](#Python-XGBOOST-Machine-Learning-Pipeline)
1. [Register Models to SAS Model Manager](#Register-Models-to-SAS-Model-Manager)
1. [Accessing Models via API](#Accessing-Models-via-API)
1. [Conclusion](#Conclusion)
***
## Introduction 
In this notebook, we will connect to SAS Viya from Python using the SWAT package. Using SWAT, we will use the Data Science Pilot action set to build automated machine learning models using the power of SAS analytics. Next, we will create a simple XGBOOST model pipeline, which will push into the SAS model repository service to demonstrate Viya's ability to consume different types of models! Finally, we will use the SASCTL and PZMM packages to generate model metadata and push our SAS models and our native Python model into the model repository service. More information about SWAT, Data Science Pilot, SASCTL and PZMM are available below.

### The Scripting Wrapper for Analytics Transfer Package
The [SAS SWAT (Scripting Wrapper for Analytics Transfer) package for Python](https://sassoftware.github.io/python-swat/index.html) is a Python interface to the SAS Cloud Analytic Services (CAS) engine (the centerpiece of the SAS Viya framework). SWAT allows Python programmers to load and access data stored in Viya as CASTables as well as execute CAS actions upon the data stored in CAS. CAS Action Sets are akin to libraries and CAS Actions can be thought of as functions. Examples and syntax for utilizing CAS actions in Python are readily available in the [documentation](https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=v_006&docsetId=casactml&docsetTarget=titlepage.htm&locale=en). SWAT accesses the CAS Engine using our Open API Framework. Notice in the image below that programmers can interface with CASTables using a familiar pandas-like syntax. With the best-of-breed SAS analytics in the cloud and the use of Python and its large collection of open source packages, the SWAT package gives you access to the best of both worlds.

### The Data Science Pilot Action Set
The Data Science Pilot Action Set is included with SAS Visual Data Mining and Machine Learning (VDMML) and consists of actions that implement a policy-based, configurable, and scalable approach to automating data science workflows. These actions allow you to move from a data set to a deployable model in a flash! A deeper dive into the data science can be found in [my blog series](https://blogs.sas.com/content/tag/data-science-pilot-explained/).

### The SASCTL and PZMM Packages
The SASCTL and PZMM Packages are incredibly helpful with model metadata, management and governance. SASCTL and PZMM enable you to easily put models built using CAS actions or purely python into our model repository and save important metadata with those models. You can also publish the models for easily deployment, monitor model performance, or access our workflow service. SASCTL and PZMM access the model management and model deployment microservices using our open API framework.

Now that we have a better understanding of SAS Open Source packages, let's get started!
***
## Gather Resources 
First, we will import necessary packages.

In [1]:
# Packages for Python Basics
import sys
import numpy as np
import pandas as pd

# Packages for Building Model Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
import xgboost as xgb

# Packages for the Environment
from pathlib import Path
import os
import urllib3
import getpass

# Functions for working with SAS and Python
from sasctl.services import model_repository as modelRepo 
from sasctl.tasks import register_model, publish_model, update_model_performance
from sasctl import pzmm as pzmm
from sasctl import Session
import swat

Now I will use the Scripting Wrapper for Analytics Transfer (SWAT) package to connect to out Cloud Analytics Service (CAS). To use this notebook yourself, ensure that you can connect to a SAS Viya 2020 environment and input your credentials below! 

In [2]:
# For authentication
username = getpass.getpass("Username: ")
password = getpass.getpass("Password: ")
hostname = getpass.getpass("Hostname: ")
portnum = int(getpass.getpass("Portnumber: "))
os.environ['CAS_CLIENT_SSL_CA_LIST']= getpass.getpass("SSL Certificate: ")

# Build connection
conn =  swat.CAS(hostname, portnum, username, password)

Username:  ······
Password:  ··············
Hostname:  ········································
Portnumber:  ·····
SSL Certificate:  ······················································································


Next, we will load the csv file holding our into a pandas data frame and into a CAS table. To use this notebook yourself, edit the code to point to a copy of the HMEQ data set. You can also use a copy residing on your SAS Viya 2020 environment! 

In [3]:
hmeqdf = pd.read_csv("./data/hmeq.csv")
hmeqct = conn.read_csv("./data/hmeq.csv", casout=dict(name='hmeqct', replace=True))

NOTE: Cloud Analytic Services made the uploaded file available as table HMEQCT in caslib CASUSER(sorowl).
NOTE: The table HMEQCT has been created in caslib CASUSER(sorowl) from binary data uploaded to Cloud Analytic Services.


In [4]:
hmeqdf.head()

Unnamed: 0,BAD,LOAN,MORTDUE,VALUE,REASON,JOB,YOJ,DEROG,DELINQ,CLAGE,NINQ,CLNO,DEBTINC
0,0,26800,46236.0,62711.0,DebtCon,Office,17.0,0.0,0.0,175.075058,1.0,22.0,33.059934
1,0,26900,74982.0,126972.0,DebtCon,Office,0.0,0.0,0.0,315.818911,0.0,23.0,38.32599
2,0,26900,67144.0,92923.0,DebtCon,Other,16.0,0.0,0.0,89.112173,1.0,17.0,32.791478
3,0,26900,45763.0,73797.0,DebtCon,Other,23.0,,0.0,291.591682,1.0,29.0,39.370858
4,0,27000,144901.0,178093.0,DebtCon,ProfExe,7.0,0.0,0.0,331.113972,0.0,34.0,40.566552


In [5]:
hmeqct.head()

Unnamed: 0,BAD,LOAN,MORTDUE,VALUE,REASON,JOB,YOJ,DEROG,DELINQ,CLAGE,NINQ,CLNO,DEBTINC
0,0.0,26800.0,46236.0,62711.0,DebtCon,Office,17.0,0.0,0.0,175.075058,1.0,22.0,33.059934
1,0.0,26900.0,74982.0,126972.0,DebtCon,Office,0.0,0.0,0.0,315.818911,0.0,23.0,38.32599
2,0.0,26900.0,67144.0,92923.0,DebtCon,Other,16.0,0.0,0.0,89.112173,1.0,17.0,32.791478
3,0.0,26900.0,45763.0,73797.0,DebtCon,Other,23.0,,0.0,291.591682,1.0,29.0,39.370858
4,0.0,27000.0,144901.0,178093.0,DebtCon,ProfExe,7.0,0.0,0.0,331.113972,0.0,34.0,40.566552


***
## Data Science Pilot in Python 

Now we will interface with the Data Science Pilot Action Set to build the same model we can build in SAS in Python - demonstrating the ability to code in both Python and SAS, while getting the same results!

In [6]:
conn.builtins.loadactionset('dataSciencePilot')
conn.dataSciencePilot.dsAutoMl(
    table = 'hmeqct',
    target = 'BAD', 
    explorationPolicy = {'cardinality': {'lowMediumCutoff':40}}, 
    screenPolicy = {'missingPercentThreshold':35}, 
    selectionPolicy = {'topk':10},
    transformationPolicy = {'entropy': True, 'iqv': True, 'kurtosis': True, 'outlier': True},
    modelTypes              = ["decisionTree", "gradboost"],
    objective               = "AUC",
    sampleSize              = 10,
    topKPipelines           = 5,
    kFolds                  = 2,
    transformationOut       = {"name" : "TRANSFORMATION_OUT_PY", "replace" : True},
    featureOut              = {"name" : "FEATURE_OUT_PY", "replace" : True},
    pipelineOut             = {"name" : "PIPELINE_OUT_PY", "replace" : True},
    saveState               = {"modelNamePrefix" : "ASTORE_OUT_PY", "replace" : True, "topK":1} 
)

NOTE: Added action set 'dataSciencePilot'.
NOTE: Added action set 'autotune'.
NOTE: Added action set 'decisionTree'.
NOTE: Added action set 'autotune'.
NOTE: Early stopping is activated; 'NTREE' will not be tuned.
NOTE: Added action set 'decisionTree'.
NOTE: Added action set 'autotune'.
NOTE: Added action set 'decisionTree'.
NOTE: Added action set 'autotune'.
NOTE: Early stopping is activated; 'NTREE' will not be tuned.
NOTE: Added action set 'decisionTree'.
NOTE: Added action set 'autotune'.
NOTE: Added action set 'decisionTree'.
NOTE: Added action set 'autotune'.
NOTE: Early stopping is activated; 'NTREE' will not be tuned.
NOTE: Added action set 'decisionTree'.
NOTE: Added action set 'autotune'.
NOTE: Added action set 'decisionTree'.
NOTE: Added action set 'autotune'.
NOTE: Early stopping is activated; 'NTREE' will not be tuned.
NOTE: Added action set 'decisionTree'.
NOTE: Added action set 'autotune'.
NOTE: Added action set 'decisionTree'.
NOTE: Added action set 'autotune'.
NOTE: Ea

Unnamed: 0,Descr,Value
0,Number of Tree Nodes,31.0
1,Max Number of Branches,2.0
2,Number of Levels,5.0
3,Number of Leaves,16.0
4,Number of Bins,100.0
5,Minimum Size of Leaves,6.0
6,Maximum Size of Leaves,2045.0
7,Number of Variables,10.0
8,Confidence Level for Pruning,0.25
9,Number of Observations Used,5960.0

Unnamed: 0,Descr,Value
0,Number of Observations Read,5960.0
1,Number of Observations Used,5960.0
2,Misclassification Error (%),11.845637584

Unnamed: 0,LEVNAME,LEVINDEX,VARNAME
0,1,0,P_BAD1
1,0,1,P_BAD0

Unnamed: 0,LEVNAME,LEVINDEX,VARNAME
0,,0,I_BAD

Unnamed: 0,Variable,Event,CutOff,TP,FP,FN,TN,Sensitivity,Specificity,KS,...,F_HALF,FPR,ACC,FDR,F1,C,Gini,Gamma,Tau,MISCEVENT
0,P_BAD0,0,0.00,4771.0,1189.0,0.0,0.0,1.000000,0.000000,0.0,...,0.833770,1.000000,0.800503,0.199497,0.889200,0.880554,0.761107,0.829785,0.243135,0.199497
1,P_BAD0,0,0.01,4771.0,1124.0,0.0,65.0,1.000000,0.054668,0.0,...,0.841417,0.945332,0.811409,0.190670,0.894618,0.880554,0.761107,0.829785,0.243135,0.188591
2,P_BAD0,0,0.02,4771.0,1124.0,0.0,65.0,1.000000,0.054668,0.0,...,0.841417,0.945332,0.811409,0.190670,0.894618,0.880554,0.761107,0.829785,0.243135,0.188591
3,P_BAD0,0,0.03,4771.0,1124.0,0.0,65.0,1.000000,0.054668,0.0,...,0.841417,0.945332,0.811409,0.190670,0.894618,0.880554,0.761107,0.829785,0.243135,0.188591
4,P_BAD0,0,0.04,4771.0,1124.0,0.0,65.0,1.000000,0.054668,0.0,...,0.841417,0.945332,0.811409,0.190670,0.894618,0.880554,0.761107,0.829785,0.243135,0.188591
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,P_BAD0,0,0.95,1806.0,44.0,2965.0,1145.0,0.378537,0.962994,0.0,...,0.741928,0.037006,0.495134,0.023784,0.545537,0.880554,0.761107,0.829785,0.243135,0.504866
96,P_BAD0,0,0.96,1806.0,44.0,2965.0,1145.0,0.378537,0.962994,0.0,...,0.741928,0.037006,0.495134,0.023784,0.545537,0.880554,0.761107,0.829785,0.243135,0.504866
97,P_BAD0,0,0.97,1806.0,44.0,2965.0,1145.0,0.378537,0.962994,0.0,...,0.741928,0.037006,0.495134,0.023784,0.545537,0.880554,0.761107,0.829785,0.243135,0.504866
98,P_BAD0,0,0.98,7.0,0.0,4764.0,1189.0,0.001467,1.000000,0.0,...,0.007293,0.000000,0.200671,0.000000,0.002930,0.880554,0.761107,0.829785,0.243135,0.799329

Unnamed: 0,NOBS,ASE,DIV,RASE,MCE,MCLL
0,5960.0,0.091043,5960.0,0.301733,0.118456,0.306519

Unnamed: 0,Parameter,Value
0,Model Type,Decision Tree
1,Tuner Objective Function,Area Under Curve
2,Search Method,GRID
3,Number of Grid Points,6
4,Maximum Tuning Time in Seconds,36000
5,Validation Type,Single Partition
6,Validation Partition Fraction,0.30
7,Log Level,0
8,Seed,70555066
9,Number of Parallel Evaluations,2

Unnamed: 0,Evaluation,MAXLEVEL,NBINS,CRIT,AreaUnderCurve,EvaluationTime
0,0,11,20,gainRatio,0.857149,1.530925
1,4,5,100,gain,0.86587,0.61755
2,2,10,100,gainRatio,0.854104,1.511851
3,5,10,100,gain,0.84696,0.735416
4,3,15,100,gainRatio,0.824081,1.005928
5,6,15,100,gain,0.806591,0.821093
6,1,5,100,gainRatio,0.794999,0.624389

Unnamed: 0,Iteration,Evaluations,Best_obj,Time_sec
0,0,1,0.857149,1.530925
1,1,7,0.86587,4.48161

Unnamed: 0,Evaluation,Iteration,MAXLEVEL,NBINS,CRIT,AreaUnderCurve,EvaluationTime
0,0,0,11,20,gainRatio,0.857149,1.530925
1,1,1,5,100,gainRatio,0.794999,0.624389
2,2,1,10,100,gainRatio,0.854104,1.511851
3,3,1,15,100,gainRatio,0.824081,1.005928
4,4,1,5,100,gain,0.86587,0.61755
5,5,1,10,100,gain,0.84696,0.735416
6,6,1,15,100,gain,0.806591,0.821093

Unnamed: 0,Parameter,Name,Value
0,Evaluation,Evaluation,4
1,Maximum Tree Levels,MAXLEVEL,5
2,Maximum Bins,NBINS,100
3,Criterion,CRIT,gain
4,Area Under Curve,Objective,0.8658701776

Unnamed: 0,Parameter,Value
0,Initial Configuration Objective Value,0.857149
1,Best Configuration Objective Value,0.86587
2,Worst Configuration Objective Value,0.794999
3,Initial Configuration Evaluation Time in Seconds,1.530925
4,Best Configuration Evaluation Time in Seconds,0.617336
5,Number of Improved Configurations,1.0
6,Number of Evaluated Configurations,7.0
7,Total Tuning Time in Seconds,5.401655
8,Parallel Tuning Speedup,1.352141

Unnamed: 0,Task,Time_sec,Time_percent
0,Model Training,4.207316,57.604467
1,Model Scoring,2.170447,29.71668
2,Total Objective Evaluations,6.380965,87.364983
3,Tuner,0.922837,12.635017
4,Total CPU Time,7.303802,100.0

Unnamed: 0,Hyperparameter,RelImportance
0,MAXLEVEL,1.0
1,CRIT,0.050953
2,NBINS,0.0

Unnamed: 0,Descr,Value
0,Number of Trees,150.0
1,Distribution,2.0
2,Learning Rate,0.1
3,Subsampling Rate,0.6
4,Number of Selected Variables (M),10.0
5,Number of Bins,77.0
6,Number of Variables,10.0
7,Max Number of Tree Nodes,123.0
8,Min Number of Tree Nodes,63.0
9,Max Number of Branches,2.0

Unnamed: 0,Progress,Metric
0,1.0,0.199497
1,2.0,0.199497
2,3.0,0.199497
3,4.0,0.196812
4,5.0,0.167450
...,...,...
145,146.0,0.001846
146,147.0,0.001846
147,148.0,0.001678
148,149.0,0.001846

Unnamed: 0,Descr,Value
0,Number of Observations Read,5960.0
1,Number of Observations Used,5960.0
2,Misclassification Error (%),0.1510067114

Unnamed: 0,TreeID,Trees,NLeaves,MCR,LogLoss,ASE,RASE,MAXAE
0,0.0,1.0,48.0,0.199497,0.455008,0.144200,0.379737,0.819707
1,1.0,2.0,101.0,0.199497,0.420078,0.130961,0.361886,0.833214
2,2.0,3.0,152.0,0.199497,0.394628,0.121028,0.347891,0.849363
3,3.0,4.0,202.0,0.196812,0.372234,0.112158,0.334900,0.860647
4,4.0,5.0,256.0,0.165604,0.352168,0.104402,0.323114,0.872457
...,...,...,...,...,...,...,...,...
145,145.0,146.0,7373.0,0.001846,0.035530,0.005181,0.071979,0.618662
146,146.0,147.0,7426.0,0.001846,0.034929,0.005028,0.070911,0.584889
147,147.0,148.0,7485.0,0.001678,0.034398,0.004909,0.070066,0.601992
148,148.0,149.0,7542.0,0.001846,0.033824,0.004784,0.069168,0.593865

Unnamed: 0,LEVNAME,LEVINDEX,VARNAME
0,1,0,P_BAD1
1,0,1,P_BAD0

Unnamed: 0,LEVNAME,LEVINDEX,VARNAME
0,,0,I_BAD

Unnamed: 0,Variable,Event,CutOff,TP,FP,FN,TN,Sensitivity,Specificity,KS,...,F_HALF,FPR,ACC,FDR,F1,C,Gini,Gamma,Tau,MISCEVENT
0,P_BAD0,0,0.00,4771.0,1189.0,0.0,0.0,1.000000,0.000000,0.0,...,0.833770,1.000000,0.800503,0.199497,0.889200,0.999994,0.999987,0.999988,0.319445,0.199497
1,P_BAD0,0,0.01,4771.0,934.0,0.0,255.0,1.000000,0.214466,0.0,...,0.864594,0.785534,0.843289,0.163716,0.910844,0.999994,0.999987,0.999988,0.319445,0.156711
2,P_BAD0,0,0.02,4771.0,797.0,0.0,392.0,1.000000,0.329689,0.0,...,0.882114,0.670311,0.866275,0.143139,0.922913,0.999994,0.999987,0.999988,0.319445,0.133725
3,P_BAD0,0,0.03,4771.0,684.0,0.0,505.0,1.000000,0.424727,0.0,...,0.897108,0.575273,0.885235,0.125390,0.933112,0.999994,0.999987,0.999988,0.319445,0.114765
4,P_BAD0,0,0.04,4771.0,604.0,0.0,585.0,1.000000,0.492010,0.0,...,0.908035,0.507990,0.898658,0.112372,0.940469,0.999994,0.999987,0.999988,0.319445,0.101342
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,P_BAD0,0,0.95,4329.0,0.0,442.0,1189.0,0.907357,1.000000,0.0,...,0.979988,0.000000,0.925839,0.000000,0.951429,0.999994,0.999987,0.999988,0.319445,0.074161
96,P_BAD0,0,0.96,4237.0,0.0,534.0,1189.0,0.888074,1.000000,0.0,...,0.975413,0.000000,0.910403,0.000000,0.940719,0.999994,0.999987,0.999988,0.319445,0.089597
97,P_BAD0,0,0.97,4081.0,0.0,690.0,1189.0,0.855376,1.000000,0.0,...,0.967291,0.000000,0.884228,0.000000,0.922052,0.999994,0.999987,0.999988,0.319445,0.115772
98,P_BAD0,0,0.98,3808.0,0.0,963.0,1189.0,0.798156,1.000000,0.0,...,0.951857,0.000000,0.838423,0.000000,0.887749,0.999994,0.999987,0.999988,0.319445,0.161577

Unnamed: 0,NOBS,ASE,DIV,RASE,MCE,MCLL
0,5960.0,0.004665,5960.0,0.068298,0.00151,0.033342

Unnamed: 0,Parameter,Value
0,Model Type,Gradient Boosting Tree
1,Tuner Objective Function,Area Under Curve
2,Search Method,GRID
3,Number of Grid Points,16
4,Maximum Tuning Time in Seconds,36000
5,Validation Type,Single Partition
6,Validation Partition Fraction,0.30
7,Log Level,0
8,Seed,70555071
9,Number of Parallel Evaluations,2

Unnamed: 0,Evaluation,M,LEARNINGRATE,SUBSAMPLERATE,LASSO,RIDGE,NBINS,MAXLEVEL,AreaUnderCurve,EvaluationTime
0,0,10,0.1,0.5,0.0,1.0,50,5,0.921546,9.55613
1,14,10,0.1,0.6,0.0,0.0,77,7,0.940871,22.204142
2,7,10,0.1,0.8,0.0,0.0,77,7,0.939292,9.113203
3,1,10,0.1,0.6,0.5,0.0,77,7,0.939042,26.665014
4,10,10,0.1,0.8,0.5,0.0,77,5,0.92672,4.915408
5,13,10,0.1,0.8,0.0,0.0,77,5,0.917736,3.837701
6,5,10,0.1,0.6,0.0,0.0,77,5,0.91651,3.316526
7,9,10,0.1,0.8,0.5,0.0,77,7,0.915537,3.806702
8,11,10,0.1,0.6,0.5,0.0,77,5,0.905599,4.272436
9,15,10,0.05,0.8,0.5,0.0,77,7,0.825386,1.119262

Unnamed: 0,Iteration,Evaluations,Best_obj,Time_sec
0,0,1,0.921546,9.55613
1,1,17,0.940871,62.697545

Unnamed: 0,Evaluation,Iteration,M,LEARNINGRATE,SUBSAMPLERATE,LASSO,RIDGE,NBINS,MAXLEVEL,AreaUnderCurve,EvaluationTime
0,0,0,10,0.1,0.5,0.0,1.0,50,5,0.921546,9.55613
1,1,1,10,0.1,0.6,0.5,0.0,77,7,0.939042,26.665014
2,2,1,10,0.05,0.6,0.0,0.0,77,5,0.818485,1.600817
3,3,1,10,0.05,0.6,0.5,0.0,77,7,0.810106,1.130838
4,4,1,10,0.05,0.8,0.5,0.0,77,5,0.818035,1.000732
5,5,1,10,0.1,0.6,0.0,0.0,77,5,0.91651,3.316526
6,6,1,10,0.05,0.8,0.0,0.0,77,7,0.816337,1.276982
7,7,1,10,0.1,0.8,0.0,0.0,77,7,0.939292,9.113203
8,8,1,10,0.05,0.8,0.0,0.0,77,5,0.813599,1.033013
9,9,1,10,0.1,0.8,0.5,0.0,77,7,0.915537,3.806702

Unnamed: 0,Parameter,Name,Value
0,Evaluation,Evaluation,14.0
1,Number of Variables to Try,M,10.0
2,Learning Rate,LEARNINGRATE,0.1
3,Sampling Rate,SUBSAMPLERATE,0.6
4,Lasso,LASSO,0.0
5,Ridge,RIDGE,0.0
6,Number of Bins,NBINS,77.0
7,Maximum Tree Levels,MAXLEVEL,7.0
8,Area Under Curve,Objective,0.9408711074

Unnamed: 0,Parameter,Value
0,Initial Configuration Objective Value,0.921546
1,Best Configuration Objective Value,0.940871
2,Worst Configuration Objective Value,0.810106
3,Initial Configuration Evaluation Time in Seconds,9.55613
4,Best Configuration Evaluation Time in Seconds,22.203997
5,Number of Improved Configurations,2.0
6,Number of Evaluated Configurations,17.0
7,Total Tuning Time in Seconds,96.296407
8,Parallel Tuning Speedup,1.352349

Unnamed: 0,Task,Time_sec,Time_percent
0,Model Training,122.700009,94.220552
1,Model Scoring,6.529068,5.01363
2,Total Objective Evaluations,129.236482,99.239868
3,Tuner,0.989892,0.760132
4,Total CPU Time,130.226375,100.0

Unnamed: 0,Hyperparameter,RelImportance
0,LEARNINGRATE,1.0
1,LASSO,0.006289
2,SUBSAMPLERATE,0.002352
3,MAXLEVEL,0.000761
4,M,0.0
5,RIDGE,0.0
6,NBINS,0.0

Unnamed: 0,Descr,Value
0,Number of Trees,150.0
1,Distribution,2.0
2,Learning Rate,0.1
3,Subsampling Rate,0.5
4,Number of Selected Variables (M),14.0
5,Number of Bins,50.0
6,Number of Variables,14.0
7,Max Number of Tree Nodes,31.0
8,Min Number of Tree Nodes,17.0
9,Max Number of Branches,2.0

Unnamed: 0,Progress,Metric
0,1.0,0.199497
1,2.0,0.199497
2,3.0,0.199497
3,4.0,0.199497
4,5.0,0.193456
...,...,...
145,146.0,0.049329
146,147.0,0.048993
147,148.0,0.048154
148,149.0,0.047315

Unnamed: 0,Descr,Value
0,Number of Observations Read,5960.0
1,Number of Observations Used,5960.0
2,Misclassification Error (%),4.6476510067

Unnamed: 0,TreeID,Trees,NLeaves,MCR,LogLoss,ASE,RASE,MAXAE
0,0.0,1.0,14.0,0.199497,0.458854,0.145714,0.381725,0.815309
1,1.0,2.0,28.0,0.199497,0.428499,0.134314,0.366489,0.831229
2,2.0,3.0,43.0,0.199497,0.405006,0.125130,0.353737,0.843562
3,3.0,4.0,58.0,0.199497,0.385870,0.117518,0.342808,0.854327
4,4.0,5.0,72.0,0.193456,0.370783,0.111549,0.333990,0.863793
...,...,...,...,...,...,...,...,...
145,145.0,146.0,2183.0,0.048993,0.134989,0.036668,0.191488,0.971637
146,146.0,147.0,2192.0,0.048658,0.134792,0.036561,0.191210,0.971385
147,147.0,148.0,2206.0,0.047819,0.134387,0.036389,0.190760,0.970637
148,148.0,149.0,2222.0,0.047148,0.133839,0.036210,0.190289,0.970746

Unnamed: 0,LEVNAME,LEVINDEX,VARNAME
0,1,0,P_BAD1
1,0,1,P_BAD0

Unnamed: 0,LEVNAME,LEVINDEX,VARNAME
0,,0,I_BAD

Unnamed: 0,Variable,Event,CutOff,TP,FP,FN,TN,Sensitivity,Specificity,KS,...,F_HALF,FPR,ACC,FDR,F1,C,Gini,Gamma,Tau,MISCEVENT
0,P_BAD0,0,0.00,4771.0,1189.0,0.0,0.0,1.000000,0.000000,0.0,...,0.833770,1.000000,0.800503,0.199497,0.889200,0.986788,0.973576,0.974871,0.311008,0.199497
1,P_BAD0,0,0.01,4771.0,1107.0,0.0,82.0,1.000000,0.068966,0.0,...,0.843440,0.931034,0.814262,0.188329,0.896047,0.986788,0.973576,0.974871,0.311008,0.185738
2,P_BAD0,0,0.02,4771.0,1018.0,0.0,171.0,1.000000,0.143818,0.0,...,0.854191,0.856182,0.829195,0.175851,0.903598,0.986788,0.973576,0.974871,0.311008,0.170805
3,P_BAD0,0,0.03,4771.0,949.0,0.0,240.0,1.000000,0.201850,0.0,...,0.862717,0.798150,0.840772,0.165909,0.909542,0.986788,0.973576,0.974871,0.311008,0.159228
4,P_BAD0,0,0.04,4771.0,897.0,0.0,292.0,1.000000,0.245585,0.0,...,0.869256,0.754415,0.849497,0.158257,0.914072,0.986788,0.973576,0.974871,0.311008,0.150503
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,P_BAD0,0,0.95,3386.0,5.0,1385.0,1184.0,0.709704,0.995795,0.0,...,0.923371,0.004205,0.766779,0.001474,0.829699,0.986788,0.973576,0.974871,0.311008,0.233221
96,P_BAD0,0,0.96,3113.0,2.0,1658.0,1187.0,0.652484,0.998318,0.0,...,0.903314,0.001682,0.721477,0.000642,0.789500,0.986788,0.973576,0.974871,0.311008,0.278523
97,P_BAD0,0,0.97,2704.0,1.0,2067.0,1188.0,0.566757,0.999159,0.0,...,0.867167,0.000841,0.653020,0.000370,0.723381,0.986788,0.973576,0.974871,0.311008,0.346980
98,P_BAD0,0,0.98,2133.0,0.0,2638.0,1189.0,0.447076,1.000000,0.0,...,0.801699,0.000000,0.557383,0.000000,0.617903,0.986788,0.973576,0.974871,0.311008,0.442617

Unnamed: 0,NOBS,ASE,DIV,RASE,MCE,MCLL
0,5960.0,0.036071,5960.0,0.189923,0.046477,0.133422

Unnamed: 0,Parameter,Value
0,Model Type,Gradient Boosting Tree
1,Tuner Objective Function,Area Under Curve
2,Search Method,GRID
3,Number of Grid Points,16
4,Maximum Tuning Time in Seconds,36000
5,Validation Type,Single Partition
6,Validation Partition Fraction,0.30
7,Log Level,0
8,Seed,70564939
9,Number of Parallel Evaluations,4

Unnamed: 0,Evaluation,M,LEARNINGRATE,SUBSAMPLERATE,LASSO,RIDGE,NBINS,MAXLEVEL,AreaUnderCurve,EvaluationTime
0,0,14,0.1,0.5,0.0,1.0,50,5,0.951389,12.500329
1,1,14,0.1,0.6,0.0,0.0,77,7,0.950255,9.45954
2,9,14,0.1,0.8,0.0,0.0,77,7,0.947687,8.861211
3,14,14,0.1,0.6,0.0,0.0,77,5,0.946082,9.209892
4,16,14,0.1,0.6,0.5,0.0,77,5,0.944753,5.38908
5,15,14,0.1,0.8,0.5,0.0,77,7,0.940222,4.89393
6,8,14,0.1,0.8,0.0,0.0,77,5,0.940188,4.610089
7,11,14,0.1,0.8,0.5,0.0,77,5,0.939386,4.158722
8,6,14,0.1,0.6,0.5,0.0,77,7,0.928318,3.705621
9,4,14,0.05,0.6,0.5,0.0,77,5,0.850381,1.891152

Unnamed: 0,Iteration,Evaluations,Best_obj,Time_sec
0,0,1,0.951389,12.500329
1,1,17,0.951389,31.170086

Unnamed: 0,Evaluation,Iteration,M,LEARNINGRATE,SUBSAMPLERATE,LASSO,RIDGE,NBINS,MAXLEVEL,AreaUnderCurve,EvaluationTime
0,0,0,14,0.1,0.5,0.0,1.0,50,5,0.951389,12.500329
1,1,1,14,0.1,0.6,0.0,0.0,77,7,0.950255,9.45954
2,2,1,14,0.05,0.8,0.5,0.0,77,5,0.847657,2.817777
3,3,1,14,0.05,0.8,0.5,0.0,77,7,0.835042,3.067492
4,4,1,14,0.05,0.6,0.5,0.0,77,5,0.850381,1.891152
5,5,1,14,0.05,0.8,0.0,0.0,77,7,0.840787,1.414821
6,6,1,14,0.1,0.6,0.5,0.0,77,7,0.928318,3.705621
7,7,1,14,0.05,0.8,0.0,0.0,77,5,0.842524,1.548925
8,8,1,14,0.1,0.8,0.0,0.0,77,5,0.940188,4.610089
9,9,1,14,0.1,0.8,0.0,0.0,77,7,0.947687,8.861211

Unnamed: 0,Parameter,Name,Value
0,Evaluation,Evaluation,0.0
1,Number of Variables to Try,M,14.0
2,Learning Rate,LEARNINGRATE,0.1
3,Sampling Rate,SUBSAMPLERATE,0.5
4,Lasso,LASSO,0.0
5,Ridge,RIDGE,1.0
6,Number of Bins,NBINS,50.0
7,Maximum Tree Levels,MAXLEVEL,5.0
8,Area Under Curve,Objective,0.9513885219

Unnamed: 0,Parameter,Value
0,Initial Configuration Objective Value,0.951389
1,Best Configuration Objective Value,0.951389
2,Worst Configuration Objective Value,0.830543
3,Initial Configuration Evaluation Time in Seconds,12.500329
4,Best Configuration Evaluation Time in Seconds,12.500329
5,Number of Improved Configurations,0.0
6,Number of Evaluated Configurations,17.0
7,Total Tuning Time in Seconds,50.814675
8,Parallel Tuning Speedup,1.87207

Unnamed: 0,Task,Time_sec,Time_percent
0,Model Training,87.928319,92.430994
1,Model Scoring,5.403537,5.680244
2,Total Objective Evaluations,93.340051,98.119852
3,Tuner,1.788558,1.880148
4,Total CPU Time,95.128609,100.0

Unnamed: 0,CAS_Library,Name,Rows,Columns
0,CASUSER(SOROWL),ASTORE_OUT_PY_gradBoost_1,1,2

Unnamed: 0,Hyperparameter,RelImportance
0,LEARNINGRATE,1.0
1,MAXLEVEL,0.015843
2,LASSO,0.009224
3,SUBSAMPLERATE,0.003932
4,M,0.0
5,RIDGE,0.0
6,NBINS,0.0

Unnamed: 0,casLib,Name,Rows,Columns,casTable
0,CASUSER(sorowl),PIPELINE_OUT_PY,5,35,"CASTable('PIPELINE_OUT_PY', caslib='CASUSER(so..."
1,CASUSER(sorowl),TRANSFORMATION_OUT_PY,32,21,"CASTable('TRANSFORMATION_OUT_PY', caslib='CASU..."
2,CASUSER(sorowl),FEATURE_OUT_PY,59,15,"CASTable('FEATURE_OUT_PY', caslib='CASUSER(sor..."
3,CASUSER(sorowl),ASTORE_OUT_PY_fm_,1,2,"CASTable('ASTORE_OUT_PY_fm_', caslib='CASUSER(..."
4,CASUSER(sorowl),ASTORE_OUT_PY_gradBoost_1,1,2,"CASTable('ASTORE_OUT_PY_gradBoost_1', caslib='..."


Great! Now let's use our feature creation model and our grad boost model to score our data.

In [7]:
conn.loadactionset('astore')
model_astore = conn.CASTable("ASTORE_OUT_PY_gradBoost_1")
feature_astore = conn.CASTable("ASTORE_OUT_PY_fm_")
conn.score(
    table='hmeqct', 
    copyvars='BAD', 
    out = dict(name='feat_scored', replace=True),
    rstore='ASTORE_OUT_PY_fm_')
conn.score(
    table='feat_scored', 
    copyvars='BAD', 
    out = dict(name='hmeq_scored', replace=True), 
    rstore='ASTORE_OUT_PY_gradBoost_1')
hmeq_scored = conn.CASTable('hmeq_scored')
hmeq_scored = hmeq_scored[['BAD', 'P_BAD1']].to_frame()
hmeq_scored.BAD = hmeq_scored.BAD.astype(int)
hmeq_scored = hmeq_scored.rename(columns={'BAD': '0', 'P_BAD1': '0'})
hmeq_scored = pd.DataFrame(hmeq_scored)

NOTE: Added action set 'astore'.


*** 
## Python XGBOOST Machine Learning Pipeline
Next, we will create a pipeline which will take the input variables, impute missing categorical variables, one-hot-encode categorical variables, and train a xgboost model.

In [8]:
# Separate target from data preparation process
y = hmeqdf.pop('BAD').values
# Find and notes categorical columns
kinds = np.array([dt.kind for dt in hmeqdf.dtypes])
all_columns = hmeqdf.columns.values
is_cat = kinds == 'O'
cat_cols = all_columns[is_cat]
# Impute missing categorical variables with missing
cat_si_step = ('si', SimpleImputer(strategy='constant', fill_value='MISSING'))
# One hot-encode categorical variables
cat_ohe_step = ('ohe', OneHotEncoder(sparse=False, handle_unknown='ignore'))
# Create ML pipe with imputation, one-hot-ecndcoding, and xgboost
cat_steps = [cat_si_step, cat_ohe_step]
cat_pipe = Pipeline(cat_steps)
cat_transformers = [('cat', cat_pipe, cat_cols)]
ct = ColumnTransformer(transformers=cat_transformers)
ml_pipe = Pipeline([('transform', ct), ('xgb', xgb.XGBClassifier())])
ml_pipe.fit(hmeqdf, y)

Pipeline(steps=[('transform',
                 ColumnTransformer(transformers=[('cat',
                                                  Pipeline(steps=[('si',
                                                                   SimpleImputer(fill_value='MISSING',
                                                                                 strategy='constant')),
                                                                  ('ohe',
                                                                   OneHotEncoder(handle_unknown='ignore',
                                                                                 sparse=False))]),
                                                  array(['REASON', 'JOB'], dtype=object))])),
                ('xgb',
                 XGBClassifier(base_score=0.5, booster='gbtree',
                               colsample_bylevel=1, colsample_bynode=1,
                               colsampl..., gpu_id=-1,
                               importance_ty

This is a simple example, but let’s examine our model’s accuracy as a sanity check.

In [9]:
ml_pipe.score(hmeqdf, y)

0.8013422818791947

We now have a working pipeline. Now let's create some metadata around our models and push them into Model Manager.
***
## Register Models into SAS Model Manager
Our next-to-last step in this notebook is to register the models into SAS Model Manager. We are using the sasctl package to make this process easier. To use sasctl, we do need to start a session using our Viya credentials. Next, we use the register model function with our model, a name for our model, the input data set, and a name for the project.
### SWAT Model Management
Below is a quick function definition I created to make generate performance metadata for our model built using SWAT as well as push the model into SAS Model Manager.

In [10]:
def mm_swat(project_name, model_name, model_astore, feature_astore, feature_name, scored_data, target):
    ##################################################
    # CREATE OUTPUT INFORMATION  

    # Output folder name
    output_folder = 'Outputs'

    # Create Folder
    output_path = Path.cwd() / output_folder / model_name
    if not os.path.exists(output_path):
        os.makedirs(output_path)
    
    ##################################################
    # CONNECT TO SAS VIYA
    host_session = 'https://' + hostname + '/'

    # Connect using a session 
    sess=Session(host_session, username, password, verify_ssl=False, protocol="http")

    ##################################################
    # COMPILE METADATA

    # Create JSON Files Object
    JSONFiles = pzmm.JSONFiles()    
    
    # Write Fit Statistics JSON
    JSONFiles.calculateFitStat(trainData=scored_data, jPath=output_path)

    # Write ROC amd Lift Statistics JSON
    JSONFiles.generateROCLiftStat(target, 1, conn, trainData=scored_data, jPath=output_path)
        
    model_content = [output_path / f for f in ('dmcas_fitstat.json', 'dmcas_lift.json', 'dmcas_roc.json')]
    model_content = {f.name: f for f in model_content}

    # Pushing Astore into Model Manager
    register_model(feature_astore, feature_name, project=project_name, force=True, version='latest')
    register_model(model_astore, model_name, project=project_name, force=True, version='latest', files=model_content) 

In [11]:
urllib3.disable_warnings()
# mm_swat(project_name, model_name, model_astore, feature_astore, feature_name, scored_data, target)
mm_swat('HMEQ Loan Default', 'SWAT_DSP_GRADBOOST', model_astore, feature_astore, 'SWAT_DSP_FEATS', hmeq_scored, "BAD")

NOTE: Added action set 'percentile'.
NOTE: Cloud Analytic Services made the uploaded file available as table SCOREDVALUES in caslib CASUSER(sorowl).
NOTE: The table SCOREDVALUES has been created in caslib CASUSER(sorowl) from binary data uploaded to Cloud Analytic Services.
NOTE: Added action set 'astore'.
NOTE: Cloud Analytic Services saved the file _26E50B4B870F46938BFDDD9A2.sashdat in caslib ModelStore.
NOTE: Added action set 'astore'.
NOTE: Cloud Analytic Services saved the file _55D32694629F4D339BD70180B.sashdat in caslib ModelStore.


### Python Model Management 
The function below will also generate the performance metadata and push the Python model into SAS Model Manager as well as generate score code for our Micro Analytic Service (MAS). (Note on the generated score code: it’s best to think of this as a starting point. Additional edits may be needed before publishing the Python model).

In [12]:
def mm_python(project_name, model_name, model, target, input_df, actual_df, desc, model_type):      
    ##################################################
    # CREATE OUTPUT INFORMATION  

    # Output folder name
    output_folder = 'Outputs'

    # Create Folder
    output_path = Path.cwd() / output_folder / model_name
    if not os.path.exists(output_path):
        os.makedirs(output_path)
    
    ##################################################
    # CONNECT TO SAS VIYA
    host_session = 'https://' + hostname + '/'

    # Connect using a session 
    sess=Session(host_session, username, password, verify_ssl=False, protocol="http")

    ##################################################
    # COMPILE METADATA

    # Create JSON Files Object
    JSONFiles = pzmm.JSONFiles()

    # Write Input Variable JSON 
    JSONFiles.writeVarJSON(input_df, isInput=True, jPath=output_path)

    # Mock up Output Variables
    yCategory = actual_df.astype('str') # Ensuring dataframe length matches
    output_df = pd.DataFrame(columns=['EM_EVENTPROBABILITY', 'EM_CLASSIFICATION']) # Ensuring column names are what SAS expects
    output_df['EM_CLASSIFICATION'] = yCategory # Ensuring data type is nominal 
    output_df['EM_EVENTPROBABILITY'] = 0.5 # Ensuring data type is decimal  

    # Write Output Variable JSON
    JSONFiles.writeVarJSON(output_df, isInput=False, jPath=output_path)

    # Write File Metadata JSON
    JSONFiles.writeFileMetadataJSON(model_name, jPath=output_path)

    # Write Model Properties JSON
    JSONFiles.writeModelPropertiesJSON(modelName=model_name, 
        modelDesc=desc,
        targetVariable=target,
        modelType=model_type,
        modelPredictors=input_df.columns.array,
        targetEvent='1',
        numTargetCategories=2,
        eventProbVar='EM_EVENTPROBABILITY',
        jPath=output_path,
        modeler=username)

    # Get predictions
    trainProba = model.predict_proba(input_df)

    # Creating Assessment Data 
    trainData = pd.concat([pd.DataFrame(actual_df).reset_index(drop=True), pd.Series(data=trainProba[:,1])], axis=1)

    # Write Fit Statistics JSON
    JSONFiles.calculateFitStat(trainData=trainData, jPath=output_path)

    # Write ROC amd Lift Statistics JSON
    JSONFiles.generateROCLiftStat(target, 1, conn, trainData=trainData, jPath=output_path)

    ##################################################
    # PICKLE MODEL 
    pzmm.PickleModel.pickleTrainedModel(_, model, model_name, output_path)

    ##################################################
    # WRITE SCORE CODE 

    # Generate Score Code Object
    ScoreCode = pzmm.ScoreCode()

    # Write Score Code
    ScoreCode.writeScoreCode(input_df, actual_df, model_name, '{}.predict_proba({})', model_name + '.pickle', pyPath=output_path)

    ##################################################
    # ZIP FOLDER
    zipIOFile = pzmm.ZipModel.zipFiles(fileDir=output_path, modelPrefix=model_name)

    ##################################################
    # PUSH ZIP FOLDER INTO MODEL MANAGER
    with sess: 
        modelRepo.import_model_from_zip(model_name, project_name, zipIOFile)

In [13]:
urllib3.disable_warnings()

# mm_python(project_name, model_name, model, scored_data, target, input_df, actual_df, desc, model_type)
mm_python('HMEQ Loan Default', 'HMEQ_XGBOOST', ml_pipe , 'BAD', hmeqdf, y, 'Generated in Jupyter Notebook', 'XGBOOST')

NOTE: Added action set 'percentile'.
NOTE: Cloud Analytic Services made the uploaded file available as table SCOREDVALUES in caslib CASUSER(sorowl).
NOTE: The table SCOREDVALUES has been created in caslib CASUSER(sorowl) from binary data uploaded to Cloud Analytic Services.


*** 
## Accessing Models via API
Now, we can head over to SAS Viya to manage our model using the visual interface. But, you don't need to use the visual interface in SAS Viya if you don't want to! You can actually access most of the model management capabilities through the SAS open source package SASCTL. For whatever isn't included in the open source package, you can do through API!

In this section, we will use the SAS Viya APIs to look at one of our deployed models. To use this notebook on your own, make sure you have deployed one of your models, either through the visual interface or programmatically. To get started, we need to generate an authorization token.

In [14]:
from requests import request
import urllib3
urllib3.disable_warnings()

# Get token
host = 'http://' + hostname 
url = host + '/SASLogon/oauth/token' 
r = request('POST', url,
            data='grant_type=password&username=%s&password=%s' %(username, password),
            headers={
                'Accept': 'application/json',
                'Content-Type': 'application/x-www-form-urlencoded'
            },
            auth=('sas.ec', ''),
            verify=False)
token = r.json()['access_token']

Let's pull the specific information for our deployed module.

In [15]:
deployed_name= input("Deployed Name:")

Deployed Name: hmeq_champ


In [16]:
headers = {'Authorization': 'Bearer ' + token}

url = host + '/microanalyticScore/modules/' + deployed_name + '/steps'

r = request('GET', url, params={}, headers = headers, verify=False)

r.json()

{'links': [{'method': 'GET',
   'rel': 'collection',
   'href': '/microanalyticScore/modules/hmeq_champ/steps',
   'uri': '/microanalyticScore/modules/hmeq_champ/steps',
   'type': 'application/vnd.sas.collection'},
  {'method': 'GET',
   'rel': 'self',
   'href': '/microanalyticScore/modules/hmeq_champ/steps?start=0&limit=20',
   'uri': '/microanalyticScore/modules/hmeq_champ/steps?start=0&limit=20',
   'type': 'application/vnd.sas.collection'},
  {'method': 'POST',
   'rel': '/microanalyticScore/modules/hmeq_champ/steps',
   'href': '/microanalyticScore/modules/hmeq_champ/steps',
   'uri': '/microanalyticScore/modules/hmeq_champ/steps',
   'type': 'application/vnd.sas.microanalytic.module.definition',
   'responseType': 'application/vnd.sas.microanalytic.module'}],
 'name': 'steps',
 'accept': 'application/vnd.sas.microanalytic.module.step',
 'count': 3,
 'items': [{'links': [{'method': 'GET',
     'rel': 'up',
     'href': '/microanalyticScore/modules/hmeq_champ/steps',
     'uri': 

To close out, let's use that information to score some data.

In [17]:
# Prepare payload
payload = '{"inputs":[{"name":"clage", "value": 12}, {"name":"clno", "value": 0}, {"name":"debtinc", "value": 25}, {"name":"delinq", "value": 1}, {"name":"derog", "value": 1},{"name":"job", "value": "Other"}, {"name":"loan", "value": 8000}, {"name":"mortdue", "value": 80000}, {"name":"ninq", "value": 0}, {"name":"reason", "value": "HomeImp"}, {"name":"value", "value": 100000}, {"name":"yoj", "value": 4}]}'
    
# Send request
headers = {'Content-Type': 'application/vnd.sas.microanalytic.module.step.input+json', 
               'Authorization': 'Bearer ' + token}
    
url = host + '/microanalyticScore/modules/' + deployed_name + '/steps/score'
        
r = request('POST', url, data=payload, headers = headers, verify=False)
r.json()

{'links': [],
 'version': 2,
 'moduleId': 'hmeq_champ',
 'stepId': 'score',
 'executionState': 'completed',
 'outputs': [{'name': 'EM_CLASSIFICATION', 'value': '           1'},
  {'name': 'EM_EVENTPROBABILITY', 'value': 0.8357853808167168},
  {'name': 'EM_PROBABILITY', 'value': 0.8357853808167168},
  {'name': 'I_BAD', 'value': '           1                    '},
  {'name': 'P_BAD0', 'value': 0.16421461918328317},
  {'name': 'P_BAD1', 'value': 0.8357853808167168},
  {'name': '_P_', 'value': None},
  {'name': '_WARN_', 'value': '    '}]}

In the JSON response above, we can pull out our model’s classification as well as probability of default. By utilizing APIs and JSON, we now can easily embed analytically models into other applications to aid in smart decision making. We were also able to get this response back in under a second. This rapid response allows us to quickly return analytical results to the systems that depend on them.  Analytics can be made more accessible one API at a time! 
***
## Conclusion
We have gone over a very simple modeling example using both Python and SAS analytics. We pushed both models into the SAS model repository and accessed a deployed model from API. 

In [18]:
conn.close()

***