# Registering Open Source Models to SAS Viya from SAS Viya Workbench

If you have ever registered an open source model from your own personal machine into SAS Viya, then you'll find the process to be identical. There is nothing special that you need to do differently from a code perspective: as long as SAS Viya Workbench can talk to your SAS Viya Server, then you can register the model the same way you have always done.

The best way to find out if you can talk with your SAS Viya server is by making a simple request to see if you can at least get a 200 response from SASLogon. If you can, you're good. If you can't, your administrator can work with you to make sure SAS Viya Workbench can talk to your SAS Viya Server.

In this notebook, we'll run through an example where we:
* Build an XGBoost model on HMEQ
* Write all the files necessary for SAS Model Manager using [sasctl](https://sassoftware.github.io/python-sasctl/api/sasctl.html) and [pzmm](https://sassoftware.github.io/python-sasctl/api/sasctl.pzmm.html)
* Register the model to SAS Model Manager

Expected directory structure:
 * Data: `/workspaces/myfolder/data`
 * Model: `/workspaces/myfolder/models`

In [None]:
import requests

host = 'https://my-viya-server.com'
resp = requests.get(f'{host}/SASLogon', verify=False)

if resp.status_code == 200:
    print('Status: 200. You can successfully communicate with the server.')
else:
    print("Received a non-200 status code:", resp.status_code)

In [1]:
import pandas as pd
import xgboost as xgb
import getpass
from sasctl import pzmm as pzmm
from sasctl import Session
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report

### Load Data

[Download hmeq here](https://support.sas.com/documentation/onlinedoc/viya/examples.htm)

In [2]:
df_hmeq = pd.read_csv('/workspaces/myfolder/data/hmeq.csv')
df_hmeq = pd.get_dummies(df_hmeq, drop_first=True, dtype='int')

df_hmeq

Unnamed: 0,BAD,LOAN,MORTDUE,VALUE,YOJ,DEROG,DELINQ,CLAGE,NINQ,CLNO,DEBTINC,REASON_HomeImp,JOB_Office,JOB_Other,JOB_ProfExe,JOB_Sales,JOB_Self
0,1,1100,25860.0,39025.0,10.5,0.0,0.0,94.366667,1.0,9.0,,1,0,1,0,0,0
1,1,1300,70053.0,68400.0,7.0,0.0,2.0,121.833333,0.0,14.0,,1,0,1,0,0,0
2,1,1500,13500.0,16700.0,4.0,0.0,0.0,149.466667,1.0,10.0,,1,0,1,0,0,0
3,1,1500,,,,,,,,,,0,0,0,0,0,0
4,0,1700,97800.0,112000.0,3.0,0.0,0.0,93.333333,0.0,14.0,,1,1,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5955,0,88900,57264.0,90185.0,16.0,0.0,0.0,221.808718,0.0,16.0,36.112347,0,0,1,0,0,0
5956,0,89000,54576.0,92937.0,16.0,0.0,0.0,208.692070,0.0,15.0,35.859971,0,0,1,0,0,0
5957,0,89200,54045.0,92924.0,15.0,0.0,0.0,212.279697,0.0,15.0,35.556590,0,0,1,0,0,0
5958,0,89800,50370.0,91861.0,14.0,0.0,0.0,213.892709,0.0,16.0,34.340882,0,0,1,0,0,0


### Train/Validation/Test Split

__y__: BAD (1/0)

__x__: All other variables in HMEQ

In [3]:
X = df_hmeq.drop('BAD', axis=1)
y = df_hmeq['BAD']
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4, random_state=42)
X_valid, X_test, y_valid, y_test = train_test_split(X_temp, y_temp, test_size=0.25, random_state=42)

### Train XGBoost model

Fit against the training data, but evaluate against the test set to find when it starts overfitting

In [4]:
xgb_eval = xgb.XGBClassifier(
    objective="binary:logistic", 
    random_state=42, 
    early_stopping_rounds=5, 
    n_estimators=1000
)

xgb_eval.fit(X_train, y_train, eval_set=[(X_valid, y_valid)])

[0]	validation_0-logloss:0.40697
[1]	validation_0-logloss:0.35823
[2]	validation_0-logloss:0.32542
[3]	validation_0-logloss:0.30659
[4]	validation_0-logloss:0.29494
[5]	validation_0-logloss:0.28545
[6]	validation_0-logloss:0.27571
[7]	validation_0-logloss:0.27044
[8]	validation_0-logloss:0.26432
[9]	validation_0-logloss:0.25727
[10]	validation_0-logloss:0.25240
[11]	validation_0-logloss:0.24832
[12]	validation_0-logloss:0.24631
[13]	validation_0-logloss:0.24189
[14]	validation_0-logloss:0.23631
[15]	validation_0-logloss:0.23514
[16]	validation_0-logloss:0.23290
[17]	validation_0-logloss:0.23238
[18]	validation_0-logloss:0.23130
[19]	validation_0-logloss:0.23017
[20]	validation_0-logloss:0.22913
[21]	validation_0-logloss:0.22707
[22]	validation_0-logloss:0.22441
[23]	validation_0-logloss:0.22248
[24]	validation_0-logloss:0.22008
[25]	validation_0-logloss:0.21877
[26]	validation_0-logloss:0.21851
[27]	validation_0-logloss:0.21868
[28]	validation_0-logloss:0.21835
[29]	validation_0-loglos

### Fit final XGBoost model

Based on the early stopping round we found when evaluating the initial model, train against the train + validation dataset for that many iterations. Use the test dataset as our final unbiased performance test.

In [5]:
# Fit with full data but stop on the best iteration
xgb_model = xgb.XGBClassifier(
    objective="binary:logistic", 
    random_state=42, 
    n_estimators=xgb_eval.get_booster().best_iteration + 1
)

xgb_model.fit(pd.concat([X_train, X_valid]), pd.concat([y_train, y_valid]))

y_pred = xgb_model.predict(X_test)

print('Confusion Matrix')
print(confusion_matrix(y_test, y_pred))
print('\n')
print('Classification Report')
print(classification_report(y_test, y_pred))

Confusion Matrix
[[468  12]
 [ 33  83]]


Classification Report
              precision    recall  f1-score   support

           0       0.93      0.97      0.95       480
           1       0.87      0.72      0.79       116

    accuracy                           0.92       596
   macro avg       0.90      0.85      0.87       596
weighted avg       0.92      0.92      0.92       596



### Create necessary files for SAS Model Manager

The code below will use the Python Zip Model Management (pzmm) module to do the following:
* Generate the necessary files SAS Model Manager needs
* Create scoring code
* Load the model into SAS Model Manager

For more details registering open source models to SAS Model Manager, see:
* [Open Source Models in the SAS Viya Platform](https://medium.com/@cktaylor364/open-source-models-in-the-sas-viya-platform-fd87c880ccab)
* [sasctl pzmm GitHub examples](https://github.com/sassoftware/python-sasctl/tree/master/src/sasctl/pzmm)

### Properties for importing into SAS Model Manager

In [6]:
prefix = 'XGBoost' # Model name
model_desc = "XGBoost model for hmeq" # Model description
project = "HMEQ Models" # Name of project
modeler = input('Enter modeler username')
model   = xgb_model # Model instance
data    = df_hmeq   # Data for model
inputs  = X.columns # Input columns
target  = 'BAD'     # Target variable
target_values = ["0", "1"] # Target values: 0/1 for HMEQ
target_cols   = ["EM_CLASSIFICATION", "EM_EVENTPROBABILITY"] # Model output variables
model_path    = '/workspaces/myfolder/models' # Path to model files

In [None]:
# Write pickle file
pzmm.PickleModel.pickle_trained_model(model_prefix=prefix, trained_model=model, pickle_path=model_path)

# Write inputs SAS Model Manager expects
pzmm.JSONFiles.write_var_json(input_data=data[inputs], is_input=True, json_path=model_path)

# Write outputs SAS Model Manager expects
output_var = pd.DataFrame(columns=target_cols, data=[["A", 0.5]])
pzmm.JSONFiles.write_var_json(output_var, is_input=False, json_path=model_path)

# Write metadata so that SAS Model Manager knows what each file is
pzmm.JSONFiles.write_file_metadata_json(model_prefix=prefix, json_path=model_path)

# Write model properties
pzmm.JSONFiles.write_model_properties_json(
    model_name=prefix,      
    target_variable=target,      # Target variable to make predictions about (BAD in this case)
    target_values=target_values, # Possible values for the target variable (1 or 0 for binary classification of BAD)
    json_path=model_path,        # Where are all the JSON files?
    model_desc=model_desc,       # Describe the model
    model_algorithm="Ensemble",  # What kind of algorithm is it?
    modeler=modeler # Who made the model?
)

Model XGBoost was successfully pickled and saved to /workspaces/myfolder/models/XGBoost.pickle.
inputVar.json was successfully written and saved to /workspaces/myfolder/models/inputVar.json
outputVar.json was successfully written and saved to /workspaces/myfolder/models/outputVar.json
fileMetadata.json was successfully written and saved to /workspaces/myfolder/models/fileMetadata.json
ModelProperties.json was successfully written and saved to /workspaces/myfolder/models/ModelProperties.json


### Start a SAS Viya Session

You must first connect to your server to register the model. Depending upon your server setup, you can do this a number of ways:
- Username/password
- Client ID and secret
- OAuth token

Your administrator can help you figure out the most appropriate authentication method.

For more information, see: 
[sasctl.Session](https://sassoftware.github.io/python-sasctl/api/sasctl.session.html)

In [None]:
sess = Session(
    'https://my-viya-server.com',
    username=input('Enter username'),
    password=getpass.getpass('Enter password'),
    protocol='https', 
    verify_ssl=False
)

### Register the model to SAS Model Manager

`pzmm.ImportModel.import_model()` will do all the heavy lifting:
- Log onto the server
- Create a new project if one has not been created yet
- Zip up all the files
- Import the model and all necessary files into SAS Model Manager

In [None]:
pzmm.ImportModel.import_model(
    model_files    = model_path,    # Where are the model files?
    model_prefix   = prefix,        # What is the model name?
    project        = project,       # What is the project name?
    input_data     = X,             # What does example input data look like?
    predict_method = [xgb_model.predict_proba, [int, int]], # What is the predict method and what does it return?
    overwrite_model= True,          # Overwrite the model if it already exists?
    score_metrics  = target_cols,   # What are the output variables?
    target_values  = target_values, # What are the expected values of the target variable?
    target_index   = 1,             # What is the index of the target value in target_values?
    model_file_name= prefix + ".pickle", # How was the model file serialized?
    missing_values = True           # Does the data include missing values?
)