Copyright © 2020, SAS Institute Inc., Cary, NC, USA.  All Rights Reserved.
SPDX-License-Identifier: Apache-2.0

# Build and Import a Trained Model into SAS Open Model Manager or SAS Model Manager

This notebook provides an example of how to build and train a Python model and then import the model into SAS Open Model Manager. Lines of code that must be modified by the user, such as directory paths are noted with the comment "_Changes required by user._".

_**Note:** If you download only this notebook and not the rest of the repository, you must also download the hmeq.csv, hmeqPrediction.csv, and dmcas_fitstat.csv files from the [/samples/Python_Models/DTree_sklearn_PyPickleModel/Data](../samples/Python_Models/DTree_sklearn_PyPickleModel/Data) directory. These files are used when executing this notebook example._

Here are the steps:

1. Build and train a model.
2. Serialize the model into a pickle file and deploy the pickle file into SAS Open Model Manager.
3. Write JSON files that are associated with the trained model and write the model score code .py file. Also, write JSON files for one of the following data options:
   (a) Generate Fit Statistics from user-defined input.
   (b) Calculate Fit Statistics, ROC curve and Lift information from data.
4. Write a score code Python file for model scoring in SAS Open Model Manager or SAS Model Manager.
5. Zip the pickle, JSON, and score code files into an archive file.
6. Import the ZIP archive file to SAS Open Model Manager or SAS Model Manager via an API call.

### Step 1: Build and Train a Model

In [None]:
from pathlib import Path
import pandas as pd

import sklearn.tree as tree
from sklearn.model_selection import train_test_split

In [None]:
dataFolder = Path.cwd() / '../samples/Python_Models/DTree_sklearn_PyPickleModel/Data/' # Changes required by user.
zipFolder = Path.cwd() / '../samples/Python_Models/DTree_sklearn_PyPickleModel/Model/' # Changes required by user.
modelPrefix  = 'hmeqClassTree'

In [None]:
yName = 'BAD'
catName = ['JOB', 'REASON']
intName = ['CLAGE', 'CLNO', 'DEBTINC', 'DELINQ', 'DEROG', 'NINQ', 'YOJ']

inputData = pd.read_csv((Path(dataFolder) / 'hmeq.csv'), sep=',',
                        usecols=[yName]+catName+intName)

In [None]:
useColumn = [yName]
useColumn.extend(catName + intName)
inputData = inputData[useColumn].dropna()

xTrain, xTest, yTrain, yTest = train_test_split(inputData, inputData[yName],
                                                test_size=0.2, random_state=42)

In [None]:
model = tree.DecisionTreeClassifier(criterion='entropy', max_depth=5,
                                    min_samples_split=20,
                                    min_samples_leaf=10,
                                    random_state=42)
print(model)

In [None]:
x = pd.get_dummies(xTrain[catName].astype('category'))
x = x.join(xTrain[intName])
y = yTrain.astype('category')
trainedModel = model.fit(x, y)

In [None]:
yCategory = y.cat.categories
outputVar = pd.DataFrame(columns=['EM_EVENTPROBABILITY', 'EM_CLASSIFICATION'])
outputVar['EM_CLASSIFICATION'] = yCategory.astype('str')
outputVar['EM_EVENTPROBABILITY'] = 0.5

### Step 2: Serialize a Model Into a Pickle File

In [None]:
import pzmm

In [None]:
pzmm.PickleModel.pickleTrainedModel(trainedModel, modelPrefix, zipFolder)

### Step 3: Write JSON Model Files

In [None]:
JSONFiles = pzmm.JSONFiles()
JSONFiles.writeVarJSON(inputData[catName+intName], isInput=True, jPath=zipFolder)

JSONFiles.writeVarJSON(outputVar, isInput=False, jPath=zipFolder)

modelName = 'Home Equity Loan Classification Tree'
JSONFiles.writeModelPropertiesJSON(modelName=modelName,
                                   modelDesc='',
                                   targetVariable=yName,
                                   modelType='tree',
                                   modelPredictors=(catName + intName),
                                   targetEvent=yCategory[1].astype('str'),
                                   numTargetCategories=len(yCategory),
                                   eventProbVar='EM_EVENTPROBABILITY',
                                   jPath=zipFolder,
                                   modeler='sasdemo')

JSONFiles.writeFileMetadataJSON(modelPrefix, jPath=zipFolder)

In [None]:
# (a) Writes Fit Statistics to dmcas_fitstat.json file from user-defined input.
# This cell can be skipped if calculating statistics automatically from data.
fitStatTuples = [('GAMMA', 1.65412, 'TRAIN'),
                 ('NObs', 176, 'TEST'),
                 ('MCLL', .196882, 'VALIDATE')]
csvPath = dataFolder / 'dmcas_fitstat.csv' # Changes required by user.
JSONFiles = pzmm.JSONFiles()
JSONFiles.writeBaseFitStat(csvPath=csvPath,
                           jPath=zipFolder,
                           userInput=True,
                           tupleList=fitStatTuples)

In [None]:
# (b) Calculates Fit Statistics, ROC curve and Lift information from data to create the relevant JSON files.
# This cell can be skipped if statistics were defined by the user,
targetName = 'BAD'
targetValue = 1
csvPath = dataFolder / 'hmeqPrediction.csv'
df = pd.read_csv(csvPath)
yTrainActual = df.yActual.to_list()
yTrainPredict = df.yPredict.to_list()
data = [(None, None),
        (yTrainActual, yTrainPredict),
        (None, None)]
JSONFiles = pzmm.JSONFiles()
JSONFiles.calculateFitStat(data, zipFolder)
JSONFiles.generateROCStat(data, targetName, zipFolder)
JSONFiles.generateLiftStat(data, targetName, targetValue, zipFolder)

### Step 4: Generate Score Code

In [None]:
ScoreCode = pzmm.ScoreCode()

df = pd.read_csv(dataFolder / 'hmeq.csv') # Changes required by user.
inputDF = df.drop(['BAD', 'LOAN', 'MORTDUE', 'VALUE'], axis=1)
targetDF = df['BAD']
ScoreCode.writeScoreCode(inputDF, targetDF, modelPrefix,
                         '{}.predict({})', modelPrefix + '.pickle',
                         pyPath=zipFolder)

### Step 5: Zip Model and Relevant Files

In [None]:
pzmm.ZipModel.zipFiles(fileDir=zipFolder, modelPrefix=modelPrefix)

### Step 6: Import Model into SAS Open Model Manager

In [None]:
host = 'myserver.com' # Changes required by user.
username = 'myusername' # Changes required by user.
password = 'mypassword' # Changes required by user.
ModelImport = pzmm.ModelImport(host)

In [None]:
zPath = Path(zipFolder) / (modelPrefix + '.zip')
projectID = '00000a00-0000-0000-0aaa-0000000aa0a0' # Changes required by user.
try:
    ModelImport.importModel(modelPrefix, projectID=projectID,
                            zPath=zPath, username=username, password=password)
except NameError:
    ModelImport.importModel(modelPrefix, projectID=projectID, zPath=zPath)