Copyright © 2023, SAS Institute Inc., Cary, NC, USA.  All Rights Reserved.
SPDX-License-Identifier: Apache-2.0

# USA Housing Dataset : Build and Import Trained Regression Model into SAS Model Manager

This notebook provides an example of how to build and train a simple Python model and then import the model into SAS Model Manager (on either SAS Viya 3.5 or SAS Viya 4) using the USA Housing data set. Lines of code that must be modified by the user, such as directory paths or the host server are noted with the comment "_Changes required by user._"

_**Note:** If you download only this notebook and not the rest of the repository, you must also download the hmeq.csv file from the data folder in the examples directory. These files are used when executing this notebook example._

Here are the steps shown in this notebook:

1. Import, review, and preprocess data for model training.
2. Build, train, and assess a scikit-learn linear regression model.
3. Serialize the model into a pickle file.
4. Write the metadata JSON files needed for importing into SAS Model Manager as well as optional files for fit statistics and ROC/Lift charts.
4. Write a score code Python file for model scoring.
5. Zip the pickle, JSON, and score code files into an archive file.
6. Import the ZIP archive file to SAS Model Manager via the Session object and relevant function call.

### Python Package Imports

In [1]:
# Standard Library
from pathlib import Path
import warnings

# Third Party
import matplotlib.pyplot as plt 
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Application specific
import sasctl.pzmm as pzmm
from sasctl import Session
from sasctl.services import model_repository as modelRepo

In [2]:
# Global Package Options
pd.options.mode.chained_assignment = None  # default="warn"
plt.rc("font", size=14)
# Ignore warnings from pandas about SWAT using a feature that will be depreciated soon
warnings.simplefilter(action="ignore", category=FutureWarning)

### Import and Review Data Set

In [3]:
housing_data = pd.read_csv("data/USA_Housing.csv",sep= ",")
housing_data.shape

(5000, 7)

In [4]:
housing_data = housing_data.drop(["Address"], axis=1)
housing_data.head()

Unnamed: 0,Avg_Area_Income,Avg_Area_House_Age,Avg_Area_Number_of_Rooms,Avg_Area_Number_of_Bedrooms,Area_Population,Price
0,79545.45857,5.682861,7.009188,4.09,23086.8005,1059034.0
1,79248.64245,6.0029,6.730821,3.09,40173.07217,1505891.0
2,61287.06718,5.86589,8.512727,5.13,36882.1594,1058988.0
3,63345.24005,7.188236,5.586729,3.26,34310.24283,1260617.0
4,59982.19723,5.040555,7.839388,4.23,26354.10947,630943.5


In [5]:
housing_data.columns

Index(['Avg_Area_Income', 'Avg_Area_House_Age', 'Avg_Area_Number_of_Rooms',
       'Avg_Area_Number_of_Bedrooms', 'Area_Population', 'Price'],
      dtype='object')

### Preprocess Data

In [6]:
# Input 
predictor_columns = ["Avg_Area_Income", "Avg_Area_House_Age", "Avg_Area_Number_of_Rooms", 
                    "Avg_Area_Number_of_Bedrooms", "Area_Population"]

# Target
target_column = "Price"
x = housing_data[predictor_columns]
y = housing_data[target_column]

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=42)

# For missing values, impute the data set's mean value
x_test.fillna(x_test.mean(), inplace=True)
x_train.fillna(x_train.mean(), inplace=True)
print(x_test.shape)
print(x_train.shape)

(1500, 5)
(3500, 5)


### Create, Train, and Assess Model

In [7]:
# Linear Regression Training
lrm = LinearRegression(normalize=True)
lrm.fit(x_train, y_train)

LinearRegression(normalize=True)

In [8]:
# Test Predictions
from sklearn import metrics
lrm_predict = lrm.predict(x_test)
print("Linear Regression Model Accuracy = " + str(np.round(metrics.r2_score(y_test, lrm_predict)*100,2)) + "%")

Linear Regression Model Accuracy = 91.47%


### Zip file for registering into SAS Model Manager

In [9]:
# Model name within SAS Model Manager
model_prefix = "LinearRegression"
# Directory location for the model files
zip_folder = Path.cwd() / "data/USAHousingModels/LinearRegression" # Changes required by user
# Output variables expected in SAS Model Manager
score_metrics = ["EM_PREDICTION"]

pzmm.PickleModel.pickle_trained_model(
    model_prefix=model_prefix,
    trained_model=lrm,
    pickle_path=zip_folder
)

Model LinearRegression was successfully pickled and saved to ~\data\USAHousingModels\LinearRegression\LinearRegression.pickle.


In [10]:
def write_json_files(data, predict, target, path, prefix):    
    # Write input variable mapping to a json file
    pzmm.JSONFiles.write_var_json(input_data=data[predict], is_input=True, json_path=path)
    
    # Set output variables and assign an event threshold, then write output variable mapping
    output_var = pd.DataFrame(columns=score_metrics, data=[[0.5]]) # data argument includes example expected types for outputs
    pzmm.JSONFiles.write_var_json(output_var, is_input=False, json_path=path)
        
    # Write model properties to a json file
    pzmm.JSONFiles.write_model_properties_json(
        model_name=prefix, 
        target_variable=target, # Target variable to make predictions about 
        json_path=path, 
        model_desc=f"Description for the {prefix} model.",
        model_algorithm="",
        modeler="sasdemo",
    )
    
    # Write model metadata to a json file so that SAS Model Manager can properly identify all model files
    pzmm.JSONFiles.write_file_metadata_json(model_prefix=prefix, json_path=path)

write_json_files(housing_data, predictor_columns, target_column, zip_folder, model_prefix)

inputVar.json was successfully written and saved to ~\data\USAHousingModels\LinearRegression\inputVar.json
outputVar.json was successfully written and saved to ~\data\USAHousingModels\LinearRegression\outputVar.json
ModelProperties.json was successfully written and saved to ~\data\USAHousingModels\LinearRegression\ModelProperties.json
fileMetadata.json was successfully written and saved to ~\data\USAHousingModels\LinearRegression\fileMetadata.json


In [11]:
import getpass
username = getpass.getpass()
password = getpass.getpass()
host = "demo.sas.com" # Changes required by user
sess = Session(host, username, password, protocol="http") # For TLS-enabled servers, change protocol value to "https"

In [12]:
model_response = pzmm.ImportModel.import_model(
    model_files=zip_folder, # Where are the model files?
    model_prefix=model_prefix, # What is the model name?
    project="RegressionModelExample", # What is the project name?
    input_data=x, # What does example input data look like?
    predict_method=[lrm.predict, [float]], # What is the predict method and what does it return?
    score_metrics=score_metrics, # What are the output variables?
    overwrite_model=True, # Overwrite the model if it already exists?
    model_file_name=model_prefix + ".pickle", # How was the model file serialized?
    missing_values=True # Does the data include missing values?
)

Model score code was written successfully to ~\data\USAHousingModels\LinearRegression\score_LinearRegression.py and uploaded to SAS Model Manager.
All model files were zipped to ~\data\USAHousingModels\LinearRegression.


  warn(f"No project with the name or UUID {project} was found.")


A new project named RegressionModelExample was created.
Model was successfully imported into SAS Model Manager as LinearRegression with the following UUID: b9936acb-6668-4b41-906f-171954f74d30.
