### Regression on California Housing Dataset

I have **split** the initial demo in two Notebooks:
* a first one, where I train the model (using XGBoost) and save the model in JSON format
* this NB where the trained model, serialized as JSON, is saved to the Model Catalog and then deployed as REST service

It requires ADS vers. >= 2.5.9

In [1]:
import pandas as pd
import numpy as np

# the dataset used for the example
from sklearn.datasets import fetch_california_housing

from sklearn.model_selection import train_test_split

# the GBM used
import xgboost as xgb

import os
from ads import set_auth
import ads

# for new ADS model deployment features we need this
#
from ads.model.framework.xgboost_model import XGBoostModel
from ads.common.model_metadata import UseCaseType
from ads.catalog.model import ModelCatalog

In [2]:
# check we have the minimum ADS version (april 2022)
assert ads.__version__ >= "2.5.9"

In this initial part the dataset is loaded and a test dataset i created, to use it to test the model after deployment.

The (XGBoost) model is loaded from a JSON file produced by **train_california_housing_simplified** NB

### Load the dataset

In [3]:
# load the dataset
housing = fetch_california_housing(as_frame=True)

orig_df = housing.frame

In [4]:
orig_df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,MedHouseVal
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


### some data preparation

In [5]:
# In this example I'll use all the columns (ex MedHouseVal) as features, except Lat, Long, to simplify

TARGET = "MedHouseVal"
all_cols = list(orig_df.columns)
cols_to_drop = ['Latitude', 'Longitude']

cat_cols = ['HouseAge']

# take care, I have sorted
FEATURES = sorted(list(set(all_cols) - set([TARGET])- set(cols_to_drop)))

# for LightGBM
cat_columns_idxs = [i for i, col in enumerate(FEATURES) if col in cat_cols]

FEATURES

['AveBedrms', 'AveOccup', 'AveRooms', 'HouseAge', 'MedInc', 'Population']

In [6]:
# the only important thing is that we have 1 categorical column: HouseAge

# we will code categorical as integer starting from zero
# in this case it is easy, since the minimum is 1... so we need only to subtract 1

In [7]:
# make a copy before any changes
used_df = orig_df.copy()

used_df['HouseAge'] = used_df['HouseAge'] - 1.

used_df['HouseAge'] = used_df['HouseAge'].astype(int)
used_df['HouseAge'] = used_df['HouseAge'].astype("category")

In [8]:
# let's make a simple train/test split
X = used_df[FEATURES].values
y = used_df[TARGET].values

TEST_SIZE = 0.2

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=TEST_SIZE, random_state=1234)

### Load the model saved in JSON format

In [9]:
MODEL_FILE_NAME = 'housing.json'

model = xgb.Booster()

model.load_model(MODEL_FILE_NAME)

### Prepare for Model Catalog

In [10]:
PATH_ARTEFACT = f"./model-files"

if not os.path.exists(PATH_ARTEFACT):
    os.mkdir(PATH_ARTEFACT)

In [11]:
set_auth(auth='resource_principal')

xgb_model = XGBoostModel(estimator=model, artifact_dir= PATH_ARTEFACT)

In [12]:
# 1. prepare
xgb_model.prepare(
    inference_conda_env="generalml_p37_cpu_v1",
    training_conda_env="generalml_p37_cpu_v1",
    use_case_type=UseCaseType.REGRESSION,
    X_sample=X_test,
    y_sample=y_test,
    force_overwrite=True
)

In [13]:
# be aware that the XGBModel is saved as a JSON file, in model-files

In [14]:
# 2. verify
print(xgb_model.verify(X_test[:10]))

# compare with expected values
print()
print(f"Expected: {y_test[:10]}")

Start loading model.json from model directory /home/datascience/data-science-bp/model-files ...
Model is successfully loaded.
{'prediction': [4.162364959716797, 3.1165318489074707, 3.8856496810913086, 1.3761677742004395, 3.2547011375427246, 1.7698906660079956, 1.7593719959259033, 3.1433591842651367, 0.922572135925293, 0.7925826907157898]}

Expected: [5.    2.939 4.125 1.576 3.041 1.    2.187 2.581 0.714 0.838]


In [15]:
# we can check the list of steps
xgb_model.summary_status()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Actions Needed
Step,Status,Details,Unnamed: 3_level_1
initiate,Done,Initiated the model,
prepare(),Done,Generated runtime.yaml,
prepare(),Done,Generated score.py,
prepare(),Done,Serialized model,
prepare(),Done,"Populated metadata(Custom, Taxonomy and Provenance)",
verify(),Done,Local tested .predict from score.py,
save(),Available,Conducted Introspect Test,
save(),Available,Uploaded artifact to model catalog,
deploy(),Not Available,Deployed the model,
predict(),Not Available,Called deployment predict endpoint,


In [16]:
# 3. introspect to do some checks
xgb_model.introspect()

['model.json', 'output_schema.json', 'input_schema.json', 'runtime.yaml', 'test_json_output.json', 'score.py']


Unnamed: 0,Test key,Test name,Result,Message
0,runtime_env_path,Check that field MODEL_DEPLOYMENT.INFERENCE_ENV_PATH is set,Passed,
1,runtime_env_python,Check that field MODEL_DEPLOYMENT.INFERENCE_PYTHON_VERSION is set to a value of 3.6 or higher,Passed,
2,runtime_path_exist,Check that the file path in MODEL_DEPLOYMENT.INFERENCE_ENV_PATH is correct.,Passed,
3,runtime_version,Check that field MODEL_ARTIFACT_VERSION is set to 3.0,Passed,
4,runtime_yaml,"Check that the file ""runtime.yaml"" exists and is in the top level directory of the artifact directory",Passed,
5,score_load_model,Check that load_model() is defined,Passed,
6,score_predict,Check that predict() is defined,Passed,
7,score_predict_arg,Check that all other arguments in predict() are optional and have default values,Passed,
8,score_predict_data,"Check that the only required argument for predict() is named ""data""",Passed,
9,score_py,"Check that the file ""score.py"" exists and is in the top level directory of the artifact directory",Passed,


In [17]:
# seems everything is OK

### Save the Model to the Model Catalog

In [18]:
# 4. after all needed changes to score.py you can save to model catalog
model_id = xgb_model.save(display_name = "cal_housing_new2", description = "new way of model deployment")

Start loading model.json from model directory /home/datascience/data-science-bp/model-files ...
Model is successfully loaded.
['model.json', 'output_schema.json', 'input_schema.json', 'runtime.yaml', 'test_json_output.json', 'score.py']


loop1:   0%|          | 0/5 [00:00<?, ?it/s]

artifact:/tmp/saved_model_898864aa-3a41-4f6f-9d80-e9511b6c2c73.zip


In [19]:
# just to see, check again status
xgb_model.summary_status()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Actions Needed
Step,Status,Details,Unnamed: 3_level_1
initiate,Done,Initiated the model,
prepare(),Done,Generated runtime.yaml,
prepare(),Done,Generated score.py,
prepare(),Done,Serialized model,
prepare(),Done,"Populated metadata(Custom, Taxonomy and Provenance)",
verify(),Done,Local tested .predict from score.py,
save(),Done,Conducted Introspect Test,
save(),Done,Uploaded artifact to model catalog,
deploy(),Available,Deployed the model,
predict(),Not Available,Called deployment predict endpoint,


### deploy the model as REST service

In [20]:
set_auth(auth='resource_principal')

# needs to specify shape, log ids... (easier from the UI)
xgb_model_deployment = xgb_model.deploy(display_name = "cal_xgb_deploy1",
                                        deployment_instance_count=1,
                                        deployment_instance_shape='VM.Standard2.2',
                                        deployment_bandwidth_mbps=10,
                                        # to attach logging to the REST service. OCID can be taken from UI
                                        deployment_log_group_id="ocid1.loggroup.oc1.eu-frankfurt-1.amaaaaaangencdya63i3qhao4bjx754lb3m2jpekev5oc55p5ebjvykbtgya"
                                       )

loop1:   0%|          | 0/6 [00:00<?, ?it/s]

### test the deployed REST service

In [25]:
xgb_model.predict(X_test[:10])

{'prediction': [4.162364959716797,
  3.1165318489074707,
  3.8856496810913086,
  1.3761677742004395,
  3.2547011375427246,
  1.7698906660079956,
  1.7593719959259033,
  3.1433591842651367,
  0.922572135925293,
  0.7925826907157898]}

### clean up the deployed REST service and the Model from the Catalog

careful... it will be destroyed... are you sure?

In [27]:
xgb_model.delete_deployment(wait_for_completion=True)

ModelCatalog(compartment_id=os.environ['NB_SESSION_COMPARTMENT_OCID']).delete_model(model_id)

True