# Machine Learning Model Serving

I now have a working Neural Network classifier. 

Next step will be to save this model to the [model catalog](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/manage-models.htm). 


In [15]:
import ads
import json
import logging
import oci
import os
import random
import shutil
import string
import tempfile
import uuid
import warnings


from ads.catalog.model import ModelCatalog
from ads.common.model import ADSModel
from ads.dataset.factory import DatasetFactory
from oci.data_science import models
from ads.model.deployment import ModelDeployer, ModelDeploymentProperties
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier

logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.ERROR)
warnings.filterwarnings('ignore')
ads.set_documentation_mode(False)
logging.getLogger('ads').setLevel(level=logging.ERROR)
logging.getLogger('ADS').setLevel(level=logging.ERROR)
logging.getLogger('ODSC-ModelDeployment').setLevel(level=logging.ERROR)

In [3]:
import pickle

model_filepath='./models/{}.pkl'.format('neural_network_classifier')

# load model using pickle l from disk
print('Loading model ...\n    MODEL: {}'.format(model_filepath))
loaded_model = pickle.load(open(model_filepath, 'rb'))
# model = joblib.load(model_filepath)


Loading model ...
    MODEL: ./models/neural_network_classifier.pkl


## Saving the Model in the Data Science Model Catalog 

To save the model in the catalog we are going to use the `ads` library and its `prepare_generic_model()` function. That is probably the easiest way to save a model to the catalog. The first step is to create a temporary local directory where we are going to store the model artifact files: 

In [9]:
import warnings
warnings.filterwarnings('ignore')
import ads

# Using resource principal to authenticate when using the model catalog:
ads.set_auth(auth='resource_principal')
compartment_id = os.environ['NB_SESSION_COMPARTMENT_OCID']
project_id = os.environ['PROJECT_OCID']

In [10]:
from ads.common.model_artifact import ModelArtifact
from ads.common.model_export_util import prepare_generic_model
import os

#Replace with your own path: 
path_to_rf_artifact = f"./ads"
if not os.path.exists(path_to_rf_artifact):
    os.mkdir(path_to_rf_artifact)

In [13]:
# Build the model and convert it to an ADSModel object
model_ads = ADSModel.from_estimator(loaded_model)

In [16]:
artifact = prepare_generic_model(path_to_rf_artifact, force_overwrite=True, data_science_env=True)

loop1:   0%|          | 0/4 [00:00<?, ?it/s]

In [17]:
from joblib import dump

dump(loaded_model, os.path.join(path_to_rf_artifact, "nnc.joblib"))

['./ads/nnc.joblib']

Now that we have a serialized model object in our artifact directory, the next step is to modify the file `func.py` which contains the definition of the Oracle Functions handler (`handler()`) function. The handler function is the function that is being called by Oracle Functions. 

#### Adding Loggers to handler() 

In the cell below I wrote a new version of `func.py`. Executing this cell will overwrite the template ADS provides as part of the model artifact. 

You should note a couple of differences with respect to the template. First, I import the Python `logging` library and define a couple of loggers: `model-prediction` and `model-input-features`. I am using these two loggers to capture the model predictions and the model input features for each call made to the Function. That is what I need to monitor how my predictions and features distributions are changing over time. Those log entries are captured and stored in the [Logging service](https://docs.cloud.oracle.com/en-us/iaas/Content/Logging/Concepts/loggingoverview.htm#loggingoverview). Second, I added some additional data transformations in `handler()`. You could have achieved a similar outcome by adding those transformations to the body of `predict()` in `score.py`. 

In [18]:
%%writefile {path_to_rf_artifact}/func.py

import io
import json

from fdk import response
import sys
sys.path.append('/function')
import score
import pandas as pd
model = score.load_model()

# Importing and configuring logging: 
import logging
logging.basicConfig(format='%(name)s - %(levelname)s - %(message)s', level=logging.INFO)

# configuring logging: 
# For model predictions: 
logger_pred = logging.getLogger('model-prediction')
logger_pred.setLevel(logging.INFO)
# For the input feature vector: 
logger_input = logging.getLogger('model-input-features')
logger_input.setLevel(logging.INFO)

def handler(ctx, data: io.BytesIO=None):
    try:
        input = json.loads(data.getvalue())['input']
        logger_input.info(input)
        input2 = json.loads(input)
        input_df = pd.DataFrame.from_dict(input2)
        prediction = score.predict(input_df, model)
        logger_pred.info(prediction)
    except (Exception, ValueError) as ex:
        logger_pred.info("prediction fail {}".format(str(ex)))

    return response.Response(
        ctx, response_data=json.dumps("predictions: {}".format(prediction)),
        headers={"Content-Type": "application/json"}
    )

Writing ./ads/func.py


Next I modify the `requirements.txt` file. ADS generates a template for `requirements.txt` that provides a best guess at the dependencies necessary to build the Oracle Function and run the model. In this case, I modified the template and added dependencies on `scikit-learn` version 0.21.3: 

In [19]:
%%writefile {path_to_rf_artifact}/requirements.txt

cloudpickle==1.6
pandas==1.1.0
numpy==1.18.5
fdk==0.1.18
scikit-learn==0.23.2

Writing ./ads/requirements.txt


I am done with the Oracle Functions part. The last thing I need to do is to modify the inference script `score.py` which loads the model to memory and call the `predict()` method of the model object. 

By default, ADS generates this file assuming that you are using `cloudpickle` to read the model serialized object. In our case, we are using `joblib`. I modified `score.py` to make use of `joblib`. I left the definition of `predict()` intact. 

In [27]:
%%writefile {path_to_rf_artifact}/score.py

import json
import os
from joblib import load

"""
   Inference script. This script is used for prediction by scoring server when schema is known.
"""


def load_model():
    """
    Loads model from the serialized format

    Returns
    -------
    model:  a model instance on which predict API can be invoked
    """
    model_dir = os.path.dirname(os.path.realpath(__file__))
    contents = os.listdir(model_dir)
    model_file_name = "nnc.joblib"
    # TODO: Load the model from the model_dir using the appropriate loader
    # Below is a sample code to load a model file using `cloudpickle` which was serialized using `cloudpickle`
    # from cloudpickle import cloudpickle
    if model_file_name in contents:
        with open(os.path.join(os.path.dirname(os.path.realpath(__file__)), model_file_name), "rb") as file:
            model = load(file) # Use the loader corresponding to your model file.
    else:
        raise Exception('{0} is not found in model directory {1}'.format(model_file_name, model_dir))
    
    return model


def predict(data, model=load_model()) -> dict:
    """
    Returns prediction given the model and data to predict

    Parameters
    ----------
    model: Model instance returned by load_model API
    data: Data format as expected by the predict API of the core estimator. For eg. in case of sckit models it could be numpy array/List of list/Panda DataFrame

    Returns
    -------
    predictions: Output from scoring server
        Format: { 'prediction': output from `model.predict` method }

    """
    assert model is not None, "Model is not loaded"
    # X = pd.read_json(io.StringIO(data)) if isinstance(data, str) else pd.DataFrame.from_dict(data)
    return { 'prediction': model.predict(data).tolist() }

Overwriting ./ads/score.py


### Testing the Model Artifact before Saving to the Model Catalog 

It is always a good idea to test your model artifact in your notebook session before saving it to the catalog. Especially if your Oracle Function depends on it. That is exactly what I am doing next.

I first modify my Python path and insert the path where the `score.py` module is located. I then import `score` and  call the `predict()` function defined in `score.py`. I load the train dataset and compare the predictions from `predict()` to the `predictions` array I created right after training model. If `load_model()` and `predict()` functions are doing the right thing I should retrieve the same `predictions` array. 

In [24]:
import pandas as pd
from sklearn.preprocessing import StandardScaler, MinMaxScaler
data = pd.read_csv('./data/f1_df_final.csv')

In [25]:
df = data.copy()
df.podium = df.podium.map(lambda x: 1 if x == 1 else 0)

train = df[df.season < 2019]
X_train = train.drop(['driver', 'podium'], axis = 1)
y_train = train.podium

scaler = StandardScaler()
X_train = pd.DataFrame(scaler.fit_transform(X_train), columns = X_train.columns)


In [30]:
for circuit in df[df.season == 2019]['round'].unique():

    test = df[(df.season == 2019) & (df['round'] == circuit)]
    X_test = test.drop(['driver', 'podium'], axis = 1)
    y_test = test.podium

    #scaling
    X_test = pd.DataFrame(scaler.transform(X_test), columns = X_test.columns)

In [31]:
# add the path of score.py: 
import sys 
import numpy as np 
sys.path.insert(0, path_to_rf_artifact)

from score import load_model, predict

# Load the model to memory 
_ = load_model()
# make predictions on the training dataset: 
predictions_test = predict(X_test, _)

# comparing the predictions from predict() to the predictions array I created above. 
print(f"The two arrays are equal: {np.array_equal(predictions_test['prediction'], y_test)}")

The two arrays are equal: True


## Save the Artifact Model to Model Catalog of Data Science

In [34]:
# Store the model in the Model Catalog
mc_model = artifact.save(project_id=project_id, compartment_id=compartment_id,
                               display_name="Neural Network Classifier",
                               description="A F1 Neural Network Classifier",
                               ignore_pending_changes=True)
shutil.rmtree(path_to_rf_artifact)
model_id = mc_model.id
print(f"Model OCID: {model_id}")

loop1:   0%|          | 0/4 [00:00<?, ?it/s]

artifact:/tmp/saved_model_70f2159c-cd1a-4d9e-a6c9-e115e6ec3048.zip
Model OCID: ocid1.datasciencemodel.oc1.eu-frankfurt-1.amaaaaaaht5jzvaatec52bw4bj3lo3e72rltrymccqohbag6ixrd5ldckysq
