In [None]:
# Upgrade Oracle ADS to pick up latest features and maintain compatibility with Oracle Cloud Infrastructure.

!pip install -U oracle-ads

<font color=gray>Oracle Data Science service sample notebook.

Copyright (c) 2022 Oracle, Inc.  All rights reserved.
Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl.
</font>

***
# <font color=red>Train, Register, and Deploy a Generic Model</font>
<p style="margin-left:10%; margin-right:10%;">by the <font color=teal> Oracle Cloud Infrastructure Data Science Service Team </font></p>

***

## Overview

The notebook demonstrates how to to train, test, save and deploy an instance of the `GenericModel` class.

The `GenericModel` class in Accelerated Data Science (ADS) allows you to rapidly get a model into production. The `.prepare()` method creates the model artifacts that are needed to deploy a functioning model without you having to configure it or write code, including the ability to customize the `score.py` file as needed. The model can be subsequently verified, saved, and deployed.

Compatible conda pack: [General Machine Learning](https://docs.oracle.com/en-us/iaas/data-science/using/conda-gml-fam.htm) for CPU on Python 3.8 (version 1.0)

### Prequisites

This notebook requires authorization to work with the OCI Data Science Service. Details can be found [here](https://accelerated-data-science.readthedocs.io/en/latest/user_guide/cli/authentication.html#). For the purposes of this notebook what is important to to know is that resource principals will be used absent api_key authentication.

---

## Contents

* <a href='#intro'>Introduction</a>
* <a href='#create'>Create a Generic Model</a>
* <a href='#serialize'>Generic Framework Serialization</a>
    * <a href='#serialize_genericmodel'>Create a GenericModel</a>
    * <a href='#serialize_prepare'>Prepare</a>
    * <a href='#serialize_verify'>Verify</a>
    * <a href='#serialize_save'>Save</a>
    * <a href='#serialize_deploy'>Deploy</a>
    * <a href='#serialize_predict'>Predict</a>
* <a href='#clean_up'>Clean Up</a>
* <a href='#ref'>References</a>    

---

Datasets are provided as a convenience. Datasets are considered third-party content and are not considered materials under your agreement with Oracle.
      


In [None]:
import ads
import logging
import os
import random
import tempfile
import warnings

from ads.common.model_metadata import UseCaseType
from ads.model.generic_model import GenericModel
from numpy import array
from shutil import rmtree

logging.basicConfig(format="%(levelname)s:%(message)s", level=logging.ERROR)
warnings.filterwarnings("ignore")

<a id='intro'></a>
# Introduction

In this notebook, you will create a custom model class called `Square`. It has one method, `.predict()`, that returns the predicted value. It is designed to demonstrate how to create your own custom model class and use the `GenericModel` class.

The `.prepare()` method will store the model as a pickle file. It will also generate a generic `score.py` file that will load the pickle file and call the `predict()` method.

### Authenticate

Authentication to the OCI Data Science service is required. Here we default to resource principals.

In [None]:
ads.set_auth(auth="resource_principal")

<a id='create'></a>
# Create a Generic Model

The next cell creates a class and instantiates it such that you have a custom model. This toy model takes a collection of values and returns the square of the values. Since this is a parametric model with no parameters to learn, there is no need to train it.

In [None]:
class Square:
    def predict(self, x):
        import numpy as np

        x_array = np.array(x)
        return np.ndarray.tolist(x_array * x_array)


model = Square()

The next cell will sample random values. You then use the .predict() method to make predictions on the dataset.

In [None]:
random.seed(42)
X = random.sample(range(0, 100), 10)
model.predict(X)

<a id='serialize'></a>
# Generic Framework Serialization

<a id='serialize_genericmodel'></a>
## Create a GenericModel

The next cell creates a model artifact directory. This directory is used to store the artifacts that are needed to deploy the model. It also creates the `GenericModel` object.

In [None]:
artifact_dir = tempfile.mkdtemp()
print(f"Model artifact director: {artifact_dir}")
generic_model = GenericModel(estimator=model, artifact_dir=artifact_dir)

The `.summary_status()` method shows the progress toward deploying the model.

In [None]:
generic_model.summary_status()

<a id='serialize_prepare'></a>
## Prepare

The prepare step is performed by the `.prepare()` method of the `GenericModel` class. It creates a number of customized files that are used to run the model once it is deployed. These include:

* `input_schema.json`: A JSON file that defines the nature of the features of the `X_sample` data.
* `output_schema.json`: A JSON file that defines the nature of the dependent variable in the `y_sample` data.
* `runtime.yaml`: This file contains information that is needed to set up the runtime environment on the deployment server.
* `score.py`: This script contains the `load_model()` and `predict()` functions. The `load_model()` function understands the format the model file was saved in, and loads it into memory. The `.predict()` method is used to make inferences in a deployed model.

The `.prepare()` method requires the `model_file_name` parameter to define the name of the model file. By default, the model is stored in a pickle file. `as_onnx` provides an alternate way to save it in the ONNX format.

To create the model artifacts, you use the `.prepare()` method

* `conda_env` variable defines the slug of the conda environment that was used to train the model

 Note that you can only pass in slug for service conda environment. For custom conda environment, you have to pass in the full path along with the `inference_python_version`.

In [None]:
conda_env = "dataexpl_p37_cpu_v3"

generic_model.prepare(
    inference_conda_env=conda_env,
    training_conda_env=conda_env,
    use_case_type=UseCaseType.MULTINOMIAL_CLASSIFICATION,
    X_sample=X,
    y_sample=array(X) ** 2,
)

The next cell uses the `.summary_status()` method to show you that the prepare step finished, and what tasks were completed:

In [None]:
generic_model.summary_status()

The `.prepare()` method has created the following fully files. However, you can modify them to fit your specific needs.

In [None]:
os.listdir(artifact_dir)

Once the artifacts have been created, there are a number of attributes in the `GenericModel` object that provide metadata about the model. The `.runtime` attribute details the model deployment settings and model provenance data.

In [None]:
generic_model.runtime_info

The `.schema_input` attribute provides metadata on the features that were used to train the model. You can use this information to determine what data must be provided to make model inferences. Each feature in the model has a section that defines the dtype, feature type, name, and if it is required. The metadata also includes the summary statistics associated with the feature type.

In [None]:
generic_model.schema_input

The `.metadata_custom` attribute provides custom metadata that contains information on the category of the metadata, description, key, and value:

In [None]:
generic_model.metadata_custom

The `.metadata_provenance` contains information about the code and training data that was used to create the model. This information is most useful when a Git repository is being used to manage the code for training the model. This is considered a best practice because it allows you to do things like reproduce a model, perform forensic on the model, and so on.

In [None]:
generic_model.metadata_provenance

The `.metadata_taxonomy` is a key-value store that has information about the classification or taxonomy of the model. This can include information such as the model framework, use case type, hyperparameters, and more.

In [None]:
generic_model.metadata_taxonomy

<a id='serialize_verify'></a>
## Verify

If you modify the `score.py` file that is part of the model artifacts, then you should verify it. The verify step allows you to test those changes without having to deploy the model. The `.verify()` method takes a set of test parameters and performs the prediction by calling the `predict` function in `score.py`. It also runs the `load_model` function.

**Note**: You need to make sure that data passed in to verify is json serializable as data serialization and deserialization is not supported for GenericModel class. However, other frameworks such as `SklearnModel` supports data serialization and deserialization for certain data types such as `Pandas DataFrame`, hence you can directly pass into `Pandas DataFrame` for `SklearnModel`.

In [None]:
generic_model.verify(X)

Update the `.summary_status()` method to show that the verify step has been completed:

In [None]:
generic_model.summary_status()

<a id='serialize_save'></a>
## Save

Once you are satisfied with the performance of the model and have verified that the `score.py` file is working, you save the model to the model catalog using the `.save()` method on the model instance. This step requires authentication provided here. The result is the model OCID which you can view in the UI.

In [None]:
model_id = generic_model.save(display_name="Demo GenericModel model")

<a id='serialize_deploy'></a>
## Deploy

When the model is in the model catalog, you can use the `.deploy()` method of a `GenericModel` object to deploy the model. This method allows you to specify the attributes of the deployment such as the display name, description, instance type and count, the maximum bandwidth, and logging groups. The next cell deploys the model with the default settings, except for the custom display name. The `.deploy()` method returns a `ModelDeployment` object.

In [None]:
deploy = generic_model.deploy(display_name="Demo GenericModel deployment")

After deployment, the `.summary_status()` method shows that the model is `ACTIVE` and the `predict()` method is available.

In [None]:
generic_model.summary_status()

<a id='serialize_predict'></a>
## Predict

In the <a href='#create'>Create a Generic Model</a> section, you used the `model.predict()` method where `model` is an generic model object. This did inference using the local model. Now that the `GenericModel` model has been deployed, you can do the same thing using similar syntax with the `.predict()` method on a `GenericModel`. 

After the deployment is active, you can call the `predict()` on the `GenericModel` object to send request to the deployed endpoint. 

In [None]:
generic_model.predict(X)["prediction"]

<a id='clean_up'></a>
# Clean Up

This notebook created a model deployment and a model. This section deletes those resources. 

The model deployment must be deleted before the model can be deleted. You use the `.delete_deployment()` method on the `GenericModel` object to do this.

In [None]:
delete = generic_model.delete_deployment(wait_for_completion=True)

After the model deployment has been deleted, the `.summary_status()` method shows that the model has been deleted and that the `predict()` method is not available:

In [None]:
generic_model.summary_status()

Use the `.delete()` method to delete the model:

In [None]:
generic_model.delete()

The next cell removes the model artifacts that were stored on your local drive:

In [None]:
rmtree(artifact_dir)

<a id='ref'></a>
# References
- [ADS Library Documentation](https://accelerated-data-science.readthedocs.io/en/latest/index.html)
- [Data Science YouTube Videos](https://www.youtube.com/playlist?list=PLKCk3OyNwIzv6CWMhvqSB_8MLJIZdO80L)
- [OCI Data Science Documentation](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/data-science.htm)
- [Oracle Data & AI Blog](https://blogs.oracle.com/datascience/)
- [Understanding Conda Environments](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/use-notebook-sessions.htm#conda_understand_environments)
- [Use Resource Manager to Configure Your Tenancy for Data Science](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/orm-configure-tenancy.htm)
- [`runtime.yaml`](https://docs.content.oci.oracleiaas.com/en-us/iaas/data-science/using/model_runtime_yaml.htm#model_runtime_yaml)
- [`score.py`](https://docs.content.oci.oracleiaas.com/en-us/iaas/data-science/using/model_score_py.htm#model_score_py)
- [Model artifact](https://docs.content.oci.oracleiaas.com/en-us/iaas/data-science/using/models_saving_catalog.htm#create-models)
- [ONNX API Summary](http://onnx.ai/sklearn-onnx/api_summary.html)