Oracle Data Science service sample notebook.

Copyright (c) 2021, 2022 Oracle, Inc. All rights reserved. Licensed under the [Universal Permissive License v 1.0](https://oss.oracle.com/licenses/upl).

---

# <font color="red">Model Deployment Using Jobs</font>
<p style="margin-left:10%; margin-right:10%;">by the <font color="teal"> Oracle Cloud Infrastructure Data Science Service</font></p>

---

# Overview:

This notebook demonstrates an end-to-end workflow of building a machine learning model in a job, deploying the model and then making a prediction from it. A job allows you to use on-demand infrastructure that will spin up, run some tasks, capture outputs, and clean up. The `ads.jobs` module in the Accelerated Data Science (ADS) SDK allows you to create and run jobs using the Oracle Cloud Infrastructure (OCI) Data Science service.

The focus of this notebook is to demonstrate how to train a model using a job, deploy the model, and perform a prediction on the model. The notebook covers how to define the script that will run in the job. The script is a toy decision tree model based on the `iris` dataset. You will then create a job that will run the script using the `ads.jobs` module. The script trains a model and stores it in the model catalog. You will then use the `ads.model.deployment` module to deploy the model and perform a prediction. In addition, the notebook shows how to programmatically, create a log group and a log to capture the jobs' logs.

Developed on Compatible conda pack: [General Machine Learning](https://docs.oracle.com/en-us/iaas/data-science/using/conda-gml-fam.htm) for CPU on Python 3.7 (version 1.0)

--- 

## Contents:

- <a href="#intro">Introduction</a>
    - <a href="#intro_config">Configuration</a>
- <a href="#script">Job Script</a>
- <a href="#train">Train the Model</a>
- <a href="#inference">Inference</a>
- <a href="#clean_up">Clean Up</a>
- <a href='#ref'>References</a>

---


Datasets are provided as a convenience.  Datasets are considered third-party content and are not considered materials 
under your agreement with Oracle.

You can access the `iris` dataset license [here](https://github.com/scikit-learn/scikit-learn/blob/master/COPYING).  

---


In [None]:
import ads
import logging
import oci
import os
import random
import string
import tempfile

from ads.catalog.model import ModelCatalog
from ads.common.oci_logging import OCILogGroup, OCILog
from ads.jobs import Job, infrastructure, PythonRuntime
from ads.model.deployment.common.utils import State
from ads.model.deployment.model_deployer import ModelDeployer
from ads.model.deployment.model_deployment_properties import ModelDeploymentProperties
from sklearn import datasets

ads.set_auth("resource_principal")
logging.getLogger("ocifs").setLevel(level=logging.ERROR)

<a id="intro"></a>
# Introduction

In the prototyping stages of model building, you often train smaller models and a subset of the data. This allows for fast iteration and a notebook is an ideal place to do that work. However, as you learn more about the data, and what model classes and hyperparameters are best, you generally want to train larger models and the entire dataset. Building complex models on large datasets can pose various challenges when just using a notebook session. Building the model can be time consuming and you don't want to slow down your other work as the model builds. Generally, the virtual machine (VM) for the notebook is relatively small as the trade-off between computational power and cost tends to lend itself to using smaller VM shapes. Further, you often want to build multiple models simultaneously, and this can significantly slow down the model building process as all the models are running on the same hardware.

The solution these challenges is to offload the model building to another computational resource. The Oracle Data Science Jobs service provides an ideal solution. A notebook session can be used to launch a job to compute the model. This allows you to scale the VM shape to the computational and memory requirements to build the model. It also allow the computation to be offloaded from the notebook session that you are working it. Thus, there is no waiting round for the model to build. By using multiple jobs, you can train many models at the same time and thus improve your workflow.

<a id="intro_config"></a>
## Configuration

Jobs and model deployment has a number of configuration values. This notebook uses sane default values that will allow it to run on common tenancy configurations. To configure the job, you must provide an OCID for a subnet. Update the value in the next cell by replacing `<subnet_id>` with your subnet OCID. It should look something like this:

```python
SUBNET_ID = "ocid1.subnet.oc1.iad.aaaaaaaa6fsov7mavp7nrh7t4yc..."
```

In [None]:
SUBNET_ID = "<subnet_id>"

One of the advantages of using a job to build the model is that you can specify the VM shape that has the computing power and memory that you need to build the model. Once the model is built the VM shape will terminate and not waste valuable resources. The next cell specifies that a `VM.Standard2.1` VM shape will be used for the job and lists the supported shapes.

In [None]:
# Define the VM shape for the job
vm_shape = "VM.Standard2.1"

[s.name for s in infrastructure.DataScienceJob.instance_shapes()]

This notebook creates a number of resources such as logs, log group, job, model, and a model deployment. A unique name will be created to identify all these resources. The following cell creates and prints the common name used on all the resources.

In [None]:
# Create a unique ID that is used for the name of the resources that will be created
resource_name = "model_deployment_jobs_" + "".join(
    random.choices(string.ascii_letters + string.digits, k=4)
)
print(f"Unique ID used in all the resources: {resource_name}")

<a id="script"></a>
# Job Script

A Python script is needed to train the model and save it in the model catalog. The next cell writes this script so that it can be used when the job is run. The script uses the popular [`iris`](https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html) dataset to train a multiclass decision tree classifier using the `DecisionTreeClassifier` class. This model is then converted to an `ADSModel` object.

The model artifacts are created using the `.prepare()` method. Since the model will be deployed the `data_science_env`, and `inference_conda_env` parameters must be specified. In the model deployment, a conda environment is specified so that the required libraries are installed. The conda environments are provided as part of the OCI Data Science service and these are called Data Science Environments. Alternatively, you or third parties can Publish conda environments. The parameter `data_science_env` indicates that the environment is provided by the Data Science service if it is set to `True`. Otherwise, it is a Published environment. The `inference_conda_env` is the URL that the model deployment service will obtain the conda environment from and install it in the model deployment instances.

The `.prepare()` method returns an object that represents the model artifact. The `.save()` method is used to store the model artifacts into the model catalog. The model deployment service will then use this model to deploy the model.

In [None]:
job_script = tempfile.NamedTemporaryFile(suffix=".py", delete=True)

with open(job_script.name, mode="w") as f:
    f.write(
        f"""
import pandas as pd

from ads import set_auth
from ads.catalog.model import ModelCatalog
from ads.common.model import ADSModel
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier
from tempfile import mkdtemp

set_auth("resource_principal")

# Train the model
iris = datasets.load_iris()
X=iris.data
y=iris.target
clf = DecisionTreeClassifier().fit(X=X, y=y)
model = ADSModel.from_estimator(clf)

# Prepare the model artifacts
model_artifact = model.prepare(
    target_dir=mkdtemp(), 
    X_sample=pd.DataFrame(X), 
    y_sample=pd.Series(y), 
    force_overwrite=True, 
    data_science_env=True, 
    inference_conda_env="oci://service-conda-packs@id19sfcrra6z/service_pack/cpu/General Machine Learning for CPUs/1.0/mlcpuv1") 

# Save the model to the model catalog
mc_model = model_artifact.save(
    project_id="{os.environ['PROJECT_OCID']}", 
    compartment_id="{os.environ['NB_SESSION_COMPARTMENT_OCID']}", 
    display_name="{resource_name}", 
    description="Model produced in the model_deployment_using_jobs.ipynb notebook")

"""
    )

<a id="train"></a>
# Train the Model

The Oracle Data Science Jobs service is used to run the <a href="#script">Job Script</a>. The following cell sets up the required resources to do this and executes the job.

Jobs has the ability to log messages from the script along with other events. Therefore, the next cell sets up a log group to contain the logs along with a log. It then creates a `Job` object and sets up the infrastructure that is needed to run the job. This includes information such as what compartment, project, and subnet should be used. It also has information about the instance such as the VM shape, the size of the block storage and it attaches the logging destination.

While the `.with_intrastructure()` method defines the infrastructure that will be used, the `.with_runtime()` method defined the basic environment. The `PythonRunTime` class is used to specify that a runtime environment should be configured. The `.with_script()` method contains the path to the <a href="#script">Job Script</a> that is executed.

The `.create()` method is used to create the job and the `.run()` method will asynchronously run the job. Generally, this is the behavior that you want so that you can continue to work in your notebook. However, for this notebook, we need the job to complete before creating the model deployment. Thus, the `.watch()` method is called. This will block the notebook and print logging messages to the screen so that you can see the progress of the job.

Running the job can take a significant amount of time as resources such as an instance and network resources much be provisioned, and the environment must be configured.

In [None]:
if SUBNET_ID != "<subnet_id>":

    # Create a log and log group
    log_group = OCILogGroup(display_name=resource_name).create()
    log = log_group.create_log(resource_name)

    # Create and run the job
    job = Job(name=resource_name)
    job.with_infrastructure(
        infrastructure.DataScienceJob()
        .with_block_storage_size(100)
        .with_compartment_id(os.environ["NB_SESSION_COMPARTMENT_OCID"])
        .with_subnet_id(SUBNET_ID)
        .with_project_id(os.environ["PROJECT_OCID"])
        .with_shape_name(vm_shape)
        .with_log_id(log.id)
        .with_log_group_id(log_group.id)
    )

    job.with_runtime(
        PythonRuntime()
        .with_script(job_script.name)
        .with_service_conda(os.path.split(os.environ["CONDA_PREFIX"])[1])
    )

    job.create()
    job_run = job.run()
    job_run.watch()

<a id="deploy"></a>
# Deploy the Model

The Oracle Data Science Model Deployment service is used to deploy the model that the job created. Since the <a href="#script">Job Script</a> created the model, this notebook will need to determine the model OCID that is going to be used in the model deployment. The `ModelCatalog` class is used to make a connection to the model catalog in the specific compartment where the model was stored. It then lists all the models in that compartment until it finds the model with a matching name. From this, it can obtain the model OCID.

The Model Deployment service needs to be configured in a way that is similar to what was done for the Jobs service. The compartment, project, display name, VM shape, instance count and logging information needs to be provided. The `.deploy()` is used to deploy the model. Normally, you would want this to be an asynchronous process so that you do not need to wait for the deployment. Similar to the jobs deployment this can take several minutes as it provisions and configures the infrastructure. However, in this case, the `wait_for_completion` parameter is set to `True` as the rest of the notebook assumes the model has been deployed.

In [None]:
if SUBNET_ID != "<subnet_id>":

    # Find and model catalog OCID
    mc = ModelCatalog(compartment_id=os.environ["PROJECT_COMPARTMENT_OCID"])
    for model in mc.list_models():
        if model.display_name == resource_name:
            model_id = model.id
            break

    # Deploy the model
    properties = (
        ModelDeploymentProperties(model_id)
        .with_prop("compartment_id", os.environ["PROJECT_COMPARTMENT_OCID"])
        .with_prop("project_id", os.environ["PROJECT_OCID"])
        .with_prop("display_name", resource_name)
        .with_instance_configuration(
            config={
                "INSTANCE_SHAPE": vm_shape,
                "INSTANCE_COUNT": 1,
                "bandwidth_mbps": 10,
            }
        )
        .with_access_log(log_group.id, log.id)
        .with_predict_log(log_group.id, log.id)
        .build()
    )
    deployment = ModelDeployer().deploy(properties, wait_for_completion=True)

<a id="inference"></a>
# Inference

The ultimate goal for deploying a model is to make inferences from it. In the following cell, the `.predict()` of the `ModelDeployment` class is used to predict the classes. Making inferences, from within the notebook session, is best done using the [ADS `model.deployment`](https://docs.oracle.com/en-us/iaas/tools/ads-sdk/latest/user_guide/model_deployment/model_deployment.html) module. Outside of a notebook session, there are a number of other choices such as the OCI command-line interface, OCI Python SDK, and OCI Java SDK.

In [None]:
if SUBNET_ID != "<subnet_id>":
    iris = datasets.load_iris()
    print(deployment.predict(iris.data.tolist()))

<a id="clean_up"></a>
# Clean Up

This notebook created a number of resources. This section will remove them from your tenancy. It can take several minutes to execute.

In [None]:
# remove the `job_script` file from the system
job_script.close()

if SUBNET_ID != "<subnet_id>":
    # remove job
    job = Job.from_datascience_job(job.id)
    job.delete()

    # Remove the model deployment.
    deployment = ModelDeployer().get_model_deployment(deployment.model_deployment_id)
    deployment.delete(wait_for_completion=True)

    # Remove the model from the model catalog
    mc.delete_model(model=model_id)

    # Delete the log group and logs
    OCILog(log_group_id=log_group.id, id=log.id).delete()
    _ = OCILogGroup(id=log_group.id).delete()

<a id='ref'></a>
# References

- [ADS Library Documentation](https://docs.cloud.oracle.com/en-us/iaas/tools/ads-sdk/latest/index.html)
- [Data Science YouTube Videos](https://www.youtube.com/playlist?list=PLKCk3OyNwIzv6CWMhvqSB_8MLJIZdO80L)
- [OCI Data Science Documentation](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/data-science.htm)
- [Oracle Data & AI Blog](https://blogs.oracle.com/datascience/)