@notebook{run_notebook-jobs.ipynb,
    title: Run a notebook with Jobs,
    summary: Run a notebook with OCI data science jobs,
    developed on: pytorch110_p37_cpu_v1,
    keywords: Run Notebook, Jobs,
    license: Universal Permissive License v 1.0
}

In [None]:
# Upgrade Oracle ADS to pick up latest features and maintain compatibility with Oracle Cloud Infrastructure.

!pip install -U oracle-ads

<font color=gray>Oracle Data Science service sample notebook.

Copyright (c) 2022 Oracle, Inc.  All rights reserved.
Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl.
</font>

***
# <font color=red>Run a Notebook with Jobs</font>
<p style="margin-left:10%; margin-right:10%;">by the <font color=teal> Oracle Cloud Infrastructure Data Science Service Team </font></p>

***

## Overview:

Oracle Cloud Infrastructure (OCI) [Data Science jobs](https://docs.oracle.com/en-us/iaas/data-science/using/jobs-about.htm) enable you to define and run a repeatable machine learning task on a fully managed infrastructure. This notebook shows you how to use the [Accelerated Data Science (ADS) SDK](https://accelerated-data-science.readthedocs.io/en/latest/) to run a Jupyter notebook using OCI data science jobs and download the outputs.

Developed on [General Machine Learning](https://docs.oracle.com/iaas/data-science/using/conda-gml-fam.htm) for CPU on Python 3.8 (version 1.0)

### Prerequisites

* This notebook requires internet egress to download the notebook and sample dataset
* This notebook requires authorization to work with the OCI Data Science Service. Details can be found [here](https://accelerated-data-science.readthedocs.io/en/latest/user_guide/cli/authentication.html#). This notebook uses resource principals for authentication.
* This notebook requires access to OCI object storage. Refer how to setup policy for managing Object Storage service resource [here](https://docs.oracle.com/en-us/iaas/Content/Identity/policiescommon/commonpolicies.htm#write-objects-to-buckets)

---

Datasets are provided as a convenience. Datasets are considered third-party content and are not considered materials under your agreement with Oracle.

---


### Introduction

We can debug a notebook with a small dataset on OCI data science notebook session using a CPU shape and scale up to process a larger dataset on OCI data science jobs using GPU shape. For this example, we will use OCI data science jobs to run our example notebook: [XGBoost with RAPIDS](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/master/notebook_examples/xgboost-with-rapids.ipynb), which shows that we can use GPU to speedup the training time.

In [None]:
import ads
from ads.jobs import Job, DataScienceJob, NotebookRuntime

## Authentication

Authentication to the OCI Data Science service is required. Here we configure ADS to use resource principals for authentication.

In [None]:
ads.set_auth(auth="resource_principal")

## Configure Notebook Runtime

The ADS `NotebookRuntime` class provides APIs to configure the job to run a notebook. The `path` of the notebook can be a local file path or a URI, including OCI object storage(`oci://bucket@namespace/path/to/notebook`). Here we will use the raw GitHub http URL. Optionally you can specify the `encoding` of the notebook (`utf-8` will be used by default). We would like to run this notebook with the NVIDIA RAPIDS 21.10 for GPU on Python 3.7 (`rapids2110_p37_gpu_v1`) conda environment provided by the OCI data science service.

In [None]:
notebook_runtime = (
    NotebookRuntime()
    .with_notebook(
        path="https://github.com/oracle-samples/oci-data-science-ai-samples/raw/master/notebook_examples/xgboost-with-rapids.ipynb",
        encoding="utf-8"
    )
    .with_service_conda("rapids2110_p37_gpu_v1")
)

The `NotebookRuntime` also provide options to save the notebook and outputs to object storage once the job finished. You can specify the output location using the `with_output` method.

In [None]:
# Update the OUTPUT_URI to your object storage location to save the notebook outputs.
OUTPUT_URI = "" # oci://bucket@namespace/path/to/dir/

if OUTPUT_URI:
    notebook_runtime.with_output(OUTPUT_URI)

## Define, Create, Run and Watch the Job

Here we define the job intrastructure to use a GPU shape and configure logging.

In [None]:
# Set the LOG_GROUP_ID and LOG_ID to the ones in your tenancy.
LOG_GROUP_ID = "ocid1.loggroup.oc1.iad.xxxxx"
LOG_ID = "ocid1.log.oc1.iad.xxxxx"


job = (
    Job()
    .with_infrastructure(
        DataScienceJob()
        .with_shape_name("VM.GPU2.1")
        .with_log_group_id(LOG_GROUP_ID)
        .with_log_id(LOG_ID)
        # The following infrastructure configurations are optional
        # if you are in an OCI data science notebook session.
        # The configurations of the notebook session will be used as defaults
        # .with_compartment_id("<compartment_ocid>")
        # .with_project_id("<project_ocid>")
    )
    .with_runtime(notebook_runtime)
)

Create and run the job

In [None]:
run = job.create().run()

Watch the job to stream outputs

In [None]:
run.watch()

## Job Outputs

Once the job finished, you can download the outputs to local directory. The job outputs will always contain the notebook with all outputs in the cells. If your notebook create files under the working directory of the notebook (using relative paths), they will also be included in the outputs. The following code downloads the outputs to `/home/datascience/outputs`.

In [None]:
# Download the outputs to /home/datascience/outputs
run.download("/home/datascience/outputs")

<a id='ref'></a>
# References
- [ADS Library Documentation](https://accelerated-data-science.readthedocs.io/en/latest/index.html)
- [ADS Documentation: Data Science Jobs](https://accelerated-data-science.readthedocs.io/en/latest/user_guide/jobs/index.html)
- [ADS Documentation: Run a notebook on Data Science Jobs](https://accelerated-data-science.readthedocs.io/en/latest/user_guide/jobs/run_notebook.html)
- [Data Science YouTube Videos](https://www.youtube.com/playlist?list=PLKCk3OyNwIzv6CWMhvqSB_8MLJIZdO80L)
- [OCI Data Science Documentation](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/data-science.htm)
- [Oracle Data & AI Blog](https://blogs.oracle.com/datascience/)
- [Understanding Conda Environments](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/use-notebook-sessions.htm#conda_understand_environments)
- [Use Resource Manager to Configure Your Tenancy for Data Science](https://docs.cloud.oracle.com/en-us/iaas/data-science/using/orm-configure-tenancy.htm)