<font color=gray>ADS Sample Notebook.

Copyright (c) 2021 Oracle, Inc. All rights reserved. Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl.
</font>

***
# <font color=red>Getting Started with Oracle Cloud Infrastructure Data Science</font>
<p style="margin-left:10%; margin-right:10%;">by the <font color=teal> Oracle Cloud Infrastructure Data Science Service Team </font></p>

***

## Service Overview

Welcome to Oracle Cloud Infrastructure Data Science service.

The Oracle Cloud Infrastructure Data Science service is a fully managed platform for data science teams to build, train, and manage machine learning models using the Oracle Cloud Infrastructure (OCI).

The Data Science service:

* Provides data scientists with a collaborative, project-driven workspace.
* Enables self-service access to infrastructure for data science workloads.
* Includes Python-centric tools, libraries, and packages developed by the open-source community and the [Oracle Accelerated Data Science Library](https://docs.cloud.oracle.com/iaas/tools/ads-sdk/latest/index.html), which supports the end-to-end lifecycle of predictive models including:
    * Data acquisition, profiling, preparation, and visualization.
    * Feature engineering.
    * Model training.
    * Model evaluation, explanation, and interpretation.
    * Model storage using the [Model Catalog](https://docs.cloud.oracle.com/iaas/data-science/using/manage-models.htm). 
    * [Model deployment](https://docs.oracle.com/iaas/data-science/using/model-dep-about.htm).
* Integrates with the rest of the OCI stack, including [Functions](https://docs.cloud.oracle.com/iaas/Content/Functions/Concepts/functionsoverview.htm), [Data Flow](https://docs.cloud.oracle.com/en-us/iaas/data-flow/using/dfs_data_flow.htm), [Autonomous Data Warehouse](https://docs.cloud.oracle.com/iaas/Content/Database/Concepts/adboverview.htm), [Streaming](https://docs.cloud.oracle.com/iaas/Content/Streaming/Concepts/streamingoverview.htm), [Vault](https://docs.cloud.oracle.com/iaas/Content/KeyManagement/Concepts/keyoverview.htm), [Logging](https://docs.cloud.oracle.com/iaas/Content/Logging/Concepts/loggingoverview.htm#loggingoverview), and [Object Storage](https://docs.cloud.oracle.com/iaas/Content/Object/Concepts/objectstorageoverview.htm).
* Helps data scientists concentrate on methodology and domain expertise to deliver more models to production.

For more details, see the [Data Science documentation](https://docs.cloud.oracle.com/iaas/data-science/using/data-science.htm).

---

## Overview

The PyTorch for CPUs [conda environment](https://docs.cloud.oracle.com/iaas/data-science/using/use-notebook-sessions.htm#conda_understand_environments) is designed for machine learning activities with a focus on applications in computer vision and natural language processing. It provides high-level features for tensor computing and deep neural networks.  This environment also includes acceleration support on Intel's CPUs with the use of `daal4py`. This library enhances `scikit-learn` algorithms by using Intel(R) `oneAPI` Data Analytics library. Use ads-lite to speed up your data science workflow with tools that automate common tasks.

---

**Important:**

Placeholder text for required values are surrounded by angle brackets that must be removed when adding the indicated content. For example, when adding a database name to `database_name = "<database_name>"` would become `database_name = "production"`.

---

## Prerequisites:
- Experience with a specific topic: Novice
- Professional experience: None

---

## Objectives:

- <a href='#authentication'>Understanding Authentication to Oracle Cloud Infrastructure Resources from a Notebook Session</a>
    - <a href='#resource_principals'>Authentication with Resource Principals</a>
        - <a href='#resource_principals_ads'>Resource Principals Authentication using the ADS SDK</a>
        - <a href='#resource_principals_oci'>Resource Principals Authentication using the OCI SDK</a>
        - <a href='#resource_principals_cli'>Resource Principals Authentication using the OCI CLI</a> 
    - <a href='#api_keys'>Authentication with API Keys</a>
- <a href='#conda'>PyTorch for CPU Conda Environment</a>
    - <a href='#conda_overview'>Overview</a>
    - <a href='#conda_libraries'>Principal Conda Libraries</a>
    - <a href='#conda_configuration'>Configuration</a>
- <a href='#ref'>References</a> 

---

In [None]:
import daal4py.sklearn
import logging
import pandas
import warnings

from ads import set_auth
from ads import set_documentation_mode
from oci.auth.signers import get_resource_principals_signer
from oci.data_science import DataScienceClient
from os import popen

set_documentation_mode(False)
warnings.filterwarnings('ignore')
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.ERROR)

<a id='authentication'></a>
# Understanding Authentication to Oracle Cloud Infrastructure Resources from a Notebook Session

When working within a notebook session, the `datascience` user is used. This user does not have an OCI Identity and Access Management (IAM) identity, so it cannot access the OCI API. To access OCI resources, including Data Science projects, models, and any other OCI service resources from the notebook environment, you must configure either resource principals or API keys. For most applications, we recommend the resource principals approach.

<a id='resource_principals'></a>
## Authentication with Resource Principals

Data Science enables easy and secure authentication using the notebook session's resource principal to access other OCI resources. This notebook demonstrates how to use your notebook session's resource principal.

Before you use this notebook, your tenancy administrator must write policies to grant permissions to the resource principal to access other OCI resources, see [Manually Configuring Your Tenancy for Data Science](https://docs.cloud.oracle.com/iaas/data-science/using/configure-tenancy.htm).

You can use either the `ads` library or  the `oci` library to configure the notebook to use resource principals. While both libraries provide the required authentication, the `ads` library is specifically designed for easy operation within a Data Science notebook session.

To avoid the library dependencies, you can use the `oci` command from the command line.

For more details on using resource principals in the Data Science service, see the [ADS Configuration Guide](https://docs.cloud.oracle.com/iaas/tools/ads-sdk/latest/user_guide/configuration/configuration.html#) and the [Authenticating to the OCI APIs from a Notebook Session](https://docs.cloud.oracle.com/iaas/data-science/using/use-notebook-sessions.htm#topic_kxj_znw_pkb).

<a id='resource_principals_ads'></a>
### Resource Principals Authentication using the ADS SDK

The `set_auth()` method sets the proper authentication mechanism for ADS. ADS uses the `oci` SDK to access resources like the model catalog or Object Storage.

Within a notebook session, you configure the use of a resource principal for the ADS SDK by running this in a notebook cell:



In [None]:
set_auth(auth='resource_principal') 

<a id='resource_principals_oci'></a>
### Resource Principals Authentication using the OCI SDK

Within your notebook session, the `oci` library can use the resource principal. The next cell demonstrates how to make a basic connection using the default settings:

In [None]:
resource_principal = get_resource_principals_signer() 
dsc = DataScienceClient(config={}, signer=resource_principal)

<a id='resource_principals_cli'></a>
### Resource Principals Authentication using the OCI CLI

Within a notebook session, you can use the OCI CLI to configure the resource principal using the `--auth=resource_principal` flag. For example:

In [None]:
cmd = "oci data-science project get --project-id=$PROJECT_OCID --auth=resource_principal 2>&1"
print(popen(cmd).read())

If the resource principal is correctly configured, messages similar to the following display:

```
{
"data": {
"compartment-id": "ocid1.compartment.oc1..aaaaaaaafl3avkal72rrwuy4m5rumpwh7r4axejjwq5hvwjy4h4uoyi7kzyq",
"created-by": "ocid1.user.oc1..aaaaaaaabfrlcbiyvjmjvgh3ns6trdyoewxytqywwta3yqmy3ah3fa3uw76q",
"defined-tags": {},
"description": "my favorite demo project\n",
"display-name": "jr-demo-project",
"freeform-tags": {},
"id": "ocid1.datascienceproject.oc1.iad.aaaaaaaappvg4tp5kmbkurcyeghxaqmaknw3s5yh2oxcvfrvjeaadinsng6q",
"lifecycle-state": "ACTIVE",
"time-created": "2019-11-14T22:29:06.870000+00:00"
},
"etag": "b4d66fb733748f3454206d5de6b9acb3634edc804b2ad1997bd69dc676035a89"
}
```

<a id='api_keys'></a>
## Authentication with API Keys

If resource principals are not explicitly used, then API Keys are used by default. For some use cases, see the instructions in the `api_keys.ipynb` example notebook.

<a id='conda'></a>
# PyTorch for CPU Conda Environment

<a id='conda_overview'></a>
## Overview

The PyTorch for CPUs [conda environment](https://docs.cloud.oracle.com/iaas/data-science/using/use-notebook-sessions.htm#conda_understand_environments) is designed for machine learning activities with a focus on applications in computer vision and natural language processing. It provides high-level features for tensor computing and deep neural networks.  This environment also includes acceleration support on Intel's CPUs with the use of `daal4py`. This library enhances `scikit-learn` algorithms by using Intel(R) `oneAPI` Data Analytics library. Use ads-lite to speed up your data science workflow with tools that automate common tasks.

You can access notebook examples for this conda environment in JupyterLab from the **Launcher** tab by clicking **Notebook Examples**. Then you can select one of the notebook examples that are available for all of the conda environments intsalled in your notebook session. 

For a description of each notebook example, see [Overview of the Notebook Examples](https://docs.cloud.oracle.com/iaas/data-science/using/use-notebook-sessions.htm#overview_of_the_notebook_examples). The notebook examples for the PyTorch for CPUs environment emphasize the use of the Oracle [ADS](https://docs.cloud.oracle.com/iaas/tools/ads-sdk/latest/index.html) libraries for a variety of machine learning use cases.

<a id='conda_libraries'></a>
## Principal Conda Libraries

- `ads-lite`
- `category-encoders`
- `daal4py`
- `pandas`
- `scikit-learn`

<a id='conda_configuration'></a>
## Configuration

No additional configuration is needed to use this conda environment.

# References

* [Understanding and Using Conda Environments](https://docs.cloud.oracle.com/iaas/data-science/using/use-notebook-sessions.htm#conda_understand_environments)
* [Intel's oneAPI Data Analytics Library](https://intelpython.github.io/daal4py/)
* [ADS Configuration Guide](https://docs.cloud.oracle.com/iaas/tools/ads-sdk/latest/user_guide/configuration/configuration.html#)
* [Authenticating to the Oracle Cloud Infrastructure APIs from a Notebook Session](https://docs.cloud.oracle.com/iaas/data-science/using/use-notebook-sessions.htm#topic_kxj_znw_pkb)
* [Manually Configuring Your Tenancy for Data Science](https://docs.cloud.oracle.com/iaas/data-science/using/configure-tenancy.htm)
* [Data Science service guide](https://docs.cloud.oracle.com/iaas/data-science/using/data-science.htm)
* [Our Data Science & AI Blog](https://blogs.oracle.com/datascience/)