# ARM Showcase Notebook

ARM shapes are now supported in the OCI Data Science Service. This notebook provides a showcase of what you can do with an ARM-based conda pack. For the most part the underlying platform will not make a difference, but the notebook does a check at the beginning to be sure.

## Upgrade Accelerated Data Science SDK - `oracle-ads`

The Oracle Accelerated Data Science (ADS) SDK is maintained by the Oracle Cloud Infrastructure Data Science service team. It speeds up common data science activities by providing tools that automate and/or simplify common data science tasks, along with providing a data scientist friendly pythonic interface to Oracle Cloud Infrastructure (OCI) services, most notably OCI Data Science, Data Flow, Object Storage, and the Autonomous Database. ADS gives you an interface to manage the lifecycle of machine learning models, from data acquisition to model evaluation, interpretation, and model deployment.

Before you begin with a conda environment, upgrade `oracle-ads` library - [![PyPI](https://img.shields.io/pypi/v/oracle-ads.svg)](https://pypi.org/project/oracle-ads/)  [![Python](https://img.shields.io/pypi/pyversions/oracle-ads.svg?style=plastic)](https://pypi.org/project/oracle-ads/)


You can check your version of `oracle-ads` by running - 

In [1]:
# Show the ADS version and platform

import ads
print(ads.__version__)

import os
print(os.uname().machine)

2.8.10


In [2]:
# Uncomment this code and set the correct proxy links if have to setup proxy for internet
# import os
# os.environ['http_proxy']="http://myproxy"
# os.environ['https_proxy']="http://myproxy"

# Use os.environ['no_proxy'] to route trafic directly

In [3]:
# To upgrade ADS uncomment the next line and run the following
#! pip install -q oracle-ads --upgrade

## Authentication
To interact with oci services you need to authenticate with one of the following mechanism - 

### 1. Resource Principal

Resource Principal works by authorizing the notebook instance that you are using to read/manage OCI service resource such as Object Storage, Data Science Jobs, Data Science Models, Data Science Model Deployment, etc. Check these references - 
    
- Refer how to setup policy for managing Data science service resource [here](https://docs.oracle.com/en-us/iaas/data-science/using/policies.htm)
- Refer how to setup policy for managing Object Storage service resource [here](https://docs.oracle.com/en-us/iaas/Content/Identity/policiescommon/commonpolicies.htm#write-objects-to-buckets)
    
    
Other useful resources - 

- https://docs.oracle.com/en-us/iaas/Content/Identity/Concepts/commonpolicies.htm
- https://docs.oracle.com/en-us/iaas/Content/Identity/Concepts/policygetstarted.htm#Getting_Started_with_Policies

Once the policies are setup, configure `oracle-ads` to use resource principal as follows - 


```python
ads.set_auth('resource_principal')
```

### 2. API Key

To setup API Key refer - 

- https://docs.oracle.com/en-us/iaas/Content/API/Concepts/apisigningkey.htm
- https://docs.oracle.com/en-us/iaas/Content/API/Concepts/sdkconfig.htm


Once you have setup the config and the keys, you can setup ads to use API Keys - 

```python

ads.set_auth('api_key')

```

## Working with Data on Object Storage

In [4]:
import ads
import pandas as pd

ads.set_auth("resource_principal")

In [None]:
bucket_name = "hosted-ds-datasets"
namespace = "bigdatadatasciencelarge"


file_name = "titanic/titanic.csv"
df = pd.read_csv(
    f"oci://{bucket_name}@{namespace}/{file_name}",
    storage_options=ads.common.auth.default_signer(),
)

In [None]:
df.head()

## Prediction (out of sample)

In [None]:
%matplotlib inline

In [None]:
import numpy as np
import matplotlib.pyplot as plt

import statsmodels.api as sm

plt.rc("figure", figsize=(16, 8))
plt.rc("font", size=14)

### Artificial data

In [None]:
nsample = 50
sig = 0.25
x1 = np.linspace(0, 20, nsample)
X = np.column_stack((x1, np.sin(x1), (x1 - 5) ** 2))
X = sm.add_constant(X)
beta = [5.0, 0.5, 0.5, -0.02]
y_true = np.dot(X, beta)
y = y_true + sig * np.random.normal(size=nsample)

### Estimation 

In [None]:
olsmod = sm.OLS(y, X)
olsres = olsmod.fit()
print(olsres.summary())

### In-sample prediction

In [None]:
ypred = olsres.predict(X)
print(ypred)

### Create a new sample of explanatory variables Xnew, predict and plot

In [None]:
x1n = np.linspace(20.5, 25, 10)
Xnew = np.column_stack((x1n, np.sin(x1n), (x1n - 5) ** 2))
Xnew = sm.add_constant(Xnew)
ynewpred = olsres.predict(Xnew)  # predict out of sample
print(ynewpred)

### Plot comparison

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.plot(x1, y, "o", label="Data")
ax.plot(x1, y_true, "b-", label="True")
ax.plot(np.hstack((x1, x1n)), np.hstack((ypred, ynewpred)), "r", label="OLS prediction")
ax.legend(loc="best")

### Predicting with Formulas

Using formulas can make both estimation and prediction a lot easier

In [None]:
from statsmodels.formula.api import ols

data = {"x1": x1, "y": y}

res = ols("y ~ x1 + np.sin(x1) + I((x1-5)**2)", data=data).fit()

We use the `I` to indicate use of the Identity transform. Ie., we do not want any expansion magic from using `**2`

In [None]:
res.params

Now we only have to pass the single variable and we get the transformed right-hand side variables automatically

In [None]:
res.predict(exog=dict(x1=x1n))

## Pandas

Manipulate a pandas dataframe

In [None]:
import pandas as pd

In [None]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

# Define the column names
col_names = ["sepal_length_in_cm", "sepal_width_in_cm","petal_length_in_cm", "petal_width_in_cm", "class"]

# Read data from URL
iris_data = pd.read_csv(url, names=col_names)

iris_data.head() 

In [None]:
iris_data.loc[4, :].values.flatten().tolist()

In [None]:
assert iris_data.loc[3, :].values.flatten().tolist() == [4.6, 3.1, 1.5, 0.2, 'Iris-setosa']
assert iris_data.loc[4, :].values.flatten().tolist() == [5.0, 3.6, 1.4, 0.2, 'Iris-setosa']

In [None]:
iris_data.drop(iris_data.index[3], inplace=True)

In [None]:
iris_data.head()

In [None]:
iris_data.to_csv("altered_data.csv")

In [None]:
new_data = pd.read_csv("altered_data.csv")
new_data.drop(['Unnamed: 0'], axis=1, inplace=True)
new_data.head()


In [None]:
assert new_data.loc[3, :].values.flatten().tolist() == [5.0, 3.6, 1.4, 0.2, 'Iris-setosa']

### Working with other sources

Learn how to work with other sources [here](https://accelerated-data-science.readthedocs.io/en/latest/user_guide/loading_data/connect.html)

## References

* [Oracle Accelerated Data Science SDK Guide](https://accelerated-data-science.readthedocs.io/en/latest/)
* [Oracle Accelerated Data Science Source Code](https://github.com/oracle/accelerated-data-science)
* [Notebook Examples](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/master/notebook_examples)
* [Cond environments](https://docs.oracle.com/en-us/iaas/data-science/using/conda_understand_environments.htm)
* [Publish Conda Environments](https://docs.oracle.com/en-us/iaas/data-science/using/conda_publishs_object.htm)