### OCI Data Science - Useful Tips
<details>
<summary><font size="2">Check for Public Internet Access</font></summary>

```python
import requests
response = requests.get("https://oracle.com")
assert response.status_code==200, "Internet connection failed"
```
</details>
<details>
<summary><font size="2">Helpful Documentation </font></summary>
<ul><li><a href="https://docs.cloud.oracle.com/en-us/iaas/data-science/using/data-science.htm">Data Science Service Documentation</a></li>
<li><a href="https://docs.cloud.oracle.com/iaas/tools/ads-sdk/latest/index.html">ADS documentation</a></li>
</ul>
</details>
<details>
<summary><font size="2">Typical Cell Imports and Settings for ADS</font></summary>

```python
%load_ext autoreload
%autoreload 2
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

import logging
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.ERROR)

import ads
from ads.dataset.factory import DatasetFactory
from ads.automl.provider import OracleAutoMLProvider
from ads.automl.driver import AutoML
from ads.evaluations.evaluator import ADSEvaluator
from ads.common.data import ADSData
from ads.explanations.explainer import ADSExplainer
from ads.explanations.mlx_global_explainer import MLXGlobalExplainer
from ads.explanations.mlx_local_explainer import MLXLocalExplainer
from ads.catalog.model import ModelCatalog
from ads.common.model_artifact import ModelArtifact
```
</details>
<details>
<summary><font size="2">Useful Environment Variables</font></summary>

```python
import os
print(os.environ["NB_SESSION_COMPARTMENT_OCID"])
print(os.environ["PROJECT_OCID"])
print(os.environ["USER_OCID"])
print(os.environ["TENANCY_OCID"])
print(os.environ["NB_REGION"])
```
</details>

# <font color=red>Using Data Science Jobs to ingest feature values in to OCI feature store</font>
<p style="margin-left:10%; margin-right:10%;">by the <font color=teal> Oracle Cloud Infrastructure Data Science Team </font></p>

***

# Introduction 

Data Science Jobs allow you to run customized tasks outside of a notebook session. You can have Compute on demand and only pay for the Compute that you need. With jobs, you can run applications that perform tasks such as data preparation, model training, hyperparameter tuning, and batch inference. When the task is complete, the compute automatically terminates. You can use the Logging service to capture output messages. In this notebook, we will use the Accelerated Data Science SDK (ADS) to help us define a Data Science Job to create design time entities of OCI feature store which can later be used to ingest feature data.

For more information on using ADS for jobs, you can go to our [documentation](https://docs.oracle.com/en-us/iaas/tools/ads-sdk/latest/user_guide/jobs/index.html).

In [10]:
from ads.jobs import Job
from ads.jobs import DataScienceJob, ScriptRuntime
import ads 
ads.set_auth('resource_principal')

## Infrastructure

Data Science Job infrastructure is defined by a `DataScienceJob` instance.  
<span style="color:red">Important:  </span>If you want to use logging for the job, fill in the `log_group_id` and `log_id` in the cell below.  You need to have set up the policies for the logging service.  For more information about setting up logs for a job, you can go to our [documentation](https://docs.oracle.com/en-us/iaas/data-science/using/log-about.htm#jobs_about__job-logs).

In [11]:
infrastructure = (
    DataScienceJob()
    .with_shape_name("VM.Standard2.24")
    .with_block_storage_size(50)
    .with_log_group_id(<log_group_id>)
    .with_log_id(<log_id>)
)

## Job Runtime

`ScriptRuntime` allows you to run Python, Bash, and Java scripts from a single source file (.zip or .tar.gz) or code directory. You can configure a Data Science Conda Environment for running your code.

In [12]:
runtime = (
    ScriptRuntime()
    .with_source("./feature_Store_ingestion.py")
    .with_service_conda("fspyspark32_p38_cpu_v3")
)

## Define Job

With runtime and infrastructure, you can define a job and give it a name

In [13]:
import time    
epoch_time = int(time.time())
print(f'fs_ingestion_{epoch_time}')
job = Job(name= f'fs_ingestion_{epoch_time}').with_infrastructure(infrastructure).with_runtime(runtime)

fs_ingestion_1700223484


## Create and Run Job

You can call the `create()` method of a job instance to create a job. After the job is created, you can call the `run()` method to create and start a job run. The `run()` method returns a `DataScienceJobRun`. You can monitor the job run output by calling the `watch()` method of the `DataScienceJobRun` instance.

In [14]:
job.create()


kind: job
spec:
  id: ocid1.datasciencejob.oc1.iad.amaaaaaabiudgxya5nplohzlhdqajednplisbunaptpgxrdep7r2uqgcgv7q
  infrastructure:
    kind: infrastructure
    spec:
      blockStorageSize: 50
      compartmentId: ocid1.tenancy.oc1..aaaaaaaa462hfhplpx652b32ix62xrdijppq2c7okwcqjlgrbknhgtj2kofa
      displayName: fs_ingestion_1700223484
      jobInfrastructureType: ME_STANDALONE
      jobType: DEFAULT
      logGroupId: ocid1.loggroup.oc1.iad.amaaaaaabiudgxyavwdj2fgi66ezw7gxxlnmxrks5pcm44775pvgwcuwqy7a
      logId: ocid1.log.oc1.iad.amaaaaaabiudgxyai23mvcdneaplmdmrius3ozthkwjpfuqrbl5hdrvwcgaq
      projectId: ocid1.datascienceproject.oc1.iad.amaaaaaabiudgxyak5mvbwf54ca3fqrmh4rl6kqv5mo5qamksz4tkc4diixq
      shapeName: VM.Standard2.24
    type: dataScienceJob
  name: fs_ingestion_1700223484
  runtime:
    kind: runtime
    spec:
      conda:
        slug: fspyspark32_p38_cpu_v1
        type: service
      scriptPathURI: ./feature_Store_ingestion.py
    type: script

In [15]:
job_run = job.run()

In [16]:
job_run.watch()

Job OCID: ocid1.datasciencejob.oc1.iad.amaaaaaabiudgxya5nplohzlhdqajednplisbunaptpgxrdep7r2uqgcgv7q
Job Run OCID: ocid1.datasciencejobrun.oc1.iad.amaaaaaabiudgxyaw75lkemmpip57lbmtwvf22tpp276bfycpxn337zoqqyq
2023-11-17 12:18:14 - Job Run ACCEPTED
2023-11-17 12:18:30 - Job Run ACCEPTED, Infrastructure provisioning.
2023-11-17 12:19:33 - Job Run ACCEPTED, Infrastructure provisioned.
2023-11-17 12:20:04 - Job Run ACCEPTED, Job run bootstrap starting.
2023-11-17 12:22:30 - Job Run ACCEPTED, Job run bootstrap complete. Artifact execution starting.
2023-11-17 12:22:45 - Job Run IN_PROGRESS, Job run artifact execution in progress.
ERROR:ads.common.oci_mixin:Failed to synchronize the properties of <class 'ads.common.oci_logging.OCILog'> due to service error:
{'target_service': 'logging_management', 'status': 404, 'code': 'NotAuthorizedOrNotFound', 'opc-request-id': '0DD9899611EF4A289396EF122B09B37C/4B7B1C538F24C730AE45A20217670AEA/5418E31F02D6F1EBEB1DD3196FEBD761', 'message': 'Authorization fai

ERROR - Exception
Traceback (most recent call last):
  File "/home/datascience/conda/fspyspark32_p38_cpu_v1/lib/python3.8/site-packages/ads/common/oci_logging.py", line 376, in _search_logs
    response = self.search_client.search_logs(
  File "/home/datascience/conda/fspyspark32_p38_cpu_v1/lib/python3.8/site-packages/oci/loggingsearch/log_search_client.py", line 197, in search_logs
    return retry_strategy.make_retrying_call(
  File "/home/datascience/conda/fspyspark32_p38_cpu_v1/lib/python3.8/site-packages/oci/retry/retry.py", line 308, in make_retrying_call
    response = func_ref(*func_args, **func_kwargs)
  File "/home/datascience/conda/fspyspark32_p38_cpu_v1/lib/python3.8/site-packages/oci/base_client.py", line 522, in call_api
    return self.request(request, allow_control_chars, operation_name, api_reference_link)
  File "/home/datascience/conda/fspyspark32_p38_cpu_v1/lib/python3.8/site-packages/circuitbreaker.py", line 159, in wrapper
    return call(function, *args, **kwargs