### OCI Data Science - Useful Tips
<details>
<summary><font size="2">Check for Public Internet Access</font></summary>

```python
import requests
response = requests.get("https://oracle.com")
assert response.status_code==200, "Internet connection failed"
```
</details>
<details>
<summary><font size="2">Helpful Documentation </font></summary>
<ul><li><a href="https://docs.cloud.oracle.com/en-us/iaas/data-science/using/data-science.htm">Data Science Service Documentation</a></li>
<li><a href="https://docs.cloud.oracle.com/iaas/tools/ads-sdk/latest/index.html">ADS documentation</a></li>
</ul>
</details>
<details>
<summary><font size="2">Typical Cell Imports and Settings for ADS</font></summary>

```python
%load_ext autoreload
%autoreload 2
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

import logging
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.ERROR)

import ads
from ads.dataset.factory import DatasetFactory
from ads.automl.provider import OracleAutoMLProvider
from ads.automl.driver import AutoML
from ads.evaluations.evaluator import ADSEvaluator
from ads.common.data import ADSData
from ads.explanations.explainer import ADSExplainer
from ads.explanations.mlx_global_explainer import MLXGlobalExplainer
from ads.explanations.mlx_local_explainer import MLXLocalExplainer
from ads.catalog.model import ModelCatalog
from ads.common.model_artifact import ModelArtifact
```
</details>
<details>
<summary><font size="2">Useful Environment Variables</font></summary>

```python
import os
print(os.environ["NB_SESSION_COMPARTMENT_OCID"])
print(os.environ["PROJECT_OCID"])
print(os.environ["USER_OCID"])
print(os.environ["TENANCY_OCID"])
print(os.environ["NB_REGION"])
```
</details>

In [None]:
# Upgrade Oracle ADS to pick up latest features and maintain compatibility with Oracle Cloud Infrastructure.
!pip install -U oracle-ads

In [2]:
# importando a biblioteca ADS e realizando a autenticação
import ads
ads.set_auth("resource_principal")

In [3]:
import os
compartment_id = os.environ.get("NB_SESSION_COMPARTMENT_OCID")
logs_bucket_uri = "oci://bucket-logs@id3kyspkytmr" #Bucket para logs da sessão do Data Flow
archive_uri = "oci://bucket-library@id3kyspkytmr/archive3.zip" #Aqui colocar a imagem que você quer subir

In [4]:
import json

def prepare_command(command: dict) -> str:
    """Converts dictionary command to the string formatted commands."""
    return f"'{json.dumps(command)}'"

In [5]:
%load_ext dataflow.magics

In [6]:
command = prepare_command(
    {
        "compartmentId": compartment_id,
        "displayName": "App_DataFlowStudio",
        "language": "PYTHON",
        "sparkVersion": "3.2.1",
        "numExecutors": 2,
        "archiveUri": archive_uri,
        "driverShape": "VM.Standard.E4.Flex",
        "executorShape": "VM.Standard.E4.Flex",
        "configuration": {
            "spark.dynamicAllocation.enabled": "true",
            "spark.dynamicAllocation.shuffleTracking.enabled": "true",
            "spark.dynamicAllocation.minExecutors": "1",
            "spark.dynamicAllocation.maxExecutors": "2",
            "spark.dynamicAllocation.executorIdleTimeout": "60",
            "spark.dynamicAllocation.schedulerBacklogTimeout": "60",
            "spark.dataflow.dynamicAllocation.quotaPolicy": "min",
            "spark.jars.packages": "com.oracle.oci.sdk:oci-java-sdk-addons-sasl:2.20.0,org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.1"
        },
        "driverShapeConfig": {"ocpus": 1, "memoryInGBs": 8},
        "executorShapeConfig": {"ocpus": 1, "memoryInGBs": 8},
        "logsBucketUri": logs_bucket_uri,
        "type": "SESSION",
        #"logsBucketUri": logs_bucket_uri,
        #"poolId": "ocid1.dataflowpool.oc1.iad.anuwcljttsbrckqalrhm4oxeatoxzdmwoo36wpsqpyb2mhz5kunzkojynd7a"
        }
) 
%create_session -l python -c $command

Setting up the Cluster..


FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

Cluster is ready..
Starting Spark application..


Session ID,Kind,State,Current session
ocid1.dataflowapplication.oc1.iad.anuwcljttsbrckqasmgrtupiovpqikgyqy3l4mlqje7i3clxs45knavsxswa,pyspark,IN_PROGRESS,Dataflow Run


FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

SparkSession available as 'spark'.
SparkContext available as 'sc'.


In [7]:
%use_session -s ocid1.dataflowapplication.oc1.iad.anuwcljttsbrckqasmgrtupiovpqikgyqy3l4mlqje7i3clxs45knavsxswa

Using Active Session .. ocid1.dataflowrun.oc1.iad.anuwcljttsbrckqaonas4xuldm72kxiqgqta5pjllnmccnkdi2sb6ehsrhha


FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

Cluster is ready..
Starting Spark application..


Session ID,Kind,State,Current session
ocid1.dataflowapplication.oc1.iad.anuwcljttsbrckqasmgrtupiovpqikgyqy3l4mlqje7i3clxs45knavsxswa,pyspark,IN_PROGRESS,Dataflow Run


FloatProgress(value=0.0, bar_style='info', description='Progress:', layout=Layout(height='25px', width='50%'),…

SparkSession available as 'spark'.
SparkContext available as 'sc'.
