### JOBS for Data Drift Detection

Create, run and monitor a JOB for Data Drift Detection from a NB.

In this NoteBook we show how to create, run and monitor a JOB that compare a reference and a new dataset in order to check if there is Data Drift.
The code is packed in a tar.gz file, saved in Object Storage

In [18]:
import os
import ads

from ads.jobs import DataScienceJob
from ads.jobs import ScriptRuntime
from ads.jobs import Job

from ads import set_auth

In [19]:
print(ads.__version__)

2.5.4


In [20]:
compartment_id = os.environ['NB_SESSION_COMPARTMENT_OCID']
project_id = os.environ['PROJECT_OCID']

set_auth(auth='resource_principal')

In [21]:
# 1. Specify the INfrastructure requested
# VM Shape, logging
# netwrok is taken from NB session
infrastructure = (
    DataScienceJob()
    .with_shape_name("VM.Standard2.4")
    .with_log_group_id("ocid1.loggroup.oc1.eu-frankfurt-1.amaaaaaangencdyazs4l4rzrzsarlej6mqlwlbz6bmnx4adwdlssveam2jaa")
    .with_log_id("ocid1.log.oc1.eu-frankfurt-1.amaaaaaangencdya47httqmxyiew5tkxa6l7gekev2ljpasixuhmp2fa3v5q")
)

In [22]:
#
# all the Python code is packed i drift.tar.gz, saved in a Object Storage bucket
# url: oci://drift_input@frqap2zhtzbe/drift.tar.gz
#

# specify the runtime and conda and env 
runtime = (
    ScriptRuntime()
    .with_source("oci://drift_input@frqap2zhtzbe/drift.tar.gz")
    .with_service_conda("generalml_p37_cpu_v1")
    .with_environment_variable(JOB_RUN_ENTRYPOINT="test_drift_analysis.py")
)

In [23]:
# specify the JOB
job = (
    Job(name="job_data_drift2")
    .with_infrastructure(infrastructure)
    .with_runtime(runtime)
)

In [24]:
# create the JOB
job.create()

kind: job
spec:
  id: ocid1.datasciencejob.oc1.eu-frankfurt-1.amaaaaaangencdyah5gdzd2jjmdolnnd7bvcifnq6mi6rfirddwohvva2xta
  infrastructure:
    kind: infrastructure
    spec:
      blockStorageSize: 500
      compartmentId: ocid1.compartment.oc1..aaaaaaaag2cpni5qj6li5ny6ehuahhepbpveopobooayqfeudqygdtfe6h3a
      displayName: job_data_drift2
      jobInfrastructureType: STANDALONE
      jobType: DEFAULT
      logGroupId: ocid1.loggroup.oc1.eu-frankfurt-1.amaaaaaangencdyazs4l4rzrzsarlej6mqlwlbz6bmnx4adwdlssveam2jaa
      logId: ocid1.log.oc1.eu-frankfurt-1.amaaaaaangencdya47httqmxyiew5tkxa6l7gekev2ljpasixuhmp2fa3v5q
      projectId: ocid1.datascienceproject.oc1.eu-frankfurt-1.amaaaaaangencdyatmhyp2gmw3hll77lhrup6alcojr56n2iixtt56m35wxa
      shapeName: VM.Standard2.4
      subnetId: ocid1.subnet.oc1.eu-frankfurt-1.aaaaaaaaijgqblnhpqle2zorl75qli23wre5eboqjtystagdgun4qwdxj4aq
    type: dataScienceJob
  name: job_data_drift2
  runtime:
    kind: runtime
    spec:
      conda:
        slug:

In [25]:
# run
job_run = job.run()

In [26]:
# watch and stream the job run outputs
job_run.watch()

2022-06-21 19:22:39 - Job Run ACCEPTED, Infrastructure provisioning.
2022-06-21 19:23:35 - Job Run ACCEPTED, Infrastructure provisioned.
2022-06-21 19:23:44 - Job Run ACCEPTED, Job run bootstrap starting.
2022-06-21 19:26:43 - Job Run ACCEPTED, Job run bootstrap complete. Artifact execution starting.
2022-06-21 19:26:55 - Job Run IN_PROGRESS, Job run artifact execution in progress.
2022-06-21 19:26:46 - Matplotlib is building the font cache; this may take a moment.
2022-06-21 19:26:51 - Read dataset to compare and analyze...
2022-06-21 19:26:51 - 
2022-06-21 19:26:51 - *** Report on evidences of Data Drift identified ***
2022-06-21 19:26:51 - 
2022-06-21 19:26:52 - p_value: 0.0
2022-06-21 19:26:52 - Identified drift in column: Age
2022-06-21 19:26:52 - 
2022-06-21 19:26:52 - Identified drift in column: MonthlyIncome
2022-06-21 19:26:52 - 
2022-06-21 19:26:52 - p_value: 0.0
2022-06-21 19:26:52 - [{'Column': 'Age', 'Type': 'continuous', 'p_value': 0.0, 'threshold': 0.01, 'stats': '[37.81

compartmentId: ocid1.compartment.oc1..aaaaaaaag2cpni5qj6li5ny6ehuahhepbpveopobooayqfeudqygdtfe6h3a
createdBy: ocid1.datasciencenotebooksession.oc1.eu-frankfurt-1.amaaaaaangencdyapcfjgjuueoo2t5qb5b3zidssg37mocoeqqeqbxthtnva
definedTags:
  default-tags:
    CreatedBy: ocid1.datasciencenotebooksession.oc1.eu-frankfurt-1.amaaaaaangencdyapcfjgjuueoo2t5qb5b3zidssg37mocoeqqeqbxthtnva
displayName: job_data_drift2-run-20220621-1922
id: ocid1.datasciencejobrun.oc1.eu-frankfurt-1.amaaaaaangencdyatg3tn6mtypr2qsoly5gm7amdg7u7qrh2ovasf62vhlyq
jobConfigurationOverrideDetails:
  jobType: DEFAULT
jobId: ocid1.datasciencejob.oc1.eu-frankfurt-1.amaaaaaangencdyah5gdzd2jjmdolnnd7bvcifnq6mi6rfirddwohvva2xta
jobInfrastructureConfigurationDetails:
  blockStorageSizeInGBs: 500
  jobInfrastructureType: STANDALONE
  shapeName: VM.Standard2.4
  subnetId: ocid1.subnet.oc1.eu-frankfurt-1.aaaaaaaaijgqblnhpqle2zorl75qli23wre5eboqjtystagdgun4qwdxj4aq
lifecycleDetails: Job run artifact execution in progress.
lifecycleS