## Jobs Management Examples

#### You can use the API to create, start, stop, schedule jobs, and more

#### This functionality empowers CML users who have ETL and database scoring use cases, or simply want to automate projects

#### Please make sure to select a legacy engine when you launch a session as some of the API's methods do not work with ML Runtimes yet. 

#### You can select Legacy engines by going to "Project Settings" -> "Runtime/Engine" -> "Legacy Engine". The default docker image will be fine.

In [2]:
import cmlapi
import os

config = cmlapi.Configuration()
config.host = os.environ["PROJECT_HOST"]
client = cmlapi.ApiClient(config)
client.set_default_header("authorization", "Bearer "+os.environ["API_KEY"])
api_instance = cmlapi.CMLServiceApi(client)

#### You can create a new job as below. The schedule value follows the cron format. For example, to execute the job every Monday at 1 PM UTC, the schedule would be "0 13 * * 1" without quotes. 

#### Notice the project id is repeated both inside the dictionary and at the end

In [3]:
api_instance.create_job({"project_id":"qqbn-sekc-qwme-4kyg",
                          "name":"my_scheduled_job",
                          "schedule":"0 13 * * 1",
                          "memory":4,
                          "cpu":2,
                          "script":"data_ingest_job.py", 
                          "kernel":"python3"}, "qqbn-sekc-qwme-4kyg")

{'arguments': '',
 'cpu': 2.0,
 'created_at': datetime.datetime(2021, 7, 12, 20, 15, 1, 603239, tzinfo=tzlocal()),
 'creator': {'email': 'pauldefusco@cloudera.com',
             'id': '1',
             'name': 'Paul de Fusco',
             'username': 'pauldefusco'},
 'engine_image_id': '15',
 'english_schedule': '0 13 * * 1',
 'environment': '',
 'id': 'ennq-7dum-18ab-a5ow',
 'kernel': 'python3',
 'memory': 4.0,
 'name': 'my_scheduled_job',
 'nvidia_gpu': '0',
 'parent_id': '',
 'paused': False,
 'schedule': '0 13 * * 1',
 'script': 'data_ingest_job.py',
 'share_token': '',
 'timeout': '0',
 'timeout_kill': False,
 'timezone': 'America/Los_Angeles',
 'type': 'cron',
 'updated_at': datetime.datetime(2021, 7, 12, 20, 15, 1, 603284, tzinfo=tzlocal())}

#### Alternatively, you can create a job and then issue its execution with a second command.

In [4]:
api_instance.create_job({"project_id":"qqbn-sekc-qwme-4kyg",
                          "name":"my_unscheduled_job",
                          #"schedule":"0 13 * * 1",
                          "memory":4,
                          "cpu":2,
                          "script":"data_ingest_job.py", 
                          "kernel":"python3"}, "qqbn-sekc-qwme-4kyg")

{'arguments': '',
 'cpu': 2.0,
 'created_at': datetime.datetime(2021, 7, 12, 20, 15, 1, 782930, tzinfo=tzlocal()),
 'creator': {'email': 'pauldefusco@cloudera.com',
             'id': '1',
             'name': 'Paul de Fusco',
             'username': 'pauldefusco'},
 'engine_image_id': '15',
 'english_schedule': '',
 'environment': '',
 'id': '562m-sk6u-s1av-9t89',
 'kernel': 'python3',
 'memory': 4.0,
 'name': 'my_unscheduled_job',
 'nvidia_gpu': '0',
 'parent_id': '',
 'paused': False,
 'schedule': '',
 'script': 'data_ingest_job.py',
 'share_token': '',
 'timeout': '0',
 'timeout_kill': False,
 'timezone': 'America/Los_Angeles',
 'type': 'manual',
 'updated_at': datetime.datetime(2021, 7, 12, 20, 15, 1, 782966, tzinfo=tzlocal())}

In [9]:
project_identifier="qqbn-sekc-qwme-4kyg"
name="my_unscheduled_job"

#### Notice the project id trailing the dictionary is also followed by the job id

In [7]:
api_instance.create_job_run({"project_id":"qqbn-sekc-qwme-4kyg",
                              "name":"my_unscheduled_job",
                              "script":"data_ingest_job.py",
                              "kernel":"python3"}, "qqbn-sekc-qwme-4kyg", "562m-sk6u-s1av-9t89")

{'arguments': '',
 'cpu': 2.0,
 'created_at': datetime.datetime(2021, 7, 12, 20, 16, 11, 695305, tzinfo=tzlocal()),
 'environment': 'null',
 'finished_at': datetime.datetime(1, 1, 1, 0, 0, tzinfo=tzlocal()),
 'id': '0645larphwti9o39',
 'job_id': '562m-sk6u-s1av-9t89',
 'kernel': 'python3',
 'memory': 4.0,
 'nvidia_gpu': 0,
 'project_id': 'qqbn-sekc-qwme-4kyg',
 'running_at': datetime.datetime(1, 1, 1, 0, 0, tzinfo=tzlocal()),
 'scheduling_at': datetime.datetime(2021, 7, 12, 20, 16, 11, 694997, tzinfo=tzlocal()),
 'starting_at': datetime.datetime(1, 1, 1, 0, 0, tzinfo=tzlocal()),
 'status': 'ENGINE_SCHEDULING'}

#### Jobs can be created to start bassed on whether other jobs succeed 

#### To do so, you can create a job and declare its parent. The child will launch as soon as the parent completes.

In [8]:
api_instance.create_job({"project_id":"qqbn-sekc-qwme-4kyg",
                          "name":"parent_job",
                          #"schedule":"0 13 * * 1",
                          "memory":4,
                          "cpu":2,
                          "script":"data_ingest_job.py", 
                          "kernel":"python3"}, "qqbn-sekc-qwme-4kyg")

{'arguments': '',
 'cpu': 2.0,
 'created_at': datetime.datetime(2021, 7, 12, 20, 19, 57, 659860, tzinfo=tzlocal()),
 'creator': {'email': 'pauldefusco@cloudera.com',
             'id': '1',
             'name': 'Paul de Fusco',
             'username': 'pauldefusco'},
 'engine_image_id': '15',
 'english_schedule': '',
 'environment': '',
 'id': 'xxu8-5xhs-vgin-jn0o',
 'kernel': 'python3',
 'memory': 4.0,
 'name': 'parent_job',
 'nvidia_gpu': '0',
 'parent_id': '',
 'paused': False,
 'schedule': '',
 'script': 'data_ingest_job.py',
 'share_token': '',
 'timeout': '0',
 'timeout_kill': False,
 'timezone': 'America/Los_Angeles',
 'type': 'manual',
 'updated_at': datetime.datetime(2021, 7, 12, 20, 19, 57, 659915, tzinfo=tzlocal())}

In [12]:
api_instance.create_job({"project_id":"qqbn-sekc-qwme-4kyg",
                          "name":"child_job",
                          "parent_job_id":"xxu8-5xhs-vgin-jn0o",
                          "memory":4,
                          "cpu":2,
                          "script":"data_ingest_job.py", 
                          "kernel":"python3"}, "qqbn-sekc-qwme-4kyg")

{'arguments': '',
 'cpu': 2.0,
 'created_at': datetime.datetime(2021, 7, 12, 20, 24, 33, 169006, tzinfo=tzlocal()),
 'creator': {'email': 'pauldefusco@cloudera.com',
             'id': '1',
             'name': 'Paul de Fusco',
             'username': 'pauldefusco'},
 'engine_image_id': '15',
 'english_schedule': '',
 'environment': '',
 'id': 'n7tl-5ihi-pvku-6svh',
 'kernel': 'python3',
 'memory': 4.0,
 'name': 'child_job',
 'nvidia_gpu': '0',
 'parent_id': 'xxu8-5xhs-vgin-jn0o',
 'paused': False,
 'schedule': '',
 'script': 'data_ingest_job.py',
 'share_token': '',
 'timeout': '0',
 'timeout_kill': False,
 'timezone': 'America/Los_Angeles',
 'type': 'dependent',
 'updated_at': datetime.datetime(2021, 7, 12, 20, 24, 33, 169029, tzinfo=tzlocal())}

In [13]:
api_instance.create_job_run({"project_id":"qqbn-sekc-qwme-4kyg",
                              "name":"parent_job",
                              "script":"data_ingest_job.py",
                              "kernel":"python3"}, "qqbn-sekc-qwme-4kyg", "xxu8-5xhs-vgin-jn0o")

{'arguments': '',
 'cpu': 2.0,
 'created_at': datetime.datetime(2021, 7, 12, 20, 24, 41, 864607, tzinfo=tzlocal()),
 'environment': 'null',
 'finished_at': datetime.datetime(1, 1, 1, 0, 0, tzinfo=tzlocal()),
 'id': 'oh72tzmkh6i5duun',
 'job_id': 'xxu8-5xhs-vgin-jn0o',
 'kernel': 'python3',
 'memory': 4.0,
 'nvidia_gpu': 0,
 'project_id': 'qqbn-sekc-qwme-4kyg',
 'running_at': datetime.datetime(1, 1, 1, 0, 0, tzinfo=tzlocal()),
 'scheduling_at': datetime.datetime(2021, 7, 12, 20, 24, 41, 864585, tzinfo=tzlocal()),
 'starting_at': datetime.datetime(1, 1, 1, 0, 0, tzinfo=tzlocal()),
 'status': 'ENGINE_SCHEDULING'}