# CI / CD with CML API V2

##### Cloudera Machine Learning exposes a REST API that you can use to perform operations related to projects, jobs, and runs. You can use API commands to integrate CML with third-party workflow tools or to control CML from the command line.

##### This notebook demonstrates how to create and execute three CML jobs with the CML API V2. 

For the Public Documentation please visit this [page](https://docs.cloudera.com/machine-learning/cloud/api/topics/ml-api-v2.html)

For a full example of the API's capabilities, please visit this notebook from [Cloudera's Public GitHub](https://github.com/cloudera/CML_AMP_APIv2)

#### To use the Python API in your own code, first install the Python API client and point it to your cluster.

In [25]:
### Install the API

import os
cluster = os.getenv("CDSW_DOMAIN")

# If you are not on a TLS enabled cluster (your cluster url starts with ‘http’),
# please use the following command instead.
# !pip3 install http://{cluster}/api/v2/python.tar.gz
#!pip3 install https://{cluster}/api/v2/python.tar.gz

In [26]:
from cmlapi.utils import Cursor
import cmlapi
import string
import random
import json

try:
    client = cmlapi.default_client()
except ValueError:
    print("Could not create a client. If this code is not being run in a CML session, please include the keyword arguments \"url\" and \"cml_api_key\".")

session_id = "".join([random.choice(string.ascii_lowercase) for _ in range(6)])
session_id

'nusdau'

##### The API has a lot of cool features. For example, it allows you to view all runtimes associated with the CML Workspace

In [27]:
# cursor also supports search_filter
# cursor = Cursor(client.list_runtimes, 
#                 search_filter = json.dumps({"image_identifier":"jupyter"}))
cursor = Cursor(client.list_runtimes)
runtimes = cursor.items()
for rt in runtimes:
    print(rt.image_identifier)

docker.repository.cloudera.com/cdsw/ml-runtime-jupyterlab-python3.6-cuda:2021.09.1-b5
docker.repository.cloudera.com/cdsw/ml-runtime-jupyterlab-python3.6-standard:2021.09.1-b5
docker.repository.cloudera.com/cdsw/ml-runtime-jupyterlab-python3.7-cuda:2021.09.1-b5
docker.repository.cloudera.com/cdsw/ml-runtime-jupyterlab-python3.7-rapids:2021.04.1-b5
docker.repository.cloudera.com/cdsw/ml-runtime-jupyterlab-python3.7-standard:2021.09.1-b5
docker.repository.cloudera.com/cdsw/ml-runtime-jupyterlab-python3.8-cuda:2021.09.1-b5
docker.repository.cloudera.com/cdsw/ml-runtime-jupyterlab-python3.8-rapids:2021.04.1-b5
docker.repository.cloudera.com/cdsw/ml-runtime-jupyterlab-python3.8-standard:2021.09.1-b5
docker.repository.cloudera.com/cdsw/ml-runtime-jupyterlab-python3.9-cuda:2021.09.1-b5
docker.repository.cloudera.com/cdsw/ml-runtime-jupyterlab-python3.9-standard:2021.09.1-b5
docker.repository.cloudera.com/cdsw/ml-runtime-workbench-python3.6-cuda:2021.09.1-b5
docker.repository.cloudera.com/cdsw

In [28]:
from __future__ import print_function
import time
import cmlapi
from cmlapi.rest import ApiException
from pprint import pprint

try:
    # List the available runtime addons, optionally filtered, sorted, and paginated.
    api_response = client.list_runtime_addons(page_size=500)
    pprint(api_response)
except ApiException as e:
    print("Exception when calling CMLServiceApi->list_runtime_addons: %s\n" % e)


{'next_page_token': '',
 'runtime_addons': [{'component': 'HadoopCLI',
                     'display_name': 'Hadoop CLI 3.1.1 - CDP 7.2.8 - HOTFIX-2',
                     'identifier': 'hadoop-cli-311-728-hf2',
                     'status': 'AVAILABLE'},
                    {'component': 'HadoopCLI',
                     'display_name': 'Hadoop CLI - CDP 7.2.11 - HOTFIX-2',
                     'identifier': 'hadoop-cli-7.2.11-hf2',
                     'status': 'AVAILABLE'},
                    {'component': 'Spark',
                     'display_name': 'Spark 2.4.7 - CDP 7.2.11 - CDE 1.13 - '
                                     'HOTFIX-2',
                     'identifier': 'spark247-13-hf2',
                     'status': 'AVAILABLE'},
                    {'component': 'Spark',
                     'display_name': 'Spark 2.4.7 - CDP 7.2.10 - CDE 1.11 - '
                                     'HOTFIX-1',
                     'identifier': 'spark247-hf1',
                     'stat

##### Similarly, you can see all jobs in the current CML Project

In [29]:
### GET ALL PREVIOUS JOBS FROM PROJECT ###
    
project_id = os.environ["CDSW_PROJECT_ID"]

joblists = client.list_jobs(project_id = project_id)
print(f'Fetched {len(joblists.jobs)} jobs from the project')

Fetched 0 jobs from the project


In [30]:
project_id

'xmht-69bj-ku2o-05a3'

## You can build ML Ops Pipelines by creating and executing CML Jobs with the API from this notebook.

##### First we create a CML Job to train a model. We trained this model in ProtoType.py but in a real world scenario you may have created your model baseline with ML Flow, also available in CML under the Experiments tab.

In [31]:
### CREATE A JOB TO RETRAIN THE MODEL ###
    
# Create a job. We will create dependent/children jobs of this job, so we call this one a "grandparent job". The parameter "runtime_identifier" is needed if this is running in a runtimes project.
greatgrandparent_job_body = cmlapi.CreateJobRequest(
    project_id = project_id,
    name = "TrainModelJob",
    script = "cml_jobs/TrainModelJob.py",
    cpu = 4.0,
    memory = 12.0,
    runtime_identifier = "docker.repository.cloudera.com/cdsw/ml-runtime-workbench-python3.7-standard:2021.09.1-b5", 
    runtime_addon_identifiers = ["spark311-13-hf1"]
)

In [32]:
# Create this job within the project specified by the project_id parameter.
greatgrandparent_job = client.create_job(greatgrandparent_job_body, project_id)

##### A second CML Job to push the model to a REST Endpoint can be set to execute as a dependency on the first one

In [33]:
### CREATE A JOB TO PUSH THE MODEL TO A REST ENDPOINT ###

# Create a dependent job by specifying the parent job's ID in the parent_job_id field.
grandparent_job_body = cmlapi.CreateJobRequest(
    project_id = project_id,
    name = "PushModelJob",
    script = "cml_jobs/PushModelJob.py",
    kernel = "python3",
    cpu = 2,
    memory = 4,
    runtime_identifier = "docker.repository.cloudera.com/cdsw/ml-runtime-workbench-python3.7-standard:2021.09.1-b5",
    runtime_addon_identifiers = ["spark311-13-hf1"],
    parent_job_id = greatgrandparent_job.id
)
grandparent_job = client.create_job(grandparent_job_body, project_id)

##### A third CML Job to push the model to a REST Endpoint can be set to execute as a dependency on the first one

In [34]:
### CREATE A JOB TO WARM UP THE MODEL ###

# Create a dependent job by specifying the parent job's ID in the parent_job_id field.
parent_job_body = cmlapi.CreateJobRequest(
    project_id = project_id,
    name = "Simulation",
    script = "cml_jobs/Simulation.py",
    kernel = "python3",
    cpu = 2,
    memory = 4,
    runtime_identifier = "docker.repository.cloudera.com/cdsw/ml-runtime-workbench-python3.7-standard:2021.09.1-b5",
    runtime_addon_identifiers = ["spark311-13-hf1"],
    parent_job_id = grandparent_job.id
)
parent_job = client.create_job(parent_job_body, project_id)

##### Finally a third job can be created to perform Model Inference. This job is dependent on the second job.

In [35]:
### CREATE A JOB TO DO INFERENCE ON THE MODEL ###

# Create a job that is dependent on the job from the previous cell. This leads to a dependency chain of grandparent_job -> parent_job -> child_job. If grantparent_job runs and succeeds, then parent_job will trigger, and if parent_job runs and succeeds, child_job will trigger. This one uses a template script that does not terminate, so we'll have the opportunity to try stopping it later.
child_job_body = cmlapi.CreateJobRequest(
    project_id = project_id,
    name = "InferenceJob",
    script = "cml_jobs/InferenceJob.py",
    kernel = "python3",
    cpu = 4,
    memory = 12,  
    runtime_identifier = "docker.repository.cloudera.com/cdsw/ml-runtime-workbench-python3.7-standard:2021.09.1-b5",
    runtime_addon_identifiers = ["spark311-13-hf1"],
    parent_job_id = parent_job.id
)
child_job = client.create_job(child_job_body, project_id)

##### Notice that although we have created the jobs, we haven't executed them yet

##### If you hover over to the CML Project landing page and open the Jobs tab you will notice the jobs have been added under the "Jobs Section"

![title](images/cml-jobs-created.png)

##### Next, we can use the API to run the first job. When it succeeds, the second job will execute, and then the third...

In [None]:
# Create a job run for the specified job.
# If the job has dependent jobs, the dependent jobs will run after the job succeeds.
# In this case, the grandparent job will run first, then the parent job, and then the child job, provided each job run succeeds.
jobrun_body = cmlapi.CreateJobRunRequest(project_id, greatgrandparent_job.id)
job_run = client.create_job_run(jobrun_body, project_id, greatgrandparent_job.id)
run_id = job_run.id

##### Go back to the Jobs tab in the CML landing page. Notice the jobs are now executing 

![title](images/cml-jobs-running.png)

##### If you want to learn more about CML Jobs, you can click on the CML Job and four new tabs will open, giving you the ability to explore the Job DAG, look into execution, and even determine where in your code errors occurred if any

![title](images/cml-job-dependencies.png)

![title](images/cml-job-history.png)

![title](images/cml-job-troubleshoot.png)