# Invoking Jobs: Examples
This notebook provides an example of invoking Jobs that are defined in the project. 

More information about the API (**Watson Data API**) that's used in this notebook is available in product documentation: https://cloud.ibm.com/apidocs/watson-data-api-cpd  It is also possible to invoke jobs using *cpdctl* API, which we demonstrate in another notebook. 

APIs are typically used for *automation* and *orchestration*. We implemented code in a notebook that's a part of a Watson Studio project for ease of demonstration. Sample Python code can also be saved as a Python script and executed from an external environment (such as a CI/CD platform). 

Since we use the terms automation and orchestration, let's define them. **Automation** is invoking the process programatically without human interaction, usually based on a schedule or a trigger. **Orchestration** is combining multiple steps into a single process.  

Any type of asset that can be configured to run as a *Job* in a *Watson Studio project* can be used with this API. Current version of Cloud Pak for Data supports jobs for:
- Notebooks
- Python scripts
- R scripts
- Modeler flows
- Refinery flows

There are several use cases for automation and orchestration of jobs. Here are a few examples:
- Automate and/or orchestrate data preparation (implemented in scripts, flows, or notebooks) based on a schedule or an external trigger
- Automate and/or orchestrate model retraining (implemented in scripts, flows, or notebooks) based on a schedule or an external trigger
- Automate and/or orchestrate testing and deployment of data science assets into *Deployment Spaces*. 

In this notebook we will show a simple example which you can expand to fit your use cases. 

*Note: This notebook has been written and tested in Cloud Pak for Data Hybrid Cloud.*

## Step 1: Manually create a Job
While it's possible to create a job with an API, we think that in most scenarios jobs will be created and tested manually, and then used with automation/orchestration.

Create a Job for any asset that you want to test. For example, the *Predict_Customer_Churn* notebook that's included in this project. This notebook creates a model and saves it in the project,  When you run the notebook as a batch job, it will perform the same steps as in interactive mode - it will build and save a model.

Complete the following steps:

1. Open the notebook in *Edit* mode and save a version (look for *Versions* icon in the rigth top toolbar). Jobs require versioning of notebooks and flows.
2. Create a job
3. Test the job. In addition to the successful run, you should see a model created under the Models section of the project. 

## Step 2: Invoke the Job
To construct the API call, we will need to get the following information:
1. *Authorizaton token*: this token is required for all calls to Cloud Pak for Data API
2. *Project id*: needed as a paramter for the job invocation REST request
3. *Asset id*: needed as a parameter for the job invocation REST request

In [None]:
# If you're running in a Watson Studio project, the token is available as a local variable
#token = os.environ['USER_ACCESS_TOKEN']

#In this notebook we will demonstrate retrieving a token via API, which will be required for code running outside of Cloud Pak for Data

In [None]:
# Define variables that need to be changed or reused

# TO DO: change to the hostname (and port, if defined) of your cluster

# If using a market cluster in North America (in TEC), the value should be 'https://ibm-nginx-svc.cpdmkt.svc' (this value is the same for ALL clusters)
# For all other clusters, use the CPD URL that end with .oi, for example, 'https://cpdmkt-cpd-cpdmkt.apps.cpd.12-181-164-84.nip.io'
cpd_hostname = "***"

# TO DO: change to userid and password that exists in the CPD cluster. These credentials will be used to generate a token
username = "***"
password = "***"

In [None]:
import requests
import json

headers = {
    'Content-Type': 'application/json',
}

data = '{"username":\"' + username + '\","password":\"' + password + '\"}\''

# Construct the request URL
requestURL = cpd_hostname + "/icp4d-api/v1/authorize"

response = requests.post(requestURL, headers=headers, data=data, verify=False)

responseContent = response.content
token = json.loads(responseContent)['token']

# Print token just for a demo - remove in production
print(token)

In [None]:
#Next, we will get the project id. We can use the watson-studio-lib library to perform this task. 

# Import the lib
from ibm_watson_studio_lib import access_project_or_space
wslib = access_project_or_space()

# Get project id
projectID = wslib.here.get_ID()
print(projectID)

In [None]:
# This funciton is useful for looking up the value for the "Job" asset type, which we will use in the next cell
wslib.assets.list_asset_types()

In [None]:
# Get the Job id
wslib.assets.list_assets("job")

In [None]:
# Manually look up the the asset_id for the Notebook Job that you created and save it in a variable. It will be used to construct REST request URL. 
# Make sure to get the ID for the Notebook Job, not Notebook Job Run. 
jobID = "47181527-1052-4b5e-9a1c-edc70fc335ed"

In [None]:
headers = {
     'Authorization': 'Bearer ' + token,
     'accept': 'application/json',
     'Content-Type': 'application/json'
}

# This JSON format will work even if the Job doesn't have parameters (like the sample notebook Job we configured in Step 1)

dataDict = {
   "job_run": {
        "configuration": {
            "env_variables": [
                "variable1=test1",
                "variable2=test2"
            ]
        }
    }
}

data = json.dumps(dataDict)
print(headers)
print(data)

In [None]:
#Construct the URL for invoking the job. We are using this REST endpoint: https://cloud.ibm.com/apidocs/watson-data-api-cpd#job-runs-create
url =  cpd_hostname + "/v2/jobs/" + jobID + "/runs?project_id=" + projectID
print(url)

In [None]:
response = requests.post(url, headers=headers, data=data, verify=False)

responseContent = response.content
print(responseContent)

In [None]:
# If we want to check the job status, we need to get the run ID, which is called asset_id
runID = json.loads(responseContent)['metadata']['asset_id']
print(runID)

In [None]:
url = cpd_hostname + "/v2/jobs/" + jobID + "/runs/" + runID + "?project_id=" + projectID
print(url)

In [None]:
response = requests.get(url, headers=headers, verify=False)
responseContent = response.content
print(responseContent)

In [None]:
# Job Status is reported in variable "state"
jobStatus = json.loads(responseContent)['entity']['job_run']['state']
print(jobStatus)

<span style="color:red">Important Note: Check the Jobs tab of your project. You should now see a running job</span>

In [None]:
# Status look up can also be implemented in a loop. This is useful when you need to invoke a 2nd job after the completion of the first one
import time

while jobStatus == "Starting" or jobStatus == "Running":
  response = requests.get(url, headers=headers, verify=False)
  responseContent = response.content
  jobStatus = json.loads(responseContent)['entity']['job_run']['state']
  print(jobStatus)
# Wait for 30 seconds before checking status again
  time.sleep(30)

In [None]:
# Here you can add the call to the 2nd step in your orchestration workflow

**Written by: Elena Lowery, April 2022**