## Jobs API

This API provides the possibility to create a job of an existing python or spark program, execute it and access application logs.

## Scope

* Define a job configuration
* Create a new job
* Execute the job
* Download the logs
* Update job configuration

In [17]:
import hopsworks

## Connect to the cluster

In [18]:
# Connect to your cluster, to be used running inside Jupyter or jobs inside the cluster.
connection = hopsworks.connection()

Connected. Call `.close()` to terminate connection gracefully.


In [19]:
# Uncomment when connecting to the cluster from an external environment.
# connection = hopsworks.connection(project='my_project', host='my_instance', port=443, api_key_value='apikey')

## Get the project

In [20]:
# Get the project object, if used inside your hopsworks cluster it gets the current project
project = connection.get_project()

In [21]:
# Uncomment to get specific project
# project = connection.get_project('my_project')

## Upload SparkPI program to cluster

In [22]:
dataset_api = project.get_dataset_api()

In [23]:
# Download a file to work with
!wget https://repo.hops.works/dev/robin/spark-examples.jar

--2022-04-12 13:23:19--  https://repo.hops.works/dev/robin/spark-examples.jar
Resolving repo.hops.works (repo.hops.works)... 144.91.119.112
Connecting to repo.hops.works (repo.hops.works)|144.91.119.112|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1527170 (1.5M) [application/java-archive]
Saving to: ‘spark-examples.jar.3’


2022-04-12 13:23:19 (39.1 MB/s) - ‘spark-examples.jar.3’ saved [1527170/1527170]



In [24]:
uploaded_app_file = dataset_api.upload("spark-examples.jar", "Resources", overwrite=True)

HBox(children=(FloatProgress(value=0.0, description='Uploading', max=1527170.0, style=ProgressStyle(descriptio…




In [25]:
uploaded_app_file

'Resources/spark-examples.jar'

## Get job configuration for a Spark Job

In [26]:
jobs_api = project.get_jobs_api()

In [27]:
spark_config = jobs_api.get_configuration("SPARK")

In [28]:
spark_config

{'type': 'sparkJobConfiguration',
 'amQueue': 'default',
 'amMemory': 2048,
 'amVCores': 1,
 'spark.executor.instances': 1,
 'spark.executor.cores': 1,
 'spark.executor.memory': 4096,
 'spark.executor.gpus': 0,
 'spark.tensorflow.num.ps': 1,
 'spark.dynamicAllocation.enabled': True,
 'spark.dynamicAllocation.minExecutors': 1,
 'spark.dynamicAllocation.maxExecutors': 2,
 'spark.dynamicAllocation.initialExecutors': 1,
 'spark.blacklist.enabled': False}

In [29]:
## Override configuration properties
spark_config['appPath'] = uploaded_app_file
spark_config['mainClass'] = 'org.apache.spark.examples.SparkPi'

## Create a job

In [30]:
JOB_NAME="my_job"

In [31]:
jobs_api.exists(JOB_NAME)

False

In [32]:
# Create the job and get a reference to it
my_job = jobs_api.create_job(JOB_NAME, spark_config)

In [33]:
jobs_api.exists(JOB_NAME)

True

## Execute the job

In [34]:
# Execute the job and optionally pass arguments.
# If await_termination is set to True, block until execution is finished
my_execution = my_job.run(args="100", await_termination=True)

2022-04-12 13:23:21,240 INFO: Waiting for execution to finish. Current state: INITIALIZING. Final status: UNDEFINED
2022-04-12 13:23:27,467 INFO: Waiting for execution to finish. Current state: ACCEPTED. Final status: UNDEFINED
2022-04-12 13:23:46,149 INFO: Waiting for execution to finish. Current state: RUNNING. Final status: UNDEFINED
2022-04-12 13:24:08,049 INFO: Waiting for execution to finish. Current state: AGGREGATING_LOGS. Final status: SUCCEEDED
2022-04-12 13:24:11,123 INFO: Waiting for execution to finish. Current state: FINISHED. Final status: SUCCEEDED
2022-04-12 13:24:14,126 INFO: Waiting for log aggregation to finish.
2022-04-12 13:24:17,411 INFO: Execution finished successfully.


In [35]:
# True or False, indicates if execution ran successfully
my_execution.success 

True

## Download the logs for the execution

In [36]:
# Download logs and return file paths in where they exists on disk
out, err = my_execution.download_logs()

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1307.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=153101.0, style=ProgressStyle(descripti…




In [37]:
f_out = open(out, "r")
print(f_out.read())

Container: container_1649425494064_0008_01_000001 on hopsworks0.logicalclocks.com_9000_1649769846113
Log Type: prelaunch.out
Log Length: 100
Log Contents: 
Setting up env variables
Setting up job resources
Copying debugging information
Launching container

Log Type: stdout
Log Length: 33
Log Contents: 
Pi is roughly 3.1423403142340316

Log Type: stdout.txt
Log Length: 0
Container: container_1649425494064_0008_01_000003 on hopsworks0.logicalclocks.com_9000_1649769846113
Log Type: prelaunch.out
Log Length: 100
Log Contents: 
Setting up env variables
Setting up job resources
Copying debugging information
Launching container

Log Type: stdout
Log Length: 0
Log Type: stdout.txt
Log Length: 0
Container: container_1649425494064_0008_01_000002 on hopsworks0.logicalclocks.com_9000_1649769846113
Log Type: prelaunch.out
Log Length: 100
Log Contents: 
Setting up env variables
Setting up job resources
Copying debugging information
Launching container

Log Type: stdout
Log Length: 0
Log Type: stdout

In [38]:
f_err = open(err, "r")
print(f_err.read()) 

Container: container_1649425494064_0008_01_000001 on hopsworks0.logicalclocks.com_9000_1649769846113
Log Type: prelaunch.err
Log Length: 0
Container: container_1649425494064_0008_01_000001 on hopsworks0.logicalclocks.com_9000_1649769846113
Log Type: stderr
Log Length: 87392
Log Contents: 
2022-04-12 13:23:29,630 INFO demo_ml_meb10000,my_job,43,application_1649425494064_0008 SignalUtils: Registering signal handler for TERM
2022-04-12 13:23:29,632 INFO demo_ml_meb10000,my_job,43,application_1649425494064_0008 SignalUtils: Registering signal handler for HUP
2022-04-12 13:23:29,633 INFO demo_ml_meb10000,my_job,43,application_1649425494064_0008 SignalUtils: Registering signal handler for INT
2022-04-12 13:23:31,131 INFO demo_ml_meb10000,my_job,43,application_1649425494064_0008 SecurityManager: Changing view acls to: yarnapp,demo_ml_meb10000__meb10000
2022-04-12 13:23:31,131 INFO demo_ml_meb10000,my_job,43,application_1649425494064_0008 SecurityManager: Changing modify acls to: yarnapp,demo_

In [39]:
# Get all the executions for the job
my_job.get_executions()

[Execution('SUCCEEDED', 'FINISHED', '2022-04-12T13:23:21Z', '100')]

## Update the configuration

In [40]:
my_job.config

{'type': 'sparkJobConfiguration',
 'appName': 'my_job',
 'amQueue': 'default',
 'amMemory': 2048,
 'amVCores': 1,
 'jobType': 'SPARK',
 'appPath': 'hdfs:///Projects/demo_ml_meb10000/Resources/spark-examples.jar',
 'mainClass': 'org.apache.spark.examples.SparkPi',
 'spark.executor.instances': 1,
 'spark.executor.cores': 1,
 'spark.executor.memory': 4096,
 'spark.executor.gpus': 0,
 'spark.tensorflow.num.ps': 1,
 'spark.dynamicAllocation.enabled': True,
 'spark.dynamicAllocation.minExecutors': 1,
 'spark.dynamicAllocation.maxExecutors': 2,
 'spark.dynamicAllocation.initialExecutors': 1,
 'spark.blacklist.enabled': False}

In [41]:
config = my_job.config
config['amMemory'] = 2000 # Update the Spark driver memory to 2000MB
my_job.config = config

my_job = my_job.save() # Save in the backend

In [42]:
my_job.config

{'type': 'sparkJobConfiguration',
 'appName': 'my_job',
 'amQueue': 'default',
 'amMemory': 2000,
 'amVCores': 1,
 'jobType': 'SPARK',
 'appPath': 'hdfs:///Projects/demo_ml_meb10000/Resources/spark-examples.jar',
 'mainClass': 'org.apache.spark.examples.SparkPi',
 'spark.executor.instances': 1,
 'spark.executor.cores': 1,
 'spark.executor.memory': 4096,
 'spark.executor.gpus': 0,
 'spark.tensorflow.num.ps': 1,
 'spark.dynamicAllocation.enabled': True,
 'spark.dynamicAllocation.minExecutors': 1,
 'spark.dynamicAllocation.maxExecutors': 2,
 'spark.dynamicAllocation.initialExecutors': 1,
 'spark.blacklist.enabled': False}

In [43]:
my_job.delete()