Workers

Gerald Manipon edited this page May 18, 2018 · 8 revisions

Workers

What are HySDS jobs?

HySDS jobs are essentially celery tasks. More specifically, they are celery tasks that encapsulate the execution of some executable within a docker image. The celery task callable (hysds.job_worker.run_job) is responsible for setup, execution, and tear down of the job's work environment. Specifically, it ensures:

  • there is enough free space on the root work directory (threshold defaults to 10% free)
    • if there isn't, it cleans out old work directories until the threshold is met
  • the job has a unique work directory to execute in
  • job state is propagated to mozart
  • job metrics is propagated to metrics
  • pre-processing steps are executed
    • default built-in pre-processing step is hysds.utils.localize_urls which downloads input data
  • docker parameters such as volume mounts and UID/GID are set according to job specifications (job-spec)
  • executable is run via docker
  • post-processing steps are executed
    • default built-in post-processing step is hysds.utils.publish_datasets which searches for and publishes HySDS datasets generated by the executable

How do you define a HySDS job?

You define a HySDS job by defining a job-spec and a hysds-io. See Job and HySDS IO Specifications. For a step-by-step example, see Hello World.

What are HySDS Workers?

Workers are Celery-level workers that run tasks. Since jobs are tasks, they also run jobs within the context of a unique working directory.

Each job is invoked from a unique working directory on the worker node.

Worker Events

See http://celery.readthedocs.org/en/latest/userguide/monitoring.html#worker-events

worker-online

signature: worker-online(hostname,timestamp,freq,sw_ident,sw_ver,sw_sys)

The worker has connected to the broker and is online.

  • hostname: Hostname of the worker.
  • timestamp: Event timestamp.
  • freq: Heartbeat frequency in seconds (float).
  • sw_ident: Name of worker software (e.g. py-celery).
  • sw_ver: Software version (e.g. 2.2.0).
  • sw_sys: Operating System (e.g. Linux, Windows, Darwin).

worker-heartbeat

signature: worker-heartbeat(hostname,timestamp,freq,sw_ident,sw_ver,sw_sys,active,processed)

Sent every minute, if the worker has not sent a heartbeat in 2 minutes, it is considered to be offline.

  • hostname: Hostname of the worker.
  • timestamp: Event timestamp.
  • freq: Heartbeat frequency in seconds (float).
  • sw_ident: Name of worker software (e.g. py-celery).
  • sw_ver: Software version (e.g. 2.2.0).
  • sw_sys: Operating System (e.g. Linux, Windows, Darwin).
  • active: Number of currently executing tasks.
  • processed: Total number of tasks processed by this worker.

worker-offline

signature: worker-offline(hostname,timestamp,freq,sw_ident,sw_ver,sw_sys)

The worker has disconnected from the broker.

Celery Worker Naming Convention

The naming of the worker is important for parsing purposes to be displayed on mozart's faceted search.

Transport

Job events are shipped out to mozart via redis using with msgpack.

msgpack

It's fast, small, and has first class language support.http://msgpack.org/

PGE handling

Work dir scrubbers

POSIX signal handling for verdi worker

Verdi has python handlers for capturing any kill signal from celery worker. verdi then emits them as events to mozart via redis.

Supported POSIX signal handling and event emitting from verdi:

  • 1 SIGHUP: Hangup
  • 2 SIGINT: Terminal interrupt signal.
  • 3 SIGQUIT: Terminal quit signal.
  • 6 SIGABRT: Process abort signal
  • 9 SIGKILL: Kill (cannot be caught or ignored).
  • 15 SIGTERM: Termination signal.

Localize and Publish Data Products

Run in stand-alone test mode

Create the ./work directory and run the following command:

HYSDS_DATASETS_CFG=~/verdi/ops/hysds/configs/datasets/datasets.json HYSDS_WORKER_CFG=job_worker.json ~/verdi/ops/hysds/scripts/run_job.py test_job.json
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.