Skip to content

Requests, Actions, Tasks & Jobs

swamikevala edited this page Jan 21, 2022 · 1 revision

Requests

A request is an instruction to perform an action. A request can be submitted by a user (user request) or it can be generated internally by the system (system request). In general, for any user request which specifies artifacts, files or volumes (for example ingest, restore, initialize respectively), the system will internally generate a system request for each item. Sometimes a request will reference a previous request - for example, the synchronous action cancel must reference a request to be cancelled.

Actions and Tasks

Actions are the operations that Dwara can perform. An action can be one of the following types: sync, task or complex (comprising of one or more flows - see below). The main actions are ingest, restore and process. Tasks are asynchronous operations, which are divided into two categories - storage tasks and processing tasks.

Synchronous Actions

Synchronous actions are list, scan, rename_staged, rename, hold, release, cancel, abort, delete, diagnostics.

Storage Tasks

Storage tasks are asynchronous actions that involve writing, verifying and restoring data to and from storage volumes. They also include administrative tasks such as initializing and finalizing physical storage volumes. Storage tasks are write, restore, verify, initialize, finalize, import, map_tapedrives, rewrite, and migrate.

Processing Tasks

Processing tasks are asynchronous actions that perform processing on certain types of files. Processing tasks are generally not part of Dwara's core framework - they are added as extensions to support the requirements of a particular implementation. However Dwara does have several core processing tasks to support its core operations. These are checksum-gen, checksum-verify, file-copy, file-delete, file-ignore. Processing tasks operate on logical units of one or more related files defined by a filetype. The keyword _all_ is a special filetype used to denote all files. Processing tasks can result in the creation of new artifacts. Such tasks must define an output filetype as well as an output artifactclass suffix. The artifactclass that should contain the new artifact is obtained by concatenating the input artifactclass name with the suffix. The task must generate file(s) that match the output filetype configuration, otherwise it will fail.

If the output artifactclass suffix is set to NULL then the processing task does not result in the creation of a new artifact. If the output artifactclass suffix is set to '' (empty string) then the output artifactclass is the same as the input artifactclass. In this case we also assume that the input and output artifacts are the same (so the sequence code extraction/assignment logic is skipped - see sequences)

A new processing task can be defined by registering a Java class that implements the IProcessingTask interface, and adding a corresponding entry in the processingtask table. Note that the processingtask table does not contain entries for core processing tasks.

Complex Actions

A complex action allows us to chain together multiple tasks, and is defined by one or more flows. They are ingest, process, restore_process. Flows are defined by flow elements, each of which specifies a task or (sub)flow, its dependencies, and some optional task configuration options. A flow cannot contain a flow element that references itself.

Jobs

A job is an instance of a task. Storage task jobs operate on a single artifact, file, or storage volume, while processing task jobs operate on all applicable files. When the system receives a request for an asynchronous action, jobs are generated and queued, only getting executed when the necessary system resources are available.

Status

Jobs and requests have a status to indicate their state at a point in time. The following job status' are defined:

status job
queued Job has not yet started and is in queue
in_progress Job is running
completed Job has successfully completed
completed_failures

Processing job: When at least one file has been successfully processed, and at least one file has failed. (Failures get added to the failure table). Job becomes a candidate for a rerun.

Volume import: When at least one artifact on the volume has been successfully imported, and at least one artifact has failed. (Import does not use jobs) 

Note this can also be used for synchronous actions which act on multiple items (e.g. staged_rename)

on_hold Job is on hold (like queued, but needs to be released before it will run)
cancelled queued or failed job has been cancelled
aborted Running job has been aborted
failed Job has failed
marked_completed Job has failed or completed_failures but has been manually marked completed so it does not get rerun. Example: If there are corrupt files that are not processable then we mark the file(s) "bad". Jobs with bad files have this status.
marked_failed Job has completed according to the system, but we have later found out that there was a problem with the job, so we manually set it as "marked_failed". In this case some fix is needed and the job needs to be re-run. Note that setting a job as "marked_failed" will automatically set all its dependent jobs to marked_failed.

Many statuses are used for both jobs and requests. A system request consists of multiple jobs, so the job status definitions are modified so that they have a similar meaning in both contexts.

For system requests the status is defined by checking if it has at least one job having a particular status - where the statuses are ordered in the following priority

  1. in_progress
  2. queued
  3. on_hold
  4. completed_failures
  5. failed
  6. marked_completed
  7. completed

Note: Does cancelled make sense anywhere here ?

Qu: Where should "marked_failed" come in this list ?

This means, to determine the status of a system request we first check if it has any in_progress jobs. If it does then the status is set to in_progress. If it does not then we check if it has any queued jobs. If it does then we set it to queued, and so on.

To determine the status of a user request we use the same logic (where user request is to system request, as system request is to job)

Marked Completed - Example Catdv db not supporting filenames with emoji characters