Skip to content

Bigjob Architecture

melrom edited this page Sep 12, 2012 · 12 revisions

BigJob Architecture

SAGA BigJob comprises of three components: (i) the BigJob Manager(bigjob.bigjob_manager) that provides the pilot job abstraction and manages the orchestration and scheduling of BigJobs (which in turn allows the management of both bigjob objects and subjobs) (ii) the BigJob-Agent(bigjob.bigjob_agent) that represents the pilot job and thus, the application-level resource manager running on the respective resource, and (iii) the advert service that is used for communication between the BigJob Manager and Agent.

Before running regular jobs (so-called sub-jobs), an application must initialize a bigjob object. The BigJob Manager then queues a pilot job, which actually runs a BigJob Agent on the respective resource. For this agent a specified number of resources is requested. Subsequently, sub-jobs can be submitted through the BigJob Manager using the jobID of the BigJob as reference. The BigJob Manager ensures that the subjobs are launched onto the correct resource based upon the specified jobID using the right number of processes. Communication between the BigJob Agent and BigJob Manager is carried out using the SAGA advert service, a central key/value store. For each new job, an advert entry is created by the BigJob Manager. The agent periodically polls for new jobs. If a new job is found and resources are available, the job is dispatched, otherwise it is queued.

Further Reading

Below are a few publications which depict the capabilities of BigJob on computing infrastructures and explain the BigJob architecture in more depth.