A Batsim simulation consists in two processes:
- Batsim itself, in charge of simulating what happens on the platform
- A Decision Process (or more simply scheduler), in charge of making decisions
The two processes communicate via a socket with the protocol explained in the present document. The protocol is synchronous and follows a simple request-reply pattern. Whenever an event which may require making decision occurs in Batsim in the simulation, the following steps occur:
- Batsim suspends the simulation
- Batsim sends a request to the scheduler (telling it what happened on the platform)
- Batsim waits for a reply from the scheduler
- Batsim receives the reply
- Batsim resumes the simulation, applying the decision which have been made
ZeroMQ is used in both processes (Batsim uses a ZMQ REQ socket, the scheduler a ZMQ REP one).
The behavior of this protocol depends on the configuration:
- If Redis is enabled, job metadata is stored into a Redis server and not sent through the protocol. In this case, the protocol is only used for synchronization purposes. More information about Redis conventions are described there.
- Batsim may or may not forward job profile information to the scheduler when jobs are submitted (see JOB_SUBMITTED documentation)
- Dynamic jobs submissions can be enabled or disabled. Many parameters of job submissions can be adjusted, please refer to the Dynamic submission of jobs documentation for more details.
It is a JSON object that looks like this:
{
"now": 1024.24,
"events": [
{
"timestamp": 1000,
"type": "EXECUTE_JOB",
"data": {
"job_id": "workload!job_1234",
"alloc": "1 2 4-8",
}
},
{
"timestamp": 1012,
"type": "EXECUTE_JOB",
"data": {
"job_id": "workload!job_1235",
"alloc": "12-100",
}
}
]
}
The now
field defines the current simulation time.
- If the message comes from Batsim, it means that the scheduler cannot make
decisions before
now
(time travel simulation is not supported at the moment) - If the message comes from the scheduler, it tells Batsim that the
scheduler finished making its decisions at timestamp
now
. It is used by Batsim to know when the scheduler will be available for making new decisions.
Constraints on the message format are defined here:
- the message timestamp
now
MUST be greater than or equal to every eventtimestamp
- events timestamps MUST be in ascending order: event[i].timestamp <= event[i+1].timestamp
- mandatory fields:
now
(type: float)events
: (type array (can be empty))timestamp
(type: float)type
(type: string as defined below)data
(type: dict (can be empty))
These events can be sent from Batsim to the scheduler, or in the opposite direction.
BATSIM <---> DECISION
The simplest message, stands either for: "nothing happened" if sent by
Batsim, or "do nothing" if sent by the scheduler. It means that the
events list is empty: "events": []
- data: N/A
- full message example:
{
"now": 1024.24,
"events": []
}
These events are sent by Batsim to the scheduler.
BATSIM ---> DECISION
Sent at the beginning of the simulation. Once it has been sent, and if redis is enabled, meta-information can be read from Redis.
Batsim configuration is sent through the config
object (in data
).
Any custom information can be added into the
Batsim configuration, which gives a generic way to give
metainformation from Batsim to any scheduler at runtime.
- data:
- nb_resources: the number of resources
- allow_time_sharing: whether time sharing is enabled or not
- config: the Batsim configuration
- resources_data: information about the resources
- id: unique resource number
- name: resource name
- state: resource state in {sleeping, idle, computing, switching_on, switching_off}
- properties: the properties specified in the SimGrid platform for the corresponding host
- example:
{
"timestamp": 0.0,
"type": "SIMULATION_BEGINS",
"data": {
"allow_time_sharing": false,
"nb_resources": 1,
"config": {},
"resources_data": [
{
"id": 0,
"name": "host0",
"state": "idle",
"properties": {}
}
]
}
}
Sent when Batsim thinks that the simulation is over. It means that all the jobs (either coming from Batsim workloads/workflows inputs, or dynamically submitted ones) have been submitted and executed (or rejected). The scheduler should answer a NOP to this message then close its socket and terminate.
- data: empty
- example:
{
"timestamp": 100.0,
"type": "SIMULATION_ENDS",
"data": {}
}
The content of this message depends on the Batsim configuration.
This event means that one job has been submitted within Batsim.
It is sent whenever a job coming from Batsim inputs (workloads and workflows)
has been submitted.
If dynamic job submissions are enabled (the configuration contains
{"job_submission": { "from_scheduler": {"enabled": true}}}
), this message is
is sent as a reply to a SUBMIT_JOB message if and only if
dynamic job submissions acknowledgements are enabled
({"job_submission": {"from_scheduler": {"acknowledge": true}}}
)
The job_id
field is always sent and contains a unique job identifier.
If redis is enabled ({"redis": {"enabled": true}}
),
job_id
is the only forwarded field.
Otherwise (if redis is disabled), a JSON description of the job is forwarded
in the job
field.
A JSON description of the job profile is sent if and only if
profiles forwarding is enabled
({"job_submission": {"forward_profiles": true}}
).
- data: a job id and optional information depending on the configuration
- example without redis and without forwarded profiles:
{
"timestamp": 10.0,
"type": "JOB_SUBMITTED",
"data": {
"job_id": "dyn!my_new_job",
"job": {
"profile": "delay_10s",
"res": 1,
"id": "my_new_job",
"walltime": 12.0
}
}
}
- example without redis and with forwarded profiles:
{
"timestamp": 10.0,
"type": "JOB_SUBMITTED",
"data": {
"job_id": "dyn!my_new_job",
"job": {
"profile": "delay_10s",
"res": 1,
"id": "my_new_job",
"walltime": 12.0
},
"profile":{
"type": "delay",
"delay": 10
}
}
}
- example with redis:
{
"timestamp": 10.0,
"type": "JOB_SUBMITTED",
"data": {"job_id": "w0!1"}
}
A job has completed its execution. It acknowledges that the actions coming from a previous EXECUTE_JOB message have been done (successfully or not, depending on whether the job completed without reaching timeout).
- data:
- job_id: the job unique identifier
- status: whether SUCCESS or TIMEOUT (DEPRECATED)
- job_state: the job state. Possible values: "NOT_SUBMITTED", "SUBMITTED", "RUNNING", "COMPLETED_SUCCESSFULLY", "COMPLETED_FAILED", "COMPLETED_WALLTIME_REACHED", "COMPLETED_KILLED", "REJECTED"
- kill_reason: the kill reason (if any)
- example:
{
"timestamp": 10.000000,
"type": "JOB_COMPLETED",
"data": {
"job_id": "2cf8ca!10",
"status": "TIMEOUT",
"job_state": "COMPLETED_KILLED",
"kill_reason": "Walltime reached"
}
}
Some jobs have been killed.
It acknowledges that the actions coming from a previous KILL_JOB
message have been done.
The job_ids
jobs correspond to those requested in the previous
KILL_JOB message)
The job_progress
map is also given for the all the jobs and tasks
inside the job that have been killed. Key is the job_id
and the value
contains a progress value that in ]0, 1[ with 0 for not started and 1 for
complete task and the profile name is also given for convenience. For
sequential job the progress map contains the 0-based index of the inner
task that was running at the time it was killed and the details of this
progress in the current_task
field. Note that sequential jobs can be
nested.
Please remark that this message does not necessarily means that all the jobs have been killed. It means that all the jobs have completed. Some of the jobs might have completed ordinarily before the kill. In this case, JOB_COMPLETED events corresponding to the aforementioned jobs should be received before the JOB_KILLED event.
- data: A list of job ids
- example:
{
"timestamp": 10.0,
"type": "JOB_KILLED",
"data": {
"job_ids": [
"w0!1",
"w0!2"
]
}
}
- data: A list of job ids + progress
- example:
{
"timestamp": 10.0,
"type": "JOB_KILLED",
"data": {
"job_ids": [
"w0!1",
"w0!2"
],
"job_progress": {
"w0!1": {
"profile": "my_simple_profile",
"progress": 0.52
},
"w0!2": {
"profile": "my_sequential_profile",
"current_task_index": 3,
"current_task": {
"profile": "my_simple_profile",
"progress": 0.52
}
},
"w0!3": {
"profile": "my_composed_profile",
"current_task_index": 2,
"current_task": {
"profile": "my_sequential_profile",
"current_task_index": 3,
"current_task": {
"profile": "my_simple_profile",
"progress": 0.52
}
}
}
}
}
}
The state of some resources has changed. It acknowledges that the actions coming from a previous SET_RESOURCE_STATE message have been done.
- data: an interval set of resource id and the new state
- example:
{
"timestamp": 10.0,
"type": "RESOURCE_STATE_CHANGED",
"data": {"resources": "1 2 3-5", "state": "42"}
}
This is a reply to a QUERY_REQUEST message. It depends on the
The message content depends on whether redis is enabled in the
Batsim configuration.
If {"redis": { "enabled": true }}
, the reply will
go in redis and only the key will be given. Otherwise, the response will be
put directly in the message.
- data: See QUERY_REQUEST documentation
- example:
{
"timestamp": 10.0,
"type": "QUERY_REPLY",
"data": {"redis_keys": "/my/key/path0" }
}
or
{
"timestamp": 10.0,
"type": "QUERY_REPLY",
"data": {"consumed_energy": "12500" }
}
This message is a response to the CALL_ME_LATER message.
- data: empty
- example:
{
"timestamp": 25.5,
"type": "REQUESTED_CALL",
"data": {}
}
These events are sent by the scheduler to Batsim.
BATSIM <--- DECISION
This is a query sent to Batsim to get information about the simulation state (or whatever you want to know...). The supported requests are:
-
"consumed_energy" with no argument that asks Batsim about the total consumed energy (from time 0 to now) in Joules. Works only in energy mode.
-
data: a dictionnary of requests.
-
example:
{
"timestamp": 10.0,
"type": "QUERY_REQUEST",
"data": {
"requests": {"consumed_energy": {}}
}
}
Rejects a job that has already been submitted. The rejected job will not appear into the final jobs trace.
- data: A job id
- example:
{
"timestamp": 10.0,
"type": "REJECT_JOB",
"data": { "job_id": "w12!45" }
}
Execute a job on a given set of resources. An optional mapping can be added to tell Batsim how to map executors to resources: where the executors will be placed inside the allocation (resource numbers are shifted to 0). It only works for SMPI for now.
The following example overrides the default round robin mapping to put the first two ranks (0 and 1) on the first allocated machine (0, which stands for resource id 2), and the last two ranks (2 and 3) on the second machine (1, which stands for resource id 3).
- data: A job id, an allocation and a mapping (optional)
- example:
{
"timestamp": 10.0,
"type": "EXECUTE_JOB",
"data": {
"job_id": "w12!45",
"alloc": "2-3",
"mapping": {"0": "0", "1": "0", "2": "1", "3": "1"}
}
}
Asks Batsim to call the scheduler later on, at a given timestamp.
- data: future timestamp float
- example:
{
"timestamp": 10.0,
"type": "CALL_ME_LATER",
"data": {"timestamp": 25.5}
}
Kills some jobs (almost instantaneously).
- data: A list of job ids
- example:
{
"timestamp": 10.0,
"type": "KILL_JOB",
"data": {"job_ids": ["w0!1", "w0!2"]}
}
Submits a job (from the scheduler). Job submissions from the scheduler must
be enabled in the configuration
({"job_submission": {"from_scheduler": {"enabled": true}}
).
The submission is acknowledged by default, but acknowledgements can be disabled
in the configuration
({"job_submission": {"from_scheduler": {"acknowledge": false}}}
).
-
data: A job id (job id duplication is forbidden), classical job and profile information (optional).
-
example with redis : the job description, and the profile description if it unknown to Batsim yet, must have been pushed into redis by the scheduler before sending this message
{
"timestamp": 10.0,
"type": "SUBMIT_JOB",
"data": {
"job_id": "w12!45",
}
}
- example without redis : the whole job description goes through the protocol.
{
"timestamp": 10.0,
"type": "SUBMIT_JOB",
"data": {
"job_id": "dyn!my_new_job",
"job":{
"profile": "delay_10s",
"res": 1,
"id": "my_new_job",
"walltime": 12.0
},
"profile":{
"type": "delay",
"delay": 10
}
}
}
Submits a profile (from the scheduler). Job submissions from the scheduler must
be enabled in the configuration
({"job_submission": {"from_scheduler": {"enabled": true}}
).
-
data: A workload name, profile name, and the data of the profile.
-
with redis : Instead of using this message, the profiles should be pushed to redis directly by the scheduler.
-
example without redis : the whole profile description goes through the protocol.
{
"timestamp": 10.0,
"type": "SUBMIT_PROFILE",
"data": {
"workload_name": "dyn_wl1",
"profile_name": "delay_10s",
"profile": {
"type": "delay",
"delay": 10
}
}
}
Sets some resources into a state.
- data: an interval set of resource id, and the new state
- example:
{
"timestamp": 10.0,
"type": "SET_RESOURCE_STATE",
"data": {"resources": "1 2 3-5", "state": "42"}
}
The scheduler notifies Batsim of something.
For example, the submission_finished
notifies that job submissions
from the scheduler are over, which allows Batsim to stop the simulation.
This message must be sent if "scheduler_submission": {"enabled": false}
is configured. See Configuration documentation for more
details.
If the scheduler realizes that it commited the mistake of notifying
submission_finished
prematurely, the continue_submission
notification
can be sent to make the scheduler able to submit dynamic jobs again.
- data: empty
- example:
{
"timestamp": 42.0,
"type": "NOTIFY",
"data": { "type": "submission_finished" }
}
Changes the state of a job, which may be helpful to implement schedulers with dynamic complex jobs.
{
"timestamp": 42.0,
"type": "CHANGE_JOB_STATE",
"data": {
"job_id": "w12!45",
"job_state": "COMPLETED_KILLED",
"kill_reason": "Sub-jobs were killed."
}
}
The way to do some operations with the protocol is shown in this section.
Depending on the configuration, job information might
either be transmitted through the protocol or Redis.
Jobs are in most cases given as Batsim inputs, which are submitted within Batsim (the scheduler knows about them via JOB_SUBMITTED events).
However, jobs can also be submitted from the scheduler throughout the simulation. For this purpose:
- dynamic job submissions must be enabled in the Batsim configuration
- the scheduler must tell Batsim when it has finished submitting dynamic jobs (via a NOTIFY event). Otherwise, Batsim will wait for new simulation events forever, causing either a SimGrid deadlock or an infinite loop at the end of the simulation.
- the scheduler must make sure that Batsim has enough information to avoid SimGrid deadlocks during the simulation. If at some simulation time all Batsim workloads/workflows inputs have been executed and nothing is happening on the platform, this might lead to a SimGrid deadlock. If the scheduler knows that it will submit a dynamic job in the future, it should ask Batsim to call it at this timestamp via a CALL_ME_LATER event.
The protocol behavior of dynamic submissions is customizable in the Batsim configuration:
- Batsim might or might not send acknowledgements when jobs have been submitted.
- Metainformation are sent via Redis if Redis is enabled, or directly via the protocol otherwise.
A simple scheduling algorithm using dynamic job submissions can be found in Batsched. This implementation should work whether Redis is enabled and whether dynamic job submissions are acknowledged.
The following two figures outline how submissions should be done (depending whether Redis is enabled).