Skip to content

Batch scheduler simulator: Focus on realism, facilitate comparison

License

Notifications You must be signed in to change notification settings

mesurajpandey/batsim

 
 

Repository files navigation

Batsim

Batsim is a Batch scheduler simulator.
A batch scheduler -- AKA Resources and Jobs Management System (RJMS) -- is a system that manages resources in large-scale computing centers, notably by scheduling and placing jobs, and by setting up energy policies.
Batsim is open source and distributed under LGPL-3.0 license. See copyright for more details.

Batsim overview figure

Batsim simulates a computing center behavior.
It is made such that any event-based scheduling algorithm can be plugged to it.
Thus, it allows to compare decision algorithms coming from production and academics worlds.

Quick links

  • Please read our contribution guidelines if you want to contribute to Batsim
  • The changelog summarizes information about the project evolution.
  • Tutorials shows how to use Batsim and how it works:
    • The usage tutorial explains how to execute a Batsim simulation, and how to setup a development docker environment
    • The time tutorial explains how the time is managed in a Batsim simulation, shows essential protocol communications and gives an overview of how Batsim works internally
  • The protocol documentation defines the protocol used between Batsim and the scheduling algorithms

Run batsim example

Important note: It is highly recommended to use Batsim with the provided container, as up-to-date packages (like boost) that may not be easily available in your distribution yet.

To test simply test batsim you can directly run it though docker. First run batsim in your container for a simple workload:

# launch a batsim container
docker run -ti --name batsim oarteam/batsim bash

# inside the container
cd /root/batsim
redis-server &
batsim -p platforms/small_platform.xml \
  -w workload_profiles/test_workload_profile.json

Then in an other terminal execute the scheduler:

# Run an other bash in the same container
docker exec -ti batsim bash

# inside the container
cd /root/batsim
python schedulers/pybatsim/launcher.py fillerSched

External References

Build and code status

master upstream_sg (recent SimGrid) codacy
master-badge upstream_sg-badge codacy-badge

Batsim uses Gitlab CI as its continuous integration system.
Build status of the different commits can be found there.
More information about our CI setup can be found there.

Development environment

If you need to change te code of batsim you can use the docker environment oarteam/batsim_ci and use the docker volumes to make your batsim version of the code inside the container.

# launch a batsim container
docker run -ti -v /home/myuser/mybatrepo:/root/batsim --name batsim_dev oarteam/batsim_ci bash

# inside the container
cd /root/batsim
rm -rf build
mkdir build
cd build
cmake ..

# Second step: run make
make -j $(nproc)
make install
make test

With this setting you can use your own development tools outside the container to hack the batsim code and use the container to only to build and test your your code.

Visualisation

Batsim output files can be visualised using external tools:

  • Evalys can be used to visualise Gantt chart from the Batsim job.csv files and SWF files
  • Vite for the Pajé traces

Tools

Also, some tools can be found in the tools directory:

  • scripts to do conversions between SWF and Batsim formats
  • scripts to setup experiments with Batsim (more details here)

Write your own scheduler (or adapt an existing one)

Schedulers must follow a text-based protocol to communicate with Batsim.
More details about the protocol can be found in the protocol description.

You may also base your work on existing Batsim-compatible schedulers:

Installation

Important note: It is highly recommended to use the method describe in the Development environment section.

Batsim uses Kameleon to build controlled environments. These environments allow us to generate Docker containers, which are used by our CI to test whether Batsim can be built correctly and whether some integration tests pass.

Thus, the most up-to-date information about how to build Batsim dependencies and Batsim itself can be found in our Kameleon recipes:

However, some information is also written below for the sake of simplicity, but please note it might be outdated.

Dependencies

Batsim dependencies are listed below:

  • SimGrid. dev version is recommended (203ec9f99 for example). To use SMPI jobs, use commit 587483ebe of mpoquet's fork. To use energy, please consider using the Batsim upstream_sg branch and SimGrid commit e96681fb8.
  • RapidJSON (1.02 or greater)
  • Boost 1.62 or greater (system, filesystem, regex, locale)
  • C++11 compiler
  • Redox (and its dependencies: hiredis and libev)

Building Batsim

Batsim can be built via CMake. An example script is given below:

# First step: generate a Makefile via CMake
mkdir build
cd build
cmake .. #-DCMAKE_INSTALL_PREFIX=/usr

# Second step: run make
make -j $(nproc)
sudo make install

Batsim Use Cases

Simulating with Batsim involves at least two processes:

  • Batsim itself
  • A decision process (or simply a scheduler)

This section shows Batsim command-line usage and some examples on how to run simple experiments with Batsim.

Batsim Usage

Batsim usage can be shown by calling the Batsim program with the --help option. It should display something like this:

batsim --help
A tool to simulate (via SimGrid) the behaviour of scheduling algorithms.

Usage:
  batsim -p <platform_file> [-w <workload_file>...]
                            [-W <workflow_file>...]
                            [--WS (<cut_workflow_file> <start_time>)...]
                            [options]
  batsim --help

Input options:
  -p --platform <platform_file>     The SimGrid platform to simulate.
  -w --workload <workload_file>     The workload JSON files to simulate.
  -W --workflow <workflow_file>     The workflow XML files to simulate.
  --WS --workflow-start (<cut_workflow_file> <start_time>)... The workflow XML
                                    files to simulate, with the time at which
                                    they should be started.

Most common options:
  -m, --master-host <name>          The name of the host in <platform_file>
                                    which will be used as the RJMS management
                                    host (thus NOT used to compute jobs)
                                    [default: master_host].
  -E --energy                       Enables the SimGrid energy plugin and
                                    outputs energy-related files.

Execution context options:
  -s, --socket <socket_file>        The Unix Domain Socket filename
                                    [default: /tmp/bat_socket].
  --redis-hostname <redis_host>     The Redis server hostname
                                    [default: 127.0.0.1]
  --redis-port <redis_port>         The Redis server port [default: 6379].

Output options:
  -e, --export <prefix>             The export filename prefix used to generate
                                    simulation output [default: out].
  --enable-sg-process-tracing       Enables SimGrid process tracing
  --disable-schedule-tracing        Disables the Pajé schedule outputting.
  --disable-machine-state-tracing   Disables the machine state outputting.


Platform size limit options:
  --mmax <nb>                       Limits the number of machines to <nb>.
                                    0 means no limit [default: 0].
  --mmax-workload                   If set, limits the number of machines to
                                    the 'nb_res' field of the input workloads.
                                    If several workloads are used, the maximum
                                    of these fields is kept.
Verbosity options:
  -v, --verbosity <verbosity_level> Sets the Batsim verbosity level. Available
                                    values: quiet, network-only, information,
                                    debug [default: information].
  -q, --quiet                       Shortcut for --verbosity quiet

Workflow options:
  --workflow-jobs-limit <job_limit> Limits the number of possible concurrent
                                    jobs for workflows. 0 means no limit
                                    [default: 0].
  --ignore-beyond-last-workflow     Ignores workload jobs that occur after all
                                    workflows have completed.

Other options:
  --allow-time-sharing              Allows time sharing: One resource may
                                    compute several jobs at the same time.
  --batexec                         If set, the jobs in the workloads are
                                    computed one by one, one after the other,
                                    without scheduler nor Redis.
  --pfs-host <pfs_host>             The name of the host, in <platform_file>,
                                    which will be the parallel filesystem target
                                    as data sink/source [default: pfs_host].
  -h --help                         Shows this help.
  --version                         Shows Batsim version.

Executing complete experiments

If you want to run more complex scenarios, giving a look at our experiment tools may save you some time!

About

Batch scheduler simulator: Focus on realism, facilitate comparison

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages

  • C++ 89.4%
  • Python 6.2%
  • CMake 3.3%
  • Shell 0.5%
  • Ruby 0.3%
  • Nix 0.2%
  • Vim Script 0.1%