Skip to content

Commit

Permalink
docs: fix #2698 (#2714)
Browse files Browse the repository at this point in the history
### Description

"Just" a documentation update:

- deleted obsolete `--cluster` references in the tutorial section
- rephrased MPI section

References to SLURM Executor Plugin and snakemake-profiles in both
changed paragraphs.

@johanneskoester today, I am not particularly good with words, but I
hope its ok-ish.

---------

Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de>
  • Loading branch information
cmeesters and johanneskoester committed Feb 24, 2024
1 parent 58a6a13 commit 508080b
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 98 deletions.
9 changes: 6 additions & 3 deletions docs/snakefiles/rules.rst
Expand Up @@ -2694,7 +2694,10 @@ Template rendering rules are always executed locally, without submission to clus
MPI support
-----------

Highly parallel programs may use the MPI (:ref: message passing interface<https://en.wikipedia.org/wiki/Message_Passing_Interface>) to enable a program to span work across an individual compute node's boundary.
Highly parallel programs may use the MPI (:ref: [message passing interface](https://en.wikipedia.org/wiki/Message_Passing_Interface)) to enable a program to span work across an individual compute node's boundary.
To actually use an HPC cluster with Snakemake, an [executor plugin is provided for the SLURM batch system](https://github.com/snakemake/snakemake-executor-plugin-slurm). You can find its documentation [here](https://github.com/snakemake/snakemake-executor-plugin-slurm/blob/main/docs/further.md).
Users of different batch systems are encouraged to [provide further plugins](https://snakemake.github.io/snakemake-plugin-catalog/#contributing) and/or share their Snakemake configuration via the [Snakemake profiles project](https://github.com/Snakemake-Profiles) project.

The command to run the MPI program (in below example we assume there exists a program ``calc-pi-mpi``) has to be specified in the ``mpi``-resource, e.g.:

.. code-block:: python
Expand Down Expand Up @@ -2725,14 +2728,14 @@ Thereby, additional parameters may be passed to the MPI-starter, e.g.:
shell:
"{resources.mpi} -n {resources.tasks} calc-pi-mpi 10 > {output} 2> {log}"
As any other resource, the `mpi`-resource can be overwritten via the command line e.g. in order to adapt to a specific platform (see :ref:`snakefiles-resources`):
As any other resource, the `mpi`-resource can be overwritten via the command line e.g. in order to adapt to a specific platform (see :ref:`snakefiles-resources`). For instance,
users of the SLURM executor plugin can use `srun` as the MPI-starter:

.. code-block:: console
$ snakemake --set-resources calc_pi:mpi="srun --hint nomultithread" ...
Note that in case of distributed, remote execution (cluster, cloud), MPI support might not be available.
So far, explicit MPI support is implemented in the `slurm plugin <https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm.html>`_.

.. _snakefiles_continuous_input:

Expand Down
97 changes: 2 additions & 95 deletions docs/tutorial/additional_features.rst
Expand Up @@ -197,103 +197,10 @@ will automatically download the requested version of the wrapper.
Furthermore, in combination with ``--software-deployment-method conda`` (see :ref:`tutorial-conda`),
the required software will be automatically deployed before execution.

Cluster execution
Cluster or cloud execution
:::::::::::::::::

By default, Snakemake executes jobs on the local machine it is invoked on.
Alternatively, it can execute jobs in **distributed environments, e.g., compute clusters or batch systems**.
If the nodes share a common file system, Snakemake supports three alternative execution modes.

In cluster environments, compute jobs are usually submitted as shell scripts via commands like ``qsub``.
Snakemake provides a **generic mode** to execute on such clusters.
By invoking Snakemake with

.. code:: console
$ snakemake --cluster qsub --jobs 100
each job will be compiled into a shell script that is submitted with the given command (here ``qsub``).
The ``--jobs`` flag limits the number of concurrently submitted jobs to 100.
This basic mode assumes that the submission command returns immediately after submitting the job.
Some clusters allow to run the submission command in **synchronous mode**, such that it waits until the job has been executed.
In such cases, we can invoke e.g.

.. code:: console
$ snakemake --cluster-sync "qsub -sync yes" --jobs 100
The specified submission command can also be **decorated with additional parameters taken from the submitted job**.
For example, the number of used threads can be accessed in braces similarly to the formatting of shell commands, e.g.

.. code:: console
$ snakemake --cluster "qsub -pe threaded {threads}" --jobs 100
Alternatively, Snakemake can use the Distributed Resource Management Application API (DRMAA_).
This API provides a common interface to control various resource management systems.
The **DRMAA support** can be activated by invoking Snakemake as follows:

.. code:: console
$ snakemake --drmaa --jobs 100
If available, **DRMAA is preferable over the generic cluster modes** because it provides better control and error handling.
To support additional cluster specific parametrization, a Snakefile can be complemented by a workflow specific profile (see :ref:`profiles`).

Using --cluster-status
::::::::::::::::::::::

Sometimes you need specific detection to determine if a cluster job completed successfully, failed or is still running.
Error detection with ``--cluster`` can be improved for edge cases such as timeouts and jobs exceeding memory that are silently terminated by
the queueing system.
This can be achieved with the ``--cluster-status`` option. The value of this option should be a executable script which takes a job id as the first argument and prints to stdout only one of [running|success|failed]. Importantly, the job id snakemake passes on is captured from the stdout of the cluster submit tool. This string will often include more than the job id, but snakemake does not modify this string and will pass this string to the status script unchanged. In the situation where snakemake has received more than the job id these are 3 potential solutions to consider: parse the string received by the script and extract the job id within the script, wrap the submission tool to intercept its stdout and return just the job code, or ideally, the cluster may offer an option to only return the job id upon submission and you can instruct snakemake to use that option. For sge this would look like ``snakemake --cluster "qsub -terse"``.

The following (simplified) script detects the job status on a given SLURM cluster (>= 14.03.0rc1 is required for ``--parsable``).

.. code:: python
#!/usr/bin/env python
import subprocess
import sys
jobid = sys.argv[1]
output = str(subprocess.check_output("sacct -j %s --format State --noheader | head -1 | awk '{print $1}'" % jobid, shell=True).strip())
running_status=["PENDING", "CONFIGURING", "COMPLETING", "RUNNING", "SUSPENDED"]
if "COMPLETED" in output:
print("success")
elif any(r in output for r in running_status):
print("running")
else:
print("failed")
To use this script call snakemake similar to below, where ``status.py`` is the script above.

.. code:: console
$ snakemake all --jobs 100 --cluster "sbatch --cpus-per-task=1 --parsable" --cluster-status ./status.py
Using --cluster-cancel
::::::::::::::::::::::

When snakemake is terminated by pressing ``Ctrl-C``, it will cancel all currently running node when using ``--drmaa``.
You can get the same behaviour with ``--cluster`` by adding ``--cluster-cancel`` and passing a command to use for canceling jobs by their jobid (e.g., ``scancel`` for SLURM or ``qdel`` for SGE).
Most job schedulers can be passed multiple jobids and you can use ``--cluster-cancel-nargs`` to limit the number of arguments (default is 1000 which is reasonable for most schedulers).

Using --cluster-sidecar
:::::::::::::::::::::::

In certain situations, it is necessary to not perform calls to cluster commands directly and instead have a "sidecar" process, e.g., providing a REST API.
One example is when using SLURM where regular calls to ``scontrol show job JOBID`` or ``sacct -j JOBID`` puts a high load on the controller.
Rather, it is better to use the ``squeue`` command with the ``-i/--iterate`` option.

When using ``--cluster``, you can use ``--cluster-sidecar`` to pass in a command that starts a sidecar server.
The command should print one line to stdout and then block and accept connections.
The line will subsequently be available in the calls to ``--cluster``, ``--cluster-status``, and ``--cluster-cancel`` in the environment variable ``SNAKEMAKE_CLUSTER_SIDECAR_VARS``.
In the case of a REST server, you can use this to return the port that the server is listening on and credentials.
When the Snakemake process terminates, the sidecar process will be terminated as well.
Executing jobs on a cluster or in the cloud is supported by so-called executor plugins, which are distributed and documented via the [Snakemake plugin catalog](https://snakemake.github.io/snakemake-plugin-catalog/).

Constraining wildcards
::::::::::::::::::::::
Expand Down

0 comments on commit 508080b

Please sign in to comment.