Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix technical bugs in resource-scope documentation #1784

Merged
merged 2 commits into from
Jul 28, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/executing/grouping.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ When executing locally, group definitions are ignored.
Groups can be defined along with the workflow definition via the ``group`` keyword, see :ref:`snakefiles-grouping`.
This way, queueing and execution time can be saved, in particular by attaching short-running downstream jobs to long running upstream jobs.

Snakemake will request resources for groups by summing across jobs that can be run in parallel, and taking the max of jobs run in series.
From Snakemake 7.11 on, Snakemake will request resources for groups by summing across jobs that can be run in parallel, and taking the max of jobs run in series.
The only exception is ``runtime``, where the max will be taken over parallel jobs, and the sum over series.
If resource contraints are provided (via ``--resources`` or ``--cores``), parallel job layers that exceed the constraints will be stacked in series.
For example, if 6 instances of ``somerule`` are being run, each instance requires ``1000MB`` of memory and ``30 min`` runtime, and only ``3000MB`` are available, Snakemake will request ``3000MB`` and ``60 min`` runtime, enough to run 3 instances of ``somerule``, then another 3 instances of ``somerule`` in series.
Expand Down
11 changes: 7 additions & 4 deletions docs/snakefiles/rules.rst
Original file line number Diff line number Diff line change
Expand Up @@ -401,10 +401,10 @@ All of these resources have specific meanings understood by snakemake and are tr
* The ``tmpdir`` resource automatically leads to setting the TMPDIR variable for shell commands, scripts, wrappers and notebooks.

* The ``runtime`` resource indicates how much time a job needs to run, and has a special meaning for cluster and cloud compute jobs.
See :ref:`the section below <resources_remote_execution>` for more information
See :ref:`the section below<resources_remote_execution>` for more information

* ``disk_mb`` and ``mem_mb`` are both locally scoped by default, a fact important for cluster and compute execution.
:ref:`See below <resources_remote_execution>` for more info.
:ref:`See below<resources_remote_execution>` for more info.
``mem_mb`` also has special meaning for some execution modes (e.g., when using :ref:`Kubernetes <kubernetes>`).

Because of these special meanings, the above names should always be used instead of possible synonyms (e.g. ``tmp``, ``mem``, ``time``, ``temp``, etc).
Expand Down Expand Up @@ -435,6 +435,7 @@ The CLI parameter takes priority.
Modification in the Snakefile uses the following syntax:

.. code-block:: python

resource_scopes:
gpus="local",
foo="local",
Expand All @@ -443,12 +444,14 @@ Modification in the Snakefile uses the following syntax:
Here, we set both ``gpus`` and ``foo`` as local resources, and we changed ``disk_mb`` from its default to be a ``global`` resource.
These options could be overriden at the command line using:

.. code-block:: bash
snakemake --set-resource-scopes gpus=global disk_mb=local
.. code-block:: console

$ snakemake --set-resource-scopes gpus=global disk_mb=local

Resources and Group Jobs
~~~~~~~~~~~~~~~~~~~~~~~~

New to Snakemake 7.11.
When submitting :ref:`group jobs <job_grouping>` to the cluster, Snakemake calculates how many resources to request by first determining which component jobs can be run in parallel, and which must be run in series.
For most resources, such as ``mem_mb`` or ``threads``, a sum will be taken across each parallel layer.
The layer requiring the most resource (i.e. ``max()``) will determine the final amount requested.
Expand Down