Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions docs/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -397,3 +397,49 @@ Another way, which is quite useful if you want to generate lots of different tes

Combining parameterized tests and test class hierarchies can offer you a very flexible way for generating multiple related tests at once keeping at the same time the maintenance cost low.
We use this technique extensively in our tests.


Flexible Regression Tests
-------------------------

.. versionadded:: 2.15

ReFrame can automatically set the number of tasks of a particular test, if its :attr:`num_tasks <reframe.core.pipeline.RegressionTest.num_tasks>` attribute is set to ``0``.
In ReFrame's terminology, such tests are called `flexible`.
By default, ReFrame will spawn such a test on all the idle nodes of the current system partition, but this behavior can be adjusted from the command-line.
Flexible tests are very useful for diagnostics tests, e.g., tests for checking the health of a whole set nodes.
In this example, we demonstrate this feature through a simple test that runs ``hostname``.
The test will verify that all the nodes print the expected host name:

.. literalinclude:: ../tutorial/advanced/advanced_example9.py

The first thing to notice in this test is that :attr:`num_tasks <reframe.core.pipeline.RegressionTest.num_tasks>` is set to ``0``.
This is a requirement for flexible tests:

.. literalinclude:: ../tutorial/advanced/advanced_example9.py
:lines: 13
:dedent: 8

The sanity function of this test simply counts the host names and verifies that they are as many as expected:

.. literalinclude:: ../tutorial/advanced/advanced_example9.py
:lines: 15-18
:dedent: 8

Notice, however, that the sanity check does not use :attr:`num_tasks` for verification, but rather a different, custom attribute, the ``num_tasks_assigned``.
This happens for two reasons:

a. At the time the sanity check expression is created, :attr:`num_tasks` is ``0``.
So the actual number of tasks assigned must be a deferred expression as well.
b. When ReFrame will determine and set the number of tasks of the test, it will not set the :attr:`num_tasks` attribute of the :class:`RegressionTest`.
It will only set the corresponding attribute of the associated job instance.

Here is how the new deferred attribute is defined:

.. literalinclude:: ../tutorial/advanced/advanced_example9.py
:lines: 22-25
:dedent: 4


The behavior of the flexible task allocation is controlled by the ``--flex-alloc-tasks`` command line option.
See the corresponding `section <running.html#controlling-the-flexible-task-allocation>`__ for more information.
34 changes: 33 additions & 1 deletion docs/running.rst
Original file line number Diff line number Diff line change
Expand Up @@ -460,11 +460,12 @@ They are summarized below:
In this example, Slurm's policy is that later definitions of options override previous ones.
So, in this case, way you would override the standard output for all the submitted jobs!

* ``--flex-alloc-tasks {all|idle|NUM}``: Automatically determine the number of tasks allocated for each test.
* ``--force-local``: Force the local execution of the selected tests.
No jobs will be submitted.
* ``--skip-sanity-check``: Skip sanity checking phase.
* ``--skip-performance-check``: Skip performance verification phase.
* ``--strict``: Force strict performance checking. Some tests may set their :attr:`strict_check <reframe.core.pipeline.RegressionTest.strick_check>` attribute to :class:`False` (see `"Reference Guide" <reference.html>`__) in order to just let their performance recorded but not yield an error.
* ``--strict``: Force strict performance checking. Some tests may set their :attr:`strict_check <reframe.core.pipeline.RegressionTest.strick_check>` attribute to :class:`False` (see `"Reference Guide" <running.html#controlling-the-execution-of-regression-tests>`__) in order to just let their performance recorded but not yield an error.
This option overrides this behavior and forces all tests to be strict.
* ``--skip-system-check``: Skips the system check and run the selected tests even if they do not support the current system.
This option is sometimes useful when you need to quickly verify if a regression test supports a new system.
Expand Down Expand Up @@ -998,3 +999,34 @@ If you now try to run a test that loads the module `cudatoolkit`, the following
* Failing phase: setup
* Reason: caught framework exception: module cyclic dependency: cudatoolkit->foo->bar->foobar->cudatoolkit
------------------------------------------------------------------------------

Controlling the Flexible Task Allocation
----------------------------------------

.. versionadded:: 2.15

ReFrame can automatically set the number of tasks of a particular test, if its :attr:`num_tasks <reframe.core.pipeline.RegressionTest.num_tasks>` attribute is set to ``0``.
By default, ReFrame will spawn such a test on all the idle nodes of the current system partition.
This behavior can be adjusted using the ``--flex-alloc-tasks`` command line option.
This option accepts three values:

1. ``idle``: (default) In this case, ReFrame will set the number of tasks to the number of idle nodes of the current logical partition multiplied by the :attr:`num_tasks_per_node <reframe.core.pipeline.RegressionTest.num_tasks_per_node>` attribute of the particular test.
2. ``all``: In this case, ReFrame will set the number of tasks to the number of all the nodes of the current logical partition multiplied by the :attr:`num_tasks_per_node <reframe.core.pipeline.RegressionTest.num_tasks_per_node>` attribute of the particular test.

3. Any positive integer: In this case, ReFrame will set the number of tasks to the given value.

The flexible allocation of number of tasks takes into account any additional logical constraint imposed by the command line options affecting the job allocation, such as ``--partition``, ``--reservation``, ``--nodelist``, ``--exclude-nodes`` and ``--job-option`` (if the scheduler option passed to the latter imposes a restriction).
Notice that ReFrame will issue an error if the resulting number of nodes is zero.

For example, using the following options would run a flexible test on all the nodes of reservation ``foo`` except the nodes ``n0[1-5]``:

.. code-block:: bash

--flex-alloc-tasks=all --reservation=foo --exclude-nodes=n0[1-5]


.. note::
Flexible task allocation is supported only for the Slurm scheduler backend.

.. warning::
Test cases resulting from flexible ReFrame tests may not be run using the asynchronous execution policy, because the nodes satisfying the required criteria will be allocated for the first test case, causing all subsequent ones to fail.
20 changes: 9 additions & 11 deletions reframe/core/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -255,22 +255,20 @@ class RegressionTest:

#: Number of tasks required by this test.
#:
#: If the number of tasks is set to ``0``, ReFrame will try to use all
#: the available nodes of a reservation. A reservation *must* be specified
#: through the `--reservation` command-line option, otherwise the
#: regression test will fail during submission. ReFrame will try to run the
#: test on all the nodes of the reservation that satisfy the selection
#: criteria of the current
#: `virtual partition <configure.html#partition-configuration>`__
#: (i.e., constraints and/or partitions).
#: If the number of tasks is set to ``0``, ReFrame will try to flexibly
#: allocate the number of tasks, based on the command line option
#: ``--flex-alloc-tasks``.
#:
#: :type: integral
#: :default: ``1``
#:
#: .. note::
#: .. versionchanged:: 2.9
#: Added support for running the test using all the nodes of the
#: specified reservation if the number of tasks is set to ``0``.
#: .. versionchanged:: 2.15
#: Added support for flexible allocation of the number of tasks
#: according to the ``--flex-alloc-tasks`` command line option
#: (see `Flexible task allocation
#: <running.html#flexible-task-allocation>`__)
#: if the number of tasks is set to ``0``.
num_tasks = fields.TypedField('num_tasks', int)

#: Number of tasks per node required by this test.
Expand Down
8 changes: 8 additions & 0 deletions reframe/core/schedulers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,14 @@ def workdir(self):

@property
def num_tasks(self):
"""The number of tasks assigned to this job.

This attribute is useful in a flexible regression test for determining
the actual number of tasks that ReFrame assigned to the test.

For more information on flexible task allocation, please refer to the
`tutorial <advanced.html#flexible-regression-tests>`__.
"""
return self._num_tasks

@property
Expand Down
25 changes: 25 additions & 0 deletions tutorial/advanced/advanced_example9.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import reframe as rfm
import reframe.utility.sanity as sn


@rfm.simple_test
class HostnameCheck(rfm.RunOnlyRegressionTest):
def __init__(self):
super().__init__()
self.valid_systems = ['daint:gpu', 'daint:mc']
self.valid_prog_environs = ['PrgEnv-cray']
self.executable = 'hostname'
self.sourcesdir = None
self.num_tasks = 0
self.num_tasks_per_node = 1
self.sanity_patterns = sn.assert_eq(
self.num_tasks_assigned,
sn.count(sn.findall(r'nid\d+', self.stdout))
)
self.maintainers = ['you-can-type-your-email-here']
self.tags = {'tutorial'}

@property
@sn.sanity_function
def num_tasks_assigned(self):
return self.job.num_tasks