From 7693c2bf2b69ea343ff4cd2187293c2812a8021f Mon Sep 17 00:00:00 2001 From: Theofilos Manitaras Date: Wed, 7 Nov 2018 12:04:51 +0100 Subject: [PATCH 1/3] Document the '--flex-alloc-tasks' feature * Add a tutorial example. --- docs/advanced.rst | 26 ++++++++++++++++++++++++++ docs/running.rst | 20 +++++++++++++++++++- reframe/core/pipeline.py | 20 +++++++++----------- tutorial/advanced/advanced_example9.py | 25 +++++++++++++++++++++++++ 4 files changed, 79 insertions(+), 12 deletions(-) create mode 100644 tutorial/advanced/advanced_example9.py diff --git a/docs/advanced.rst b/docs/advanced.rst index f49b95ead8..7703e2c866 100644 --- a/docs/advanced.rst +++ b/docs/advanced.rst @@ -397,3 +397,29 @@ Another way, which is quite useful if you want to generate lots of different tes Combining parameterized tests and test class hierarchies can offer you a very flexible way for generating multiple related tests at once keeping at the same time the maintenance cost low. We use this technique extensively in our tests. + +Flexible allocation of number of tasks +-------------------------------------- + +.. versionadded:: 2.15 + +ReFrame can flexibly allocate the number of tasks of a test based on a number of criteria. +The following regression test is used to demonstrate this feature: + +.. literalinclude:: ../tutorial/advanced/advanced_example9.py + +In order to instruct ReFrame to perform a flexible allocation of the number of tasks :attr:`num_tasks ` has to be set to ``0``: + +.. literalinclude:: ../tutorial/advanced/advanced_example9.py + :lines: 13 + :dedent: 8 + +In the above regression test the ``hostname`` command is executed once on each allocated node, since ``self.num_tasks_per_node`` equals to 1. +Thus, the output will consist of separate lines one for each node containing its name. +ReFrame is going to calculate the number of tasks based on the ``--flex-alloc-tasks`` command line option (see `Flexible task allocation `__). +To check that everything worked as expected, we test that the number of nodenames included in the output equals the number of allocated tasks. +Since the number of nodes is decided by ReFrame based on the given criteria as well as the status of the system on which it is running, a new sanity function has to be defined which returns the correct number of tasks assigned as follows: + +.. literalinclude:: ../tutorial/advanced/advanced_example9.py + :lines: 22-25 + :dedent: 4 diff --git a/docs/running.rst b/docs/running.rst index 3cf610192c..b28ba0a0ed 100644 --- a/docs/running.rst +++ b/docs/running.rst @@ -460,11 +460,12 @@ They are summarized below: In this example, Slurm's policy is that later definitions of options override previous ones. So, in this case, way you would override the standard output for all the submitted jobs! +* ``--flex-alloc-tasks {all|idle|NUM}``: Automatically determine the number of tasks allocated for each test. * ``--force-local``: Force the local execution of the selected tests. No jobs will be submitted. * ``--skip-sanity-check``: Skip sanity checking phase. * ``--skip-performance-check``: Skip performance verification phase. -* ``--strict``: Force strict performance checking. Some tests may set their :attr:`strict_check ` attribute to :class:`False` (see `"Reference Guide" `__) in order to just let their performance recorded but not yield an error. +* ``--strict``: Force strict performance checking. Some tests may set their :attr:`strict_check ` attribute to :class:`False` (see `"Reference Guide" `__) in order to just let their performance recorded but not yield an error. This option overrides this behavior and forces all tests to be strict. * ``--skip-system-check``: Skips the system check and run the selected tests even if they do not support the current system. This option is sometimes useful when you need to quickly verify if a regression test supports a new system. @@ -998,3 +999,20 @@ If you now try to run a test that loads the module `cudatoolkit`, the following * Failing phase: setup * Reason: caught framework exception: module cyclic dependency: cudatoolkit->foo->bar->foobar->cudatoolkit ------------------------------------------------------------------------------ + +Flexible task allocation +------------------------ + +.. versionadded:: 2.15 + +ReFrame can automatically determine the number of tasks used for a particular test. +In order to instruct ReFrame that the number of tasks is going to be determined at runtime, :attr:`num_tasks ` has to be set to ``0``. +This feature is used in conjuction with the ``--flex-alloc-tasks`` command line option. +Therefore, if ``--flex-alloc-tasks`` is set to ``idle``, ReFrame is going to determine the number of tasks based on the idle nodes matching the ``access`` options of the logical partition (see `"Partition Configuration" `__) as defined in the ``site_configuration`` dictionary. +For ``--flex-alloc-tasks`` set to ``all``, ReFrame will determine the number of tasks based on the idle nodes respecting the command line options for ``partition``, ``reservation``, ``nodelist``, ``exclude-nodes`` and their ``job-option`` equivalents (see `"Controlling the Execution of Regression Tests" `__). +For both ``idle`` and ``all`` the final number of tasks will be the product of the nodes satisfying the required criteria and :attr:`num_tasks_per_node `. +Finally, by specifying a positive integer ReFrame is going to use that number as the number of tasks without doing any additional check on the nodes. + +.. note:: + Flexible allocation of tasks is currently supported only for the Slurm job scheduler. + This feature should not be combined with the async execution policy since the nodes satisfying the required criteria will be allocated by the first regression test, while the subsequent tests are going to fail. diff --git a/reframe/core/pipeline.py b/reframe/core/pipeline.py index 9cb8440a62..34f12cda8a 100644 --- a/reframe/core/pipeline.py +++ b/reframe/core/pipeline.py @@ -255,22 +255,20 @@ class RegressionTest: #: Number of tasks required by this test. #: - #: If the number of tasks is set to ``0``, ReFrame will try to use all - #: the available nodes of a reservation. A reservation *must* be specified - #: through the `--reservation` command-line option, otherwise the - #: regression test will fail during submission. ReFrame will try to run the - #: test on all the nodes of the reservation that satisfy the selection - #: criteria of the current - #: `virtual partition `__ - #: (i.e., constraints and/or partitions). + #: If the number of tasks is set to ``0``, ReFrame will try to flexibly + #: allocate the number of tasks, based on the command line option + #: ``--flex-alloc-tasks``. #: #: :type: integral #: :default: ``1`` #: #: .. note:: - #: .. versionchanged:: 2.9 - #: Added support for running the test using all the nodes of the - #: specified reservation if the number of tasks is set to ``0``. + #: .. versionchanged:: 2.15 + #: Added support for flexible allocation of the number of tasks + #: according to the ``--flex-alloc-tasks`` command line option + #: (see `Flexible task allocation + #: `__) + #: if the number of tasks is set to ``0``. num_tasks = fields.TypedField('num_tasks', int) #: Number of tasks per node required by this test. diff --git a/tutorial/advanced/advanced_example9.py b/tutorial/advanced/advanced_example9.py new file mode 100644 index 0000000000..b620c76fee --- /dev/null +++ b/tutorial/advanced/advanced_example9.py @@ -0,0 +1,25 @@ +import reframe as rfm +import reframe.utility.sanity as sn + + +@rfm.simple_test +class HostnameCheck(rfm.RunOnlyRegressionTest): + def __init__(self): + super().__init__() + self.valid_systems = ['daint:gpu', 'daint:mc'] + self.valid_prog_environs = ['PrgEnv-cray'] + self.executable = 'hostname' + self.sourcesdir = None + self.num_tasks = 0 + self.num_tasks_per_node = 1 + self.sanity_patterns = sn.assert_eq( + self.num_tasks_assigned, + sn.count(sn.findall(r'nid\d+', self.stdout)) + ) + self.maintainers = ['you-can-type-your-email-here'] + self.tags = {'tutorial'} + + @property + @sn.sanity_function + def num_tasks_assigned(self): + return self.job.num_tasks From 63eade0105d0ee0ce461fb8aedc900553480bfa0 Mon Sep 17 00:00:00 2001 From: Vasileios Karakasis Date: Fri, 16 Nov 2018 21:21:41 -0600 Subject: [PATCH 2/3] Fine tune the docs for flexible tests --- docs/advanced.rst | 40 +++++++++++++++++++++-------- docs/running.rst | 35 +++++++++++++++++-------- reframe/core/schedulers/__init__.py | 8 ++++++ 3 files changed, 62 insertions(+), 21 deletions(-) diff --git a/docs/advanced.rst b/docs/advanced.rst index 7703e2c866..b99d10601f 100644 --- a/docs/advanced.rst +++ b/docs/advanced.rst @@ -398,28 +398,48 @@ Another way, which is quite useful if you want to generate lots of different tes Combining parameterized tests and test class hierarchies can offer you a very flexible way for generating multiple related tests at once keeping at the same time the maintenance cost low. We use this technique extensively in our tests. -Flexible allocation of number of tasks --------------------------------------- + +Flexible Regression Tests +------------------------- .. versionadded:: 2.15 -ReFrame can flexibly allocate the number of tasks of a test based on a number of criteria. -The following regression test is used to demonstrate this feature: +ReFrame can automatically set the number of tasks of a particular test, if its :attr:`num_tasks ` attribute is set to ``0``. +In ReFrame's terminology, such tests are called `flexible`. +By default, ReFrame will spawn such a test on all the idle nodes of the current system partition, but this behavior can be adjusted from the command-line. +Flexible tests are very useful for diagnostics tests, e.g., tests for checking the health of a whole set nodes. +In this example, we demonstrate this feature through a simple test that runs ``hostname``. +The test will verify that all the nodes print the expected host name: .. literalinclude:: ../tutorial/advanced/advanced_example9.py -In order to instruct ReFrame to perform a flexible allocation of the number of tasks :attr:`num_tasks ` has to be set to ``0``: +The first thing to notice in this test is that :attr:`num_tasks ` is set to ``0``. +This is a requirement for flexible tests: .. literalinclude:: ../tutorial/advanced/advanced_example9.py :lines: 13 :dedent: 8 -In the above regression test the ``hostname`` command is executed once on each allocated node, since ``self.num_tasks_per_node`` equals to 1. -Thus, the output will consist of separate lines one for each node containing its name. -ReFrame is going to calculate the number of tasks based on the ``--flex-alloc-tasks`` command line option (see `Flexible task allocation `__). -To check that everything worked as expected, we test that the number of nodenames included in the output equals the number of allocated tasks. -Since the number of nodes is decided by ReFrame based on the given criteria as well as the status of the system on which it is running, a new sanity function has to be defined which returns the correct number of tasks assigned as follows: +The sanity function of this test simply counts the host names and verifies that they are as many as expected: + +.. literalinclude:: ../tutorial/advanced/advanced_example9.py + :lines: 15-18 + :dedent: 8 + +Notice, however, that the sanity check does not use :attr:`num_tasks` for verification, but rather a different, custom attribute, the ``num_tasks_assigned``. +This happens for two reasons: + + a. At the time the sanity check expression is created, :attr:`num_tasks` is ``0``. + So the actual number of tasks assigned must be a deferred expression as well. + b. When ReFrame will determine and set the number of tasks of the test, it will not set the :attr:`num_tasks` attribute of the :class:`RegressionTest`. + It will only set the corresponding attribute of the associated job instance. + +Here is how the new deferred attribute is defined: .. literalinclude:: ../tutorial/advanced/advanced_example9.py :lines: 22-25 :dedent: 4 + + +The behavior of the flexible task allocation is controlled by the ``--flex-alloc-tasks`` command line option. +See the corresponding `section `__ for more information. diff --git a/docs/running.rst b/docs/running.rst index b28ba0a0ed..f439841547 100644 --- a/docs/running.rst +++ b/docs/running.rst @@ -1000,19 +1000,32 @@ If you now try to run a test that loads the module `cudatoolkit`, the following * Reason: caught framework exception: module cyclic dependency: cudatoolkit->foo->bar->foobar->cudatoolkit ------------------------------------------------------------------------------ -Flexible task allocation ------------------------- +Controlling the Flexible Task Allocation +---------------------------------------- .. versionadded:: 2.15 -ReFrame can automatically determine the number of tasks used for a particular test. -In order to instruct ReFrame that the number of tasks is going to be determined at runtime, :attr:`num_tasks ` has to be set to ``0``. -This feature is used in conjuction with the ``--flex-alloc-tasks`` command line option. -Therefore, if ``--flex-alloc-tasks`` is set to ``idle``, ReFrame is going to determine the number of tasks based on the idle nodes matching the ``access`` options of the logical partition (see `"Partition Configuration" `__) as defined in the ``site_configuration`` dictionary. -For ``--flex-alloc-tasks`` set to ``all``, ReFrame will determine the number of tasks based on the idle nodes respecting the command line options for ``partition``, ``reservation``, ``nodelist``, ``exclude-nodes`` and their ``job-option`` equivalents (see `"Controlling the Execution of Regression Tests" `__). -For both ``idle`` and ``all`` the final number of tasks will be the product of the nodes satisfying the required criteria and :attr:`num_tasks_per_node `. -Finally, by specifying a positive integer ReFrame is going to use that number as the number of tasks without doing any additional check on the nodes. +ReFrame can automatically set the number of tasks of a particular test, if its :attr:`num_tasks ` attribute is set to ``0``. +By default, ReFrame will spawn such a test on all the idle nodes of the current system partition. +This behavior can be adjusted using the ``--flex-alloc-tasks`` command line option. +This option accepts three values: + + 1. ``idle``: (default) In this case, ReFrame will set the number of tasks to the number of idle nodes of the current logical partition. + 2. ``all``: In this case, ReFrame will set the number of tasks to the number of all the nodes of the current logical partition. + 3. Any positive integer: In this case, ReFrame will set the number of tasks to the given value. + +The flexible allocation of number of tasks takes into account any additional logical constraint imposed by the command line options affecting the job allocation, such as ``--partition``, ``--reservation``, ``--nodelist``, ``--exclude-nodes`` and ``--job-option`` (if the scheduler option passed to the latter imposes a restriction). +Notice that ReFrame will issue an error if the resulting number of nodes is zero. + +For example, using the following options would run a flexible test on all the nodes of reservation ``foo`` except the nodes ``n0[1-5]``: + +.. code-block:: bash + + --flex-alloc-tasks=all --exclude-nodes=n0[1-5] + .. note:: - Flexible allocation of tasks is currently supported only for the Slurm job scheduler. - This feature should not be combined with the async execution policy since the nodes satisfying the required criteria will be allocated by the first regression test, while the subsequent tests are going to fail. + Flexible task allocation is supported only for the Slurm scheduler backend. + +.. warning:: + Test cases resulting from flexible ReFrame tests may not be run using the asynchronous execution policy, because the nodes satisfying the required criteria will be allocated for the first test case, causing all subsequent ones to fail. diff --git a/reframe/core/schedulers/__init__.py b/reframe/core/schedulers/__init__.py index a7cb1a7fdd..b3e796aea6 100644 --- a/reframe/core/schedulers/__init__.py +++ b/reframe/core/schedulers/__init__.py @@ -141,6 +141,14 @@ def workdir(self): @property def num_tasks(self): + """The number of tasks assigned to this job. + + This attribute is useful in a flexible regression test for determining + the actual number of tasks that ReFrame assigned to the test. + + For more information on flexible task allocation, please refer to the + `tutorial `__. + """ return self._num_tasks @property From 23695ef46513ca531fb64f7e34f4a85419dc68ef Mon Sep 17 00:00:00 2001 From: Theofilos Manitaras Date: Mon, 19 Nov 2018 08:14:26 +0100 Subject: [PATCH 3/3] Minor fixes --- docs/running.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/running.rst b/docs/running.rst index f439841547..436cd1e811 100644 --- a/docs/running.rst +++ b/docs/running.rst @@ -1010,8 +1010,9 @@ By default, ReFrame will spawn such a test on all the idle nodes of the current This behavior can be adjusted using the ``--flex-alloc-tasks`` command line option. This option accepts three values: - 1. ``idle``: (default) In this case, ReFrame will set the number of tasks to the number of idle nodes of the current logical partition. - 2. ``all``: In this case, ReFrame will set the number of tasks to the number of all the nodes of the current logical partition. + 1. ``idle``: (default) In this case, ReFrame will set the number of tasks to the number of idle nodes of the current logical partition multiplied by the :attr:`num_tasks_per_node ` attribute of the particular test. + 2. ``all``: In this case, ReFrame will set the number of tasks to the number of all the nodes of the current logical partition multiplied by the :attr:`num_tasks_per_node ` attribute of the particular test. + 3. Any positive integer: In this case, ReFrame will set the number of tasks to the given value. The flexible allocation of number of tasks takes into account any additional logical constraint imposed by the command line options affecting the job allocation, such as ``--partition``, ``--reservation``, ``--nodelist``, ``--exclude-nodes`` and ``--job-option`` (if the scheduler option passed to the latter imposes a restriction). @@ -1021,7 +1022,7 @@ For example, using the following options would run a flexible test on all the no .. code-block:: bash - --flex-alloc-tasks=all --exclude-nodes=n0[1-5] + --flex-alloc-tasks=all --reservation=foo --exclude-nodes=n0[1-5] .. note::