From 0c48dd9a2e5bd8e504af46c480534759e3b60925 Mon Sep 17 00:00:00 2001 From: Eirini Koutsaniti Date: Fri, 3 Mar 2023 11:49:16 +0100 Subject: [PATCH 1/3] Add more documentation about pipeline_timeout --- docs/manpage.rst | 15 +++++++++++++++ docs/pipeline.rst | 16 ++++++++++++++++ 2 files changed, 31 insertions(+) diff --git a/docs/manpage.rst b/docs/manpage.rst index 13df9fa0d1..4f3c015a47 100644 --- a/docs/manpage.rst +++ b/docs/manpage.rst @@ -1413,6 +1413,21 @@ Whenever an environment variable is associated with a configuration option, its ================================== ================== +.. envvar:: RFM_PIPELINE_TIMEOUT + + Timeout in seconds for advancing the pipeline in the asynchronous execution policy. + + .. table:: + :align: left + + ================================== ================== + Associated command line option N/A + Associated configuration parameter `~config.general.purge_environment` + ================================== ================== + + .. versionadded:: 3.10.0 + + .. envvar:: RFM_PREFIX General directory prefix for ReFrame-generated directories. diff --git a/docs/pipeline.rst b/docs/pipeline.rst index f0a51e2915..6c98258a8a 100644 --- a/docs/pipeline.rst +++ b/docs/pipeline.rst @@ -197,6 +197,22 @@ To control the concurrency of the ReFrame execution context, users should set th Execution contexts were formalized. +--------------------------------------------------------- +Raising the throughput of jobs in the asynchronous policy +--------------------------------------------------------- + +.. versionadded:: 3.10.0 + +ReFrame's asynchronous execution policy will cycle through the tests and in every iteration it will try to advance as many as possible in a given time. +This time is controlled by the :attr:`~config.general.pipeline_timeout` configuration option or the :envvar:`RFM_PIPELINE_TIMEOUT` environment variable. +If this timeout value is exceeded and at least one test has progressed, ReFrame will stop processing new tests and it will try to further advance tests that have already started. +The default value of the timeout is 10 seconds in order to give priority to tests that have already started and have a more interactive output. + +There are cases when some tests take too long to proceed (e.g., due to copying of large files) and as a result they are blocking more tests from starting their pipeline. +This could lead to a sequential run of the tests and increase the time of the total run significantly. +In these cases, you can try setting the timeout to a higher value, like 60 seconds. + + Timing the Test Pipeline ------------------------ From 9c8624c3a09735c97b789987bdf1df3f0cd21071 Mon Sep 17 00:00:00 2001 From: Vasileios Karakasis Date: Sat, 11 Mar 2023 20:10:10 +0100 Subject: [PATCH 2/3] Apply suggestions from code review --- docs/pipeline.rst | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/docs/pipeline.rst b/docs/pipeline.rst index 6c98258a8a..ec38b1f0bb 100644 --- a/docs/pipeline.rst +++ b/docs/pipeline.rst @@ -198,19 +198,21 @@ To control the concurrency of the ReFrame execution context, users should set th --------------------------------------------------------- -Raising the throughput of jobs in the asynchronous policy +Tweaking the throughput and interactivity of test jobs in the asynchronous execution policy --------------------------------------------------------- .. versionadded:: 3.10.0 -ReFrame's asynchronous execution policy will cycle through the tests and in every iteration it will try to advance as many as possible in a given time. -This time is controlled by the :attr:`~config.general.pipeline_timeout` configuration option or the :envvar:`RFM_PIPELINE_TIMEOUT` environment variable. -If this timeout value is exceeded and at least one test has progressed, ReFrame will stop processing new tests and it will try to further advance tests that have already started. -The default value of the timeout is 10 seconds in order to give priority to tests that have already started and have a more interactive output. +ReFrame's asynchronous execution policy will iteratively cycle through all the in-flight tests and will try to advance the state (see state diagram above) of as many as possible within a given time slot. +The duration of this time slot is controlled by the :attr:`~config.general.pipeline_timeout` configuration option or the :envvar:`RFM_PIPELINE_TIMEOUT` environment variable. +If this timeout expires and at least one test has progressed, ReFrame will stop processing new tests in this time slot. +In the next time slot, it will try to further advance tests that have already started and if there is enough time left, it will also start new tests. +Essentially, a small timeout value gives preference to tests that have already started, thus pushing them quicker down their pipeline, whereas higher values give preference to overall test throughput, as more tests will be running concurrently. +The default timeout is 10 seconds in order to balance interactivity and overall throughput. There are cases when some tests take too long to proceed (e.g., due to copying of large files) and as a result they are blocking more tests from starting their pipeline. This could lead to a sequential run of the tests and increase the time of the total run significantly. -In these cases, you can try setting the timeout to a higher value, like 60 seconds. +In these cases, a higher timeout value will help to increase the test concurrency and therefore the overall throughput. Timing the Test Pipeline From 37837ba4f34cd64b2f1be8999c115c8106d7fb50 Mon Sep 17 00:00:00 2001 From: Vasileios Karakasis Date: Sat, 11 Mar 2023 22:33:47 +0100 Subject: [PATCH 3/3] Address remaining PR comments --- docs/config_reference.rst | 1 + docs/manpage.rst | 4 +++- docs/pipeline.rst | 11 +++++------ 3 files changed, 9 insertions(+), 7 deletions(-) diff --git a/docs/config_reference.rst b/docs/config_reference.rst index 03e21008b8..c708ab484b 100644 --- a/docs/config_reference.rst +++ b/docs/config_reference.rst @@ -1511,6 +1511,7 @@ General Configuration ReFrame's asynchronous execution policy will try to advance as many tests as possible in their pipeline, but some tests may take too long to proceed (e.g., due to copying of large files) blocking the advancement of previously started tests. If this timeout value is exceeded and at least one test has progressed, ReFrame will stop processing new tests and it will try to further advance tests that have already started. + See :ref:`pipeline-timeout` for more guidance on how to set this. :required: No :default: ``10`` diff --git a/docs/manpage.rst b/docs/manpage.rst index 4f3c015a47..84c232d682 100644 --- a/docs/manpage.rst +++ b/docs/manpage.rst @@ -1416,13 +1416,15 @@ Whenever an environment variable is associated with a configuration option, its .. envvar:: RFM_PIPELINE_TIMEOUT Timeout in seconds for advancing the pipeline in the asynchronous execution policy. + See :ref:`pipeline-timeout` for more guidance on how to set this. + .. table:: :align: left ================================== ================== Associated command line option N/A - Associated configuration parameter `~config.general.purge_environment` + Associated configuration parameter :attr:`~config.general.pipeline_timeout` ================================== ================== .. versionadded:: 3.10.0 diff --git a/docs/pipeline.rst b/docs/pipeline.rst index ec38b1f0bb..b6efb3edef 100644 --- a/docs/pipeline.rst +++ b/docs/pipeline.rst @@ -197,21 +197,20 @@ To control the concurrency of the ReFrame execution context, users should set th Execution contexts were formalized. ---------------------------------------------------------- -Tweaking the throughput and interactivity of test jobs in the asynchronous execution policy ---------------------------------------------------------- +.. _pipeline-timeout: -.. versionadded:: 3.10.0 +------------------------------------------------------------------------------------------- +Tweaking the throughput and interactivity of test jobs in the asynchronous execution policy +------------------------------------------------------------------------------------------- ReFrame's asynchronous execution policy will iteratively cycle through all the in-flight tests and will try to advance the state (see state diagram above) of as many as possible within a given time slot. The duration of this time slot is controlled by the :attr:`~config.general.pipeline_timeout` configuration option or the :envvar:`RFM_PIPELINE_TIMEOUT` environment variable. -If this timeout expires and at least one test has progressed, ReFrame will stop processing new tests in this time slot. +If this timeout expires and at least one test has progressed, ReFrame will stop processing new tests in this time slot. In the next time slot, it will try to further advance tests that have already started and if there is enough time left, it will also start new tests. Essentially, a small timeout value gives preference to tests that have already started, thus pushing them quicker down their pipeline, whereas higher values give preference to overall test throughput, as more tests will be running concurrently. The default timeout is 10 seconds in order to balance interactivity and overall throughput. There are cases when some tests take too long to proceed (e.g., due to copying of large files) and as a result they are blocking more tests from starting their pipeline. -This could lead to a sequential run of the tests and increase the time of the total run significantly. In these cases, a higher timeout value will help to increase the test concurrency and therefore the overall throughput.