[litmus] Add support for speedcheck parameter for -mode presi #869

relokin · 2024-06-03T13:15:54Z

This change adds support for the speedcheck parameter for -mode presi. This was already supported in -mode std. The user can provide the parameter "+sc" which will force the exit to as soon as the post-condition is observed.

maranget · 2024-06-12T14:14:43Z

Hi @relokin, using one global stop_now flag may lead to deadlock. Commit b72a2f3 is an attempt to avoid such deadlocks by stopping all instances before stopping the experiment.

relokin · 2024-06-25T22:46:09Z

Hey @maranget, indeed I missed this. I tried to avoid other deadlocks, but I missed this.

I had a look at your patch and I think it's a much better way to achieve what I was trying to. Do you want to open a new PR with your own patch or should I cherry pick it in this pull request?

maranget · 2024-06-26T05:41:01Z

Hi @relokin, cherry picking looks like the most adequate technique. Your opinion?

relokin · 2024-06-26T08:57:17Z

Just to check with you that my understanding is correct.

The high level desire was to exit litmus7 as soon as there is an execution where the post-condition is satisfied. With this PR, if a specific execution satisfies the post-condition, then other instances will continue for a little longer.

For example, let's say we execute 2 instances of MP using 4 cores. The 1st instance observes an execution which satisfies the post-condition and exits immediately. The 2nd instance will continue executing until the end of the for loop (executing in total size iterations), or until it encounters itself an execution which satisfies the post-condition).

However, at the end of the execution, we know that at least for one instance its last execution satisfied the post-condition. And in the case of presi, we know that at least for at least a set of cores (2 in the case of MP), the last time they executed the test, it satisfied the post-condition.

Does that make sense?

maranget · 2024-06-26T11:25:17Z

The high level desire was to exit litmus7 as soon as there is an execution where the post-condition is satisfied. With this PR, if a specific execution satisfies the post-condition, then other instances will continue for a little longer.

Hi @relokin, we agree on the high-level desire. If I am not mistaken, this is what the synchronisation code of commit 0b45377 does. That is all instance will exit as soon as possible if one instance discovers that the post-condition is satisfied.

Every test thread executes nruns times (function choose) a sequence of size tests (function choose_params).

As soon as one of the instance discovers that the post-condition is satisfied, it sets the global flag stop_now to true. Moreover all the threads of any instance synchronise as follows: thread number zero copies the global flag into an instance level flag and all instance thread synchronise with an instance level synchronisation barrier before they read the instance level flag. If this flag is set, all threads exit the loop by returning from the choose_params function. As a consequence, all thread of all instances will exit their loops as soon as possible and return inside the loop of size nruns in the function choose. There they all synchronise on a global synchronisation barrier before reading the global flag and all exit if they see it set,

I am not sure the scheme above is dead-lock free. It looks important that all the threads of a given instance act consistently. Hence the idea of them synchronising before reading the instance level flag.

maranget · 2024-06-26T11:53:04Z

A simpler scheme that would not stop threads as soon as the previous one, but that would spare the instance level synchronisation, would be as follows: the choose_param function (loop on size) simply records the occurrence of the stop condition locally, returning 1 if the stop condition occurred for some of the loop iteration and 0 otherwise.

In choose, if the returned value is one, set the global stop_now flag to one. Then synchronise on a global sync barrier, before reading the global flag and exiting when set.

relokin · 2024-07-09T18:26:48Z

A simpler scheme that would not stop threads as soon as the previous one, but that would spare the instance level synchronisation, would be as follows: the choose_param function (loop on size) simply records the occurrence of the stop condition locally, returning 1 if the stop condition occurred for some of the loop iteration and 0 otherwise.

Thanks @maranget! The motivation for this change (and I am aware that this might not be the same for speedcheck in -mode std) was to make it easier to identify the execution that satisfied the post-condition. So it's quite important that we exit as soon as possible. So I would rather not change it, unless ofc, there is something wrong with the current approach.

maranget · 2024-07-10T08:11:06Z

Hi @relokin. I guess we agree to merge this PR as it is, or do you see additional improvements?

maranget · 2024-07-10T08:12:35Z

If we agree on merging, would you please rebase on master?

This change adds support for the speedcheck parameter for -mode presi. This was already supported in -mode std. The user can provide the parameter "+sc" which will force the exit to as soon as the post-condition is observed. So as to avoid deadlocks, introduce a 2 level procedure to stop the experiment. Any instance that reaches the stop condition sets the global `stop_now` flag. This global flag is copied into an instance specific `stop_now` flag after each test by the thread number zero of this instance. Then, the instance threads synchronise on an instance level barrier before reading the instance flag and interrupting the test loop, if the flag is set. After returning from the test loop, _all_ threads synchronise on a global barrier, before reading the global flag and interrupting the experiment if the flag is set.

relokin · 2024-07-10T09:02:28Z

I am happy for this to be merged. Thanks @maranget!

maranget · 2024-07-10T11:05:31Z

Merged, thanks @relokin

relokin force-pushed the speedcheck-presi branch from 4f3b2fa to 0b45377 Compare June 26, 2024 08:44

relokin force-pushed the speedcheck-presi branch from 0b45377 to de79e25 Compare July 10, 2024 09:01

maranget merged commit 010efa8 into herd:master Jul 10, 2024
3 checks passed

relokin deleted the speedcheck-presi branch July 10, 2024 11:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[litmus] Add support for speedcheck parameter for -mode presi #869

[litmus] Add support for speedcheck parameter for -mode presi #869

relokin commented Jun 3, 2024

maranget commented Jun 12, 2024 •

edited

Loading

relokin commented Jun 25, 2024

maranget commented Jun 26, 2024

relokin commented Jun 26, 2024

maranget commented Jun 26, 2024 •

edited

Loading

maranget commented Jun 26, 2024 •

edited

Loading

relokin commented Jul 9, 2024

maranget commented Jul 10, 2024 •

edited

Loading

maranget commented Jul 10, 2024

relokin commented Jul 10, 2024

maranget commented Jul 10, 2024

[litmus] Add support for speedcheck parameter for -mode presi #869

[litmus] Add support for speedcheck parameter for -mode presi #869

Conversation

relokin commented Jun 3, 2024

maranget commented Jun 12, 2024 • edited Loading

relokin commented Jun 25, 2024

maranget commented Jun 26, 2024

relokin commented Jun 26, 2024

maranget commented Jun 26, 2024 • edited Loading

maranget commented Jun 26, 2024 • edited Loading

relokin commented Jul 9, 2024

maranget commented Jul 10, 2024 • edited Loading

maranget commented Jul 10, 2024

relokin commented Jul 10, 2024

maranget commented Jul 10, 2024

maranget commented Jun 12, 2024 •

edited

Loading

maranget commented Jun 26, 2024 •

edited

Loading

maranget commented Jun 26, 2024 •

edited

Loading

maranget commented Jul 10, 2024 •

edited

Loading