omp.library-only: Fix incorrect addition of master group offset to group id #814
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The fiber-based collective execution engine in
omp.library-only
is built around the idea of having a master fiber that probes whether a kernel contains barriers by attempting to execute the work group as a loop across the work items, which the compiler might then auto-vectorize.When a barrier is encountered, we need to additionally spawn one fiber per work item, such that all work items run inside their own dedicated fiber. This is much less efficient than the case without fibers as it prevents vectorization across work items and introduces a lot of stack context switches. However it is required because it guarantees correct barrier semantics, since if work items are just iterations of a loop, they cannot all reach the barrier at the same time.
Now, it can happen that only some work groups contain barriers while others do not. To support this use case, the newly spawned fibers in general have to skip some work groups until they have reached the position where the master fiber is currently waiting for them to catch up. The position of the master fiber in the work group iteration space where the barrier was encountered is called master group offset.
Previously the master group offset was added to the group id that was passed into kernels processed by the newly spawned fibers. I don't understand why this should be necessary. In fact, I believe this to be incorrect because I believe that it causes work items being launched that have the wrong group id assigned if the master group offset is non-zero, i.e. the first work group do not contain barriers. This in turn can lead to a mismatch between the number of barriers that the master fiber encounters, and the number of barriers that the other fibers encounter. Because all fibers must be able to reach all barriers, this will in general result in a deadlock.
This PR removes the addition of the master group offset to the group id, thereby fixing the potential deadlocks.
Fixes #809