Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

omp.library-only: Fix incorrect addition of master group offset to group id #814

Merged
merged 1 commit into from
Aug 24, 2022

Conversation

illuhad
Copy link
Collaborator

@illuhad illuhad commented Aug 23, 2022

The fiber-based collective execution engine in omp.library-only is built around the idea of having a master fiber that probes whether a kernel contains barriers by attempting to execute the work group as a loop across the work items, which the compiler might then auto-vectorize.
When a barrier is encountered, we need to additionally spawn one fiber per work item, such that all work items run inside their own dedicated fiber. This is much less efficient than the case without fibers as it prevents vectorization across work items and introduces a lot of stack context switches. However it is required because it guarantees correct barrier semantics, since if work items are just iterations of a loop, they cannot all reach the barrier at the same time.
Now, it can happen that only some work groups contain barriers while others do not. To support this use case, the newly spawned fibers in general have to skip some work groups until they have reached the position where the master fiber is currently waiting for them to catch up. The position of the master fiber in the work group iteration space where the barrier was encountered is called master group offset.

Previously the master group offset was added to the group id that was passed into kernels processed by the newly spawned fibers. I don't understand why this should be necessary. In fact, I believe this to be incorrect because I believe that it causes work items being launched that have the wrong group id assigned if the master group offset is non-zero, i.e. the first work group do not contain barriers. This in turn can lead to a mismatch between the number of barriers that the master fiber encounters, and the number of barriers that the other fibers encounter. Because all fibers must be able to reach all barriers, this will in general result in a deadlock.

This PR removes the addition of the master group offset to the group id, thereby fixing the potential deadlocks.

Fixes #809

…oup id, which can result in deadlocks due to non-collective barriers
@illuhad illuhad merged commit e8c875a into develop Aug 24, 2022
@illuhad illuhad deleted the hotfix/fix-ndrange-fibers-incorrect-offset branch August 24, 2022 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incomplete Launch Groups Creates group_barrier deadlock.
1 participant