Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed deadlock in sgmv_shrink kernel caused by imbalanced segments #156

Merged
merged 1 commit into from
Jan 4, 2024

Conversation

tgaddair
Copy link
Contributor

@tgaddair tgaddair commented Jan 4, 2024

Fixes #149.

The crux of the issue was that each grid block can execute a dynamic number of steps depending on the size of its segment (s_end - s_start). However, during each step the block will call grid.sync(). If one block executes more steps than another, it will call grid.sync() a different number of times, leading to a deadlock.

The solution presented here is to compute the max number of steps from the largest segment, and then call grid.sync() at the end of the kernel for the difference between the max steps and the current block's steps.

@tgaddair tgaddair mentioned this pull request Jan 4, 2024
4 tasks
@tgaddair tgaddair merged commit a5f49b2 into main Jan 4, 2024
1 check passed
@tgaddair tgaddair deleted the fix-sgmv-deadlock branch January 4, 2024 05:36
@tgaddair tgaddair restored the fix-sgmv-deadlock branch January 4, 2024 05:38
@tgaddair tgaddair deleted the fix-sgmv-deadlock branch January 11, 2024 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lorax Hanging in production
1 participant