Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug (Doc) sessionization example has sync() inside if block #8874

Open
shaunc opened this issue Apr 3, 2023 · 1 comment
Open

Bug (Doc) sessionization example has sync() inside if block #8874

shaunc opened this issue Apr 3, 2023 · 1 comment
Labels
CUDA CUDA related issue/PR doc

Comments

@shaunc
Copy link
Contributor

shaunc commented Apr 3, 2023

https://numba.readthedocs.io/en/stable/cuda/examples.html#dividing-click-data-into-sessions

Has example containing:

# Determine session labels
if is_sess_boundary:
    # This thread marks the start of a session
    results[gid] = gid
    # Make sure all session boundaries are written
    # before populating the session id
    grid = cuda.cg.this_grid()
    grid.sync()

    # more code here ....

However, this will only sync for some threads. Presumably the example could be changed to:

# Determine session labels
if is_sess_boundary:
    # This thread marks the start of a session
    results[gid] = gid

# Make sure all session boundaries are written
# before populating the session id
grid = cuda.cg.this_grid()
grid.sync()

if is_sess_boundary:
    # more code here ....
@gmarkall gmarkall added doc CUDA CUDA related issue/PR labels Apr 3, 2023
@gmarkall
Copy link
Member

gmarkall commented Apr 3, 2023

Many thanks for the report - I believe you are correct. The CUDA Programming Guide section on Group Collectives says:

These operations require participation of all threads in the specified group in order to complete the operation. All threads in the group need to pass the same values for corresponding arguments to each collective call, unless different values are explicitly allowed in the argument description. Otherwise the behavior of the call is undefined.

I think the present implementation of grid group sync in Numba happens to work with this example, because it uses a particular mechanism for synchronizing the grid, but it's an implementation detail and would fail if we switched to using the more efficient method used in newer (11.0 onwards) CUDA toolkits.

I think your suggested fix looks correct, and it should be picked up by CI if it is not, because the code is a doctest - would you like to make a PR for the fix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA CUDA related issue/PR doc
Projects
None yet
Development

No branches or pull requests

2 participants