Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU User Experience Improvements #1283

Merged
merged 44 commits into from Jun 29, 2023
Merged

GPU User Experience Improvements #1283

merged 44 commits into from Jun 29, 2023

Conversation

tbennun
Copy link
Collaborator

@tbennun tbennun commented Jun 23, 2023

  • Post-SDFG error checking and informative guidance (Fixes GPU debugging guidance #1277)
  • More information in syncdebug prints (e.g., actual grid/block sizes)
  • Tell users where invalid/failing SDFGs were saved
  • Always error-check (without synchronization) unless configured otherwise
  • Validation error upon reading from a non-accessible host memory (e.g., GPU global) outside an accelerator context
  • __launch_bounds__ support in map nodes: -1 disables, 0 (default) sets if block size is constant, any other number sets explicitly
  • Argument errors (Fixes DaCe does not warn on excess arguments to dace.program #1263)
  • Validate nested SDFG parent pointers
  • Warnings/errors on conflicting behavior with gpu_block_size, configuration, and internal maps (Fixes Fewer dimension GPU maps with higher-dimension GPU blocks should fallback #1271)
    • Warning when multiple internal Threadblock maps with different sizes exist + the final block size used
    • Configurable (+default) maximal block size x, total block size and error when surpassed + recommendations (tile block with another map)
    • Reduce default_block_size dimensionality to kernel dimensionality and warn if that happens for any reason
    • Error when both gpu_block_size and threadblock maps are given
    • The warning for using default block size should guide the user towards gpu_block_size or thread-block sub maps
  • Fix CUDA 12+ CUB includes

@tbennun tbennun self-assigned this Jun 23, 2023
@mcopik
Copy link
Contributor

mcopik commented Jun 25, 2023

If I understand correctly, the current proposal is to check for errors after execution. I think it might be worth considering a case where we fail at the first CUDA error, not just check at the end - this will make debugging much easier and faster. We could make it an optional feature if there's any concern that an additional check of a return value from CUDA calls would add an unnecessary performance tax.

Happy to help with this feature by improving & merging my solution for this.

@tbennun
Copy link
Collaborator Author

tbennun commented Jun 26, 2023

We don’t want the overhead by default, that’s what syncdebug is for. It does what you proposed and more (and even more after this PR).

@tbennun tbennun added the no-ci label Jun 26, 2023
@tbennun tbennun removed the no-ci label Jun 26, 2023
@tbennun
Copy link
Collaborator Author

tbennun commented Jun 27, 2023

@mcopik I followed your suggestion in the latest commit

@tbennun tbennun marked this pull request as ready for review June 29, 2023 03:37
@tbennun tbennun requested a review from phschaad June 29, 2023 05:02
Copy link
Collaborator

@phschaad phschaad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor comments to consider.

dace/transformation/passes/fusion_inline.py Outdated Show resolved Hide resolved
dace/runtime/include/dace/cuda/stream.cuh Outdated Show resolved Hide resolved
@tbennun tbennun enabled auto-merge June 29, 2023 06:55
@tbennun tbennun merged commit 81b3e4e into master Jun 29, 2023
9 checks passed
@tbennun tbennun deleted the gpu-ux branch June 29, 2023 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants