Simplify cuda_calc_xblock_count / cuda_calc_block_count via if constexpr#5783
Closed
q10 wants to merge 1 commit into
Closed
Simplify cuda_calc_xblock_count / cuda_calc_block_count via if constexpr#5783q10 wants to merge 1 commit into
q10 wants to merge 1 commit into
Conversation
Summary:
The current `cuda_block_count.h` defines `cuda_calc_xblock_count`
as **four SFINAE overloads** (signed/unsigned x signed/unsigned for
the two integer parameters) plus a `cuda_calc_xblock_count_base`
helper -- five functions in total -- purely to suppress "pointless
comparison against zero" compiler warnings on unsigned integer types.
The header itself documents this rationale at lines 28-32 of the
pre-diff file:
"This system prevents 'pointless comparison against zero'
warnings from the compiler for unsigned types (simpler ways
of suppressing this warning didn't work) while maintaining the
various warnings."
The "simpler ways didn't work" comment dates from before
`if constexpr` was widely available. With C++17/C++20 the entire
five-function tower collapses to **one** function template using
`if constexpr` to gate the signed-only `>= 0` checks at compile
time. The unused branch is discarded entirely so no warning is
emitted on unsigned types.
`cuda_calc_block_count` (the y/z-dim wrapper that adds the 65535
cap) is similarly trimmed to a 4-line template that delegates to
`cuda_calc_xblock_count`.
Net effect:
- Five functions reduced to two.
- File length: ~155 lines -> ~85 lines (~45% reduction).
- Public API unchanged: same names, same return types, same
observable behaviour. TORCH_CHECK messages match the originals
verbatim.
- Behaviour-preserving: every existing caller across fbgemm_gpu
continues to compile and produces the same `uint32_t` result.
This is a prep diff for an upcoming change that introduces a
`determine_grid_blocks` helper (with a `BlockCapPolicy` enum) on
top of these primitives. Folding the SFINAE tower now keeps that
follow-up diff's helper signature minimal.
Reviewed By: spcyppt
Differential Revision: D106262731
Contributor
|
@q10 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D106262731. |
Contributor
|
This pull request has been merged in 15f7c24. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
The current
cuda_block_count.hdefinescuda_calc_xblock_countas four SFINAE overloads (signed/unsigned x signed/unsigned for
the two integer parameters) plus a
cuda_calc_xblock_count_basehelper -- five functions in total -- purely to suppress "pointless
comparison against zero" compiler warnings on unsigned integer types.
The header itself documents this rationale at lines 28-32 of the
pre-diff file:
The "simpler ways didn't work" comment dates from before
if constexprwas widely available. With C++17/C++20 the entirefive-function tower collapses to one function template using
if constexprto gate the signed-only>= 0checks at compiletime. The unused branch is discarded entirely so no warning is
emitted on unsigned types.
cuda_calc_block_count(the y/z-dim wrapper that adds the 65535cap) is similarly trimmed to a 4-line template that delegates to
cuda_calc_xblock_count.Net effect:
observable behaviour. TORCH_CHECK messages match the originals
verbatim.
continues to compile and produces the same
uint32_tresult.This is a prep diff for an upcoming change that introduces a
determine_grid_blockshelper (with aBlockCapPolicyenum) ontop of these primitives. Folding the SFINAE tower now keeps that
follow-up diff's helper signature minimal.
Reviewed By: spcyppt
Differential Revision: D106262731