Skip to content

Simplify cuda_calc_xblock_count / cuda_calc_block_count via if constexpr#5783

Closed
q10 wants to merge 1 commit into
pytorch:mainfrom
q10:export-D106262731
Closed

Simplify cuda_calc_xblock_count / cuda_calc_block_count via if constexpr#5783
q10 wants to merge 1 commit into
pytorch:mainfrom
q10:export-D106262731

Conversation

@q10
Copy link
Copy Markdown
Contributor

@q10 q10 commented May 26, 2026

Summary:
The current cuda_block_count.h defines cuda_calc_xblock_count
as four SFINAE overloads (signed/unsigned x signed/unsigned for
the two integer parameters) plus a cuda_calc_xblock_count_base
helper -- five functions in total -- purely to suppress "pointless
comparison against zero" compiler warnings on unsigned integer types.
The header itself documents this rationale at lines 28-32 of the
pre-diff file:

"This system prevents 'pointless comparison against zero'
 warnings from the compiler for unsigned types (simpler ways
 of suppressing this warning didn't work) while maintaining the
 various warnings."

The "simpler ways didn't work" comment dates from before
if constexpr was widely available. With C++17/C++20 the entire
five-function tower collapses to one function template using
if constexpr to gate the signed-only >= 0 checks at compile
time. The unused branch is discarded entirely so no warning is
emitted on unsigned types.

cuda_calc_block_count (the y/z-dim wrapper that adds the 65535
cap) is similarly trimmed to a 4-line template that delegates to
cuda_calc_xblock_count.

Net effect:

  • Five functions reduced to two.
  • File length: ~155 lines -> ~85 lines (~45% reduction).
  • Public API unchanged: same names, same return types, same
    observable behaviour. TORCH_CHECK messages match the originals
    verbatim.
  • Behaviour-preserving: every existing caller across fbgemm_gpu
    continues to compile and produces the same uint32_t result.

This is a prep diff for an upcoming change that introduces a
determine_grid_blocks helper (with a BlockCapPolicy enum) on
top of these primitives. Folding the SFINAE tower now keeps that
follow-up diff's helper signature minimal.

Reviewed By: spcyppt

Differential Revision: D106262731

Summary:
The current `cuda_block_count.h` defines `cuda_calc_xblock_count`
as **four SFINAE overloads** (signed/unsigned x signed/unsigned for
the two integer parameters) plus a `cuda_calc_xblock_count_base`
helper -- five functions in total -- purely to suppress "pointless
comparison against zero" compiler warnings on unsigned integer types.
The header itself documents this rationale at lines 28-32 of the
pre-diff file:

    "This system prevents 'pointless comparison against zero'
     warnings from the compiler for unsigned types (simpler ways
     of suppressing this warning didn't work) while maintaining the
     various warnings."

The "simpler ways didn't work" comment dates from before
`if constexpr` was widely available. With C++17/C++20 the entire
five-function tower collapses to **one** function template using
`if constexpr` to gate the signed-only `>= 0` checks at compile
time. The unused branch is discarded entirely so no warning is
emitted on unsigned types.

`cuda_calc_block_count` (the y/z-dim wrapper that adds the 65535
cap) is similarly trimmed to a 4-line template that delegates to
`cuda_calc_xblock_count`.

Net effect:
- Five functions reduced to two.
- File length: ~155 lines -> ~85 lines (~45% reduction).
- Public API unchanged: same names, same return types, same
  observable behaviour. TORCH_CHECK messages match the originals
  verbatim.
- Behaviour-preserving: every existing caller across fbgemm_gpu
  continues to compile and produces the same `uint32_t` result.

This is a prep diff for an upcoming change that introduces a
`determine_grid_blocks` helper (with a `BlockCapPolicy` enum) on
top of these primitives. Folding the SFINAE tower now keeps that
follow-up diff's helper signature minimal.

Reviewed By: spcyppt

Differential Revision: D106262731
@meta-cla meta-cla Bot added the cla signed label May 26, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented May 26, 2026

@q10 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D106262731.

@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented May 26, 2026

This pull request has been merged in 15f7c24.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant