Simplify cuda_calc_xblock_count / cuda_calc_block_count via if constexpr by q10 · Pull Request #5783 · pytorch/FBGEMM

q10 · 2026-05-26T19:29:43Z

Summary:
The current cuda_block_count.h defines cuda_calc_xblock_count
as four SFINAE overloads (signed/unsigned x signed/unsigned for
the two integer parameters) plus a cuda_calc_xblock_count_base
helper -- five functions in total -- purely to suppress "pointless
comparison against zero" compiler warnings on unsigned integer types.
The header itself documents this rationale at lines 28-32 of the
pre-diff file:

"This system prevents 'pointless comparison against zero'
 warnings from the compiler for unsigned types (simpler ways
 of suppressing this warning didn't work) while maintaining the
 various warnings."

The "simpler ways didn't work" comment dates from before
if constexpr was widely available. With C++17/C++20 the entire
five-function tower collapses to one function template using
if constexpr to gate the signed-only >= 0 checks at compile
time. The unused branch is discarded entirely so no warning is
emitted on unsigned types.

cuda_calc_block_count (the y/z-dim wrapper that adds the 65535
cap) is similarly trimmed to a 4-line template that delegates to
cuda_calc_xblock_count.

Net effect:

Five functions reduced to two.
File length: ~155 lines -> ~85 lines (~45% reduction).
Public API unchanged: same names, same return types, same
observable behaviour. TORCH_CHECK messages match the originals
verbatim.
Behaviour-preserving: every existing caller across fbgemm_gpu
continues to compile and produces the same uint32_t result.

This is a prep diff for an upcoming change that introduces a
determine_grid_blocks helper (with a BlockCapPolicy enum) on
top of these primitives. Folding the SFINAE tower now keeps that
follow-up diff's helper signature minimal.

Reviewed By: spcyppt

Differential Revision: D106262731

Summary: The current `cuda_block_count.h` defines `cuda_calc_xblock_count` as **four SFINAE overloads** (signed/unsigned x signed/unsigned for the two integer parameters) plus a `cuda_calc_xblock_count_base` helper -- five functions in total -- purely to suppress "pointless comparison against zero" compiler warnings on unsigned integer types. The header itself documents this rationale at lines 28-32 of the pre-diff file: "This system prevents 'pointless comparison against zero' warnings from the compiler for unsigned types (simpler ways of suppressing this warning didn't work) while maintaining the various warnings." The "simpler ways didn't work" comment dates from before `if constexpr` was widely available. With C++17/C++20 the entire five-function tower collapses to **one** function template using `if constexpr` to gate the signed-only `>= 0` checks at compile time. The unused branch is discarded entirely so no warning is emitted on unsigned types. `cuda_calc_block_count` (the y/z-dim wrapper that adds the 65535 cap) is similarly trimmed to a 4-line template that delegates to `cuda_calc_xblock_count`. Net effect: - Five functions reduced to two. - File length: ~155 lines -> ~85 lines (~45% reduction). - Public API unchanged: same names, same return types, same observable behaviour. TORCH_CHECK messages match the originals verbatim. - Behaviour-preserving: every existing caller across fbgemm_gpu continues to compile and produces the same `uint32_t` result. This is a prep diff for an upcoming change that introduces a `determine_grid_blocks` helper (with a `BlockCapPolicy` enum) on top of these primitives. Folding the SFINAE tower now keeps that follow-up diff's helper signature minimal. Reviewed By: spcyppt Differential Revision: D106262731

meta-codesync · 2026-05-26T19:29:52Z

@q10 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D106262731.

meta-codesync · 2026-05-26T20:26:12Z

This pull request has been merged in 15f7c24.

meta-cla Bot added the cla signed label May 26, 2026

meta-codesync Bot added fb-exported meta-exported labels May 26, 2026

meta-codesync Bot closed this in 15f7c24 May 26, 2026

facebook-github-tools Bot added the Merged label May 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify cuda_calc_xblock_count / cuda_calc_block_count via if constexpr#5783

Simplify cuda_calc_xblock_count / cuda_calc_block_count via if constexpr#5783
q10 wants to merge 1 commit into
pytorch:mainfrom
q10:export-D106262731

q10 commented May 26, 2026

Uh oh!

meta-codesync Bot commented May 26, 2026

Uh oh!

meta-codesync Bot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

q10 commented May 26, 2026

Uh oh!

meta-codesync Bot commented May 26, 2026

Uh oh!

meta-codesync Bot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant