Refactor bounds_check_indices offset checks to condition-first (Phase 1) (#5682) by gchalump · Pull Request #5682 · pytorch/FBGEMM

gchalump · 2026-04-23T17:43:48Z

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/2624

Phase 1 of the bounds_check_indices race-condition cleanup stack.

This is a pure refactor — no behavior change. The per-b_t offset validation in both v1 and v2 CUDA kernels is rewritten from mode-first to condition-first.

Before (mode-first):

if (FATAL) { asserts }
else if (WARNING) { if (bad) { warn + adjust } }
else if (IGNORE) { adjust unconditionally }

After (condition-first):

if (bad) {
  if (FATAL) { asserts }
  else { if (WARNING) { warn } adjust }
}

Why: the mode-first form had IGNORE call adjust_offset_kernel on every (b, t) pair, even when offsets were valid — wasted writes plus needless contention on the offsets buffer. Condition-first runs adjustment only when bounds are actually violated.

Stack:

(this) Phase 1 — Condition-first refactor
Phase 1.5 — Gate offset correction on lane 0 + shfl_sync broadcast
Phase 2 — Make bad offsets unconditionally fatal

Reviewed By: q10

Differential Revision: D101718260

meta-codesync · 2026-04-23T17:43:56Z

@gchalump has exported this pull request. If you are a Meta employee, you can view the originating Diff in D101718260.

Summary: This diff bundles the two pieces needed to benchmark the bounds_check_indices stack across architectures, so a single diff number can be forwarded to collaborators (NVIDIA A100/H100/B200, AMD MI300/MI350) for cross-arch perf testing before any of the stack lands. # Part 1: --bounds-check-version flag Adds a `--bounds-check-version` CLI flag to the `bounds-check-indices` subcommand of `fbcode/deeplearning/fbgemm/fbgemm_gpu/bench/tbe/tbe_utils_benchmark.py`, plumbed through to `torch.ops.fbgemm.bounds_check_indices` as the `bounds_check_version` kwarg (which already exists in the C++ op signature). Without this, v2 was only reachable by either flipping the `BOUNDS_CHECK_INDICES_V2` JK gate (requires JK perms; not available to OSS/external runners) or manually editing the bench file inline (fragile, doesn't survive `sl goto`). Default `1` preserves existing bench behavior (no behavior change for any current caller). # Part 2: portable bench tooling Three bash scripts in `fbcode/ai_codesign/nonprod/gchalump/scripts/fbgemm/`: 1. **run_bounds_check_indices_bench.sh** — inner runner. Sweeps {modes} × {versions} × {trials} on the currently-checked-out commit. Captures `system.json` (GPU/driver/CUDA/host/checkout). Mirrors the arch dispatch (`-c h100`/`-c mi350`/`-c b200`) and `--gpu N` (sets both `CUDA_VISIBLE_DEVICES` and `HIP_VISIBLE_DEVICES`) conventions from supadchaya's existing TBE bench scripts. Auto-falls-back to v1-only on checkouts that predate Part 1. 2. **run_bounds_check_indices_on_stack.sh** — outer driver. Loops `sl goto` across {Baseline, Phase 0.5, 1, 1.5, 2, 3}, runs the inner per phase, restores original checkout via EXIT trap. Forwards all bench flags. Pre-pulls all diffs to avoid mid-sweep network calls. Auto-runs the analyzer at the end. 3. **analyze_bounds_check_results.sh** — log parser + reporter (pure bash). Emits `results.csv` and `summary.md` (median per cell with Δ T vs Baseline, per kernel version). Uploads both to pastry by default. # Why folded into one diff Originally Part 1 was the bottom of the bounds_check stack and Part 2 was a parallel branch off master (D102682427, now abandoned). Folded into one diff so collaborators only need to remember one diff number to reproduce the full benchmark. Differential Revision: D102675932

… 1) (pytorch#5682) Summary: X-link: facebookresearch/FBGEMM#2624 Phase 1 of the bounds_check_indices race-condition cleanup stack. This is a pure refactor — no behavior change. The per-b_t offset validation in both v1 and v2 CUDA kernels is rewritten from mode-first to condition-first. **Before (mode-first):** ``` if (FATAL) { asserts } else if (WARNING) { if (bad) { warn + adjust } } else if (IGNORE) { adjust unconditionally } ``` **After (condition-first):** ``` if (bad) { if (FATAL) { asserts } else { if (WARNING) { warn } adjust } } ``` **Why**: the mode-first form had IGNORE call `adjust_offset_kernel` on every (b, t) pair, even when offsets were valid — wasted writes plus needless contention on the offsets buffer. Condition-first runs adjustment only when bounds are actually violated. **Stack:** - (this) Phase 1 — Condition-first refactor - Phase 1.5 — Gate offset correction on lane 0 + shfl_sync broadcast - Phase 2 — Make bad offsets unconditionally fatal Reviewed By: q10 Differential Revision: D101718260

meta-codesync · 2026-05-01T00:51:16Z

This pull request has been merged in b6f2db1.

meta-cla Bot added the cla signed label Apr 23, 2026

meta-codesync Bot added fb-exported meta-exported labels Apr 23, 2026

gchalump added 2 commits April 28, 2026 16:24

meta-codesync Bot changed the title ~~Refactor bounds_check_indices offset checks to condition-first (Phase 1)~~ Refactor bounds_check_indices offset checks to condition-first (Phase 1) (#5682) Apr 28, 2026

gchalump force-pushed the export-D101718260 branch from 56a9d90 to e53328b Compare April 28, 2026 23:24

meta-codesync Bot closed this in b6f2db1 May 1, 2026

facebook-github-tools Bot added the Merged label May 1, 2026

gchalump added category:improvement contributor:Meta feature:tbessd labels May 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor bounds_check_indices offset checks to condition-first (Phase 1) (#5682)#5682

Refactor bounds_check_indices offset checks to condition-first (Phase 1) (#5682)#5682
gchalump wants to merge 2 commits into
pytorch:mainfrom
gchalump:export-D101718260

gchalump commented Apr 23, 2026 •

edited by meta-codesync Bot

Loading

Uh oh!

meta-codesync Bot commented Apr 23, 2026

Uh oh!

meta-codesync Bot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gchalump commented Apr 23, 2026 • edited by meta-codesync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync Bot commented Apr 23, 2026

Uh oh!

meta-codesync Bot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gchalump commented Apr 23, 2026 •

edited by meta-codesync Bot

Loading