[Inductor] Add 0 initialization to Triton masked loads #127311

alexbaden · 2024-05-28T16:51:14Z

For a masked tl.load operation, the Triton language specifies that values masked out (i.e. where the mask evaluates to false) are undefined in the output of the load. Triton provides an optional other parameter which, when included, provides an explicit value to use for masked out values from the load. If the output from a masked load without the other parameter is used in a conditional, unexpected behavior can occur.

Despite the language specification, all Triton backends currently in use by PyTorch Inductor (NVIDIA, AMD, and Intel) 0-initialize masked loads if other is not present (we recently changed the Intel backend behavior to match NVIDIA and AMD because that's what our users expect, even if we are not following the Triton spec to the tee). This PR attempts to "future-proof" Inductor for new backends (or perhaps changes in the current backends? - we did not see any performance change from 0-initializing in the Intel XPU backend but one could imagine compiler optimizations to remove paths that depend on undefined) to add an explicit other in instances where later conditionals depend on the tl.load output. I also removed an exception to other behavior for boolean loads, which was put in place for a Triton bug that should be fixed. I added other to the getting started documentation as a clue that masked load behavior requires explicit initialization if, even though I don't expect undef values to cause the example code to fail if the underlying output is not 0-initialized. Finally, I added other to the make_load function in select_algorithm.py, though I wasn't able to determine if that function was actually being called.

Fixes #126535

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

pytorch-bot · 2024-05-28T16:51:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/127311

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Upgrade MacOS runner to 14

✅ You can merge normally! (2 Unrelated Failures)

As of commit 11e6287 with merge base cbb79a2 ():

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (inductor_torchbench_cpu_smoketest_perf, 1, 1, linux.24xl.spr-metal, unstable) (gh) (#126993)
Process completed with exit code 1.
pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 4, 5, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) ()
inductor/test_efficient_conv_bn_eval.py::EfficientConvBNEvalCudaTests::test_basic_cuda

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jansel · 2024-05-29T21:07:26Z

@pytorchbot merge

pytorchmergebot · 2024-05-29T21:09:33Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-05-29T22:15:39Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-13-py3-arm64 / test (default, 1, 3, macos-m1-stable)

Details for Dev Infra team

Raised by workflow job

jansel · 2024-05-30T03:56:21Z

@pytorchbot merge -i

pytorchmergebot · 2024-05-30T03:58:09Z

Merge started

Your change will be merged while ignoring the following 2 checks: pull / linux-focal-cuda12.4-py3.10-gcc9-sm86 / test (default, 4, 5, linux.g5.4xlarge.nvidia.gpu, unstable), inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (inductor_torchbench_cpu_smoketest_perf, 1, 1, linux.24xl.spr-metal, unstable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

For a masked `tl.load` operation, the Triton language specifies that values masked out (i.e. where the mask evaluates to false) are undefined in the output of the load. Triton provides an optional `other` parameter which, when included, provides an explicit value to use for masked out values from the load. If the output from a masked load without the `other` parameter is used in a conditional, unexpected behavior can occur. Despite the language specification, all Triton backends currently in use by PyTorch Inductor (NVIDIA, AMD, and Intel) 0-initialize masked loads if `other` is not present (we recently changed the Intel backend behavior to match NVIDIA and AMD because that's what our users expect, even if we are not following the Triton spec to the tee). This PR attempts to "future-proof" Inductor for new backends (or perhaps changes in the current backends? - we did not see any performance change from 0-initializing in the Intel XPU backend but one could imagine compiler optimizations to remove paths that depend on undefined) to add an explicit `other` in instances where later conditionals depend on the `tl.load` output. I also removed an exception to `other` behavior for boolean loads, which was put in place for a Triton bug that should be fixed. I added `other` to the getting started documentation as a clue that masked load behavior requires explicit initialization if, even though I don't expect `undef` values to cause the example code to fail if the underlying output is not 0-initialized. Finally, I added other to the `make_load` function in `select_algorithm.py`, though I wasn't able to determine if that function was actually being called. Fixes #126535 Pull Request resolved: #127311 Approved by: https://github.com/jansel

Add 0 initialization to masked loads

17ca693

pytorch-bot bot added the module: inductor label May 28, 2024

pytorchbot added the open source label May 28, 2024

link fixup

11e6287

soulitzer requested a review from jansel May 29, 2024 15:18

soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 29, 2024

jansel added release notes: inductor ciflow/inductor labels May 29, 2024

jansel approved these changes May 29, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 29, 2024

pytorchmergebot added the merging label May 29, 2024

pytorchmergebot removed the merging label May 29, 2024

pytorchmergebot added the merging label May 30, 2024

pytorchmergebot added the Merged label May 30, 2024

pytorchmergebot closed this in 5d316c8 May 30, 2024

pytorchmergebot removed the merging label May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Inductor] Add 0 initialization to Triton masked loads #127311

[Inductor] Add 0 initialization to Triton masked loads #127311

Uh oh!

alexbaden commented May 28, 2024 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented May 28, 2024 •

edited

Loading

Uh oh!

jansel commented May 29, 2024

Uh oh!

pytorchmergebot commented May 29, 2024

Uh oh!

pytorchmergebot commented May 29, 2024

Uh oh!

jansel commented May 30, 2024

Uh oh!

pytorchmergebot commented May 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Inductor] Add 0 initialization to Triton masked loads #127311

[Inductor] Add 0 initialization to Triton masked loads #127311

Uh oh!

Conversation

alexbaden commented May 28, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/127311

❗ 1 Active SEVs

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

jansel commented May 29, 2024

Uh oh!

pytorchmergebot commented May 29, 2024

Merge started

Uh oh!

pytorchmergebot commented May 29, 2024

Merge failed

Uh oh!

jansel commented May 30, 2024

Uh oh!

pytorchmergebot commented May 30, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

alexbaden commented May 28, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented May 28, 2024 •

edited

Loading