Skip to content

Conversation

@blaine-rister
Copy link
Contributor

@blaine-rister blaine-rister commented Oct 23, 2025

Problem

Inductor implicitly upcasts certain rank-0 kernel arguments from float16 to float32. Currently, this happens only on the "cpu" device, which appears to be related to float16 support in CPU Triton. However, it can also affect the behavior of GPU kernels, when a model contains tensors from multiple devices. Upcasting may be undesirable on some platforms, so users can typically disable it with the config.triton.codegen_upcast_to_fp32 flag. However, this flag was not respected by the rank-0 kernel argument codepath.

Through an improbable series of events, float32 upcasting caused an internal model to fail compilation on MTIA. (Internal reviewers see T242444110.)

Fix

If config.triton.codegen_upcast_to_fp32 evaluates to False, cast the kernel argument to the original dtype.

Test plan

Added a new CI test checking for the downcast iff the config flag is false. The test mixes GPU and CPU tensors to generate a GPU kernel with the implicit float32 upcast and explicit float16 downcast.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 23, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166118

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 5365419 with merge base b4fd471 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@blaine-rister blaine-rister added the topic: not user facing topic category label Oct 23, 2025
@blaine-rister blaine-rister requested review from eellison, jansel and kundaMwiza and removed request for jansel October 23, 2025 22:11
@blaine-rister blaine-rister marked this pull request as ready for review October 23, 2025 23:40
@blaine-rister blaine-rister changed the title avoid upcast for 0d cpu tensor [Inductor] Restore dtype for rank-0 CPU tensors Oct 23, 2025
@blaine-rister blaine-rister changed the title [Inductor] Restore dtype for rank-0 CPU tensors [Inductor] Restore original dtype for rank-0 CPU tensors Oct 23, 2025
@blaine-rister
Copy link
Contributor Author

@pytorchbot merge -i

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 24, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 0 checks:

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable)

Details for Dev Infra team Raised by workflow job

@blaine-rister
Copy link
Contributor Author

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants