[Inductor] Restore original dtype for rank-0 CPU tensors #166118

blaine-rister · 2025-10-23T07:59:59Z

Problem

Inductor implicitly upcasts certain rank-0 kernel arguments from float16 to float32. Currently, this happens only on the "cpu" device, which appears to be related to float16 support in CPU Triton. However, it can also affect the behavior of GPU kernels, when a model contains tensors from multiple devices. Upcasting may be undesirable on some platforms, so users can typically disable it with the config.triton.codegen_upcast_to_fp32 flag. However, this flag was not respected by the rank-0 kernel argument codepath.

Through an improbable series of events, float32 upcasting caused an internal model to fail compilation on MTIA. (Internal reviewers see T242444110.)

Fix

If config.triton.codegen_upcast_to_fp32 evaluates to False, cast the kernel argument to the original dtype.

Test plan

Added a new CI test checking for the downcast iff the config flag is false. The test mixes GPU and CPU tensors to generate a GPU kernel with the implicit float32 upcast and explicit float16 downcast.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

pytorch-bot · 2025-10-23T08:00:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166118

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 5365419 with merge base b4fd471 ():

NEW FAILURE - The following job has failed:

trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable) (gh)
RuntimeError: doctests 1/1 failed!

This comment was automatically generated by Dr. CI and updates every 15 minutes.

blaine-rister · 2025-10-24T16:34:12Z

@pytorchbot merge -i

pytorchmergebot · 2025-10-24T16:37:49Z

Merge started

Your change will be merged while ignoring the following 0 checks:

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-10-24T17:31:00Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable)

Details for Dev Infra team

Raised by workflow job

blaine-rister · 2025-10-24T19:51:07Z

@pytorchbot merge -i

pytorchmergebot · 2025-10-24T19:53:44Z

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

avoid upcast for 0d cpu tensor

ae7362e

pytorch-bot bot added ciflow/inductor module: inductor labels Oct 23, 2025

comment

5365419

blaine-rister added the topic: not user facing topic category label Oct 23, 2025

blaine-rister requested review from eellison, jansel and kundaMwiza and removed request for jansel October 23, 2025 22:11

blaine-rister marked this pull request as ready for review October 23, 2025 23:40

blaine-rister changed the title ~~avoid upcast for 0d cpu tensor~~ [Inductor] Restore dtype for rank-0 CPU tensors Oct 23, 2025

blaine-rister changed the title ~~[Inductor] Restore dtype for rank-0 CPU tensors~~ [Inductor] Restore original dtype for rank-0 CPU tensors Oct 23, 2025

jfix71 approved these changes Oct 24, 2025

View reviewed changes

jansel approved these changes Oct 24, 2025

View reviewed changes

kundaMwiza approved these changes Oct 24, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 24, 2025

pytorchmergebot added the merging label Oct 24, 2025

pytorchmergebot removed the merging label Oct 24, 2025

pytorchmergebot added the merging label Oct 24, 2025

pytorchmergebot closed this in 0442125 Oct 24, 2025

pytorchmergebot added Merged and removed merging labels Oct 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Inductor] Restore original dtype for rank-0 CPU tensors #166118

[Inductor] Restore original dtype for rank-0 CPU tensors #166118

blaine-rister commented Oct 23, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 23, 2025 •

edited

Loading

Uh oh!

blaine-rister commented Oct 24, 2025

Uh oh!

pytorchmergebot commented Oct 24, 2025

Uh oh!

pytorchmergebot commented Oct 24, 2025

Uh oh!

blaine-rister commented Oct 24, 2025

Uh oh!

pytorchmergebot commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Inductor] Restore original dtype for rank-0 CPU tensors #166118

[Inductor] Restore original dtype for rank-0 CPU tensors #166118

Conversation

blaine-rister commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Test plan

Uh oh!

pytorch-bot bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166118

❌ 1 New Failure

Uh oh!

blaine-rister commented Oct 24, 2025

Uh oh!

pytorchmergebot commented Oct 24, 2025

Merge started

Uh oh!

pytorchmergebot commented Oct 24, 2025

Merge failed

Uh oh!

blaine-rister commented Oct 24, 2025

Uh oh!

pytorchmergebot commented Oct 24, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

blaine-rister commented Oct 23, 2025 •

edited

Loading

pytorch-bot bot commented Oct 23, 2025 •

edited

Loading