Gate division bitwise numerics under a flag #165566

PaulZhang12 · 2025-10-15T17:41:39Z

Stack from ghstack (oldest at bottom):

-> Gate division bitwise numerics under a flag #165566

#164144 ensures that division for compile is bitwise equivalent with eager. However, in #164301, the kernel performance is regressed.

On B200:
With standard triton /:
6511 GB/s

With triton div_rn:
4692 GB/s

Further investigation is required for the generated PTX to see why there is such a large slowdown. For now, enable bitwise equivalent results under TORCHINDUCTOR_EMULATE_DIVISION_ROUNDING similar to emulate_precision_cast

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

[ghstack-poisoned]

pytorch-bot · 2025-10-15T17:41:42Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165566

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3821222 with merge base f58f301 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: c28b5fc Pull Request resolved: #165566

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: 992ee00 Pull Request resolved: #165566

#164144 ensures that division for compile is bitwise equivalent with eager. However, in #164301, the kernel performance is regressed. On B200: With standard triton `/`: 6511 GB/s With triton `div_rn`: 4692 GB/s Further investigation is required for the generated PTX to see why there is such a large slowdown. For now, enable bitwise equivalent results under `TORCHINDUCTOR_EMULATE_DIVISION_ROUNDING` similar to emulate_precision_cast cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

ghstack-source-id: f6dc638 Pull Request resolved: #165566

eellison

too bad this is slow. is it crazy to change pytorch eager to use approx ?

PaulZhang12 · 2025-10-15T20:54:31Z

@pytorchbot merge

pytorchmergebot · 2025-10-15T20:56:30Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Gate division bitwise numerics under a flag

7d68c4f

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor labels Oct 15, 2025

PaulZhang12 added a commit that referenced this pull request Oct 15, 2025

Gate division bitwise numerics under a flag

6e44203

ghstack-source-id: c28b5fc Pull Request resolved: #165566

Update on "Gate division bitwise numerics under a flag"

cdede6d

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov coconutruben [ghstack-poisoned]

PaulZhang12 added a commit that referenced this pull request Oct 15, 2025

Gate division bitwise numerics under a flag

98e959d

ghstack-source-id: 992ee00 Pull Request resolved: #165566

PaulZhang12 added the topic: not user facing topic category label Oct 15, 2025

PaulZhang12 requested review from eellison and ngimel October 15, 2025 18:07

PaulZhang12 added a commit that referenced this pull request Oct 15, 2025

Gate division bitwise numerics under a flag

9e1245c

ghstack-source-id: f6dc638 Pull Request resolved: #165566

ngimel approved these changes Oct 15, 2025

View reviewed changes

eellison approved these changes Oct 15, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 15, 2025

pytorchmergebot added the merging label Oct 15, 2025

pytorchmergebot added the Merged label Oct 15, 2025

pytorchmergebot closed this in 901bbcb Oct 15, 2025

pytorchmergebot removed the merging label Oct 15, 2025

jeffdaily mentioned this pull request Oct 16, 2025

DISABLED test_truediv_base_not_bitwise_equivalent (__main__.CudaReproTests) #165671

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gate division bitwise numerics under a flag #165566

Gate division bitwise numerics under a flag #165566

PaulZhang12 commented Oct 15, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 15, 2025 •

edited

Loading

Uh oh!

eellison left a comment

Uh oh!

PaulZhang12 commented Oct 15, 2025

Uh oh!

pytorchmergebot commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Gate division bitwise numerics under a flag #165566

Gate division bitwise numerics under a flag #165566

Conversation

PaulZhang12 commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165566

✅ No Failures

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

PaulZhang12 commented Oct 15, 2025

Uh oh!

pytorchmergebot commented Oct 15, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PaulZhang12 commented Oct 15, 2025 •

edited

Loading

pytorch-bot bot commented Oct 15, 2025 •

edited

Loading