Speed up FP precision lookup #164044

lakshayg · 2025-09-27T15:56:01Z

This commit simplifies the precision lookup and setting logic
by reducing the number of branches and using a custom hash
function. Fixes #161822. The issue described in #163709 still
persists. This is meant as a short term fix.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @Lucaskabela

pytorch-bot · 2025-09-27T15:56:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164044

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 100120d with merge base 2a7c486 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

lakshayg · 2025-09-27T15:58:58Z

@pytorchbot label "topic: not user facing"

eqy

Thanks! What's the estimated performance difference between the old float32Precision and new version? IIRC this was the main issue as all matmuls (even ones that were not affected by TF32 setting) would call it

aten/src/ATen/Context.h

lakshayg · 2025-09-29T18:58:04Z

What's the estimated performance difference between the old float32Precision and new version

@eqy I used the benchmarking script you shared in #161822 and ran it 100 times for this PR and the base branch. Here's what the run time distribution looks like. It looks like the distribution has just shifted left by ~0.5 ticks.

It's roughly a 5% speedup but I think it's better to think of it as a constant 0.5 tick improvement since it's not dependent on the sizes of the matrices involved.

eqy · 2025-10-01T17:31:41Z

@pytorchmergebot merge

pytorchmergebot · 2025-10-01T17:33:43Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-10-01T23:32:05Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

cyyever · 2025-10-02T00:57:08Z

@pytorchmergebot merge

swolchok · 2025-10-02T18:34:42Z

aten/src/ATen/Context.h

+    }
  };

+  std::unordered_map<


this can be improved further by using nested arrays as in #164387

Yeah, I considered doing that but decided not to since the hastable approach is immune to the possibility of someone accidentally changing the enum order or forgetting to update the mapping when they add/remove backends.

I also think that this design needs to be revisited (#163709) and wanted to avoid potentially throw-away work.

It's a valid comment though. Feel free to take it up in a separate PR.

aten/src/ATen/Context.h

yangw-dev · 2025-10-02T20:56:47Z

@pytorchbot revert -c ghfirst -m "broke internal build In file included from xplat/caffe2/aten/src/ATen/DeviceAccelerator.cpp:1:
xplat/caffe2/aten/src/ATen/Context.h:502:38: error: shift count >= width of type [-Werror,-Wshift-count-overflow]
502 | return std::hash<size_t>{}((k1 << 32) | k2);"

pytorch-bot · 2025-10-02T20:56:50Z

❌ 🤖 pytorchbot command failed:

Got EOF while in a quoted string```
Try `@pytorchbot --help` for more info.

yangw-dev · 2025-10-02T20:57:01Z

@pytorchbot revert -c ghfirst -m "broke internal build In file included from xplat/caffe2/aten/src/ATen/DeviceAccelerator.cpp:1: xplat/caffe2/aten/src/ATen/Context.h:502:38: error: shift count >= width of type [-Werror,-Wshift-count-overflow] 502 | return std::hash<size_t>{}((k1 << 32) | k2);"
sdfdsf

pytorchmergebot · 2025-10-02T21:00:30Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

This reverts commit 723ba21. Reverted #164044 on behalf of https://github.com/yangw-dev due to broke internal build In file included from xplat/caffe2/aten/src/ATen/DeviceAccelerator.cpp:1: xplat/caffe2/aten/src/ATen/Context.h:502:38: error: shift count >= width of type [-Werror,-Wshift-count-overflow] 502 | return std::hash<size_t>{}((k1 << 32) | k2); ([comment](#164044 (comment)))

pytorchmergebot · 2025-10-02T21:00:47Z

@lakshayg your PR has been successfully reverted.

This commit simplifies the precision lookup and setting logic by reducing the number of branches and using a custom hash function.

The buggy implementation could return "bf16" for "cuda" backend which is an unsupported combination.

ngimel · 2025-10-03T21:27:51Z

@pytorchbot merge

pytorchmergebot · 2025-10-03T21:29:38Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This commit simplifies the precision lookup and setting logic by reducing the number of branches and using a custom hash function. Fixes pytorch#161822. The issue described in pytorch#163709 still persists. This is meant as a short term fix. Pull Request resolved: pytorch#164044 Approved by: https://github.com/ngimel, https://github.com/eqy

pytorch-bot bot added the topic: not user facing topic category label Sep 27, 2025

pytorchbot added the open source label Sep 27, 2025

lakshayg force-pushed the fp_precision_settting_perf_fix branch 2 times, most recently from b315aff to 3bf567d Compare September 28, 2025 16:18

eqy requested a review from ngimel September 29, 2025 16:04

eqy reviewed Sep 29, 2025

View reviewed changes

ngimel reviewed Sep 29, 2025

View reviewed changes

aten/src/ATen/Context.h Outdated Show resolved Hide resolved

jerryzh168 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 29, 2025

lakshayg requested review from syed-ahmed and sraikund16 as code owners September 30, 2025 06:08

pytorch-bot bot added ciflow/inductor module: cpu CPU specific problem (e.g., perf, algorithm) module: dynamo labels Sep 30, 2025

ngimel approved these changes Sep 30, 2025

View reviewed changes

lakshayg force-pushed the fp_precision_settting_perf_fix branch from e6ebdf3 to e658958 Compare September 30, 2025 19:17

lakshayg mentioned this pull request Sep 30, 2025

Performance improvements to Context::float32Precision #163306

Closed

lakshayg added this to PyTorch + CUDA Oct 1, 2025

lakshayg self-assigned this Oct 1, 2025

lakshayg moved this to In Progress in PyTorch + CUDA Oct 1, 2025

eqy approved these changes Oct 1, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 1, 2025

pytorchmergebot added the merging label Oct 1, 2025

lakshayg mentioned this pull request Oct 1, 2025

Inconsistent handling of backend floating point precision settings (related to PR #125888) #163709

Open

pytorchmergebot closed this in 723ba21 Oct 2, 2025

github-project-automation bot moved this from In Progress to Done in PyTorch + CUDA Oct 2, 2025

pytorchmergebot removed the merging label Oct 2, 2025

swolchok reviewed Oct 2, 2025

View reviewed changes

swolchok mentioned this pull request Oct 2, 2025

Improve efficiency of at::float32Precision APIs #164387

Closed

ngimel reviewed Oct 2, 2025

View reviewed changes

aten/src/ATen/Context.h Outdated Show resolved Hide resolved

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Oct 2, 2025

pytorchmergebot reopened this Oct 2, 2025

pytorchmergebot requested a review from Aidyn-A as a code owner October 2, 2025 21:00

lakshayg added 6 commits October 2, 2025 14:02

Speed up fp precision lookup

1773d93

This commit simplifies the precision lookup and setting logic by reducing the number of branches and using a custom hash function.

Fix fallback logic

a233b45

The buggy implementation could return "bf16" for "cuda" backend which is an unsupported combination.

Fix default precision settings

0050f5d

Use enums for FP32 precision settings

c44778e

Fix python binding for _get_fp32_precision_getter

f332fe8

use c10::hash

100120d

lakshayg force-pushed the fp_precision_settting_perf_fix branch from e658958 to 100120d Compare October 2, 2025 21:43

pytorchmergebot added the merging label Oct 3, 2025

pytorchmergebot closed this in f006aee Oct 3, 2025

pytorchmergebot removed the merging label Oct 3, 2025

Speed up FP precision lookup #164044

Speed up FP precision lookup #164044

Uh oh!

Conversation

lakshayg commented Sep 27, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164044

✅ No Failures

Uh oh!

lakshayg commented Sep 27, 2025

Uh oh!

eqy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lakshayg commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eqy commented Oct 1, 2025

Uh oh!

pytorchmergebot commented Oct 1, 2025

Merge started

Uh oh!

pytorchmergebot commented Oct 1, 2025

Uh oh!

cyyever commented Oct 2, 2025

Uh oh!

swolchok Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lakshayg Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yangw-dev commented Oct 2, 2025

Uh oh!

pytorch-bot bot commented Oct 2, 2025

Uh oh!

yangw-dev commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorchmergebot commented Oct 2, 2025

Uh oh!

pytorchmergebot commented Oct 2, 2025

Uh oh!

ngimel commented Oct 3, 2025

Uh oh!

pytorchmergebot commented Oct 3, 2025

Merge started

Uh oh!

Uh oh!

lakshayg commented Sep 27, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Sep 27, 2025 •

edited

Loading

lakshayg commented Sep 29, 2025 •

edited

Loading

swolchok Oct 2, 2025 •

edited

Loading

yangw-dev commented Oct 2, 2025 •

edited

Loading