Add split-K control to cuBLAS reduced-precision settings #164766

ngimel · 2025-10-06T19:15:16Z

Summary

add a CuBLASReductionOption enum so the CUDA context can track reduced-precision and split-K options
extend the Python bindings, backend helpers, and docs to accept an optional allow_splitk argument for fp16/bf16 matmul controls
update cuBLAS/cuBLASLt call sites plus dynamo guards and tests to respect the new combinations

Testing

python test/test_cuda.py TestCuda.test_cublas_allow_fp16_reduced_precision_reduction_get_set -v (fails: ModuleNotFoundError: No module named 'psutil')

https://chatgpt.com/codex/tasks/task_e_68e404623178832f8a3e1d34e1e175da

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @Lucaskabela

pytorch-bot · 2025-10-06T19:15:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164766

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit cf5223c with merge base 4a6abba ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / inductor-test / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
vision_maskrcnn

This comment was automatically generated by Dr. CI and updates every 15 minutes.

eqy · 2025-10-07T16:46:06Z

torch/csrc/Module.cpp

+      option == at::CuBLASReductionOption::AllowReducedPrecisionWithSplitK;
+  bool allow_splitk = option !=
+      at::CuBLASReductionOption::DisallowReducedPrecisionDisallowSplitK;
+  return Py_BuildValue("(pp)", allow_reduced_precision, allow_splitk);


maybe it wants something like Py_BuildValue("(OO)", allow_reduced_precision ? Py_True : Py_False, allow_splitk ? Py_True : Py_False);

malfet

LGTM, though I guess @albanD would be a bit unhappy that it does not follow accelerator generic abstraction

albanD

That sounds good to me, this is quite specific for now, so it's ok to have a oneoff API and we can extend it later as needed.

albanD · 2025-10-08T18:27:57Z

torch/backends/cuda/__init__.py

+
+        if isinstance(value, bool):
+            return value, True
+        if isinstance(value, (list, tuple)):


nit: the doc says tuple-only

albanD · 2025-10-08T18:29:08Z

torch/backends/cuda/__init__.py

+            )
+
+        if isinstance(value, bool):
+            return value, True


This should query the current split_k value and not force override it to True?

If we want to preserve current behavior we should force-override it to True

ngimel · 2025-10-08T18:40:26Z

@pytorchbot merge

pytorchmergebot · 2025-10-08T18:42:37Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Add split-K control to cuBLAS reduced-precision settings

b6e2679

ngimel requested review from Aidyn-A, eqy and syed-ahmed as code owners October 6, 2025 19:15

ngimel added the codex label Oct 6, 2025 — with ChatGPT Codex Connector

pytorch-bot bot added ciflow/inductor module: dynamo labels Oct 6, 2025

ngimel marked this pull request as draft October 6, 2025 19:17

ngimel changed the title ~~Add split-K control to cuBLAS reduced-precision settings~~ [WIP] Add split-K control to cuBLAS reduced-precision settings Oct 6, 2025

ngimel added the topic: not user facing topic category label Oct 6, 2025

ngimel force-pushed the codex/enhance-cuda.matmul-with-allow_splitk-argument branch 5 times, most recently from 3f16d22 to 1f30115 Compare October 7, 2025 02:50

ngimel changed the title ~~[WIP] Add split-K control to cuBLAS reduced-precision settings~~ Add split-K control to cuBLAS reduced-precision settings Oct 7, 2025

ngimel marked this pull request as ready for review October 7, 2025 04:50

ngimel added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 7, 2025

ngimel force-pushed the codex/enhance-cuda.matmul-with-allow_splitk-argument branch from 1f30115 to 52dbd40 Compare October 7, 2025 05:44

eqy reviewed Oct 7, 2025

View reviewed changes

ngimel force-pushed the codex/enhance-cuda.matmul-with-allow_splitk-argument branch from 52dbd40 to 5f0d23d Compare October 7, 2025 18:33

lint; cleanup

cf5223c

ngimel force-pushed the codex/enhance-cuda.matmul-with-allow_splitk-argument branch from 5f0d23d to cf5223c Compare October 7, 2025 18:42

malfet approved these changes Oct 8, 2025

View reviewed changes

albanD approved these changes Oct 8, 2025

View reviewed changes

pytorchmergebot added the merging label Oct 8, 2025

pytorchmergebot added the Merged label Oct 8, 2025

pytorchmergebot closed this in 37c6087 Oct 8, 2025

pytorchmergebot removed the merging label Oct 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add split-K control to cuBLAS reduced-precision settings #164766

Add split-K control to cuBLAS reduced-precision settings #164766

Uh oh!

ngimel commented Oct 6, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Oct 6, 2025 •

edited

Loading

Uh oh!

eqy Oct 7, 2025

Uh oh!

malfet left a comment

Uh oh!

albanD left a comment

Uh oh!

albanD Oct 8, 2025

Uh oh!

albanD Oct 8, 2025

Uh oh!

ngimel Oct 8, 2025

Uh oh!

albanD Oct 8, 2025

Uh oh!

ngimel commented Oct 8, 2025

Uh oh!

pytorchmergebot commented Oct 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add split-K control to cuBLAS reduced-precision settings #164766

Add split-K control to cuBLAS reduced-precision settings #164766

Uh oh!

Conversation

ngimel commented Oct 6, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

pytorch-bot bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164766

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

eqy Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

albanD Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

albanD Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

ngimel Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

albanD Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

ngimel commented Oct 8, 2025

Uh oh!

pytorchmergebot commented Oct 8, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ngimel commented Oct 6, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Oct 6, 2025 •

edited

Loading