Skip to content

Conversation

ngimel
Copy link
Collaborator

@ngimel ngimel commented Oct 6, 2025

Summary

  • add a CuBLASReductionOption enum so the CUDA context can track reduced-precision and split-K options
  • extend the Python bindings, backend helpers, and docs to accept an optional allow_splitk argument for fp16/bf16 matmul controls
  • update cuBLAS/cuBLASLt call sites plus dynamo guards and tests to respect the new combinations

Testing

  • python test/test_cuda.py TestCuda.test_cublas_allow_fp16_reduced_precision_reduction_get_set -v (fails: ModuleNotFoundError: No module named 'psutil')

https://chatgpt.com/codex/tasks/task_e_68e404623178832f8a3e1d34e1e175da

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @Lucaskabela

Copy link

pytorch-bot bot commented Oct 6, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164766

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit cf5223c with merge base 4a6abba (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@ngimel ngimel marked this pull request as draft October 6, 2025 19:17
@ngimel ngimel changed the title Add split-K control to cuBLAS reduced-precision settings [WIP] Add split-K control to cuBLAS reduced-precision settings Oct 6, 2025
@ngimel ngimel added the topic: not user facing topic category label Oct 6, 2025
@ngimel ngimel force-pushed the codex/enhance-cuda.matmul-with-allow_splitk-argument branch 5 times, most recently from 3f16d22 to 1f30115 Compare October 7, 2025 02:50
@ngimel ngimel changed the title [WIP] Add split-K control to cuBLAS reduced-precision settings Add split-K control to cuBLAS reduced-precision settings Oct 7, 2025
@ngimel ngimel marked this pull request as ready for review October 7, 2025 04:50
@ngimel ngimel added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 7, 2025
@ngimel ngimel force-pushed the codex/enhance-cuda.matmul-with-allow_splitk-argument branch from 1f30115 to 52dbd40 Compare October 7, 2025 05:44
option == at::CuBLASReductionOption::AllowReducedPrecisionWithSplitK;
bool allow_splitk = option !=
at::CuBLASReductionOption::DisallowReducedPrecisionDisallowSplitK;
return Py_BuildValue("(pp)", allow_reduced_precision, allow_splitk);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it wants something like Py_BuildValue("(OO)", allow_reduced_precision ? Py_True : Py_False, allow_splitk ? Py_True : Py_False);

@ngimel ngimel force-pushed the codex/enhance-cuda.matmul-with-allow_splitk-argument branch from 52dbd40 to 5f0d23d Compare October 7, 2025 18:33
@ngimel ngimel force-pushed the codex/enhance-cuda.matmul-with-allow_splitk-argument branch from 5f0d23d to cf5223c Compare October 7, 2025 18:42
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though I guess @albanD would be a bit unhappy that it does not follow accelerator generic abstraction

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me, this is quite specific for now, so it's ok to have a oneoff API and we can extend it later as needed.


if isinstance(value, bool):
return value, True
if isinstance(value, (list, tuple)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the doc says tuple-only

)

if isinstance(value, bool):
return value, True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should query the current split_k value and not force override it to True?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to preserve current behavior we should force-override it to True

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, ok

@ngimel
Copy link
Collaborator Author

ngimel commented Oct 8, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants