[cutlass backend] forward fix of standalone runner for fbcode #147158

henrylhtsang · 2025-02-14T00:58:04Z

Stack from ghstack (oldest at bottom):

-> [cutlass backend] forward fix of standalone runner for fbcode #147158

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2025-02-14T00:58:08Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/147158

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit aeb6482 with merge base 302f56a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: a9462fc Pull Request resolved: #147158

henrylhtsang · 2025-02-14T01:22:11Z

@desertfire actually would be good if we can just remove this https://github.com/pytorch/pytorch/blob/main/torch/_inductor/codecache.py#L2844

…ode" Also only doing mixed mm for A100. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

chenyang78 · 2025-02-14T08:48:18Z

test/inductor/test_cutlass_backend.py

+                # error while loading shared libraries: IX}: invalid mode for dlopen(): Invalid argument
+                platform_path = sysconfig.get_config_var("LIBDIR")
+                link_str = " ".join(
+                    [f"-L{platform_path}", "-Xlinker", f"-rpath={platform_path}"]


Hmm, does this only affect the standalone runner for the unittests? What about the cases where we compile a model with cutlass backend in fbcode?

@chenyang78 my guess is, we are build an executable this time, and all the previous time we were building a .so file....

alexsamardzic · 2025-02-14T09:27:03Z

test/inductor/test_cutlass_backend.py

-    @unittest.skipIf(not SM80OrLater, "need sm_80")
+    @unittest.skipIf(not SM80OrLater or SM90OrLater, "need sm_8x exactly")


What's the reason for disabling test for SM90 and higher? I tested on H100 machine, and it worked fine.

@alexsamardzic good question not 100% sure

I think I broke it in #146877. Forward fix preping in #147185

EDIT: on the other hand, I saw that the mixed mm test is for A100 only. Is that true?

I believe it should work on Hopper too, as CUTLASS 3.x kernels are too put in the mix when auto-tuning for mixed MM, see here. As mentioned above, it worked when I tested on H100, i.e. it generated standalone runner with CUTLASS 3.x kernel instantiated. Alternatively, we could use regular MM for SM90 and above, the point of this test is just to have some kind of C++ program, generated with standalone runner, to compile, and run without errors.

For Ampere, auto-tuning is implemented only for mixed MM and 2:4 sparse MM (this could be tracked by searching for CUTLASS2xGemmTemplate.add_cutlass_gemm_choices() calls in torch/_inductor/kernel/mm.py), this is why I put mixed MM in this test. You asked previously about the rationale for supporting Ampere at all - the main motivation for having these two operations supported was that these were not supported by Triton. I think 2:4 sparse MM is not yet supported, while if I remember correctly mixed MM was not supported back at the time, and also CUTLASS would oftentimes be picked by auto-tuner over Triton, i.e. oftentimes CUTLASS mixed MM kernel would be faster.

Beware that CUTLASS2xGemmTemplate is not supporting bias for 2:4 sparse MM, but instead it has metadata tensor (this tensor contains information about the sparsity pattern of the 2:4 sparse weight matrix) handling hacked in. I think it should not affect your changes, but it's something to keep on mind.

In general, this whole CUTLASS "back-end" for auto-tuning is rather fragile, and hard to extend, in its current state. As in the meantime CUTLASS provided solid support for generating C++ kernels from plain Python description of the intended operation, see here, I was planning to try to completely re-write this stuff, but at the moment I don't know if/when that could be. Still, if that path is approached, either by myself or someone else, it's very important that we keep the tests as complete as possible.

re: mixed mm test on H100, when I run it on H100 with TORCH_LOGS="+inductor" and force disable cache, I see "No suitable Cutlass GEMM configs found, fallbacks used ". That is without #146877. I also suspect autotune_fallback_to_aten=False doesn't work for mixed mm.

EDIT: I think its related to the alignment. If things are divisible by 16 they can work

re "In general, this whole CUTLASS "back-end" for auto-tuning is rather fragile, and hard to extend, in its current state. " I agree. Moving to python would be nice. I think this particular test can remain in A100 just for repro purpose.

I also suspect autotune_fallback_to_aten=False doesn't work for mixed mm.

The mixed MM has a rather convoluted logic for selection of auto-tuning back-ends to use, see here. I don't know why it's this way, but apparently it goes against some of the config settings.

…ode" Also only doing mixed mm for A100. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

…ode" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

henrylhtsang · 2025-02-14T20:41:12Z

@alexsamardzic I updated it, now it should run on both A100 and H100.

A few small action items:

update the mixed_mm test to use maybe (128, 16, 128) so it can run on H100
Add fallback to aten logic to tuned_mixed_mm

…ode" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

ghstack-source-id: b41dc7c Pull Request resolved: #147158

henrylhtsang · 2025-02-18T22:00:29Z

@pytorchbot merge -i

pytorchmergebot · 2025-02-18T22:02:08Z

Merge started

Your change will be merged while ignoring the following 0 checks:

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…147409) Summary: I think the path is not needed anymore. It was added in #126408, but it has been a while since then. See if CI complains. Differential Revision: D69573185 See also #147158 Pull Request resolved: #147409 Approved by: https://github.com/chenyang78

Pull Request resolved: #147158 Approved by: https://github.com/chenyang78

…147409) Summary: I think the path is not needed anymore. It was added in #126408, but it has been a while since then. See if CI complains. Differential Revision: D69573185 See also #147158 Pull Request resolved: #147409 Approved by: https://github.com/chenyang78

Pull Request resolved: #147158 Approved by: https://github.com/chenyang78

…147409) Summary: I think the path is not needed anymore. It was added in #126408, but it has been a while since then. See if CI complains. Differential Revision: D69573185 See also #147158 Pull Request resolved: #147409 Approved by: https://github.com/chenyang78

…h#147158) Pull Request resolved: pytorch#147158 Approved by: https://github.com/chenyang78

…ytorch#147409) Summary: I think the path is not needed anymore. It was added in pytorch#126408, but it has been a while since then. See if CI complains. Differential Revision: D69573185 See also pytorch#147158 Pull Request resolved: pytorch#147409 Approved by: https://github.com/chenyang78

[cutlass backend] forward fix of standalone runner for fbcode

395409c

[ghstack-poisoned]

This was referenced Feb 14, 2025

[cutlass backend][BE] refactor tests to remove duplicate logic #146743

Closed

For addmm and bmm, check if config.autotune_fallback_to_aten before using aten as a fallback. Also fix bmm cutlass backend #147148

Closed

pytorch-bot bot added ciflow/inductor module: inductor topic: not user facing topic category labels Feb 14, 2025

henrylhtsang added a commit that referenced this pull request Feb 14, 2025

[cutlass backend] forward fix of standalone runner for fbcode

b22bb85

ghstack-source-id: a9462fc Pull Request resolved: #147158

henrylhtsang requested review from huydhn, alexsamardzic, chenyang78 and ColinPeppler February 14, 2025 01:19

This was referenced Feb 14, 2025

[cutlass backend] remove triton from most tests and add an integration test #147169

Closed

[cutlass backend] add subproc tests #147173

Closed

henrylhtsang mentioned this pull request Feb 14, 2025

Add numerical tests for speciality ops #147178

Closed

chenyang78 reviewed Feb 14, 2025

View reviewed changes

alexsamardzic reviewed Feb 14, 2025

View reviewed changes

henrylhtsang added 3 commits February 14, 2025 11:48

Update on "[cutlass backend] forward fix of standalone runner for fbc…

6e6c92a

…ode" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

Update on "[cutlass backend] forward fix of standalone runner for fbc…

f7e4402

…ode" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

henrylhtsang added 3 commits February 14, 2025 18:23

Update on "[cutlass backend] forward fix of standalone runner for fbc…

d1d829b

…ode" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

Update on "[cutlass backend] forward fix of standalone runner for fbc…

e0d2b2f

…ode" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

Update on "[cutlass backend] forward fix of standalone runner for fbc…

aeb6482

…ode" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

henrylhtsang added a commit that referenced this pull request Feb 18, 2025

[cutlass backend] forward fix of standalone runner for fbcode

0b493c0

ghstack-source-id: b41dc7c Pull Request resolved: #147158

chenyang78 approved these changes Feb 18, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 18, 2025

pytorchmergebot added the merging label Feb 18, 2025

henrylhtsang mentioned this pull request Feb 18, 2025

[BE] remove sysconfig.get_config_var("LIBDIR") from cuda lib paths #147409

Closed

pytorchmergebot added the Merged label Feb 19, 2025

pytorchmergebot closed this in 5006932 Feb 19, 2025

pytorchmergebot removed the merging label Feb 19, 2025

Raymo111 pushed a commit that referenced this pull request Feb 20, 2025

[cutlass backend] forward fix of standalone runner for fbcode (#147158)

3e51865

Pull Request resolved: #147158 Approved by: https://github.com/chenyang78

pytorch-bot bot pushed a commit that referenced this pull request Feb 24, 2025

[cutlass backend] forward fix of standalone runner for fbcode (#147158)

f43321d

Pull Request resolved: #147158 Approved by: https://github.com/chenyang78

majing921201 pushed a commit to majing921201/pytorch that referenced this pull request Mar 4, 2025

[cutlass backend] forward fix of standalone runner for fbcode (pytorc…

fc8c20b

…h#147158) Pull Request resolved: pytorch#147158 Approved by: https://github.com/chenyang78

github-actions bot deleted the gh/henrylhtsang/10/head branch March 27, 2025 02:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[cutlass backend] forward fix of standalone runner for fbcode #147158

[cutlass backend] forward fix of standalone runner for fbcode #147158

Uh oh!

henrylhtsang commented Feb 14, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 14, 2025 •

edited

Loading

Uh oh!

henrylhtsang commented Feb 14, 2025

Uh oh!

chenyang78 Feb 14, 2025

Uh oh!

henrylhtsang Feb 14, 2025

Uh oh!

alexsamardzic Feb 14, 2025

Uh oh!

henrylhtsang Feb 14, 2025 •

edited

Loading

Uh oh!

alexsamardzic Feb 14, 2025

Uh oh!

henrylhtsang Feb 14, 2025 •

edited

Loading

Uh oh!

henrylhtsang Feb 14, 2025

Uh oh!

alexsamardzic Feb 14, 2025

Uh oh!

henrylhtsang commented Feb 14, 2025 •

edited

Loading

Uh oh!

henrylhtsang commented Feb 18, 2025

Uh oh!

pytorchmergebot commented Feb 18, 2025

Uh oh!

Uh oh!

		@unittest.skipIf(not SM80OrLater, "need sm_80")
		@unittest.skipIf(not SM80OrLater or SM90OrLater, "need sm_8x exactly")

[cutlass backend] forward fix of standalone runner for fbcode #147158

[cutlass backend] forward fix of standalone runner for fbcode #147158

Uh oh!

Conversation

henrylhtsang commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/147158

✅ No Failures

Uh oh!

henrylhtsang commented Feb 14, 2025

Uh oh!

chenyang78 Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

henrylhtsang Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

alexsamardzic Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

henrylhtsang Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexsamardzic Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

henrylhtsang Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

henrylhtsang Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

alexsamardzic Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

henrylhtsang commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henrylhtsang commented Feb 18, 2025

Uh oh!

pytorchmergebot commented Feb 18, 2025

Merge started

Uh oh!

Uh oh!

henrylhtsang commented Feb 14, 2025 •

edited

Loading

pytorch-bot bot commented Feb 14, 2025 •

edited

Loading

henrylhtsang Feb 14, 2025 •

edited

Loading

henrylhtsang Feb 14, 2025 •

edited

Loading

henrylhtsang commented Feb 14, 2025 •

edited

Loading