[Inductor] Add triton.autotune support for user defined triton kernels with complex grids #112290

oulgen · 2023-10-27T21:14:45Z

Stack from ghstack (oldest at bottom):

Support calling user defined triton kernels with kernel.run #112292
-> [Inductor] Add triton.autotune support for user defined triton kernels with complex grids #112290

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler

…s with complex grids [ghstack-poisoned]

pytorch-bot · 2023-10-27T21:14:49Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112290

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 07165d9 with merge base f5088d2 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…s with complex grids ghstack-source-id: 8849bba31b2748464dc491fa72e98d12f8319c01 Pull Request resolved: #112290

…iton kernels with complex grids" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

jansel · 2023-10-28T18:05:47Z

torch/_higher_order_ops/triton_kernel_wrap.py

+        def grid_fn(meta):
+            configs = kernel.configs
+            assert len(grid) == len(configs)
+            for i, (grid_val, config) in enumerate(zip(grid, configs)):
+                guards = [
+                    f"meta['{name}'] == {val}" for name, val in config.kwargs.items()
+                ]
+                guards = " and ".join(guards)
+                if eval(guards):
+                    return grid[i]


Configs aren't actually guaranteed to be unique, since they can differ in num_stages/num_warps (but not kwargs). This might be fine though, since I think the grids will be the same for duplicates.

Calling eval() is very expensive in python, and this function is O(num-configs).

This this could be implemented more efficiently using a dict.

This might be fine though, since I think the grids will be the same for duplicates

grid computation doesn't use num_warps and num_stages so duplicates are fine, just more branches.

jansel · 2023-10-28T18:08:05Z

torch/_inductor/codegen/wrapper.py

+            grid_wrapper.writeline(f"def grid_wrapper_for_{kernel_name}(meta):")
+            assert len(grid) == len(configs)
+            with grid_wrapper.indent():
+                for i, (grid_val, conf) in enumerate(zip(grid, configs)):
+                    guards = [
+                        f"meta['{name}'] == {val}" for name, val in conf.kwargs.items()
+                    ]
+                    guards = " and ".join(guards)
+                    grid_wrapper.writeline(f"if {guards}: return grid({grid[i]})(meta)")


Could we share code with the implementation above? I think we can preprocess this into a dict, so the body is more like:

def grid(meta): return precomputed_grids[(meta["XBLOCK], meta["YBLOCK])]

One edge case I see here is

@triton.autotune( configs=[ triton.Config( {"BLOCK_SIZE_X": 128, "BLOCK_SIZE_Y": 128}, num_stages=3, num_warps=8 ), triton.Config( {"BLOCK_SIZE_X": 64}, num_stages=3, num_warps=8 ), ], key=[], )

where configs are not of equal size in terms of their BLOCK_SIZE dimensions. Is this something I should disallow?

I think we can just disallow this.

jansel · 2023-10-28T18:09:17Z

torch/_inductor/triton_heuristics.py

+        # when passed as a single tuple, calling convention order is not
+        # followed, so we need to reverse to match calling conversion order
+        numels = numels[0][::-1]


I don't understand why this change is needed, aren't we not using this function for triton kernels?

cached_autotune uses the launder function which requires everything to be 3tuple since it unpacks.

pytorch/torch/_inductor/triton_heuristics.py

Lines 316 to 319 in c14c4ef

if callable(grid):

grid_0, grid_1, grid_2 = grid(grid_meta)

else:

grid_0, grid_1, grid_2 = grid

I was using this function to convert 1tuple or 2tuple into 3 tuple. I can either write my own function to do this or is there a more pythonic way to do this converstion?

Since we are generating the grid function, can't we just make it return a 3-tuple?

…iton kernels with complex grids" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

oulgen · 2023-10-29T06:26:40Z

Explain offline to @jansel that using a dict imposes unnecessary restrictions. updated the code to only have a single exec in eager mode, and share code between eager and inductor.

oulgen · 2023-10-30T17:46:17Z

@pytorchbot merge

pytorchmergebot · 2023-10-30T17:48:08Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Pull Request resolved: #112292 Approved by: https://github.com/jansel ghstack dependencies: #112290

…s with complex grids (pytorch#112290) Pull Request resolved: pytorch#112290 Approved by: https://github.com/jansel

…112292) Pull Request resolved: pytorch#112292 Approved by: https://github.com/jansel ghstack dependencies: pytorch#112290

…s with complex grids (pytorch#112290) Pull Request resolved: pytorch#112290 Approved by: https://github.com/jansel

…112292) Pull Request resolved: pytorch#112292 Approved by: https://github.com/jansel ghstack dependencies: pytorch#112290

…s with complex grids (pytorch#112290) Pull Request resolved: pytorch#112290 Approved by: https://github.com/jansel

…112292) Pull Request resolved: pytorch#112292 Approved by: https://github.com/jansel ghstack dependencies: pytorch#112290

[Inductor] Add triton.autotune support for user defined triton kernel…

4ffbf72

…s with complex grids [ghstack-poisoned]

oulgen mentioned this pull request Oct 27, 2023

[Inductor] Add triton.autotune support for user defined triton kernels with constant/simple grids #112228

Closed

oulgen added a commit that referenced this pull request Oct 27, 2023

[Inductor] Add triton.autotune support for user defined triton kernel…

7427ccc

…s with complex grids ghstack-source-id: 8849bba31b2748464dc491fa72e98d12f8319c01 Pull Request resolved: #112290

github-actions bot added module: inductor module: dynamo ciflow/inductor labels Oct 27, 2023

oulgen requested review from jansel, zou3519 and Chillee October 27, 2023 21:15

oulgen added ciflow/trunk Trigger trunk jobs on your pull request release notes: inductor labels Oct 27, 2023

oulgen mentioned this pull request Oct 27, 2023

Support calling user defined triton kernels with kernel.run #112292

Closed

oulgen added the ciflow/rocm label Oct 28, 2023

jansel requested changes Oct 28, 2023

View reviewed changes

oulgen requested a review from jansel October 29, 2023 06:25

jansel approved these changes Oct 30, 2023

View reviewed changes

pytorchmergebot added the merging label Oct 30, 2023

pytorchmergebot added Merged and removed merging labels Oct 30, 2023

pytorchmergebot closed this in 1250032 Oct 30, 2023

pytorchmergebot pushed a commit that referenced this pull request Oct 30, 2023

Support calling user defined triton kernels with kernel.run (#112292)

219763c

Pull Request resolved: #112292 Approved by: https://github.com/jansel ghstack dependencies: #112290

facebook-github-bot deleted the gh/oulgen/18/head branch November 3, 2023 14:27

xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Nov 7, 2023

[Inductor] Add triton.autotune support for user defined triton kernel…

996d3a8

…s with complex grids (pytorch#112290) Pull Request resolved: pytorch#112290 Approved by: https://github.com/jansel

andreigh pushed a commit to andreigh/pytorch that referenced this pull request Nov 19, 2023

[Inductor] Add triton.autotune support for user defined triton kernel…

4f37821

…s with complex grids (pytorch#112290) Pull Request resolved: pytorch#112290 Approved by: https://github.com/jansel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inductor] Add triton.autotune support for user defined triton kernels with complex grids #112290

[Inductor] Add triton.autotune support for user defined triton kernels with complex grids #112290

oulgen commented Oct 27, 2023 •

edited

pytorch-bot bot commented Oct 27, 2023 •

edited

jansel Oct 28, 2023

oulgen Oct 28, 2023

jansel Oct 28, 2023

oulgen Oct 28, 2023 •

edited

jansel Oct 28, 2023

jansel Oct 28, 2023

oulgen Oct 28, 2023

jansel Oct 28, 2023

oulgen commented Oct 29, 2023

oulgen commented Oct 30, 2023

pytorchmergebot commented Oct 30, 2023

	if callable(grid):
	grid_0, grid_1, grid_2 = grid(grid_meta)
	else:
	grid_0, grid_1, grid_2 = grid

[Inductor] Add triton.autotune support for user defined triton kernels with complex grids #112290

[Inductor] Add triton.autotune support for user defined triton kernels with complex grids #112290

Conversation

oulgen commented Oct 27, 2023 • edited

pytorch-bot bot commented Oct 27, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112290

✅ No Failures

jansel Oct 28, 2023

Choose a reason for hiding this comment

oulgen Oct 28, 2023

Choose a reason for hiding this comment

jansel Oct 28, 2023

Choose a reason for hiding this comment

oulgen Oct 28, 2023 • edited

Choose a reason for hiding this comment

jansel Oct 28, 2023

Choose a reason for hiding this comment

jansel Oct 28, 2023

Choose a reason for hiding this comment

oulgen Oct 28, 2023

Choose a reason for hiding this comment

jansel Oct 28, 2023

Choose a reason for hiding this comment

oulgen commented Oct 29, 2023

oulgen commented Oct 30, 2023

pytorchmergebot commented Oct 30, 2023

Merge started

oulgen commented Oct 27, 2023 •

edited

pytorch-bot bot commented Oct 27, 2023 •

edited

oulgen Oct 28, 2023 •

edited