Fix redudant kernel generations #102104

FindHao · 2023-05-23T19:00:56Z

Issue description

The PR #100064 introduces a new RNG operation process. However, it causes every randint to load a separate random seed by default. TorchInductor generates a buffer to store all necessary random seeds and places the offsets as constant values in the subsequent compute buffers. In ir_pre_fusion generated by TorchInductor, some buffers only differ by one line, which is the load random seed with the corresponding offset. Subsequently, the codegen generates Triton kernels following the same rule. Finally, in the output_code.py, some Triton kernels only differ by one line, meaning that redundant kernels are being generated.

Solution

This PR captures the seed offset and adds it to the existing self.sizevars structure. It generates variable names as placeholders, allowing the code wrapper to pass the offset as an argument to the kernels. I've also modified the divisible_by_16 check to exclude this argument.

This PR reduces the number of generated kernels from 50 to 17 for BertForMaskedLM forward.

According to tests on my own environment, the compilation time of attention_is_all_you_need_pytorch has been reduced from 94s to 66s. The speedup remains largely unchanged, at 1.37X.

The following is a comparison for a simple example.
Before:

triton_poi_fused_0 = async_compile.triton('triton_', '''
...
def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
    ...
    tmp0 = tl.load(in_ptr0 + 0)
    tmp1 = x0
    tmp2 = triton_helpers.randint64(tmp0, (tmp1).to(tl.uint32), 0, 10)

triton_poi_fused_1 = async_compile.triton('triton_', '''
...
def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
    ...
    tmp0 = tl.load(in_ptr0 + 1)
    tmp1 = x0
    tmp2 = triton_helpers.randint64(tmp0, (tmp1).to(tl.uint32), 0, 10)
...''')

def call(args):
        triton_poi_fused_0.run(buf0, buf1, 1024, grid=grid(1024), stream=stream0)
        triton_poi_fused_1.run(buf0, buf2, 1024, grid=grid(1024), stream=stream0)

After:

triton_poi_fused_0 = async_compile.triton('triton_', '''
...
def triton_(in_ptr0, out_ptr0, load_seed_offset, xnumel, XBLOCK : tl.constexpr):
    ...
    tmp0 = tl.load(in_ptr0 + load_seed_offset)
    tmp1 = x0
    tmp2 = triton_helpers.randint64(tmp0, (tmp1).to(tl.uint32), 0, 10)
    ....

def call(args):
        triton_poi_fused_0.run(buf0, buf1, 0, 1024, grid=grid(1024), stream=stream0)
        triton_poi_fused_0.run(buf0, buf2, 1, 1024, grid=grid(1024), stream=stream0)

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10

pytorch-bot · 2023-05-23T19:00:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/102104

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit dc1e187:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

FindHao · 2023-05-23T19:09:55Z

@pytorchbot label "topic: not user facing"

FindHao · 2023-05-23T21:03:53Z

Some dynamic shape tests failed because it counts the number of generated kernels while this PR changes it. I'll fix them in next commit.

ngimel · 2023-05-24T01:46:16Z

torch/_inductor/codegen/common.py


+    def seed_offset(self, name, value):
+        if "load_seed_offset" in self.sizevars.values():
+            name = "%s%d" % (


can you use f-string formatting here f"{name}{expr}"

fixed by e6268cf

ngimel · 2023-05-24T01:48:14Z

torch/_inductor/codegen/common.py

            self.inplace_buffers[output_name] = buf

+    def seed_offset(self, name, value):
+        if "load_seed_offset" in self.sizevars.values():


do we have any testcases for multiple load_seed_offset args?

Also, this wouldn't work if you change name at the callsite

I changed the hardcode string to the name in e6268cf .

The simple example for multiple load_seed_offset could be the following.

def fn(): random_tensor1 = torch.randint(10, [1024], device="cuda") random_tensor2 = torch.randint(11, [1024], device="cuda") random_tensor3 = torch.randint(10, [1024], device="cuda") tensor4 = random_tensor1 + random_tensor2 +random_tensor3 return tensor4

I'm thinking that there is another hardcode string in divisible_by_16 check. It's better to replace it too. Do you know where I should put this hardcode string as a global string?

Yeah I understand examples for multiple load_seed_offset exist, I'm asking if we have a test for those

I only tested several models. not found multiple load_seed_offset for now.

Do you mean adding a unit test?

Yes, please add a unit test for this if it doesn't exist.

add a unit test by dc1e187

ngimel · 2023-05-24T16:13:43Z

torch/_inductor/codegen/common.py


+    def seed_offset(self, name, value):
+        if name in self.sizevars.values():
+            name = f"{name}{sum(1 for value in self.sizevars.values() if value.startswith('load_seed_offset'))}"


you still have load_seed_offset here

fixed by b84276f

FindHao · 2023-05-24T17:38:17Z

@pytorchbot merge

pytorchmergebot · 2023-05-24T17:40:19Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-05-24T18:00:39Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / win-vs2019-cpu-py3 / build

Details for Dev Infra team

Raised by workflow job

FindHao · 2023-05-24T19:32:33Z

@pytorchbot merge

pytorchmergebot · 2023-05-24T19:34:37Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-05-24T21:10:48Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / win-vs2019-cpu-py3 / test (default, 2, 3, windows.4xlarge.nonephemeral)

Details for Dev Infra team

Raised by workflow job

ngimel · 2023-05-24T23:54:45Z

@pytorchbot merge -f "test failure unrelated"

pytorchmergebot · 2023-05-24T23:56:47Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

move offsets for load seed as kernel arguments

1fc105e

github-actions bot added ciflow/inductor module: inductor labels May 23, 2023

pytorch-bot bot added the topic: not user facing topic category label May 23, 2023

FindHao changed the title ~~Move offsets of loading seeds as kernel arguments~~ Fix redudant kernel generations May 23, 2023

FindHao marked this pull request as ready for review May 23, 2023 21:02

FindHao requested review from desertfire, jansel and ngimel May 23, 2023 21:02

linterrun;fix one dynamic shape test

d7d6be8

FindHao mentioned this pull request May 23, 2023

[inductor] Refactor RNG operators #100064

Closed

jansel approved these changes May 23, 2023

View reviewed changes

make the count more efficient

44917d5

ngimel reviewed May 24, 2023

View reviewed changes

use string format; change hardcode to variable

e6268cf

ngimel reviewed May 24, 2023

View reviewed changes

FindHao added 2 commits May 24, 2023 09:18

remove hardcode

b84276f

add unit test

dc1e187

ngimel approved these changes May 24, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 24, 2023

pytorchmergebot added the merging label May 24, 2023

pytorchmergebot removed the merging label May 24, 2023

pytorchmergebot added the merging label May 24, 2023

pytorchmergebot removed the merging label May 24, 2023

pytorchmergebot added the merging label May 24, 2023

pytorchmergebot added Merged and removed merging labels May 24, 2023

pytorchmergebot closed this in 3e08988 May 24, 2023

github-actions bot deleted the findhao/fix-redundant-kernels branch November 23, 2024 02:05

Fix redudant kernel generations #102104

Fix redudant kernel generations #102104

Uh oh!

Conversation

FindHao commented May 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue description

Solution

Uh oh!

pytorch-bot bot commented May 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/102104

✅ No Failures

Uh oh!

FindHao commented May 23, 2023

Uh oh!

FindHao commented May 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FindHao commented May 24, 2023

Uh oh!

pytorchmergebot commented May 24, 2023

Merge started

Uh oh!

pytorchmergebot commented May 24, 2023

Merge failed

Uh oh!

FindHao commented May 24, 2023

Uh oh!

pytorchmergebot commented May 24, 2023

Merge started

Uh oh!

pytorchmergebot commented May 24, 2023

Merge failed

Uh oh!

ngimel commented May 24, 2023

Uh oh!

pytorchmergebot commented May 24, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

FindHao commented May 23, 2023 •

edited

Loading

pytorch-bot bot commented May 23, 2023 •

edited

Loading

FindHao commented May 23, 2023 •

edited

Loading