[inductor] remove RBLOCK from persistent reduction kernel's parameter list #98653

shunting314 · 2023-04-08T00:16:12Z

Stack from ghstack (oldest at bottom):

This PR resolves comments #97203 (comment) . Send a separate PR since it's easier to test and make sure there is no perf impact.

Tests:

python test/inductor/test_torchinductor.py
run python benchmarks/dynamo/torchbench.py --backend inductor --amp --performance --dashboard --only hf_Bert --disable-cudagraphs --training before and after the change to make sure the perf change is neutral.

Now a persistent reduction kernel in hf_Bert looks like:

@persistent_reduction(
    size_hints=[4096, 1024],
    reduction_hint=ReductionHint.INNER,
    filename=__file__,
    meta={'signature': {0: '*fp32', 1: '*i64', 2: '*fp16', 3: '*i64', 4: '*fp16', 5: '*i64', 6: '*fp16', 7: '*fp16', 8: '*fp16', 9: '*fp16', 10: 'i32', 11: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': ['in_out_ptr0'], 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), equal_to_1=())]}
)
@triton.jit
def triton_(in_out_ptr0, in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, in_ptr5, in_ptr6, in_ptr7, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr):
    xnumel = 4096
    rnumel = 768
    RBLOCK: tl.constexpr = 1024
    xoffset = tl.program_id(0) * XBLOCK
    xindex = xoffset + tl.arange(0, XBLOCK)[:, None]

cc @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire

… list [ghstack-poisoned]

pytorch-bot · 2023-04-08T00:16:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/98653

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2a03229:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

… list ghstack-source-id: 22d7b3d Pull Request resolved: #98653

…s parameter list" This PR resolves comments #97203 (comment) . Send a separate PR since it's easier to test and make sure there is no perf impact. Tests: 1. python test/inductor/test_torchinductor.py 2. run `python benchmarks/dynamo/torchbench.py --backend inductor --amp --performance --dashboard --only hf_Bert --disable-cudagraphs --training` before and after the change to make sure the perf change is neutral. Now a persistent reduction kernel in hf_Bert looks like: ``` persistent_reduction( size_hints=[4096, 1024], reduction_hint=ReductionHint.INNER, filename=__file__, meta={'signature': {0: '*fp32', 1: '*i64', 2: '*fp16', 3: '*i64', 4: '*fp16', 5: '*i64', 6: '*fp16', 7: '*fp16', 8: '*fp16', 9: '*fp16', 10: 'i32', 11: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': ['in_out_ptr0'], 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), equal_to_1=())]} ) triton.jit def triton_(in_out_ptr0, in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, in_ptr5, in_ptr6, in_ptr7, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr): xnumel = 4096 rnumel = 768 RBLOCK: tl.constexpr = 1024 xoffset = tl.program_id(0) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:, None] ``` [ghstack-poisoned]

… list ghstack-source-id: 1e4a8d1 Pull Request resolved: #98653

shunting314 · 2023-04-08T07:32:55Z

@pytorchbot merge

pytorchmergebot · 2023-04-08T07:34:51Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

… list (pytorch#98653) This PR resolves comments pytorch#97203 (comment) . Send a separate PR since it's easier to test and make sure there is no perf impact. Tests: 1. python test/inductor/test_torchinductor.py 2. run `python benchmarks/dynamo/torchbench.py --backend inductor --amp --performance --dashboard --only hf_Bert --disable-cudagraphs --training` before and after the change to make sure the perf change is neutral. Now a persistent reduction kernel in hf_Bert looks like: ``` @persistent_reduction( size_hints=[4096, 1024], reduction_hint=ReductionHint.INNER, filename=__file__, meta={'signature': {0: '*fp32', 1: '*i64', 2: '*fp16', 3: '*i64', 4: '*fp16', 5: '*i64', 6: '*fp16', 7: '*fp16', 8: '*fp16', 9: '*fp16', 10: 'i32', 11: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': ['in_out_ptr0'], 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), equal_to_1=())]} ) @triton.jit def triton_(in_out_ptr0, in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, in_ptr5, in_ptr6, in_ptr7, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr): xnumel = 4096 rnumel = 768 RBLOCK: tl.constexpr = 1024 xoffset = tl.program_id(0) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:, None] ``` Pull Request resolved: pytorch#98653 Approved by: https://github.com/jansel

[inductor] remove RBLOCK from persistent reduction kernel's parameter…

bc50da5

… list [ghstack-poisoned]

shunting314 added a commit that referenced this pull request Apr 8, 2023

[inductor] remove RBLOCK from persistent reduction kernel's parameter…

0e0b3d4

… list ghstack-source-id: 22d7b3d Pull Request resolved: #98653

github-actions bot added ciflow/inductor module: inductor labels Apr 8, 2023

shunting314 requested review from Chillee, jansel and ngimel April 8, 2023 00:19

shunting314 mentioned this pull request Apr 8, 2023

[inductor] coordinate descent tuning upon max-autotune #97203

Closed

jansel approved these changes Apr 8, 2023

View reviewed changes

shunting314 added a commit that referenced this pull request Apr 8, 2023

[inductor] remove RBLOCK from persistent reduction kernel's parameter…

eccd4eb

… list ghstack-source-id: 1e4a8d1 Pull Request resolved: #98653

shunting314 added the topic: not user facing topic category label Apr 8, 2023

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 8, 2023

pytorchmergebot added the merging label Apr 8, 2023

pytorchmergebot added the Merged label Apr 8, 2023

pytorchmergebot closed this in 2edfcaf Apr 8, 2023

facebook-github-bot deleted the gh/shunting314/45/head branch June 8, 2023 18:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[inductor] remove RBLOCK from persistent reduction kernel's parameter list #98653

[inductor] remove RBLOCK from persistent reduction kernel's parameter list #98653

Uh oh!

shunting314 commented Apr 8, 2023 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Apr 8, 2023 •

edited

Loading

Uh oh!

shunting314 commented Apr 8, 2023

Uh oh!

pytorchmergebot commented Apr 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[inductor] remove RBLOCK from persistent reduction kernel's parameter list #98653

[inductor] remove RBLOCK from persistent reduction kernel's parameter list #98653

Uh oh!

Conversation

shunting314 commented Apr 8, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/98653

✅ No Failures

Uh oh!

shunting314 commented Apr 8, 2023

Uh oh!

pytorchmergebot commented Apr 8, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shunting314 commented Apr 8, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Apr 8, 2023 •

edited

Loading