-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[inductor] remove RBLOCK from persistent reduction kernel's parameter list #98653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… list [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/98653
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 2a03229: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…s parameter list" This PR resolves comments #97203 (comment) . Send a separate PR since it's easier to test and make sure there is no perf impact. Tests: 1. python test/inductor/test_torchinductor.py 2. run `python benchmarks/dynamo/torchbench.py --backend inductor --amp --performance --dashboard --only hf_Bert --disable-cudagraphs --training` before and after the change to make sure the perf change is neutral. Now a persistent reduction kernel in hf_Bert looks like: ``` persistent_reduction( size_hints=[4096, 1024], reduction_hint=ReductionHint.INNER, filename=__file__, meta={'signature': {0: '*fp32', 1: '*i64', 2: '*fp16', 3: '*i64', 4: '*fp16', 5: '*i64', 6: '*fp16', 7: '*fp16', 8: '*fp16', 9: '*fp16', 10: 'i32', 11: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': ['in_out_ptr0'], 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), equal_to_1=())]} ) triton.jit def triton_(in_out_ptr0, in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, in_ptr5, in_ptr6, in_ptr7, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr): xnumel = 4096 rnumel = 768 RBLOCK: tl.constexpr = 1024 xoffset = tl.program_id(0) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:, None] ``` [ghstack-poisoned]
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
… list (pytorch#98653) This PR resolves comments pytorch#97203 (comment) . Send a separate PR since it's easier to test and make sure there is no perf impact. Tests: 1. python test/inductor/test_torchinductor.py 2. run `python benchmarks/dynamo/torchbench.py --backend inductor --amp --performance --dashboard --only hf_Bert --disable-cudagraphs --training` before and after the change to make sure the perf change is neutral. Now a persistent reduction kernel in hf_Bert looks like: ``` @persistent_reduction( size_hints=[4096, 1024], reduction_hint=ReductionHint.INNER, filename=__file__, meta={'signature': {0: '*fp32', 1: '*i64', 2: '*fp16', 3: '*i64', 4: '*fp16', 5: '*i64', 6: '*fp16', 7: '*fp16', 8: '*fp16', 9: '*fp16', 10: 'i32', 11: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': ['in_out_ptr0'], 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), equal_to_1=())]} ) @triton.jit def triton_(in_out_ptr0, in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, in_ptr5, in_ptr6, in_ptr7, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr): xnumel = 4096 rnumel = 768 RBLOCK: tl.constexpr = 1024 xoffset = tl.program_id(0) * XBLOCK xindex = xoffset + tl.arange(0, XBLOCK)[:, None] ``` Pull Request resolved: pytorch#98653 Approved by: https://github.com/jansel
Stack from ghstack (oldest at bottom):
This PR resolves comments #97203 (comment) . Send a separate PR since it's easier to test and make sure there is no perf impact.
Tests:
python benchmarks/dynamo/torchbench.py --backend inductor --amp --performance --dashboard --only hf_Bert --disable-cudagraphs --training
before and after the change to make sure the perf change is neutral.Now a persistent reduction kernel in hf_Bert looks like:
cc @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire