Skip to content

Conversation

shunting314
Copy link
Contributor

@shunting314 shunting314 commented Apr 8, 2023

Stack from ghstack (oldest at bottom):

This PR resolves comments #97203 (comment) . Send a separate PR since it's easier to test and make sure there is no perf impact.

Tests:

  1. python test/inductor/test_torchinductor.py
  2. run python benchmarks/dynamo/torchbench.py --backend inductor --amp --performance --dashboard --only hf_Bert --disable-cudagraphs --training before and after the change to make sure the perf change is neutral.

Now a persistent reduction kernel in hf_Bert looks like:

@persistent_reduction(
    size_hints=[4096, 1024],
    reduction_hint=ReductionHint.INNER,
    filename=__file__,
    meta={'signature': {0: '*fp32', 1: '*i64', 2: '*fp16', 3: '*i64', 4: '*fp16', 5: '*i64', 6: '*fp16', 7: '*fp16', 8: '*fp16', 9: '*fp16', 10: 'i32', 11: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': ['in_out_ptr0'], 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), equal_to_1=())]}
)
@triton.jit
def triton_(in_out_ptr0, in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, in_ptr5, in_ptr6, in_ptr7, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr):
    xnumel = 4096
    rnumel = 768
    RBLOCK: tl.constexpr = 1024
    xoffset = tl.program_id(0) * XBLOCK
    xindex = xoffset + tl.arange(0, XBLOCK)[:, None]

cc @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire

@pytorch-bot
Copy link

pytorch-bot bot commented Apr 8, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/98653

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2a03229:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…s parameter list"


This PR resolves comments #97203 (comment) . Send a separate PR since it's easier to test and make sure there is no perf impact.

Tests:
1. python test/inductor/test_torchinductor.py
2. run `python benchmarks/dynamo/torchbench.py --backend inductor --amp --performance --dashboard --only hf_Bert --disable-cudagraphs --training` before and after the change to make sure the perf change is neutral.

Now a persistent reduction kernel in hf_Bert looks like:
```
persistent_reduction(
    size_hints=[4096, 1024],
    reduction_hint=ReductionHint.INNER,
    filename=__file__,
    meta={'signature': {0: '*fp32', 1: '*i64', 2: '*fp16', 3: '*i64', 4: '*fp16', 5: '*i64', 6: '*fp16', 7: '*fp16', 8: '*fp16', 9: '*fp16', 10: 'i32', 11: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': ['in_out_ptr0'], 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), equal_to_1=())]}
)
triton.jit
def triton_(in_out_ptr0, in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, in_ptr5, in_ptr6, in_ptr7, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr):
    xnumel = 4096
    rnumel = 768
    RBLOCK: tl.constexpr = 1024
    xoffset = tl.program_id(0) * XBLOCK
    xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
```



[ghstack-poisoned]
shunting314 added a commit that referenced this pull request Apr 8, 2023
@shunting314 shunting314 added the topic: not user facing topic category label Apr 8, 2023
@shunting314
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 8, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

skotapati pushed a commit to kulinseth/pytorch that referenced this pull request Apr 10, 2023
… list (pytorch#98653)

This PR resolves comments pytorch#97203 (comment) . Send a separate PR since it's easier to test and make sure there is no perf impact.

Tests:
1. python test/inductor/test_torchinductor.py
2. run `python benchmarks/dynamo/torchbench.py --backend inductor --amp --performance --dashboard --only hf_Bert --disable-cudagraphs --training` before and after the change to make sure the perf change is neutral.

Now a persistent reduction kernel in hf_Bert looks like:
```
@persistent_reduction(
    size_hints=[4096, 1024],
    reduction_hint=ReductionHint.INNER,
    filename=__file__,
    meta={'signature': {0: '*fp32', 1: '*i64', 2: '*fp16', 3: '*i64', 4: '*fp16', 5: '*i64', 6: '*fp16', 7: '*fp16', 8: '*fp16', 9: '*fp16', 10: 'i32', 11: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': ['in_out_ptr0'], 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), equal_to_1=())]}
)
@triton.jit
def triton_(in_out_ptr0, in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, in_ptr5, in_ptr6, in_ptr7, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr):
    xnumel = 4096
    rnumel = 768
    RBLOCK: tl.constexpr = 1024
    xoffset = tl.program_id(0) * XBLOCK
    xindex = xoffset + tl.arange(0, XBLOCK)[:, None]
```

Pull Request resolved: pytorch#98653
Approved by: https://github.com/jansel
@facebook-github-bot facebook-github-bot deleted the gh/shunting314/45/head branch June 8, 2023 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants