Lift inductor lowerings for jagged <-> padded dense kernels #125968

jbschlosser · 2024-05-10T21:56:49Z

Stack from ghstack (oldest at bottom):

This PR lifts internal lowerings written for FBGEMM kernels that do jagged <-> padded dense conversions. In particular, this PR provides lowerings and meta registrations for the following ATen ops:

_jagged_to_padded_dense_forward()
_padded_dense_to_jagged_forward()
- NB: if total_L is not provided, the output shape is data-dependent. An unbacked SymInt is used for this case.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

[ghstack-poisoned]

pytorch-bot · 2024-05-10T21:56:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125968

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 67aab82 with merge base 15ab636 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

inductor-periodic / cuda12.1-py3.10-gcc9-sm86-periodic-dynamo-benchmarks / test (dynamic_aot_eager_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
yolov3

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

This PR lifts internal lowerings written for FBGEMM kernels that do jagged <-> padded dense conversions. In particular, this PR provides lowerings and meta registrations for the following ATen ops: * `_jagged_to_padded_dense_forward()` * `_padded_dense_to_jagged_forward()` * NB: if `total_L` is not provided, the output shape is data-dependent. An unbacked SymInt is used for this case. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

jbschlosser · 2024-06-12T16:10:44Z

@pytorchbot merge

pytorchmergebot · 2024-06-12T16:12:39Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-06-12T16:17:57Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-focal-cuda12.1-py3.10-gcc9-no-ops / build

Details for Dev Infra team

Raised by workflow job

jbschlosser · 2024-06-12T16:53:10Z

@pytorchbot merge -i

pytorchmergebot · 2024-06-12T16:55:01Z

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / linux-focal-cuda12.1-py3.10-gcc9-no-ops / build

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

merge at will After #125968 and #127693 landrace Pull Request resolved: #128587 Approved by: https://github.com/huydhn

…125968) This PR lifts internal lowerings written for FBGEMM kernels that do jagged <-> padded dense conversions. In particular, this PR provides lowerings and meta registrations for the following ATen ops: * `_jagged_to_padded_dense_forward()` * `_padded_dense_to_jagged_forward()` * NB: if `total_L` is not provided, the output shape is data-dependent. An unbacked SymInt is used for this case. Pull Request resolved: pytorch#125968 Approved by: https://github.com/davidberard98

merge at will After pytorch#125968 and pytorch#127693 landrace Pull Request resolved: pytorch#128587 Approved by: https://github.com/huydhn

…125968) This PR lifts internal lowerings written for FBGEMM kernels that do jagged <-> padded dense conversions. In particular, this PR provides lowerings and meta registrations for the following ATen ops: * `_jagged_to_padded_dense_forward()` * `_padded_dense_to_jagged_forward()` * NB: if `total_L` is not provided, the output shape is data-dependent. An unbacked SymInt is used for this case. Pull Request resolved: pytorch#125968 Approved by: https://github.com/davidberard98

merge at will After pytorch#125968 and pytorch#127693 landrace Pull Request resolved: pytorch#128587 Approved by: https://github.com/huydhn

Summary: Add a `jagged_sum` reduction operator for padded nested tensors, based on the PyTorch `sum` operator, to TritonBench. This diff uses the PyTorch function [`torch.ops.aten._jagged_to_padded_dense_forward`](https://www.internalfb.com/code/fbsource/[92c2a067ab04e3eebc999254fed4ae2fbea6def3]/fbcode/deeplearning/fbgemm/fbgemm_gpu/fb/inductor_lowerings/elementwise_ops.py?lines=26), hosted at this [GitHub pull request](pytorch/pytorch#125968), to pad each 2-dimensional tensor in a nested tensor of shape `(B, *, M)`, then reduce across the `N`-th dimension (`dim == 1`) to a `(B, M)` output tensor. Measure accuracy of padded implementation against unpadded baseline implementation via `accuracy` TritonBench metric. Reviewed By: davidberard98 Differential Revision: D58423489

…orch#2305) Summary: Pull Request resolved: pytorch#2305 Add a `jagged_sum` reduction operator for padded nested tensors, based on the PyTorch `sum` operator, to TritonBench. This diff uses the PyTorch function [`torch.ops.aten._jagged_to_padded_dense_forward`](https://www.internalfb.com/code/fbsource/[92c2a067ab04e3eebc999254fed4ae2fbea6def3]/fbcode/deeplearning/fbgemm/fbgemm_gpu/fb/inductor_lowerings/elementwise_ops.py?lines=26), hosted at this [GitHub pull request](pytorch/pytorch#125968), to pad each 2-dimensional tensor in a nested tensor of shape `(B, *, M)`, then reduce across the `N`-th dimension (`dim == 1`) to a `(B, M)` output tensor. Measure accuracy of padded implementation against unpadded baseline implementation via `accuracy` TritonBench metric. Reviewed By: davidberard98 Differential Revision: D58423489

Summary: Pull Request resolved: #2305 Add a `jagged_sum` reduction operator for padded nested tensors, based on the PyTorch `sum` operator, to TritonBench. This diff uses the PyTorch function [`torch.ops.aten._jagged_to_padded_dense_forward`](https://www.internalfb.com/code/fbsource/[92c2a067ab04e3eebc999254fed4ae2fbea6def3]/fbcode/deeplearning/fbgemm/fbgemm_gpu/fb/inductor_lowerings/elementwise_ops.py?lines=26), hosted at this [GitHub pull request](pytorch/pytorch#125968), to pad each 2-dimensional tensor in a nested tensor of shape `(B, *, M)`, then reduce across the `N`-th dimension (`dim == 1`) to a `(B, M)` output tensor. Measure accuracy of padded implementation against unpadded baseline implementation via `accuracy` TritonBench metric. Reviewed By: davidberard98 Differential Revision: D58423489 fbshipit-source-id: d2f6095f8af1cb188bb979e2f5605ad80db50a46

Lift inductor lowerings for jagged <-> padded dense kernels

3d84a6f

[ghstack-poisoned]

jbschlosser mentioned this pull request May 10, 2024

Short-term fix to preserve NJT metadata cache in torch.compile #122836

Closed

jbschlosser mentioned this pull request May 10, 2024

Lift jagged -> padded dense forward / backward kernels from fbgemm_gpu #125946

Closed

pytorch-bot bot added ciflow/inductor module: dynamo module: inductor labels May 10, 2024

This was referenced May 10, 2024

NJT <-> padded dense conversions #125947

Draft

(WIP) to_padded_tensor() triton kernel for NJT #121947

Draft

jbschlosser added the topic: not user facing topic category label May 14, 2024

jbschlosser mentioned this pull request May 14, 2024

Traceable wrapper subclass support for deferred runtime asserts #126198

Closed

jbschlosser added 3 commits May 14, 2024 17:21

jbschlosser mentioned this pull request May 17, 2024

Use return_and_correct_aliasing() for NJT + compatible storage setting #126552

Open

jbschlosser added 12 commits May 17, 2024 14:15

jbschlosser mentioned this pull request May 23, 2024

Naive CPU kernels for jagged <-> padded dense conversions #127007

Closed

jbschlosser added 3 commits June 11, 2024 12:00

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 12, 2024

pytorchmergebot added the merging label Jun 12, 2024

pytorchmergebot removed the merging label Jun 12, 2024

pytorchmergebot added the merging label Jun 12, 2024

pytorchmergebot added the Merged label Jun 12, 2024

pytorchmergebot closed this in bb3cf8a Jun 12, 2024

pytorchmergebot removed the merging label Jun 12, 2024

clee2000 mentioned this pull request Jun 13, 2024

Forward fix lint #128587

Closed

pytorchmergebot pushed a commit that referenced this pull request Jun 13, 2024

Forward fix lint (#128587)

518c9e6

merge at will After #125968 and #127693 landrace Pull Request resolved: #128587 Approved by: https://github.com/huydhn

TharinduRusira pushed a commit to TharinduRusira/pytorch that referenced this pull request Jun 14, 2024

Forward fix lint (pytorch#128587)

b7c7d08

merge at will After pytorch#125968 and pytorch#127693 landrace Pull Request resolved: pytorch#128587 Approved by: https://github.com/huydhn

ignaciobartol pushed a commit to ignaciobartol/pytorch that referenced this pull request Jun 14, 2024

Forward fix lint (pytorch#128587)

9ae392e

merge at will After pytorch#125968 and pytorch#127693 landrace Pull Request resolved: pytorch#128587 Approved by: https://github.com/huydhn

jananisriram mentioned this pull request Jun 14, 2024

Add jagged_sum operator for padded nested tensors to TritonBench pytorch/benchmark#2305

Closed

github-actions bot deleted the gh/jbschlosser/141/head branch July 13, 2024 01:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lift inductor lowerings for jagged <-> padded dense kernels #125968

Lift inductor lowerings for jagged <-> padded dense kernels #125968

jbschlosser commented May 10, 2024 •

edited

Loading

pytorch-bot bot commented May 10, 2024 •

edited

Loading

jbschlosser commented Jun 12, 2024

pytorchmergebot commented Jun 12, 2024

pytorchmergebot commented Jun 12, 2024

jbschlosser commented Jun 12, 2024

pytorchmergebot commented Jun 12, 2024

Lift inductor lowerings for jagged <-> padded dense kernels #125968

Lift inductor lowerings for jagged <-> padded dense kernels #125968

Conversation

jbschlosser commented May 10, 2024 • edited Loading

pytorch-bot bot commented May 10, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125968

✅ You can merge normally! (1 Unrelated Failure)

jbschlosser commented Jun 12, 2024

pytorchmergebot commented Jun 12, 2024

Merge started

pytorchmergebot commented Jun 12, 2024

Merge failed

jbschlosser commented Jun 12, 2024

pytorchmergebot commented Jun 12, 2024

Merge started

jbschlosser commented May 10, 2024 •

edited

Loading

pytorch-bot bot commented May 10, 2024 •

edited

Loading