[inductor] Add lowering for as_strided_scatter #88379

peterbell10 · 2022-11-03T00:11:54Z

Stack from ghstack (oldest at bottom):

The use of as_strided does require in-memory manipulations, however this
lowering allows those memory ops to be fused with any preceding calculations.
e.g.

def f(a, b):
    return torch.as_strided_scatter(
        a * 8 + 10,
        b * 2 - 4,
        size=(a.numel() // 2,),
        stride=(2,))

Before this compiles to two kernels and a call to aten.as_strided_scatter and
with this PR it compiles to just two kernels and no additional operator calls.

In theory I think this could be a decomposition, but in practice I saw the
output_view.copy_(src) being optimized out in some cases when this was
implemented as a decomposition.

cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx

Ref pytorch/torchdynamo#327 The use of as_strided does require in-memory manipulations, however this lowering allows those memory ops to be fused with any preceding calculations. e.g. ``` def f(a, b): return torch.as_strided_scatter( a * 8 + 10, b * 2 - 4, size=(a.numel() // 2,), stride=(2,)) ``` Before this compiles to two kernels and a call to `aten.as_strided_scatter` and with this PR it compiles to just two kernels and no additional operator calls. In theory I think this could be a decomposition, but in practice I saw the `output_view.copy_(src)` being optimized out in some cases when this was implemented as a decomposition. [ghstack-poisoned]

pytorch-bot · 2022-11-03T00:11:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88379

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures, 1 Pending

As of commit 02a819e:

The following jobs have failed:

Check labels

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Ref pytorch/torchdynamo#327 The use of as_strided does require in-memory manipulations, however this lowering allows those memory ops to be fused with any preceding calculations. e.g. ``` def f(a, b): return torch.as_strided_scatter( a * 8 + 10, b * 2 - 4, size=(a.numel() // 2,), stride=(2,)) ``` Before this compiles to two kernels and a call to `aten.as_strided_scatter` and with this PR it compiles to just two kernels and no additional operator calls. In theory I think this could be a decomposition, but in practice I saw the `output_view.copy_(src)` being optimized out in some cases when this was implemented as a decomposition. ghstack-source-id: 735c3c3 Pull Request resolved: #88379

lezcano · 2022-11-03T10:37:35Z

Note that it'd still be valuable to implement this as a decomposition for other backends to use.

jansel

Actually, can you add a test for this?

Ref pytorch/torchdynamo#327 The use of as_strided does require in-memory manipulations, however this lowering allows those memory ops to be fused with any preceding calculations. e.g. ``` def f(a, b): return torch.as_strided_scatter( a * 8 + 10, b * 2 - 4, size=(a.numel() // 2,), stride=(2,)) ``` Before this compiles to two kernels and a call to `aten.as_strided_scatter` and with this PR it compiles to just two kernels and no additional operator calls. In theory I think this could be a decomposition, but in practice I saw the `output_view.copy_(src)` being optimized out in some cases when this was implemented as a decomposition. cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen chunyuan-w XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx [ghstack-poisoned]

peterbell10 · 2022-11-03T23:47:56Z

@jansel PTAL, I've added a test.

peterbell10 · 2022-11-06T22:25:03Z

@pytorchbot merge

pytorchmergebot · 2022-11-06T22:28:04Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

ngimel · 2022-11-07T18:41:56Z

output_view.copy_(src)

being optimized away is worrying, @peterbell10 do you have a repro? cc @bdhirsh

Ref pytorch/torchdynamo#327 The use of as_strided does require in-memory manipulations, however this lowering allows those memory ops to be fused with any preceding calculations. e.g. ``` def f(a, b): return torch.as_strided_scatter( a * 8 + 10, b * 2 - 4, size=(a.numel() // 2,), stride=(2,)) ``` Before this compiles to two kernels and a call to `aten.as_strided_scatter` and with this PR it compiles to just two kernels and no additional operator calls. In theory I think this could be a decomposition, but in practice I saw the `output_view.copy_(src)` being optimized out in some cases when this was implemented as a decomposition. ghstack-source-id: 65b98bf Pull Request resolved: pytorch#88379

peterbell10 · 2022-11-07T21:47:37Z

On the next PR in the stack (edit: specifically at commit hash deca398), if I enable the decomposition in torch/_inductor/decomposition.py then this reproducer fails for me:

import torch
from torch.fx.experimental.proxy_tensor import make_fx
from torch._inductor.compile_fx import compile_fx

x = torch.rand((1,), device="cuda", requires_grad=True)
y = torch.rand((1,), device="cuda", requires_grad=True)
def f(a, b):
    return torch.as_strided_scatter(
        a,
        b,
        size=(1,),
        stride=(1,)),

torch._inductor.config.debug = True
args = [x, y]

decomposed = make_fx(f, tracing_mode="fake")(*args)
compiled_decomposed = compile_fx(decomposed, args)
expect = f(*args)
actual = compiled_decomposed(*args)
print(f'Expected {expect}\nActual: {actual}')

The requires_grad=True on the input is important.

bdhirsh · 2022-11-08T20:22:45Z

Thanks for the repro @peterbell10

I'm going to add asserts in aot autograd soon to check that there are no mutable ops in the graph right before we run DCE, which is what causes problems.

For the actual fix, I think what we should probably do is use functionalization to "functionalize" every decomposition right before we run it, during proxy tensor tracing. That way we don't need to worry about / establish invariants on whether decomps are allowed to call mutations. I'm gonna try to POC this soon.

There's probably some question as to whether or not proxy tensor tracing should functionalize decomps by default or not. Maybe we need another feature flag, in make_fx? (lol)

Ref pytorch/torchdynamo#327 The use of as_strided does require in-memory manipulations, however this lowering allows those memory ops to be fused with any preceding calculations. e.g. ``` def f(a, b): return torch.as_strided_scatter( a * 8 + 10, b * 2 - 4, size=(a.numel() // 2,), stride=(2,)) ``` Before this compiles to two kernels and a call to `aten.as_strided_scatter` and with this PR it compiles to just two kernels and no additional operator calls. In theory I think this could be a decomposition, but in practice I saw the `output_view.copy_(src)` being optimized out in some cases when this was implemented as a decomposition. Pull Request resolved: pytorch#88379 Approved by: https://github.com/jansel

github-actions bot added ciflow/inductor module: inductor labels Nov 3, 2022

pytorchbot added the open source label Nov 3, 2022

lezcano requested a review from jansel November 3, 2022 10:38

peterbell10 marked this pull request as ready for review November 3, 2022 11:15

jansel approved these changes Nov 3, 2022

View reviewed changes

jansel requested changes Nov 3, 2022

View reviewed changes

peterbell10 mentioned this pull request Nov 3, 2022

[primTorch] Add prim and ref for as_strided_scatter #88426

Closed

peterbell10 added the topic: not user facing topic category label Nov 3, 2022

jansel approved these changes Nov 6, 2022

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 6, 2022

pytorchmergebot added the Merged label Nov 7, 2022

pytorchmergebot closed this in 791d9ee Nov 7, 2022

facebook-github-bot deleted the gh/peterbell10/450/head branch June 8, 2023 18:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[inductor] Add lowering for as_strided_scatter #88379

[inductor] Add lowering for as_strided_scatter #88379

Uh oh!

peterbell10 commented Nov 3, 2022 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 3, 2022 •

edited

Loading

Uh oh!

lezcano commented Nov 3, 2022

Uh oh!

jansel left a comment

Uh oh!

peterbell10 commented Nov 3, 2022

Uh oh!

peterbell10 commented Nov 6, 2022

Uh oh!

pytorchmergebot commented Nov 6, 2022

Uh oh!

ngimel commented Nov 7, 2022

Uh oh!

peterbell10 commented Nov 7, 2022 •

edited

Loading

Uh oh!

bdhirsh commented Nov 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[inductor] Add lowering for as_strided_scatter #88379

[inductor] Add lowering for as_strided_scatter #88379

Uh oh!

Conversation

peterbell10 commented Nov 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88379

❌ 1 Failures, 1 Pending

Uh oh!

lezcano commented Nov 3, 2022

Uh oh!

jansel left a comment

Choose a reason for hiding this comment

Uh oh!

peterbell10 commented Nov 3, 2022

Uh oh!

peterbell10 commented Nov 6, 2022

Uh oh!

pytorchmergebot commented Nov 6, 2022

Merge started

Uh oh!

ngimel commented Nov 7, 2022

Uh oh!

peterbell10 commented Nov 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bdhirsh commented Nov 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

peterbell10 commented Nov 3, 2022 •

edited

Loading

pytorch-bot bot commented Nov 3, 2022 •

edited

Loading

peterbell10 commented Nov 7, 2022 •

edited

Loading