Skip to content

Conversation

laithsakka
Copy link
Contributor

@laithsakka laithsakka commented Oct 2, 2024

Stack from ghstack (oldest at bottom):

during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base
we create a view that is regenerated by calling aten.alias instead of as_strided for better performance.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

Copy link

pytorch-bot bot commented Oct 2, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137149

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ddba772 with merge base 839d356 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

laithsakka added a commit that referenced this pull request Oct 2, 2024
…ctionalize_v2"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
laithsakka added a commit that referenced this pull request Oct 2, 2024
@laithsakka laithsakka changed the title avoid generating as_strided for alaising views in auto_functionalize_v2 Avoid generating as_strided for alaising views in auto_functionalize_v2 Oct 2, 2024
@laithsakka laithsakka added the topic: not user facing topic category label Oct 2, 2024
…ctionalize_v2"


title, see unit tests

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
laithsakka added a commit that referenced this pull request Oct 2, 2024
@laithsakka laithsakka requested a review from zou3519 October 2, 2024 17:05
…ctionalize_v2"


during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base
we just pass the base.  (we can potentially call alias instead to be persisitance with weried case that check id inside custom op) not sure if we shall do that.

Those checks should not be inside auto_functionalized_dense to avoid generating guards in any case. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
laithsakka added a commit that referenced this pull request Oct 2, 2024
@laithsakka laithsakka requested a review from oulgen October 2, 2024 17:33
@zou3519
Copy link
Contributor

zou3519 commented Oct 2, 2024

during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base
we just pass the base. (we can potentially call alias instead to be persisitance with weried case that check id inside custom op) not sure if we shall do that.

Can the Tensor ever be used later on? If so then we need to call alias to preserve the semantics (that the tensor is a view instead of the original tensor)

@laithsakka
Copy link
Contributor Author

during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base
we just pass the base. (we can potentially call alias instead to be persisitance with weried case that check id inside custom op) not sure if we shall do that.

Can the Tensor ever be used later on? If so then we need to call alias to preserve the semantics (that the tensor is a view instead of the original tensor)

inside the custom op?

Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's beef up the testing with dynamic shapes. Also, we shouldn't delete the as_strided calls, they should be turned into alias calls.

@laithsakka
Copy link
Contributor Author

so we do add guards when we fail the check
for example:

@torch.compile(fullgraph=True, dynamic=True, backend="inductor")
def f(a):
   b = a[0]
   foo(b)
+- LAMBDA_GUARD: Ne(L['a'].size()[0], L['a'].size()[2])  # (_higher_order_ops/auto_functionalize.py:81 in write_single_view)

we do not add new guards for this

@torch.compile(fullgraph=True, dynamic=True, backend="inductor")
def f(a):
   b = torch.ops.aten.alias (a)
   foo(b)

because the two side of the symbol we are comparing are the same symbols.

however what is the likely hood of actually failing the guards in next iterations, is your concern about the guard checking
over head that we can avoid if we do this statically (capturing the alias call). or is your concern with potential recompilation.

@laithsakka
Copy link
Contributor Author

laithsakka commented Oct 4, 2024

I see the point that "this guards is not needed in the case above, because we always know that it will is true, and we can write the opt in a way that does not generate the guard".

but someone can counter that with " well if its always the same then we dont have to worry about recompilation so unless we are worried about guard checking time we shall be ok"

I do not have strong opinion about landing this just sharing my thoughts, lmk what do you think

@zou3519
Copy link
Contributor

zou3519 commented Oct 4, 2024

The as_strided -> alias() change sounds good then, because it does not add new guards. Can you add additional tests to show that it doesn't invoke a recompile when the shape changes? Then we can land this PR.

I'm just concerned about unnecessary recompiles.

For the slice change: I'm confused, why is it generating a guard that looks like Ne(L['a'].size()[0], L['a'].size()[2]) ?

@laithsakka
Copy link
Contributor Author

laithsakka commented Oct 4, 2024

as_strided -> alias()

oh Ne(L['a'].size()[0], L['a'].size()[2]) is added by the alias change not by the slice change. I did not check anything on the other slice change yet.
basically we fail the check

     base.size() == tensor.size() 

and this cause the above failure.

…ctionalize_v2"


during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base
we just pass the base.  (we can potentially call alias instead to be persisitance with weried case that check id inside custom op) not sure if we shall do that.

Those checks should not be inside auto_functionalized_dense to avoid generating guards in any case. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
laithsakka added a commit that referenced this pull request Oct 7, 2024
@laithsakka
Copy link
Contributor Author

added recompilation tests, not sure if that covers your intention of :

we should also have a test where we create a tensor with unbacked symints, and then we take an alias of that, and then we mutate it.

also update the diff to generate alias, (added a test but it fail seems that soemwhere down the road inductor remove the alias ) see issue #137434

…ctionalize_v2"


during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base
we just pass the base.  (we can potentially call alias instead to be persisitance with weried case that check id inside custom op) not sure if we shall do that.

Those checks should not be inside auto_functionalized_dense to avoid generating guards in any case. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
laithsakka added a commit that referenced this pull request Oct 7, 2024
…ctionalize_v2"


during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base
we create a view that is regenerated by calling aten.alias instead of as_strided for better performance. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
laithsakka added a commit that referenced this pull request Oct 7, 2024
…ctionalize_v2"


during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base
we create a view that is regenerated by calling aten.alias instead of as_strided for better performance. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
laithsakka added a commit that referenced this pull request Oct 8, 2024
…ctionalize_v2"


during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base
we create a view that is regenerated by calling aten.alias instead of as_strided for better performance. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
laithsakka added a commit that referenced this pull request Oct 8, 2024
…ctionalize_v2"


during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base
we create a view that is regenerated by calling aten.alias instead of as_strided for better performance. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
laithsakka added a commit that referenced this pull request Oct 8, 2024
…ctionalize_v2"


during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base
we create a view that is regenerated by calling aten.alias instead of as_strided for better performance. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
laithsakka added a commit that referenced this pull request Oct 8, 2024
alias_default_1: "f32[s0][1]cpu" = torch.ops.aten.alias.default(arg1_1)
foo_default = torch.ops.mylib.foo.default(alias_default, alias_default_1); \
alias_default = alias_default_1 = foo_default = None
copy_: "f32[s0][1]cpu" = torch.ops.aten.copy_.default(arg1_1, arg1_1); copy_ = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check... does inductor remove the copy_ during lowering?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

output code

def call(args):
    arg0_1, arg1_1, arg2_1 = args
    args.clear()
    s0 = arg0_1
    s1 = arg1_1
    assert_size_stride(arg2_1, (s0, s1), (s1, 1))
    # Topologically Sorted Source Nodes: [], Original ATen: []
    torch.ops.mylib.foo.default(arg2_1, arg2_1)
    return (arg2_1, arg2_1, )

Comment on lines +985 to +986
def f(x):
a = torch.ops.aten.alias.default(x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this something like...

def f(x):
    a = torch.ops.aten.alias.default(x)
    b = x.clone()
    c = b.nonzero().float()
    d = c.alias()
    torch.ops.mylib.foo(a, d)
    return a, d

d is a Tensor with unbacked symints in the shape

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will add a test for that i here and in the recompile test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added they pass

# that id(x) != id(base)
@torch._inductor.config.patch(enable_auto_functionalized_v2=True)
@unittest.skip(
reason="This test fails because something else in inductor optimize out the alias. issue #137434"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm

Comment on lines +1175 to +1178
def func(x):
a = torch.ops.aten.alias.default(x)
torch.ops.mylib.not_eq(a, x)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you return a? and then assert that the input and output of func have different id identities.

Inductor doesn't match the tensor identity of intermediates of the function, but it should better match the identity of inputs/outputs of the compiled function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will add another test that does that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added that , that one pass.

Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM, but please see my suggestions for the test cases

…ctionalize_v2"


during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base
we create a view that is regenerated by calling aten.alias instead of as_strided for better performance. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
laithsakka added a commit that referenced this pull request Oct 9, 2024
…ctionalize_v2"


during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base
we create a view that is regenerated by calling aten.alias instead of as_strided for better performance. 

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
laithsakka added a commit that referenced this pull request Oct 9, 2024
@laithsakka
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 9, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

jackzhxng pushed a commit that referenced this pull request Oct 16, 2024
…v2 (#137149)

during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base
we create a view that is regenerated by calling aten.alias instead of as_strided for better performance.

Pull Request resolved: #137149
Approved by: https://github.com/zou3519
@github-actions github-actions bot deleted the gh/laithsakka/76/head branch November 10, 2024 02:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants