-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Avoid generating as_strided for alaising views in auto_functionalize_v2 #137149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137149
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit ddba772 with merge base 839d356 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…ctionalize_v2" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
…ctionalize_v2" title, see unit tests cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
…ctionalize_v2" during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base we just pass the base. (we can potentially call alias instead to be persisitance with weried case that check id inside custom op) not sure if we shall do that. Those checks should not be inside auto_functionalized_dense to avoid generating guards in any case. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
Can the Tensor ever be used later on? If so then we need to call alias to preserve the semantics (that the tensor is a view instead of the original tensor) |
inside the custom op? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's beef up the testing with dynamic shapes. Also, we shouldn't delete the as_strided calls, they should be turned into alias calls.
so we do add guards when we fail the check
we do not add new guards for this
because the two side of the symbol we are comparing are the same symbols. however what is the likely hood of actually failing the guards in next iterations, is your concern about the guard checking |
I see the point that "this guards is not needed in the case above, because we always know that it will is true, and we can write the opt in a way that does not generate the guard". but someone can counter that with " well if its always the same then we dont have to worry about recompilation so unless we are worried about guard checking time we shall be ok" I do not have strong opinion about landing this just sharing my thoughts, lmk what do you think |
The as_strided -> alias() change sounds good then, because it does not add new guards. Can you add additional tests to show that it doesn't invoke a recompile when the shape changes? Then we can land this PR. I'm just concerned about unnecessary recompiles. For the slice change: I'm confused, why is it generating a guard that looks like |
oh Ne(L['a'].size()[0], L['a'].size()[2]) is added by the alias change not by the slice change. I did not check anything on the other slice change yet.
and this cause the above failure. |
…ctionalize_v2" during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base we just pass the base. (we can potentially call alias instead to be persisitance with weried case that check id inside custom op) not sure if we shall do that. Those checks should not be inside auto_functionalized_dense to avoid generating guards in any case. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
added recompilation tests, not sure if that covers your intention of :
also update the diff to generate alias, (added a test but it fail seems that soemwhere down the road inductor remove the alias ) see issue #137434 |
…ctionalize_v2" during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base we just pass the base. (we can potentially call alias instead to be persisitance with weried case that check id inside custom op) not sure if we shall do that. Those checks should not be inside auto_functionalized_dense to avoid generating guards in any case. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
…ctionalize_v2" during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base we create a view that is regenerated by calling aten.alias instead of as_strided for better performance. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
…ctionalize_v2" during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base we create a view that is regenerated by calling aten.alias instead of as_strided for better performance. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
…ctionalize_v2" during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base we create a view that is regenerated by calling aten.alias instead of as_strided for better performance. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
…ctionalize_v2" during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base we create a view that is regenerated by calling aten.alias instead of as_strided for better performance. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
…ctionalize_v2" during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base we create a view that is regenerated by calling aten.alias instead of as_strided for better performance. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
alias_default_1: "f32[s0][1]cpu" = torch.ops.aten.alias.default(arg1_1) | ||
foo_default = torch.ops.mylib.foo.default(alias_default, alias_default_1); \ | ||
alias_default = alias_default_1 = foo_default = None | ||
copy_: "f32[s0][1]cpu" = torch.ops.aten.copy_.default(arg1_1, arg1_1); copy_ = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To check... does inductor remove the copy_ during lowering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output code
def call(args):
arg0_1, arg1_1, arg2_1 = args
args.clear()
s0 = arg0_1
s1 = arg1_1
assert_size_stride(arg2_1, (s0, s1), (s1, 1))
# Topologically Sorted Source Nodes: [], Original ATen: []
torch.ops.mylib.foo.default(arg2_1, arg2_1)
return (arg2_1, arg2_1, )
def f(x): | ||
a = torch.ops.aten.alias.default(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this something like...
def f(x):
a = torch.ops.aten.alias.default(x)
b = x.clone()
c = b.nonzero().float()
d = c.alias()
torch.ops.mylib.foo(a, d)
return a, d
d
is a Tensor with unbacked symints in the shape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will add a test for that i here and in the recompile test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added they pass
# that id(x) != id(base) | ||
@torch._inductor.config.patch(enable_auto_functionalized_v2=True) | ||
@unittest.skip( | ||
reason="This test fails because something else in inductor optimize out the alias. issue #137434" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm
def func(x): | ||
a = torch.ops.aten.alias.default(x) | ||
torch.ops.mylib.not_eq(a, x) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you return a
? and then assert that the input and output of func
have different id identities.
Inductor doesn't match the tensor identity of intermediates of the function, but it should better match the identity of inputs/outputs of the compiled function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will add another test that does that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added that , that one pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code LGTM, but please see my suggestions for the test cases
…ctionalize_v2" during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base we create a view that is regenerated by calling aten.alias instead of as_strided for better performance. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
…ctionalize_v2" during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base we create a view that is regenerated by calling aten.alias instead of as_strided for better performance. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…v2 (#137149) during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base we create a view that is regenerated by calling aten.alias instead of as_strided for better performance. Pull Request resolved: #137149 Approved by: https://github.com/zou3519
Stack from ghstack (oldest at bottom):
during auto_functionalize_v2 if we encounter a view such that size() stride() and storage_offset() matches the base
we create a view that is regenerated by calling aten.alias instead of as_strided for better performance.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang