New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement CSE for dynamo guards. #98488
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/98488
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 55b9e6c: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
In summary, it implements the CSE in 3 steps:
Returns: |
torch/_dynamo/guards.py
Outdated
# 'v' will be defined in the 'preface' list (output argument to | ||
# 'NodeTransformer') | ||
class PyExprCSEPass: | ||
IGNORED_NODE_TYPES = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How exactly did you decide to put nodes in the ignored types or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was done in an ad-hoc way. Basically, I went through the list of node types and decided whether they were interesting for CSE purposes. Now that I think about it, maybe I should have created the a list of ALLOWED_NODE_TYPES
.
Perf run kicked off on https://github.com/pytorch/pytorch/actions/runs/4638385442 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to preserve expression ordering
So, I think I can make this PR better (I think I can finish the implementation tomorrow) by:
@ezyang |
We have prior art for this already, in terms of treating shape_env code separately from other added guard code. |
By using the original string? That's seems brittle. According to the perf run your change doesn't materially change compile time, so I think the quadratic term doesn't matter, leave it as is (with a comment saying that it's quadratic.)
Sure, though IMO it's not a big deal.
As Voz says, this is the minimum you will need to do, but I don't think it is enough. Imagine something like:
Naive CSE will move x[0] access before we have confirmed that x is a list. |
I think a useful thing to know is what the performance difference between:
vs
is. If there is not much difference, then the easiest thing to do is to stop compiling guards as a big list of |
Not sure what you mean by that. By compile time, do you mean after we introduce JIT to the guards? |
Find below the benchmark (torchbench) results for the 2 versions @ezyang mentioned. In summary:
Raw Results
After looking at the results, I guess it's best to go with the if-chain version. Let me know your thoughts. (FYI: I will be off until May 1) |
Alright let's if chain |
If need be, I can help finish this PR and land it under your attribution. |
This PR does that. I did some simple local testing, and the perf seems to be identical. |
Yukio, as discussed, will be out until the next of the month. If you want this in now now, I think it's fine for you to finish this one up and land it as a PR by yukio and you. |
@ezyang @anijain2305 |
@pytorchbot merge |
Merge failedReason: This PR needs a label If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This regressed MacOS x86 tests, and in general, string comparison feels like a very fragile thing to do. def ___make_guard_fn():
def guard(L):
if not (x[0].a < x[1].a * (3 - x[2].a)):
return False
if not (a.b.c[0].d.e + a.b.c[1].d.e * a.b.c[2].d.e > 0):
return False
if not (f(m.n[0], '0').x.y.z * f(m.n[0], '1').x.y.z * f(m.n[0], '2').x.y.z < 512):
return False
if not (self.g(a, b).k + (1 - self.g(a, b).k) <= m[0].a + self.g(a, b).k):
return False
return True
return guard which is indential to the one one is trying to match, but without any intermediate vars |
Ah, right. Good catch. Thanks, @malfet . |
@ysiraichi Hmm, looks like the way code is written right now, it would not really work without astunparse, should we just add it as mandatory dependency on python-3.8? |
How so? To me, it looks like it would work without |
@ysiraichi do you see where this call is guarded by pytorch/torch/_dynamo/guards.py Line 735 in 556bb69
See following run:
|
The object instantiation is guarded. pytorch/torch/_dynamo/guards.py Line 1024 in 556bb69
|
Ok, so I'll just guard this test than... |
If `astunparse` is not installed, following guard will be generated in `test_guard_function_builder_with_cse`: ```python def ___make_guard_fn(): def guard(L): if not (x[0].a < x[1].a * (3 - x[2].a)): return False if not (a.b.c[0].d.e + a.b.c[1].d.e * a.b.c[2].d.e > 0): return False if not (f(m.n[0], '0').x.y.z * f(m.n[0], '1').x.y.z * f(m.n[0], '2').x.y.z < 512): return False if not (self.g(a, b).k + (1 - self.g(a, b).k) <= m[0].a + self.g(a, b).k): return False return True return guard ``` Though, I have to say, hardcoding string comparison is pretty weird. Also, skip `test_guards_cse_pass_[single|multiple]` if AST unparsing is missing. Fixes failure in a test introduced by #98488 copilot:poem Pull Request resolved: #101805 Approved by: https://github.com/atalman, https://github.com/ysiraichi
This PR extracted the CSE part of the code in #89707. Pull Request resolved: #98488 Approved by: https://github.com/ezyang, https://github.com/jansel, https://github.com/anijain2305
Summary: pytorch#98488 implements CSE for dynamo guards, and it relies on astunparse to perform the optimization. `test_guards_cse_pass_single` was broken and later was fixed by introducing a check_and_skip_if_needed. This actually fixes the root cause on fbcode and should bring some perf gain internally. Test Plan: `buck2 test @//mode/opt //caffe2/test/dynamo:test_dynamo -- --exact 'caffe2/test/dynamo:test_dynamo - test_misc.py::DynamicShapesMiscTests::test_guards_cse_pass_single' --run-disabled` Reviewed By: malfet Differential Revision: D46126742 fbshipit-source-id: 60ca99dd075e03ba458ebc4e3250ab0f9ebfb9d7
Summary: Pull Request resolved: pytorch#102120 pytorch#98488 implements CSE for dynamo guards, and it relies on astunparse to perform the optimization. `test_guards_cse_pass_single` was broken and later was fixed by introducing a check_and_skip_if_needed. This actually fixes the root cause on fbcode and should bring some perf gain internally. Test Plan: `buck2 test @//mode/opt //caffe2/test/dynamo:test_dynamo -- --exact 'caffe2/test/dynamo:test_dynamo - test_misc.py::DynamicShapesMiscTests::test_guards_cse_pass_single' --run-disabled` Reviewed By: malfet Differential Revision: D46126742 fbshipit-source-id: 259f7c2c16bca111bc81ab6be04b0ce19ba47af9
Summary: #98488 implements CSE for dynamo guards, and it relies on astunparse to perform the optimization. `test_guards_cse_pass_single` was broken and later was fixed by introducing a check_and_skip_if_needed. This actually fixes the root cause on fbcode and should bring some perf gain internally. Test Plan: `buck2 test @//mode/opt //caffe2/test/dynamo:test_dynamo -- --exact 'caffe2/test/dynamo:test_dynamo - test_misc.py::DynamicShapesMiscTests::test_guards_cse_pass_single' --run-disabled` Reviewed By: malfet Differential Revision: D46126742 Pull Request resolved: #102120 Approved by: https://github.com/malfet
This PR extracted the CSE part of the code in #89707.
cc @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire