Implement CSE for dynamo guards. #98488

ysiraichi · 2023-04-06T08:05:54Z

This PR extracted the CSE part of the code in #89707.

cc @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire

pytorch-bot · 2023-04-06T08:05:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/98488

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 55b9e6c:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ysiraichi · 2023-04-06T08:12:01Z

In summary, it implements the CSE in 3 steps:

Parse each guard expression into an AST
Traverse each parsed AST, counting the number of occurrences at each level
- Holds a map from the stringified node (using ast.unparse function) to a counter
Traverse each parsed AST, replacing the nodes that occur more than a threshold (currently, 2) by a new variable

Returns:
- the assignments of each new variable to its corresponding expression
- the new guard expressions

torch/_dynamo/guards.py

ezyang · 2023-04-07T12:56:26Z

torch/_dynamo/guards.py

+#        'v' will be defined in the 'preface' list (output argument to
+#        'NodeTransformer')
+class PyExprCSEPass:
+    IGNORED_NODE_TYPES = (


How exactly did you decide to put nodes in the ignored types or not?

It was done in an ad-hoc way. Basically, I went through the list of node types and decided whether they were interesting for CSE purposes. Now that I think about it, maybe I should have created the a list of ALLOWED_NODE_TYPES.

torch/_dynamo/guards.py

ezyang · 2023-04-07T13:05:26Z

Perf run kicked off on https://github.com/pytorch/pytorch/actions/runs/4638385442

ezyang · 2023-04-07T23:36:12Z

This is a very nice improvement, worth 2-4%

torch/_dynamo/guards.py

ezyang

Need to preserve expression ordering

torch/_dynamo/guards.py

ysiraichi · 2023-04-08T15:34:40Z

So, I think I can make this PR better (I think I can finish the implementation tomorrow) by:

We can make it work without ast.unparse (eliminating the quadratic time) by using the column information from parsing
Use ALLOW_NODE_TYPES (the inverse of what we have now) so as to filter the nodes interesting to the CSE, namely: Subscript, Attribute, and Call (for getattr calls)
Divide the guard function into non-ShapeEnv guards and ShapeEnv guards (which will free us from the try-catch)

@ezyang
What do you think?

voznesenskym · 2023-04-08T18:56:13Z

Divide the guard function into non-ShapeEnv guards and ShapeEnv guards (which will free us from the try-catch)

We have prior art for this already, in terms of treating shape_env code separately from other added guard code.

ezyang · 2023-04-08T23:00:42Z

We can make it work without ast.unparse (eliminating the quadratic time) by using the column information from parsing

By using the original string? That's seems brittle. According to the perf run your change doesn't materially change compile time, so I think the quadratic term doesn't matter, leave it as is (with a comment saying that it's quadratic.)

Use ALLOW_NODE_TYPES (the inverse of what we have now) so as to filter the nodes interesting to the CSE, namely: Subscript, Attribute, and Call (for getattr calls)

Sure, though IMO it's not a big deal.

Divide the guard function into non-ShapeEnv guards and ShapeEnv guards (which will free us from the try-catch)

As Voz says, this is the minimum you will need to do, but I don't think it is enough. Imagine something like:

isinstance(x, list)
len(x) == 2
isinstance(x[0], list)
len(x[0]) == 2
isinstance(x[0][0], int)
isinstance(x[0][1], int)
isinstance(x[1], int)

Naive CSE will move x[0] access before we have confirmed that x is a list.

ezyang · 2023-04-09T00:35:45Z

I think a useful thing to know is what the performance difference between:

return a and b and c

vs

if not a:
  return False
if not b:
  return False
if not c:
  return False
return True

is. If there is not much difference, then the easiest thing to do is to stop compiling guards as a big list of and statements and then you can CSE at the first point a subexpression is used, without requiring to reorder things.

ysiraichi · 2023-04-09T06:03:24Z

According to the perf run your change doesn't materially change compile time, so I think the quadratic term doesn't matter

Not sure what you mean by that. By compile time, do you mean after we introduce JIT to the guards?

ysiraichi · 2023-04-09T10:24:28Z

Find below the benchmark (torchbench) results for the 2 versions @ezyang mentioned. In summary:

They don't seem to be significantly different
There was 1 benchmark that got >1.1x speedup: resnext50_32x4d
There were 2 benchmarks that got <0.9x speedup: dlrm (0.88x) and hf_T5_large (0.81x)
There were 3 benchmarks that got <0.98x speedup: hf_T5, hf_T5_base, and mobilenet_v2
baseline refers to the commit 1d7133c542f (February 15)
if-chain refers to the second version (literally, a chain of ifs)

Raw Results

Name	Latency (baseline)	Latency (if-chain)	Speedup
BERT_pytorch	38.2049	38.1404	1.0017
Background_Matting	148.4077	148.1106	1.0020
LearningToPaint	42.1799	42.1211	1.0014
Super_SloMo	319.2586	319.4182	0.9995
alexnet	34.7081	34.6411	1.0019
dcgan	25.2886	25.2632	1.0010
densenet121	190.7768	192.6175	0.9904
dlrm	2.0278	2.2835	0.8880
drq	0.7249	0.7233	1.0022
fastNLP_Bert	17.2611	17.1888	1.0042
functorch_dp_cifar10	20.6523	20.6971	0.9978
functorch_maml_omniglot	0.6388	0.6464	0.9882
hf_Albert	15.0872	15.0448	1.0028
hf_Bart	18.9939	18.8683	1.0067
hf_Bert	17.6168	17.9100	0.9836
hf_DistilBert	8.0332	8.0316	1.0002
hf_GPT2	37.0473	37.0166	1.0008
hf_GPT2_large	199.5633	199.7220	0.9992
hf_Reformer	15.5395	15.6129	0.9953
hf_T5	90.9850	98.2737	0.9258
hf_T5_base	305.4032	319.5407	0.9558
hf_T5_large	127.2385	156.5839	0.8126
lennard_jones	0.2202	0.2198	1.0018
maml	114.7211	116.2960	0.9865
maml_omniglot	0.8444	0.8205	1.0291
mnasnet1_0	19.5482	19.5491	1.0000
mobilenet_v2	12.3285	12.9194	0.9543
mobilenet_v3_large	18.2497	18.3144	0.9965
nvidia_deeprecommender	11.4864	11.5554	0.9940
opacus_cifar10	20.7520	20.9510	0.9905
pyhpc_equation_of_state	6.9967	6.9972	0.9999
pyhpc_isoneutral_mixing	27.9351	27.8914	1.0016
pyhpc_turbulent_kinetic_energy	22.3412	21.3129	1.0482
pytorch_CycleGAN_and_pix2pix	14.6733	14.6800	0.9995
pytorch_stargan	51.5640	51.6061	0.9992
pytorch_unet	113.0233	113.1834	0.9986
resnet152	172.9542	172.9537	1.0000
resnet18	7.1986	7.1692	1.0041
resnet50	66.2987	66.3271	0.9996
resnext50_32x4d	62.5773	55.4107	1.1293
shufflenet_v2_x1_0	33.0737	32.9855	1.0027
soft_actor_critic	0.2517	0.2538	0.9917
speech_transformer	21.2798	21.2074	1.0034
squeezenet1_1	6.3501	6.3379	1.0019
tacotron2	1095.2057	1095.4023	0.9998
timm_efficientnet	55.2628	55.8831	0.9889
timm_regnet	212.7725	213.2575	0.9977
timm_resnest	43.5706	43.5622	1.0002
timm_vision_transformer	10.7047	10.7046	1.0000
timm_vision_transformer_large	378.2258	377.8992	1.0009
timm_vovnet	81.9470	82.6849	0.9911
tts_angular	13.3451	13.3499	0.9996
vgg16	15.5082	15.5191	0.9993
yolov3	83.8152	84.1460	0.9961

After looking at the results, I guess it's best to go with the if-chain version. Let me know your thoughts.

(FYI: I will be off until May 1)

ezyang · 2023-04-09T16:30:30Z

Alright let's if chain

voznesenskym · 2023-04-09T19:23:07Z

(FYI: I will be off until May 1)

If need be, I can help finish this PR and land it under your attribution.

voznesenskym · 2023-04-10T06:49:25Z

I think a useful thing to know is what the performance difference between:
return a and b and c
vs
if not a:
  return False
if not b:
  return False
if not c:
  return False
return True
is. If there is not much difference, then the easiest thing to do is to stop compiling guards as a big list of and statements and then you can CSE at the first point a subexpression is used, without requiring to reorder things.

#98374

This PR does that.

I did some simple local testing, and the perf seems to be identical.

lezcano · 2023-04-10T08:39:03Z

If need be, I can help finish this PR and land it under your attribution

Yukio, as discussed, will be out until the next of the month. If you want this in now now, I think it's fine for you to finish this one up and land it as a PR by yukio and you.

ysiraichi · 2023-05-16T19:54:27Z

Results are out!

@ezyang @anijain2305
Is it ok if I merge this once this PR turns green?

lezcano · 2023-05-17T08:21:12Z

@pytorchbot merge

pytorchmergebot · 2023-05-17T08:23:32Z

Merge failed

Reason: This PR needs a label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

ysiraichi · 2023-05-17T10:43:53Z

@pytorchbot merge

pytorchmergebot · 2023-05-17T10:47:17Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

malfet · 2023-05-18T16:34:07Z

This regressed MacOS x86 tests, and in general, string comparison feels like a very fragile thing to do.
Looks like the code you've checked in is indeed correct if astunparse is installed, but if it isnt one needs different patterns, as in that case test_guard_function_builder_with_cse would generate the following:

def ___make_guard_fn():
    def guard(L):
        if not (x[0].a < x[1].a * (3 - x[2].a)):
            return False
        if not (a.b.c[0].d.e + a.b.c[1].d.e * a.b.c[2].d.e > 0):
            return False
        if not (f(m.n[0], '0').x.y.z * f(m.n[0], '1').x.y.z * f(m.n[0], '2').x.y.z < 512):
            return False
        if not (self.g(a, b).k + (1 - self.g(a, b).k) <= m[0].a + self.g(a, b).k):
            return False
        return True
    return guard

which is indential to the one one is trying to match, but without any intermediate vars

ysiraichi · 2023-05-18T16:57:04Z

Ah, right. Good catch. Thanks, @malfet .

malfet · 2023-05-18T19:07:24Z

@ysiraichi Hmm, looks like the way code is written right now, it would not really work without astunparse, should we just add it as mandatory dependency on python-3.8?

ysiraichi · 2023-05-18T19:10:29Z

How so? To me, it looks like it would work without astunparse.

malfet · 2023-05-18T19:23:14Z

@ysiraichi do you see where this call is guarded by HAS_UNPARSE_FUNCTIONS == True

pytorch/torch/_dynamo/guards.py

Line 735 in 556bb69

self._config.expr_count[_ast_unparse(node)] += 1

See following run:

/python3.8 test_misc.py -v -k test_guards_cse_pass_multiple
Fail to import hypothesis in common_utils, tests are not derandomized
test_guards_cse_pass_multiple (__main__.MiscTests) ... ERROR

======================================================================
ERROR: test_guards_cse_pass_multiple (__main__.MiscTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_misc.py", line 5307, in test_guards_cse_pass_multiple
    csepass.count([t.expr for t in testcases])
  File "/Users/nshulga/Library/Python/3.8/lib/python/site-packages/torch/_dynamo/guards.py", line 789, in count
    counter.visit(ast.parse(e))
  File "/Users/nshulga/Library/Python/3.8/lib/python/site-packages/torch/_dynamo/guards.py", line 736, in visit
    super().visit(node)
  File "/opt/homebrew/Cellar/python38/3.8.3_1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ast.py", line 363, in visit
    return visitor(node)
  File "/opt/homebrew/Cellar/python38/3.8.3_1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ast.py", line 371, in generic_visit
    self.visit(item)
  File "/Users/nshulga/Library/Python/3.8/lib/python/site-packages/torch/_dynamo/guards.py", line 736, in visit
    super().visit(node)
  File "/opt/homebrew/Cellar/python38/3.8.3_1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ast.py", line 363, in visit
    return visitor(node)
  File "/opt/homebrew/Cellar/python38/3.8.3_1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ast.py", line 373, in generic_visit
    self.visit(value)
  File "/Users/nshulga/Library/Python/3.8/lib/python/site-packages/torch/_dynamo/guards.py", line 736, in visit
    super().visit(node)
  File "/opt/homebrew/Cellar/python38/3.8.3_1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ast.py", line 363, in visit
    return visitor(node)
  File "/opt/homebrew/Cellar/python38/3.8.3_1/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ast.py", line 373, in generic_visit
    self.visit(value)
  File "/Users/nshulga/Library/Python/3.8/lib/python/site-packages/torch/_dynamo/guards.py", line 735, in visit
    self._config.expr_count[_ast_unparse(node)] += 1
NameError: name '_ast_unparse' is not defined

----------------------------------------------------------------------
Ran 1 test in 0.007s

FAILED (errors=1)

ysiraichi · 2023-05-18T19:26:10Z

The object instantiation is guarded.

pytorch/torch/_dynamo/guards.py

Line 1024 in 556bb69

if HAS_UNPARSE_FUNCTIONS:

malfet · 2023-05-18T19:33:50Z

Ok, so I'll just guard this test than...

If `astunparse` is not installed, following guard will be generated in `test_guard_function_builder_with_cse`: ```python def ___make_guard_fn(): def guard(L): if not (x[0].a < x[1].a * (3 - x[2].a)): return False if not (a.b.c[0].d.e + a.b.c[1].d.e * a.b.c[2].d.e > 0): return False if not (f(m.n[0], '0').x.y.z * f(m.n[0], '1').x.y.z * f(m.n[0], '2').x.y.z < 512): return False if not (self.g(a, b).k + (1 - self.g(a, b).k) <= m[0].a + self.g(a, b).k): return False return True return guard ``` Though, I have to say, hardcoding string comparison is pretty weird. Also, skip `test_guards_cse_pass_[single|multiple]` if AST unparsing is missing. Fixes failure in a test introduced by #98488 copilot:poem Pull Request resolved: #101805 Approved by: https://github.com/atalman, https://github.com/ysiraichi

This PR extracted the CSE part of the code in #89707. Pull Request resolved: #98488 Approved by: https://github.com/ezyang, https://github.com/jansel, https://github.com/anijain2305

Summary: pytorch#98488 implements CSE for dynamo guards, and it relies on astunparse to perform the optimization. `test_guards_cse_pass_single` was broken and later was fixed by introducing a check_and_skip_if_needed. This actually fixes the root cause on fbcode and should bring some perf gain internally. Test Plan: `buck2 test @//mode/opt //caffe2/test/dynamo:test_dynamo -- --exact 'caffe2/test/dynamo:test_dynamo - test_misc.py::DynamicShapesMiscTests::test_guards_cse_pass_single' --run-disabled` Reviewed By: malfet Differential Revision: D46126742 fbshipit-source-id: 60ca99dd075e03ba458ebc4e3250ab0f9ebfb9d7

Summary: Pull Request resolved: pytorch#102120 pytorch#98488 implements CSE for dynamo guards, and it relies on astunparse to perform the optimization. `test_guards_cse_pass_single` was broken and later was fixed by introducing a check_and_skip_if_needed. This actually fixes the root cause on fbcode and should bring some perf gain internally. Test Plan: `buck2 test @//mode/opt //caffe2/test/dynamo:test_dynamo -- --exact 'caffe2/test/dynamo:test_dynamo - test_misc.py::DynamicShapesMiscTests::test_guards_cse_pass_single' --run-disabled` Reviewed By: malfet Differential Revision: D46126742 fbshipit-source-id: 259f7c2c16bca111bc81ab6be04b0ce19ba47af9

Summary: #98488 implements CSE for dynamo guards, and it relies on astunparse to perform the optimization. `test_guards_cse_pass_single` was broken and later was fixed by introducing a check_and_skip_if_needed. This actually fixes the root cause on fbcode and should bring some perf gain internally. Test Plan: `buck2 test @//mode/opt //caffe2/test/dynamo:test_dynamo -- --exact 'caffe2/test/dynamo:test_dynamo - test_misc.py::DynamicShapesMiscTests::test_guards_cse_pass_single' --run-disabled` Reviewed By: malfet Differential Revision: D46126742 Pull Request resolved: #102120 Approved by: https://github.com/malfet

github-actions bot added ciflow/inductor module: dynamo labels Apr 6, 2023

pytorchbot added the open source label Apr 6, 2023

ysiraichi requested review from ezyang, voznesenskym and jansel and removed request for voznesenskym April 7, 2023 08:19

ezyang added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 7, 2023

ezyang reviewed Apr 7, 2023

View reviewed changes

torch/_dynamo/guards.py Outdated Show resolved Hide resolved

ezyang reviewed Apr 7, 2023

View reviewed changes

torch/_dynamo/guards.py Show resolved Hide resolved

ezyang approved these changes Apr 7, 2023

View reviewed changes

ezyang reviewed Apr 7, 2023

View reviewed changes

torch/_dynamo/guards.py Outdated Show resolved Hide resolved

ezyang requested changes Apr 7, 2023

View reviewed changes

voznesenskym reviewed Apr 7, 2023

View reviewed changes

torch/_dynamo/guards.py Show resolved Hide resolved

ysiraichi force-pushed the guards-cse branch from 81b3f03 to 55b9e6c Compare May 16, 2023 19:48

jansel approved these changes May 17, 2023

View reviewed changes

pytorchmergebot added the merging label May 17, 2023

pytorchmergebot removed the merging label May 17, 2023

ysiraichi added the release notes: dynamo label May 17, 2023

pytorchmergebot added the merging label May 17, 2023

pytorchmergebot added Merged and removed merging labels May 17, 2023

pytorchmergebot closed this in f72f011 May 17, 2023

anijain2305 mentioned this pull request May 18, 2023

[dynamo] Guard on nn.Module dicts and type #100642

Closed

malfet mentioned this pull request May 18, 2023

Fix/skip CSE tests on Python-3.8 without astunparse #101805

Closed

desertfire mentioned this pull request May 23, 2023

[dynamo] Add astunparse dependency #102120

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement CSE for dynamo guards. #98488

Implement CSE for dynamo guards. #98488

ysiraichi commented Apr 6, 2023 •

edited by pytorch-bot bot

pytorch-bot bot commented Apr 6, 2023 •

edited

ysiraichi commented Apr 6, 2023

ezyang Apr 7, 2023

ysiraichi Apr 8, 2023

ezyang commented Apr 7, 2023

ezyang commented Apr 7, 2023

ezyang left a comment

ysiraichi commented Apr 8, 2023

voznesenskym commented Apr 8, 2023

ezyang commented Apr 8, 2023

ezyang commented Apr 9, 2023

ysiraichi commented Apr 9, 2023

ysiraichi commented Apr 9, 2023 •

edited

ezyang commented Apr 9, 2023

voznesenskym commented Apr 9, 2023

voznesenskym commented Apr 10, 2023

lezcano commented Apr 10, 2023

ysiraichi commented May 16, 2023

lezcano commented May 17, 2023

pytorchmergebot commented May 17, 2023

ysiraichi commented May 17, 2023

pytorchmergebot commented May 17, 2023

malfet commented May 18, 2023 •

edited

ysiraichi commented May 18, 2023

malfet commented May 18, 2023 •

edited

ysiraichi commented May 18, 2023

malfet commented May 18, 2023 •

edited

ysiraichi commented May 18, 2023

malfet commented May 18, 2023

Implement CSE for dynamo guards. #98488

Implement CSE for dynamo guards. #98488

Conversation

ysiraichi commented Apr 6, 2023 • edited by pytorch-bot bot

pytorch-bot bot commented Apr 6, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/98488

✅ No Failures

ysiraichi commented Apr 6, 2023

ezyang Apr 7, 2023

Choose a reason for hiding this comment

ysiraichi Apr 8, 2023

Choose a reason for hiding this comment

ezyang commented Apr 7, 2023

ezyang commented Apr 7, 2023

ezyang left a comment

Choose a reason for hiding this comment

ysiraichi commented Apr 8, 2023

voznesenskym commented Apr 8, 2023

ezyang commented Apr 8, 2023

ezyang commented Apr 9, 2023

ysiraichi commented Apr 9, 2023

ysiraichi commented Apr 9, 2023 • edited

ezyang commented Apr 9, 2023

voznesenskym commented Apr 9, 2023

voznesenskym commented Apr 10, 2023

lezcano commented Apr 10, 2023

ysiraichi commented May 16, 2023

lezcano commented May 17, 2023

pytorchmergebot commented May 17, 2023

Merge failed

ysiraichi commented May 17, 2023

pytorchmergebot commented May 17, 2023

Merge started

malfet commented May 18, 2023 • edited

ysiraichi commented May 18, 2023

malfet commented May 18, 2023 • edited

ysiraichi commented May 18, 2023

malfet commented May 18, 2023 • edited

ysiraichi commented May 18, 2023

malfet commented May 18, 2023

ysiraichi commented Apr 6, 2023 •

edited by pytorch-bot bot

pytorch-bot bot commented Apr 6, 2023 •

edited

ysiraichi commented Apr 9, 2023 •

edited

malfet commented May 18, 2023 •

edited

malfet commented May 18, 2023 •

edited

malfet commented May 18, 2023 •

edited