-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build-out an AST optimizer, moving some functionality out of the peephole optimizer #55758
Comments
As pointed out by Raymond, constant folding should be done on AST rather than on generated bytecode. Here's a patch to do that. It's rather long, so overview first. The patch replaces existing peephole pass with a folding pass on AST and a few changes in compiler. Feature-wise it should be on par with old peepholer, applying optimizations more consistently and thoroughly, but maybe missing some others. It passes all peepholer tests (though they seem not very extensive) and 'make test', but more testing is, of course, needed. I've split it in 5 pieces for easier reviewing, but these are not 5 separate patches, they should all be applied together. I can upload it somewhere for review or split it in other ways, let me know. Also, patches are versus 1e00b161f5f5, I will redo against head. TOC:
In detail:
def foo():
"doc" + "string" Without optimizations foo doesn't have a docstring. After folding, however, the first statement in foo is a string literal. This means that docstring depends on the level of optimizations. Making it an attribute avoids the problem.
4, 5. No big surprises here. |
I'm confused. Why aren't there review links? |
Because I don't know how to make them. Any pointers? |
Martin is hacking on the tool these days... So it's no surprise it |
Thanks. Review link: http://codereview.appspot.com/4281051 |
The review links didn't come up automatically because 336137a359ae isn't a hg.python.org/cpython revision ID. |
I see. Should I attach diffs vs. some revision from hg.python.org? |
No need, since you manually created a review on appspot. The local Reitveld links are just a convenience that can avoid the need to manually create a review instance. |
Any comments on the code so far or suggestions on how we should move forward? |
I've been focusing on softer targets during the sprints - I'll dig into this once I'm back home and using my desktop machine again. |
I've updated patches on Rietveld with some small changes. This includes better code generation for boolops used outside of conditions and cleaned up optimize_jumps(). This is probably the last change before I get some feedback. |
Just for fun I've run pystones. W/o my changes it averages to about 70k, with my changes to about 72.5k. |
A couple of somewhat related issues: |
AFAICT my patch has everything that bpo-1346238 has, except BoolOps, which can be easily added (there's a TODO). I don't want to add any new code, though, until the current patch will get reviewed -- adding code will only make reviewing harder. bpo-10399 looks interesting, I will take a look. |
Is anyone looking or planing to look at the patch? |
I suspect someone will sometime. There is bit of a backlog of older issues. |
Finally got around to reviewing this (just a visual scan at this stage) - thanks for the effort. These are mostly "big picture" type comments, so I'm keeping them here rather than burying them amongst all the details in the code review tool. The effect that collapsing Num/Str/Bytes into a single Lit node type has on ast.literal_eval bothered me initially, but looking more closely, I think those changes will actually improve the function (string concatenation will now work, and errors like "'hello' - 'world'" should give a more informative TypeError). (Bikeshed: We use Attribute rather than Attr for that node type, perhaps the full "Literal" name would be better, too) Lib/test/disutil.py should really be made a feature of the dis module itself, by creating an inner disassembly function that returns a string, then making the existing "dis" and "disassembly" functions print that string (i.e. similar to what I already did in making dis.show_code() a thin wrapper around the new dis.code_info() function in 3.2). In the absence of a better name, "dis_to_str" would do. Since the disassembly is interpreter specific, the new disassembly tests really shouldn't go directly in test_compile.py. A separate "test_ast_optimiser" file would be easier for alternate implementations to skip over. A less fragile testing strategy may also be to use the ast.PyCF_ONLY_AST flag and check the generated AST rather than the generated bytecode. I'd like to see a written explanation for the first few changes in test_peepholer.py. Are those cases no longer optimised? Are they optimised differently? Why did these test cases have to change? (The later changes in that file look OK, since they seem to just be updating tests to handle the more comprehensive optimisation) When you get around to rebasing the patch on 3.3 trunk, don't forget to drop any unneeded "from __future__" imports. The generated code for the Lit node type looks wrong: it sets v to Py_None, then immediately checks to see if v is NULL again. Don't use "string" as a C type - use "char *" (and "char **" instead of "string *"). There should be a new compiler flag to skip the AST optimisation step. A bunch of the compiler internal changes seem to make the basic flow of the generated assembly not match the incoming source code. It doesn't seem like a good idea to conflate these with the AST optimisation patch. If that means leaving the peepholer in to handle them for now, that's OK - it's fine to just descope the peepholer without removing it entirely. |
I think the biggest thing to take out of my review is that I strongly encourage deferring the changes for 5(b) and 5(c). I like the basic idea of using a template-based approach to try to get rid of a lot of the boilerplate code currently needed for AST visitors. Providing a hook for optimisation in Python (as Dave Malcolm's approach does) is valuable as well, but I don't think the two ideas need to be mutually exclusive. As a more general policy question... where do we stand in regards to backwards compatibility of the AST? The ast module docs don't have any caveats to say that it may change between versions, but it obviously *can* change due to new language constructs (if nothing else). |
Yes, but can existing constructs produce a different structure from one |
I would provide this via another compile flag a la PyCF_ONLY_AST. If you give only this flag, you get the original AST. If you give (e.g.) |
Thanks.
Also, can I get your opinion on making None/True/False into literals early on and getting rid of forbidden_name? Antoine, Georg -- I think Nick's question is not about AST changing after optimizations (this can indeed be a separate flag), but the structure of AST changing. E.g. collapsing of Num/Str/Bytes into Lit. Btw, if this is acceptable I'd make a couple more changes to make scope structure obvious from AST. This will allow auto-generating much of the symtable pass. |
|
I don't think it can: >>> class X:
... def __eq__(self, other):
... return True
... def __ne__(self, other):
... return True
...
>>> x = X()
>>>
>>> not x == 2
False
>>> x != 2
True
>>> |
|
Also, to avoid any confusion -- currently my patch only runs AST optimizations before code generation, so compile() with ast.PyCF_ONLY_AST returns non-optimized AST. |
While I would not be happy to use class X above, the 3.2 manual explicitly says "There are no implied relationships among the comparison operators. The truth of x==y does not imply that x!=y is false. " . |
OK, I missed the fact that the new optimisation pass isn't run under PyCF_ONLY_AST. However, as Eugene picked up, my concern is actually with the collapsing of Str/Num/Bytes/Ellipsis into the new Lit node, and the changes to the way None/True/False are handled. They're all changes that *make sense*, but would also likely cause a variety of AST manipulations to break. We definitely don't care when bytecode hacks break, but providing the ast module means that AST manipulation is officially supported. However, the reason I bring up new constructs, is the fact that new constructs may break AST manipulation passes, even if the old structures are left intact - the AST visitor may miss (or misinterpret) things because it doesn't understand the meaning of the new nodes. We may need to take this one back to python-dev (and get input from the other implementations as well). It's a fairly fundamental question when it comes to the structure of any changes. |
If we have to preserve backward compatibility of Python AST API, we can do this relatively easily (at the expense of some code complexity):
Alternative implementation is to leave Num/Str/etc classes in C as well, and write visitors (similar to folding one) to convert AST between old and new forms. Does this sounds reasonable? Should this be posted to python-dev? Should I write a PEP (I'd need some help with this)? Are there any other big issues preventing this to be merged? |
Eugene, I think you're doing great work here and would like to see you succeed. In the near term, I don't have time to participate, but don't let that stop you. |
Fairly sure it's 5 years old. |
Yes, the patch is outdated, conflicts with current code (and would conflict even more after pushing the wordcode patch) and contains bugs. But it moved in right direction, I think your _PyCode_ConstantKey() could help to fix bugs. I'm going to revive this issue. |
Serhiy: Nice! Yes, _PyCode_ConstantKey solved the problem. But bpo-16619 went in the opposite direction of this patch, and introduced a new type of literal node instead of unifying the existing ones. Kind of a shame, since *this* patch, I believe, both fixes that bug and removes the unreachable code in the example :) I also see that Victor has been doing some of the same work, e.g. bpo-26146. |
ast.Constant idea directly comes from your work. The implementatiln may ve It's a first step for AST optimizers. |
@Haypo, how do you think about ast.Lit and ast.Constant? |
I already made two changes in Python 3.6 :-) I added ast.Constant to Python 3.6. While it's not used by .py=>AST compiler, it is understood by the AST->bytecode compiler (ex: handle correctly special cases like docstrings). Moreover, I also modified the format of the co_lnotab structure to support negative line number delta. So a code transformer can move instructions and preserve the Python traceback feature. |
I would like to point out that the changes in At least this library would have a serious risk of remote DoS :
Sorry for the noise if this is a useless/incorrect consideration. |
Hugo Geoffroy added the comment:
Since the Python compiler doesn't produce ast.Constant, there is no
I tried hard to implement a sandbox in Python and I failed: I don't think that literal_eval() is safe *by design*. |
Good point Hugo. Yes, this should be taken into account when move constant folding to AST level. Thank you for the reminder. |
Currently there is no a bug in ast.literal_eval() because the '**' operator is not accepted. >>> ast.literal_eval("2**2**32")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/serhiy/py/cpython/Lib/ast.py", line 85, in literal_eval
return _convert(node_or_string)
File "/home/serhiy/py/cpython/Lib/ast.py", line 84, in _convert
raise ValueError('malformed node or string: ' + repr(node))
ValueError: malformed node or string: <_ast.BinOp object at 0xb6f2fa4c> But if move the optimization to AST level this can add a vulnerability to DOS attack. The optimizer should do additional checks first than execute operators that can return too large value or take too much CPU time. Currently this vulnerability have place in the peephole optimizer. |
The doc says "This can be used for safely evaluating strings containing Python values from untrusted sources without the need to parse the values oneself. It is not capable of evaluating arbitrarily complex expressions, for example involving operators or indexing." I don't think that it's a bug, but a deliberate design choice. a**b is an obvious trick to DoS a server (high CPU and memory usage). |
Hugo, Serhiy, and Victor: I think you're all agreeing with each other, but to make sure I'm understanding the point correctly:
However, that's not exactly how this patch works: if you pass "PyCF_ONLY_AST" as ast.parse does, it *doesn't* run the constant-folding step. Instead, the constant folding is run as an AST-to-AST transform during the AST-to-bytecode compilation step, *not* the initial source-to-AST step. (see http://bugs.python.org/issue11549#msg132361 ) This has a few nice properties:
|
I'm working on updating this part. There are some failing tests remains.
We have already Constant and NameConstant. So it seems there are no need for I think converting Num, Str, Bytes, Ellipsis into Constant in folding stage
Take docstring before constant folding isn't enough?
They are all NameConstant already. |
With the current code there can be a simple fix. If original string literals are Str, but constant-folded string constants are Constant, only treat Strs as docstrings. Then the optimizer needs a change to always preserve Str as a first statement in a function/module, and that's it.
|
Then, may I remove ast.Lit, and use Constant and NameConstant?
I think backward compatibility is not guaranteed. So I think we should make change small as possible.
OK.
I know. I want to move this patch forward, but I'm not frontend (parser, AST, and compiler) expert. Then, may I update the patch in following direction?
|
If you would like to implement constant folding at the AST level, I suggest The tricky part is to avoid operations when we know that it will raise an I would prefer to implement an AST optimizer in Python, but converting C By the way, an idea would be to skip all optimizations in some cases like |
Before trying advanced optimizations, I want move suspended obvious optimizations forwards. For example, removing unused constants is suspended because constant folding After that, I'm thinking about shrinking stacksize. frame_dealloc (scans whole stack) is one of hot functions. |
Dropping ast.Lit is fine. As for the docstring part, I'm torn. Yes it's nice as that will show up semantically in the Python code, but it's also easy to check for by just looking if the first statement is a Str (or Constant if that's what we make all strings). So I'll say I'm +0 on the docstring part. |
At the AST level, you have a wide range of possible optimizations. See the optimizations that I implemented in fatoptimizer (FAT Python) to have an idea: FAT Python adds guards checked at runtime, something not possible (not wanted) here. But if you start with constant folding, why not implementing constant propagation as well? What about loop unrolling? Where is the limit? If you implement the AST optimizer in C, the limit will probably be your C skills and your motivation :-) In Python, the limit is more the Python semantics which is... hum... not well defined. For example, does it break the Python semantics to replace [i for i in (1, 2, 3)] with [1, 2, 3]? What if you use a debugger? Do yo expect a list comprehension or a literal list? FYI I suspended my work on FAT Python because almost no other core developer was interested. I didn't get any support, whereas I need support to push core FAT Python features like function specialization and runtime checks (PEP-510, see also PEP-511). Moreover, I failed to show any significant speedup on non-trivial functions. I abandoned before investigating function inlining, even if FAT Python already has a basic support for function inlining. This issue is open since 2011. The question is always the same: is it worth it? An alternative is to experiment an AST optimizer outside CPython and come back later with more data to drive the design of such optimizer. With FAT Python, I chose to add hooks in the Python compiler, but different people told me that it's possible to do that without such hook but importlib (importlib hooks). What do you think Naoki? |
Yes, doing optimizations on AST in CPython is unlikely to give any sizable speed improvements in real world programs. Python as a language is not suited for static optimization, and even if you manage to inline a function, there's still CPython's interpreted overhead and boxed types that dwarf the effect of the optimization. The goal of this patch was never to significantly improve the speed. It was to replace the existing bytecode peephole pass with cleaner and simpler code, which also happens to produce slightly better results. |
My motivation is improve speed, reduce memory usage, and quicker But this time, my motivation is I felt "everyone think constant folding As reading bpo-28813, I think there are consensus about constant folding |
INADA Naoki added the comment:
Ah, if the motivation is performance, I would like to see benchmark
I noticed an issue with the peephole optimizer: the constant folding
You mean faster import time on precompiled .pyc files, right? It's
See http://fatoptimizer.readthedocs.io/en/latest/microbenchmarks.html#microbench FYI it took me something like 2 months to build FAT Python |
I've tried to update ast_opt.c[t] without changing AST. This patch adds only docstring to AST. |
Naoki: Can you please open a new issue for your ast-docstring.patch change? I like it, but this issue became too big, and I'm not sure that everyone in the nosy list is interested by this specific change. |
I created the issue bpo-29471: "AST: add an attribute to FunctionDef to distinguish functions from generators and coroutines". |
I think this particular issue can be closed in now. After it was written, the AST optimizer has been built-out and some peephole logic has moved. Other adjustments and improvements can be continue to be done in the other open issues. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: