-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doc strings no longer stored in body of AST #77092
Comments
Python 3.7.0b1+ (heads/3.7:dfa1144, Feb 22 2018, 12:10:59) >>> m = ast.parse("def foo():\n 'help'")
>>> m.body[0].body
[] Correct behaviour (3.6 or earlier) >>> m = ast.parse("def foo():\n 'help'")
>>> m.body[0].body
[<_ast.Expr object at 0x7fb9eeb1d4e0>] |
AST is changed slightly from Python 3.7. $ ./python
Python 3.8.0a0 (heads/master-dirty:451d1edaf4, Feb 22 2018, 21:11:54)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ast
>>> m = ast.parse("def foo():\n 'help'")
>>> ast.get_docstring(m.body[0])
'help' |
This is an unnecessary and breaking change. |
That issue has to do with bytecode generation, not the AST. The AST should be an accurate representation of the Python source code. Doc-strings may be semantically distinct from other expressions, but they are syntactically the same. |
We implemented AST-level constant folding (bpo-29469) based on the AST change (bpo-29463). I feel it's possible to revert AST change without reverting AST constant folding:
But I don't remember how difficult it is because since this change is made one year ago (GH-46)... |
This isn't a bug, but a feature. You no longer need to check and skip the first statement if it is a literal string. The body attribute now always represents a sequence of statements, and the docstring attribute represents a docstring. |
Stating that "this is a feature not a bug" does not make it so. From a syntactic point of view. |
It seems we have a difference of opinion here. Serhiy closed this issue so, Mark, if you feel strongly enough to pursue it, you should reopen it and solicit other opinions. The clock has just about run out to change the now current behavior for 3.7.0. |
It may be worth to include docstrings explicitly in the grammar. |
Serhiy, thanks for reopening this issue. It seems to be that there are three reasonable choices:
I would prefer 1, as it requires no changes to 3rd party code and doesn't present an additional obstacle when porting from Python 2. 2 would be acceptable, as it allows tools to easily convert the body back to its 3.6 form (or vice-versa) 3 is a pain as it involves re-tokenizing the file to get the location of the doc-string. |
I prefer 2 or 3. There are benefits from representing a docstring as a separate attribute. This simplifies the code for walking the AST tree (no longer special cases for skipping docstring in modules, classes and functions) and the code for retrieving a docstring. It solves the problem that an expression resulting to a constant string (like "a"+"b" or f"a") shouldn't be interpreted as docstrings. The position of a docstring can be useful for determining the position of fragments inside a docstring (for example for doctests). Several active developed third-party libraries (like pyflakes, see PyCQA/pyflakes#273) already updated their code for supporting 3.7. The position of nodes preceding or following a docstring could be used. This is not perfect, it doesn't work with empty lines before or after docstring, but it never was perfect due to escaped newlines in string literals and line continuations. |
Note, that getting the location of the dostring already was a pain since in case of multiline string literals CPython saves the location of the last line, and PyPy saves the location of the first line. Getting the location of the specific fragment of the docstring is even larger pain as shown in following examples: def f()
"""
>>> f()
"""
def f()
"""\
>>> f()
"""
def f()
""\
""\
">>> f()"\
""\
"" |
Is this going to get resolved in time for 3.7.0b2? If we need to change this behavior, I really do not want to delay this for b3 which is when the 3.7.0 ABI freezes; otherwise such a major behavior change would likely need to wait until 3.8. |
I tried this patch: diff --git a/Python/ast.c b/Python/ast.c
index e2092f0f85..93be2bc839 100644
--- a/Python/ast.c
+++ b/Python/ast.c
@@ -3537,9 +3537,9 @@ docstring_from_stmts(asdl_seq *stmts)
if (s->kind == Expr_kind && s->v.Expr.value->kind == Str_kind) {
string doc = s->v.Expr.value->v.Str.s;
/* not very efficient, but simple */
- memmove(&asdl_seq_GET(stmts, 0), &asdl_seq_GET(stmts, 1),
- (stmts->size - 1) * sizeof(void*));
- stmts->size--;
+ //memmove(&asdl_seq_GET(stmts, 0), &asdl_seq_GET(stmts, 1),
+ // (stmts->size - 1) * sizeof(void*));
+ //stmts->size--;
return doc;
}
} But I got "SyntaxError: from __future__ imports must occur at the beginning of the file". docstring is very special while it looks like just a single string literal statement... |
I'm implementing (2). Please check #50177. |
Would be nice to discuss this design question on Python-list. |
Python-list? -dev? -ideas? |
Sorry, Python-Dev, of course. |
Since we are already past the 3.7.0b2 cutoff time and there does not seen to be a consensus that the current 3.7 behavior needs to change and the proposed change is quite large, I think we should not change anything now for b2. You can have a discussion on python-dev and, if there is a consensus for change, we can look at it for b3. |
If Python 3.7 is released with current AST form, I don't want to change it again. I prefer
|
Sorry but we are not going to revisit this decision for 3.7.0. It's too late! |
IMHO Python 3.7 will be simpler to process, since it avoids to check if the first node is a docstring or not. I agree that it's a backward incompatible, but it has been done on purpose. In fact, there is no warranty about backward compatibility on the genreated AST between Python versions (like 3.6 vs 3.7). As Lukasz wrote, AST of Python 3.6 already omit a lot of information about formatting, we are far from a 1:1 mapping between code and AST. For example, tools to "unparse" AST breaks the formatting. The Python 'ast' module is just not appropriate for such use case. There are other solutions for that, like lib2to3. Having a docstring attribute *and* a docstring node seem to be redundant to be, or more like a bug. Redundant data can quickly lead to inconsistencies. The current trend is more to move the AST away from the original Python source code, to produce better AST, especially in term of performance. That's also why I added the ast.Constant node type ;-) And why many peephole optimizations have been reimplemented at the AST level, rather than on the bytecode level. |
Regardless of the value of .docstring change, it seems it had some unexpected consequences. I think providing a new mode for compile mulitline statements is a reasonable way to address those consequences going forward. That said, it's extremely late in the lifecycle of 3.7 to be thinking about new APIs. How about rolling back the whole .docstring change for 3.7? Then, it can be landed again with compile(..., mode='multiline) for 3.8. |
For the record, Serhiy commented on this issue today in a thread on the python-committers list: "I have doubts about two issues. I feel the responsibility for them because I had the opportunity to solve them before, but I lost it.
if a:
b
if c:
d You need to add an empty line between top-level complex statements. If one time CPython will add support of pasting several statements without empty lines between, it might need to add the same hack as IPython. I afraid that we might be needed to change AST again, in 3.7.1 or in 3.8.0." And later: "Inada's patch looked complex (actually it mostly restored the code before his previous change). We didn't know about IPython and we decided that it is not worth to change this code at this stage (after beta2). And definitely it will be later to do this after rc1." Full context/content starting here: |
Based on more recent discussions and indirect feedback from downstream users (primarily the recent IPython experience), I am reluctantly re-opening this issue for 3.7.0. I take responsibility for encouraging us earlier in the beta phase to continue with the feature as it stood, thereby prioritizing schedule over any technical issues. Although it is now very late in the 3.7.0 cycle, with the newly expressed concerns my opinion has changed: I think we owe it to our downstream users to think about this one more time. I agree with Benjamin's remark that it is very late in the 3.7 cycle to be considering new or revised APIs for 3.7.0. So it seems to me we have only two practical alternatives at this point: A. Proceed with 3.7.0rc1 as planned with the docstrings feature as it now stands and consider API changes for 3.8. or B. Revert the feature now for 3.7.0, retargeting for 3.8, and produce a 3.7.0b5 on a somewhat shorter cycle to allow downstream users to adapt to the removal. I don't know how controversial making this decision will be but I think we need to move quickly as this is now *the* blocker for 3.7.0. Coming now at the start of a weekend, in some countries a 3-day holiday weekend, I do want to make sure that the major stakeholders involved with this issue get a chance to "vote", although ultimately it will be up to the release manager (me) to make the final decision. So, to be clear, we will not be deciding here and now what the API should be if we were to retarget for 3.8; please save that discussion for later. The only question open right now is a vote for either option A (proceed as is) or option B (revert for 3.7 and retarget). Let's set a limit for "voting" until Tuesday 2018-05-29 18:00 UTC (to cover the holiday weekend) with the proviso that if a clear consensus one way or the other appears before that we may cut the period short and proceed to implemention. I know this is not a pleasant task especially at this late date. I apologize to everyone, especially to Inada-san, for dragging this out. I certainly appreciate the hard work that has gone into this feature so far and look forward to seeing it in either 3.7 or 3.8. |
In the A/B vote, I cast mine for B, for what it is worth, but it is not strongly held. From the IPython side, I don't view our particular issue as a major regression for users. The only affected case for us is interactively typed string literals in single statement cells not displaying themselves as results. Since the same string is necessarily already displayed in the input, this isn't a huge deal. This is pretty rare (maybe folks do this while investigating unicode issues?) and we can handle it by recompiling empty modules with 'single' instead of the usual 'exec' that we use because most IPython inputs are multi-statement cells coming from things like notebooks. It's relevant to note that any logic in the cell, e.g. The proposed 'muliline' or 'interactive' compile mode would suit IPython very well, since that's what we really want - single * N, not actually a module, and this is illustrated by the way we do execution: compile with exec, then iterate through module.body and run the nodes one at a time. |
Ouch, I'd completely missed the fact that this would affect the parsing of bare strings to a simple AST (I was focused on functions and classes, as in Mark's original example). So even though I'm the author of https://bugs.python.org/issue11549#msg193656 (where I note that we consider it reasonable for AST manipulation code to require updates when going between major Python versions), I'm reluctantly voting "B", since there's a difference between "some AST manipulation code will need to change to account for new node types and arrangements" and "all code calling ast.parse with the default mode and processing the top level node will need to change to account for docstrings now being omitted from the module body, with no readily available quick fix to get the old behaviour back". (Note that in voting for option B, I'm really only objecting to the change when it comes to Module AST nodes - rather than full reversion, I'd also be OK with a change that duplicated the new docstring attribute as body[0] for modules, while continuing to eliminate the redundancy for functions and classes - this would be a more selective variant of Mark's "Option 1" proposal from back in February). |
Łukasz Langa wrote:
That is what I proposed in https://bugs.python.org/issue33477 The other surprise, is that even is I knew AST were change and had a docstring field I was not expecting I think that the change for ast to have a docstring field is a good one from an API point of view for ast nodes. But It should probably be opt-in, at least with a deprecation period to make it the default. The other issue is that as it affect only sequence of statement where the first node is (was) a string, it can lead to really subtle difference in execution. MinRk said:
It might affect our downstream consumers that have syntax/ast transformation like sage, sympy on top of IPython. I doubt we have any but it's a possibility. While I would prefer (B) as well, I see how the new behavior can be useful in some case, and strongly also support the addition of a |
Please note that it can't be reverted simply. There are two ways to revert the change: B1. Just remove B2. Remove In case of B2, thirdparty libraries should follow the change anyway. In case of B1, I hadn't tried it yet and I don't know how difficult it is. I'll try but I can't say estimated time for now. |
Just to quickly touch on Matthias' question about opt-in or deprecations, a key thing to note is the ast module is automatically generated from the ASDL file, so either of those approaches would require developing a new mechanism in the code generator to support either approach. |
Yes, I was mostly thinking of Thinking a bit more about the I'm thinking that splitting
this could allow both to evolve and have their individual advantage, leaving |
Just noting that I'm going to attempt an alternative, hopefully lower impact, approach at implementing the reversion, which will be to modify Parser/asdl_c.py to inject the following constructor postamble for ASDL nodes that define both a "body" and a "docstring" field:
In 3.8, we'd then decide how to eliminate the duplication of information between the two locations, but do so armed with the ability to emit deprecation warnings for code which isn't setting the node docstring attribute explicitly. |
After looking at potential implementation approaches, I believe the following would be the ASDL code generator changes needed to present the docstring as part of the body in the Python API, while keeping it as a separate attribute in the C API:
Since CPython's own AST optimizer works at the C API layer, we then wouldn't need to revert any of the changes that relied on the separation of the docstring out to its own attribute. At the Python level, some differences would still be visible though:
Given the complexity of the required changes to the ASDL code generator, and the Python level API differences that would still remain, I think Serhiy's full reversion patch is going to be the more reliable option at this late stage of the release process. While it isn't nice for the folks that already adapted their code to cope with the earlier beta releases, it's the lowest risk approach that lets us get 3.7.0 out the door without unintended consequences, while allowing the issue to be revisited for 3.8 with greater awareness of the potential backwards compatibility and code migration issues. |
I'm +1 on PR-7121 too. |
OK, since I believe everyone who has spoken up so far has chosen B or a variation on it, I think we can eliminate option A. And there also seems to be a consensus so far among the core developers who have spoken up for the approach in PR 7121. Before we commit to it and produce 3.7.0b5, I really would like to hear from at least one of the downsteamers that this approach seems OK for 3.7.0. mbussonn? MIN RK? Thank you all for your (prompt) help so far with this! |
That should work well for us. Our patches for this are all conditional on the module body being empty, so reverting causes us no issues at all. Thank you! |
Thank you all for your input and thank you, Inada-san and Serhiy, for the PRs. Is there anything more that needs to go in for 3.7.0b5 tomorrow so that this can be tested before the release candidate? If not, can we now close this again? If new issues arise with b5 or later, we should probably track with another issue. |
I think it's OK. |
I concur. |
In that case, closing this as resolved. Folks interested in the idea of a "suite" compilation mode to compile a series of statements without special-casing the first one as a potential docstring, as well as the idea of a dedicated DocString AST node may want to open new RFEs for 3.8 :) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: