-
-
Notifications
You must be signed in to change notification settings - Fork 29.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-30465: Fix lineno and col_offset in fstring AST nodes #1800
Conversation
Is this intended to handle multi-line and nested f-strings? |
I'll add more tests to show what works and what doesn't. |
4c76276
to
d2dc17c
Compare
@markshannon, nested f-strings work correctly, multiline f-strings partially work. Location information about expressions inside FormattedValues is correct. Location information for enclosing nodes (JoinedStr, Str, and FormattedValue itself) is invalid due to bpo-16806. This is not fstring specific. It's a multiline string bug. I'll fix that in a separate patch. |
@serhiy-storchaka @ericvsmith @markshannon I'd appreciate a review here, I'd like this to go into 3.6.2 and the cut off date is looming. |
Python/ast.c
Outdated
`n` is the node which locations are going to be fixed relative to parent. | ||
`expr_str` is the child node's string representation, incuding braces. | ||
*/ | ||
static void fstring_fix_node_location(const node *parent, node *n, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For conforming PEP 7 this should be:
static void
fstring_fix_node_location(...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix.
@@ -4327,14 +4335,73 @@ fstring_compile_expr(const char *expr_start, const char *expr_end, | |||
str[len+2] = 0; | |||
|
|||
cf.cf_flags = PyCF_ONLY_AST; | |||
mod = PyParser_ASTFromString(str, "<fstring>", | |||
Py_eval_input, &cf, c->c_arena); | |||
mod_n = PyParser_SimpleParseStringFlagsFilename(str, "<fstring>", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mentioning PyParser_ASTFromString()
in the comment above no longer valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, will fix.
while (parent && parent->n_type != STRING) | ||
parent = parent->n_child; | ||
if (parent && parent->n_str) { | ||
substr = strstr(parent->n_str, expr_str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the f-string contains several equivalent subexpressions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we'll report line number and column information for the first occurence in all occurences. This is not currently solvable without tracking already applied occurences, etc. I don't think it's worth fixing, that would complicate this patch.
If we ever redo f-strings to be untokenized properly on first pass, then this will work as expected. The hack to do a separate pass on formatted values is the reason why this entire workaround is necessary in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't expr_start
/expr_end
give a position of a subexpression in the string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that. Unfortunately, you'd need to pass the entire string as a new argument through all the fstring_*
functions to make this work. Even then, the full string doesn't have the string prefix or the leading quote. That can be fixed, but even then, the full string is different from expr_start if compile-time concatenation is used.
In other words, this edge case requires an inordinate amount of work to get right. Let's treat this as a separate improvement in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's not worth complicating the implementation. Especially because the largest use case is to get error reporting working, and I don't see how the error could be on a second instance of the exact same expression.
{ | ||
n->n_col_offset = n->n_col_offset + col_offset; | ||
for (int i = 0; i < NCH(n); ++i) { | ||
if (n->n_lineno && n->n_lineno < CHILD(n, i)->n_lineno) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When n->n_lineno
is zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nodes don't contain col_offset and lineno information at all. I don't recall now which ones those are but the tests were failing without this additional check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should n->n_col_offset
(above) and n->n_lineno
(below) be updated for these nodes? They become non-zero, but does the new value make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n->n_lineno
is zero for eval_input
nodes only. Interestingly, n->n_col_offset
is uninitialized then (holds value -875836469 which is 0xCBCBCBCB..., one of the magic byte patterns debug PyMalloc initializes fresh memory, too). So, yes, I think setting new values is better than leaving garbage coordinates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have anything to add beyond Serhiy's comments.
Python/ast.c
Outdated
@@ -4265,6 +4266,12 @@ decode_bytes_with_escapes(struct compiling *c, const node *n, const char *s, | |||
return result; | |||
} | |||
|
|||
/* Forward declarations for recursive functions. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet a note. I don't see a recursion except a self-recursion in fstring_shift_node_locations()
. If change the order of function definitions, forward declarations will become not needed.
a50756c
to
e78be7e
Compare
I applied all suggestions by @serhiy-storchaka. @ericvsmith, this is ready for merging. What this doesn't fix is bpo-31140 since that syntax errors are raised during compilation, not after (when the fixes implemented in this PR are applied). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the one suggestion to add a new test and the (unneeded?) forward declaration, this looks good to me. Thanks!
@@ -70,6 +70,253 @@ def __call__(self): | |||
# Make sure x was called. | |||
self.assertTrue(x.called) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about some test that checks the output of eval()? This code doesn't work as-is, because the NameError doesn't include the line number. But I assume there's a way to capture it and inspect it.
I suggest this because this is the behavior that the end user really cares about.
def test_error_line_numbers(self):
expr = '''f"""
some text
other text
{x}
"""'''
with self.assertRaisesRegex(NameError, ', line 5,'):
eval(expr)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, never mind. I was conflating this with bpo-31140.
while (parent && parent->n_type != STRING) | ||
parent = parent->n_child; | ||
if (parent && parent->n_str) { | ||
substr = strstr(parent->n_str, expr_str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's not worth complicating the implementation. Especially because the largest use case is to get error reporting working, and I don't see how the error could be on a second instance of the exact same expression.
Python/ast.c
Outdated
@@ -4270,6 +4271,58 @@ decode_bytes_with_escapes(struct compiling *c, const node *n, const char *s, | |||
return result; | |||
} | |||
|
|||
/* Forward declaration for a recursive function. */ | |||
static void fstring_shift_node_locations(node *n, int lineno, int col_offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line isn't needed, is it? You define the function immediately after the forward declaration.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Alright, unnecessary forward declaration removed. |
🐍🍒⛏🤖 Thanks @ambv for the PR, and @ericvsmith for merging it 🌮🎉.I'm working now to backport this PR to: 3.6. |
…onGH-1800) For f-string ast nodes, fix the line and columns so that tools such as flake8 can identify them correctly. (cherry picked from commit e7c566c)
GH-3409 is a backport of this pull request to the 3.6 branch. |
f-strings are computed in a separate compiler step. This makes their lineno and col_offset information wrong. This is problematic for flake8 which reports problems inside f-strings on the wrong line (typically the first one in the file).
Attached patch fixes the issue.
https://bugs.python.org/issue30465