Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-30465: Fix lineno and col_offset in fstring AST nodes #1800

Merged
merged 1 commit into from
Sep 7, 2017

Conversation

ambv
Copy link
Contributor

@ambv ambv commented May 24, 2017

f-strings are computed in a separate compiler step. This makes their lineno and col_offset information wrong. This is problematic for flake8 which reports problems inside f-strings on the wrong line (typically the first one in the file).

Attached patch fixes the issue.

https://bugs.python.org/issue30465

@markshannon
Copy link
Member

Is this intended to handle multi-line and nested f-strings?

@ambv
Copy link
Contributor Author

ambv commented May 24, 2017

I'll add more tests to show what works and what doesn't.

@ambv ambv force-pushed the fstring_lineno branch 2 times, most recently from 4c76276 to d2dc17c Compare May 25, 2017 02:27
@ambv
Copy link
Contributor Author

ambv commented May 25, 2017

@markshannon, nested f-strings work correctly, multiline f-strings partially work. Location information about expressions inside FormattedValues is correct. Location information for enclosing nodes (JoinedStr, Str, and FormattedValue itself) is invalid due to bpo-16806. This is not fstring specific. It's a multiline string bug. I'll fix that in a separate patch.

@ambv ambv requested a review from ericvsmith May 25, 2017 02:33
@ambv ambv added type-bug An unexpected behavior, bug, or error needs backport to 3.6 labels May 25, 2017
@serhiy-storchaka serhiy-storchaka self-requested a review May 25, 2017 11:03
@ambv
Copy link
Contributor Author

ambv commented May 30, 2017

@serhiy-storchaka @ericvsmith @markshannon I'd appreciate a review here, I'd like this to go into 3.6.2 and the cut off date is looming.

Python/ast.c Outdated
`n` is the node which locations are going to be fixed relative to parent.
`expr_str` is the child node's string representation, incuding braces.
*/
static void fstring_fix_node_location(const node *parent, node *n,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For conforming PEP 7 this should be:

static void
fstring_fix_node_location(...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix.

@@ -4327,14 +4335,73 @@ fstring_compile_expr(const char *expr_start, const char *expr_end,
str[len+2] = 0;

cf.cf_flags = PyCF_ONLY_AST;
mod = PyParser_ASTFromString(str, "<fstring>",
Py_eval_input, &cf, c->c_arena);
mod_n = PyParser_SimpleParseStringFlagsFilename(str, "<fstring>",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioning PyParser_ASTFromString() in the comment above no longer valid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, will fix.

while (parent && parent->n_type != STRING)
parent = parent->n_child;
if (parent && parent->n_str) {
substr = strstr(parent->n_str, expr_str);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the f-string contains several equivalent subexpressions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we'll report line number and column information for the first occurence in all occurences. This is not currently solvable without tracking already applied occurences, etc. I don't think it's worth fixing, that would complicate this patch.

If we ever redo f-strings to be untokenized properly on first pass, then this will work as expected. The hack to do a separate pass on formatted values is the reason why this entire workaround is necessary in the first place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't expr_start/expr_end give a position of a subexpression in the string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that. Unfortunately, you'd need to pass the entire string as a new argument through all the fstring_* functions to make this work. Even then, the full string doesn't have the string prefix or the leading quote. That can be fixed, but even then, the full string is different from expr_start if compile-time concatenation is used.

In other words, this edge case requires an inordinate amount of work to get right. Let's treat this as a separate improvement in the future?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's not worth complicating the implementation. Especially because the largest use case is to get error reporting working, and I don't see how the error could be on a second instance of the exact same expression.

{
n->n_col_offset = n->n_col_offset + col_offset;
for (int i = 0; i < NCH(n); ++i) {
if (n->n_lineno && n->n_lineno < CHILD(n, i)->n_lineno) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When n->n_lineno is zero?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nodes don't contain col_offset and lineno information at all. I don't recall now which ones those are but the tests were failing without this additional check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should n->n_col_offset (above) and n->n_lineno (below) be updated for these nodes? They become non-zero, but does the new value make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n->n_lineno is zero for eval_input nodes only. Interestingly, n->n_col_offset is uninitialized then (holds value -875836469 which is 0xCBCBCBCB..., one of the magic byte patterns debug PyMalloc initializes fresh memory, too). So, yes, I think setting new values is better than leaving garbage coordinates.

Copy link
Member

@ericvsmith ericvsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have anything to add beyond Serhiy's comments.

Python/ast.c Outdated
@@ -4265,6 +4266,12 @@ decode_bytes_with_escapes(struct compiling *c, const node *n, const char *s,
return result;
}

/* Forward declarations for recursive functions. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet a note. I don't see a recursion except a self-recursion in fstring_shift_node_locations(). If change the order of function definitions, forward declarations will become not needed.

@ambv
Copy link
Contributor Author

ambv commented Sep 6, 2017

I applied all suggestions by @serhiy-storchaka.

@ericvsmith, this is ready for merging. What this doesn't fix is bpo-31140 since that syntax errors are raised during compilation, not after (when the fixes implemented in this PR are applied).

Copy link
Member

@ericvsmith ericvsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the one suggestion to add a new test and the (unneeded?) forward declaration, this looks good to me. Thanks!

@@ -70,6 +70,253 @@ def __call__(self):
# Make sure x was called.
self.assertTrue(x.called)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about some test that checks the output of eval()? This code doesn't work as-is, because the NameError doesn't include the line number. But I assume there's a way to capture it and inspect it.

I suggest this because this is the behavior that the end user really cares about.

    def test_error_line_numbers(self):
        expr = '''f"""
some text
other text
{x}
"""'''
        with self.assertRaisesRegex(NameError, ', line 5,'):
            eval(expr)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, never mind. I was conflating this with bpo-31140.

while (parent && parent->n_type != STRING)
parent = parent->n_child;
if (parent && parent->n_str) {
substr = strstr(parent->n_str, expr_str);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's not worth complicating the implementation. Especially because the largest use case is to get error reporting working, and I don't see how the error could be on a second instance of the exact same expression.

Python/ast.c Outdated
@@ -4270,6 +4271,58 @@ decode_bytes_with_escapes(struct compiling *c, const node *n, const char *s,
return result;
}

/* Forward declaration for a recursive function. */
static void fstring_shift_node_locations(node *n, int lineno, int col_offset);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line isn't needed, is it? You define the function immediately after the forward declaration.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I didn't expect the Spanish Inquisition!. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@ambv
Copy link
Contributor Author

ambv commented Sep 6, 2017

Alright, unnecessary forward declaration removed.

@ericvsmith ericvsmith merged commit e7c566c into python:master Sep 7, 2017
@miss-islington
Copy link
Contributor

🐍🍒⛏🤖 Thanks @ambv for the PR, and @ericvsmith for merging it 🌮🎉.I'm working now to backport this PR to: 3.6.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 7, 2017
…onGH-1800)

For f-string ast nodes, fix the line and columns so that tools such as flake8 can identify them correctly.
(cherry picked from commit e7c566c)
@bedevere-bot
Copy link

GH-3409 is a backport of this pull request to the 3.6 branch.

ericvsmith pushed a commit that referenced this pull request Sep 7, 2017
…h-3409)

For f-string ast nodes, fix the line and columns so that tools such as flake8 can identify them correctly.
(cherry picked from commit e7c566c)
@ambv ambv deleted the fstring_lineno branch July 12, 2021 11:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants