Group line and column metadata needs special handling #107

lysnikolaou · 2019-12-10T18:13:11Z

When writing the action for the group rule, I found out that parsing something like (yield a) returns the following AST object:

Module(
    body=[
        Expr(
            value=Yield(
                value=Name(
                    id="a", ctx=Load(), lineno=1, col_offset=7, end_lineno=1, end_col_offset=8
                ),
                lineno=1,
                col_offset=1,
                end_lineno=1,
                end_col_offset=8,
            ),
            lineno=1,
            col_offset=0,
            end_lineno=1,
            end_col_offset=9,
        )
    ],
    type_ignores=[],
)

That means that the Expr node's start column offset is 0, although the yield node's is 1. The correct action for group is thus:

group[expr_ty]: '(' a=(yield_expr | full_expression) ')' { a }

but the surrounding Expr node needs to have a different column offset and I can't seem to come up with a way of handling this, so any ideas are welcome!

The text was updated successfully, but these errors were encountered:

gvanrossum · 2019-12-10T22:16:26Z

Hm, so Expr is the node type for an expression statement. I suspect the problem is not specific to group but occurs for dict and set as well, and perhaps others. But can it happen for interior nodes?

I guess this is another reason why I think we may have to come up with a way to gather line/column information out of band, rather than passing it to the node constructors using EXTRA().

(PS. I reformatted your code so I could read it. The easiest way to do it that I've found is to write it to a file and run black over it... I'm weaponizing this as we speak. :-)

lysnikolaou · 2019-12-11T16:41:56Z

I suspect the problem is not specific to group but occurs for dict and set as well, and perhaps others.

Having refactored all of the atom rules(list, tuple, dict, comprehensions, set and so on) it is actually specific only to group, because it's the only one where the metadata of the inner node is different to that of the outer statement. And that makes sense, in a dict for example, the dict node starts at the opening brace and ends at the closing brace. The problem with group is that the parentheses are not part of the expression, but are only used for, well, grouping.

PS Very good idea! I was trying to do it using pprint but couldn't get it formatted the way I wanted.

gvanrossum · 2019-12-11T17:02:39Z

Ah, so it's because the group disappears. Can we introduce a helper object? Or does it have to be an existing expr_ty?

I propose not to worry about this -- I'm not sure it's worth reproducing this particular behavior. OTOH maybe the solution really is to take the line/col info out of band.

PS. #108

lysnikolaou · 2019-12-11T17:10:59Z

What exactly do you mean with out of band?

gvanrossum · 2019-12-11T17:23:52Z

We could add some code to the generated code that extracts the start line/col from the upcoming token at the top of a rule implementation function, and extracts the end line/col from the last token just before calling the action. Something like this:

    if (p->mark == p->fill) {
        if (fill_token(p) < 0) return -1;
    }
    int start_lineno = p->tokens[mark]->lineno;
    int start_col_offset = p->tokens[mark]->col_offset;

before diving into the first alternative (but after checking the memo), and then in each rule, just before the action is executed:

    res = <action>;  // or res = CONSTRUCTOR(...);

insert this:

    assert(p->mark > 0);
    int end_lineno = p->tokens[p->mark-1]->end_lineno;
    int end_col_offset = p->tokens[p->mark-1]->end_col_offset;

And then, finally, change the EXTRA macro to just be this:

#define EXTRA start_lineno, start_col_offset, end_lineno, end_col_offset, p->arena

We can then drop a whole bunch of infrastructure around EXTRA.

lysnikolaou · 2019-12-11T17:39:45Z

That seems like a nice solution. It also resembles what Python/ast.c does with nodes a bit more closely. No clue if that's good or bad!

In #109, I'm just ignoring this problem, until we have a final decision on how to proceed with this one.

gvanrossum · 2019-12-11T17:41:12Z

I recommend spending a few days trying to implement this idea and seeing if there's anything I've missed.

lysnikolaou · 2019-12-11T17:43:54Z

OK! I'll start with it tonight.

lysnikolaou · 2019-12-12T16:33:30Z

After spending some time on this, the only problem I'm having now has to do with the elif rule, because the correct column offset for it is the one where the condition starts and not the one where the keyword elif itself starts. What would the correct solution for that be? The first one that jumps to mind, is to just have a different token offset for the elif rule in the c generator and have something like int start_col_offset = p->tokens[mark+1];, which needs the condition p->mark < p->fill to be true, right?

lysnikolaou · 2019-12-12T17:42:08Z

The first one that jumps to mind, is to just have a different token offset for the elif rule in the c generator and have something like int start_col_offset = p->tokens[mark+1];, which needs the condition p->mark < p->fill to be true, right?

That indeed works and the whole test suite now passes. The problem is as we refactor more of simpy.gram more edge cases might appear, which might overcomplicate the generator. Should I push a PR just for you to see what I've done?

gvanrossum · 2019-12-12T18:50:11Z

Hm, this seems due to some idiosyncrasy in ast.c (ast_for_if_stmt()). I wonder if we could submit a PR to ast.c that fixes the column offset there? It could just use CHILD(n, off - 1) in the asdl_seq_SET() call.

lysnikolaou · 2019-12-12T19:11:44Z

I can do that, should I open an issue on bpo?

gvanrossum · 2019-12-12T20:00:00Z

Sure. Link it here.

lysnikolaou · 2019-12-12T20:16:01Z

Opened an issue at https://bugs.python.org/issue39031.

lysnikolaou · 2019-12-12T20:23:05Z

Also opened a PR on python/cpython#17582.

lysnikolaou · 2019-12-14T16:49:07Z

Two new observations:

First of all, after the newest refactorings we've got two new problems.

If there are function decorators, when parsing a function definition, then the decorator is the first thing parse, so it is what lineno/col_offset points at.
If there is a kwarg in the parameter list of a function, then the lineno/col_offset points to the first star, instead of pointing to the parameter name.

Second of all, there is a REALLY SIGNIFICANT runtime increase. Running simpy_cpython now lasts 188-244 seconds on my machine. Not sure what causes this.

gvanrossum · 2019-12-14T18:31:05Z

So to be clear, the runtime increase is due to #121?

lysnikolaou · 2019-12-14T18:34:03Z

I'm not 100% sure on that. The runtime increase is only reproducible on that branch on my local environment, but Travis thinks otherwise, as it completed in a little more that 6 seconds in #121. I need to investigate further.

lysnikolaou · 2019-12-17T01:24:21Z

Closing due to #121.

This was referenced Dec 13, 2019

[3.8] bpo-39031: Include elif keyword when producing lineno/col-offset info for if_stmt (GH-17582) python/cpython#17589

Merged

Move the definition of the inline functions to pegen.c #118

Closed

lysnikolaou mentioned this issue Dec 14, 2019

Refactor EXTRA macro and extract the line and col info out of band #121

Merged

lysnikolaou closed this as completed Dec 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group line and column metadata needs special handling #107

Group line and column metadata needs special handling #107

lysnikolaou commented Dec 10, 2019 •

edited by gvanrossum

gvanrossum commented Dec 10, 2019

lysnikolaou commented Dec 11, 2019

gvanrossum commented Dec 11, 2019

lysnikolaou commented Dec 11, 2019

gvanrossum commented Dec 11, 2019

lysnikolaou commented Dec 11, 2019

gvanrossum commented Dec 11, 2019 via email

lysnikolaou commented Dec 11, 2019

lysnikolaou commented Dec 12, 2019 •

edited

lysnikolaou commented Dec 12, 2019

gvanrossum commented Dec 12, 2019

lysnikolaou commented Dec 12, 2019

gvanrossum commented Dec 12, 2019

lysnikolaou commented Dec 12, 2019

lysnikolaou commented Dec 12, 2019

lysnikolaou commented Dec 14, 2019

gvanrossum commented Dec 14, 2019 via email

lysnikolaou commented Dec 14, 2019

lysnikolaou commented Dec 17, 2019

Group line and column metadata needs special handling #107

Group line and column metadata needs special handling #107

Comments

lysnikolaou commented Dec 10, 2019 • edited by gvanrossum

gvanrossum commented Dec 10, 2019

lysnikolaou commented Dec 11, 2019

gvanrossum commented Dec 11, 2019

lysnikolaou commented Dec 11, 2019

gvanrossum commented Dec 11, 2019

lysnikolaou commented Dec 11, 2019

gvanrossum commented Dec 11, 2019 via email

lysnikolaou commented Dec 11, 2019

lysnikolaou commented Dec 12, 2019 • edited

lysnikolaou commented Dec 12, 2019

gvanrossum commented Dec 12, 2019

lysnikolaou commented Dec 12, 2019

gvanrossum commented Dec 12, 2019

lysnikolaou commented Dec 12, 2019

lysnikolaou commented Dec 12, 2019

lysnikolaou commented Dec 14, 2019

gvanrossum commented Dec 14, 2019 via email

lysnikolaou commented Dec 14, 2019

lysnikolaou commented Dec 17, 2019

lysnikolaou commented Dec 10, 2019 •

edited by gvanrossum

lysnikolaou commented Dec 12, 2019 •

edited