Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parser module fails on legal input #80437

Closed
tyomitch mannequin opened this issue Mar 10, 2019 · 11 comments
Closed

parser module fails on legal input #80437

tyomitch mannequin opened this issue Mar 10, 2019 · 11 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@tyomitch
Copy link
Mannequin

tyomitch mannequin commented Mar 10, 2019

BPO 36256
Nosy @freddrake, @brettcannon, @gpshead, @benjaminp, @berkerpeksag, @serhiy-storchaka, @tyomitch, @pablogsal
PRs
  • bpo-36256: Fix bug in parsermodule when parsing if statements #12477
  • [3.7] bpo-36256: Fix bug in parsermodule when parsing if statements (GH-12477) #12488
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-03-21.23:56:37.904>
    created_at = <Date 2019-03-10.16:36:58.997>
    labels = ['extension-modules', '3.8', 'type-bug', '3.7']
    title = 'parser module fails on legal input'
    updated_at = <Date 2019-04-05.12:05:53.511>
    user = 'https://github.com/tyomitch'

    bugs.python.org fields:

    activity = <Date 2019-04-05.12:05:53.511>
    actor = 'pablogsal'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-03-21.23:56:37.904>
    closer = 'pablogsal'
    components = ['Extension Modules']
    creation = <Date 2019-03-10.16:36:58.997>
    creator = 'A. Skrobov'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 36256
    keywords = ['patch']
    message_count = 11.0
    messages = ['337619', '337692', '337693', '337695', '337782', '337786', '337808', '338571', '338572', '339493', '339497']
    nosy_count = 10.0
    nosy_names = ['fdrake', 'brett.cannon', 'gregory.p.smith', 'benjamin.peterson', 'python-dev', 'berker.peksag', 'serhiy.storchaka', 'xcombelle', 'A. Skrobov', 'pablogsal']
    pr_nums = ['12477', '12488']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue36256'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']

    @tyomitch
    Copy link
    Mannequin Author

    tyomitch mannequin commented Mar 10, 2019

    Under bpo-26526, I had optimised the validation code in parser module to use the actual parser DFAs, but my code considers only the token types in the input, and doesn't distinguish between different NAMEs (e.g. different keywords).

    The result is this:

    Python 3.7.0 (default, Aug 22 2018, 20:50:05) 
    [GCC 5.4.0 20160609] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import parser
    >>> parser.sequence2st(parser.suite("if True:\n pass\nelse:\n pass").totuple())
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    parser.ParserError: Expected node type 305, got 11.

    The fix for this bug is quite simple, and in fact, it had been languishing for 2.5 years under bpo-26415

    I can easily enough extract the fix into a PR of its own, but the bigger question is: parser module had been advised against using since Python 2.5; now that it has been broken for 2.5 years, nobody even noticed. (if-else must be quite a common code construct, so anybody trying to use the module would have noticed!) So, should perhaps the module be discontinued rather than fixed?

    @tyomitch tyomitch mannequin added 3.7 (EOL) end of life 3.8 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Mar 10, 2019
    @brettcannon
    Copy link
    Member

    I would be curious to hear what Pablo has to say with the new parser having landed and if there's something we should be exposing from that code to replace what's in 'parser' today? (Either w/o semantic change or a new API.)

    @pablogsal
    Copy link
    Member

    I would be curious to hear what Pablo has to say with the new parser having landed and if there's something we should be exposing from that code to replace what's in 'parser' today? (Either w/o semantic change or a new API.)

    :)

    One small clarification is that the parser is the same, what has changed is the parser generator. What is exposed in the parser modules today is the parse trees (in a very raw form).

    One thing we can do is expose the parser component that lib2to3/pgen2 has as a substitute/complement to the parser module (which is not exposed as part of the new pgen - I know, is confusing). This is very useful and complementary to the AST (for example, black is using a forked version of this component to obtain the CST as it can do round tripping - code->CST->NEW_CST->code). This piece is in pure Python and can read the parser tables that pgen generates. It also will have the advantage of forcing us to synchronize to the current grammar (black had to fork it among other things because the one in lib2to3 was out of date). This idea and all the challenges are already been discussed here:

    https://bugs.python.org/issue33337

    The major problem with the parser module is that is unsynchronized with the actual parser, it has a very raw API and is moderately unmaintained (as this bug reveals). We would need to evaluate if we want to spend effort into synchronizing them, deprecating completely the parser module, substitute it with a new python version or wait until we have a completely new non-LL(1) C parser to ask these questions.

    What do you think?

    As a side note, the problem described in this bug was more or less foreseen. This is in the header of Modules/parsemodule.c:

    • To debug parser errors like
    •  "parser.ParserError: Expected node type 12, got 333."
      
    • decode symbol numbers using the automatically-generated files
    • Lib/symbol.h and Include/token.h.

    @tyomitch
    Copy link
    Mannequin Author

    tyomitch mannequin commented Mar 11, 2019

    The major problem with the parser module is that is unsynchronized with the actual parser

    The parser module is "sort of" synchronised with the actual parser, in that it uses the same _PyParser_Grammar; this doesn't mean they always behave the same, as this bug shows :-)

    (But before bpo-26526, it used to be much worse, with the parser module having a completely independent re-implementation of the parser.)

    As a side note, the problem described in this bug was more or less foreseen. This is in the header of Modules/parsemodule.c:

    Just to clarify, the bug is not about the cryptic exception message, it's about the exception firing when it shouldn't.

    @brettcannon
    Copy link
    Member

    It's sounding like it might be worth the effort then to make lib2to3's parser not be a "hidden" thing in lib2to3, break it out as a new parser module (I have no stance on name), and then deprecate the old parser module. I think this was discussed at the last language summit when Christian proposed deprecating lib2to3 and everyone said the parser is too useful to lose.

    @xcombelle
    Copy link
    Mannequin

    xcombelle mannequin commented Mar 12, 2019

    never used the parser module nor lib2to3. Does they have any advantage over ast.parse and ast module ?

    @brettcannon
    Copy link
    Member

    @xavier different needs; AST and CST are at different stages of compilation.

    @pablogsal
    Copy link
    Member

    New changeset 9a0000d by Pablo Galindo in branch 'master':
    bpo-36256: Fix bug in parsermodule when parsing if statements (GH-12477)
    9a0000d

    @pablogsal
    Copy link
    Member

    New changeset 00eb97b by Pablo Galindo (Miss Islington (bot)) in branch '3.7':
    bpo-36256: Fix bug in parsermodule when parsing if statements (GH-12488)
    00eb97b

    @tyomitch
    Copy link
    Mannequin Author

    tyomitch mannequin commented Apr 5, 2019

    Is it intentional that the fix is not backported to 3.6 as well?

    @pablogsal
    Copy link
    Member

    3.6 only accepts security fixes at this point.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants