Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 617: new PEG-based parser #84514

Closed
gvanrossum opened this issue Apr 19, 2020 · 110 comments
Closed

PEP 617: new PEG-based parser #84514

gvanrossum opened this issue Apr 19, 2020 · 110 comments
Labels
3.9 type-feature

Comments

@gvanrossum
Copy link
Member

@gvanrossum gvanrossum commented Apr 19, 2020

BPO 40334
Nosy @gvanrossum, @rhettinger, @ned-deily, @serhiy-storchaka, @Carreau, @asottile, @lysnikolaou, @pablogsal, @miss-islington, @isidentical, @hauntsaninja
PRs
  • #19503
  • #19664
  • #19666
  • #19667
  • #19669
  • #19670
  • #19668
  • #19671
  • #19672
  • #19675
  • #19684
  • #19692
  • #19694
  • #19695
  • #19704
  • #19721
  • #19723
  • #19736
  • #19743
  • #19744
  • #19745
  • #19771
  • #19774
  • #19775
  • #19778
  • #19779
  • #19780
  • #19782
  • #19786
  • #19818
  • #19827
  • #19828
  • #19830
  • #19833
  • #19837
  • #19839
  • #19849
  • #19854
  • #19865
  • #19887
  • #19911
  • #19962
  • #19963
  • #19964
  • #19966
  • #19969
  • #19973
  • #19987
  • #20003
  • #20020
  • #20050
  • #20076
  • #20106
  • #20151
  • #20153
  • #20296
  • #20307
  • #20367
  • #20368
  • #20973
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-06-19.07:54:58.363>
    created_at = <Date 2020-04-19.20:16:17.317>
    labels = ['type-feature', '3.9']
    title = 'PEP 617: new PEG-based parser'
    updated_at = <Date 2020-06-19.07:54:58.362>
    user = 'https://github.com/gvanrossum'

    bugs.python.org fields:

    activity = <Date 2020-06-19.07:54:58.362>
    actor = 'lys.nikolaou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-06-19.07:54:58.363>
    closer = 'lys.nikolaou'
    components = []
    creation = <Date 2020-04-19.20:16:17.317>
    creator = 'gvanrossum'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 40334
    keywords = ['patch']
    message_count = 110.0
    messages = ['366803', '367009', '367050', '367057', '367059', '367060', '367061', '367063', '367064', '367066', '367067', '367068', '367069', '367071', '367073', '367075', '367076', '367084', '367097', '367105', '367112', '367113', '367114', '367115', '367121', '367126', '367135', '367136', '367137', '367139', '367140', '367141', '367142', '367155', '367158', '367160', '367170', '367171', '367190', '367191', '367196', '367237', '367243', '367246', '367439', '367440', '367448', '367467', '367470', '367471', '367473', '367474', '367521', '367564', '367576', '367577', '367580', '367582', '367598', '367602', '367610', '367637', '367713', '367772', '367815', '367822', '367835', '367836', '367839', '367840', '367843', '367845', '367850', '367854', '367862', '367866', '367891', '367914', '367916', '367974', '367976', '367978', '368004', '368034', '368288', '368302', '368304', '368329', '368330', '368566', '368567', '368593', '368595', '368598', '368599', '368663', '368796', '368892', '369090', '369166', '369203', '369288', '369536', '369553', '369554', '369837', '369840', '371841', '371842', '371848']
    nosy_count = 11.0
    nosy_names = ['gvanrossum', 'rhettinger', 'ned.deily', 'serhiy.storchaka', 'mbussonn', 'Anthony Sottile', 'lys.nikolaou', 'pablogsal', 'miss-islington', 'BTaskaya', 'hauntsaninja']
    pr_nums = ['19503', '19664', '19666', '19667', '19669', '19670', '19668', '19671', '19672', '19675', '19684', '19692', '19694', '19695', '19704', '19721', '19723', '19736', '19743', '19744', '19745', '19771', '19774', '19775', '19778', '19779', '19780', '19782', '19786', '19818', '19827', '19828', '19830', '19833', '19837', '19839', '19849', '19854', '19865', '19887', '19911', '19962', '19963', '19964', '19966', '19969', '19973', '19987', '20003', '20020', '20050', '20076', '20106', '20151', '20153', '20296', '20307', '20367', '20368', '20973']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue40334'
    versions = ['Python 3.9']

    @gvanrossum
    Copy link
    Member Author

    @gvanrossum gvanrossum commented Apr 19, 2020

    Note: PEP-617 is currently under review by the Steering Council, but if they approve we'd like to get it into alpha 6, and reviews are welcome (even though we're still finagling some corner cases of the implementation).

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Apr 22, 2020

    Once PR 19503 will be merged, I would be interested to see if setjmp()/longjmp() can be avoided. I'm scared by these functions: they require that all functions in between setjmp() and longjmp() call stack have no state and don't need any cleanup at exit.

    @gvanrossum
    Copy link
    Member Author

    @gvanrossum gvanrossum commented Apr 22, 2020

    VIctor, you were very right about longjmp. See we-like-parsers#119

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented Apr 22, 2020

    New changeset c5fc156 by Pablo Galindo in branch 'master':
    bpo-40334: PEP-617 implementation: New PEG parser for CPython (GH-19503)
    c5fc156

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented Apr 22, 2020

    New changeset 458004b by Pablo Galindo in branch 'master':
    bpo-40334: Fix errors in parse_string.c with old compilers (GH-19666)
    458004b

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Apr 22, 2020

    bpo-40334: Fix errors in parse_string.c with old compilers (GH-19666)

    Pablo told me in private that he plans to add a buildbot worker to test the old parser. So this change is fine.

    @ned-deily
    Copy link
    Member

    @ned-deily ned-deily commented Apr 22, 2020

    Looks like the build changes do not work with a build directory outside of the source directory:

    mkdir build && cd build && ../configure && make

    gcc -c -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -std=c99 -Wextra -Wno-unused-result -Wno-unused-parameter -Wno-missing-field-initializers -Wstrict-prototypes -Werror=implicit-function-declaration -fvisibility=hidden -I../Include/internal -IObjects -IInclude -IPython -I. -I../Include -DPy_BUILD_CORE -o Parser/pegen/pegen.o ../Parser/pegen/pegen.c
    error: unable to open output file 'Parser/pegen/pegen.o': 'No such file or directory'
    1 error generated.
    make: *** [Parser/pegen/pegen.o] Error 1

    Lots of build processes depend on this, like for putting out a release.

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented Apr 22, 2020

    Looks like the build changes do not work with a build directory outside of the source directory:

    Opened #19667. Ned, could you review?

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented Apr 23, 2020

    New changeset a25f3c4 by Pablo Galindo in branch 'master':
    bpo-40334: Fix builds outside the source directory and regenerate autoconf files (GH-19667)
    a25f3c4

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Apr 23, 2020

    New changeset 1def775 by Victor Stinner in branch 'master':
    bpo-40334: Rename PyConfig.use_peg to _use_peg_parser (GH-19670)
    1def775

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Apr 23, 2020

    bpo-40334: Rename PyConfig.use_peg to _use_peg_parser (GH-19670)

    This change removes sys.flags.use_peg which was only used in tests. If someone sees a good reason, we may add it back. In the meanwhile, test.support.use_old_parser() can be used.

    I didn't like the idea of having a new sys.flags.use_peg flag appears in 3.9 but disappears in 3.10.

    FYI sys.flags still supports the tuple API! Example:

    >>> sys.flags[9]
    0

    I have no idea what is this 10th member of sys.flags :-)

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Apr 23, 2020

    test_peg_generator emits compiler warnings:

    https://buildbot.python.org/all/#/builders/612/builds/283

    0:08:01 load avg: 1.40 [423/423/1] test_peg_generator passed (7 min 4 sec)
    /tmp/tmp0vbqe8gp/parse.c: In function ‘start_rule’:
    /tmp/tmp0vbqe8gp/parse.c:32:35: warning: passing argument 2 of ‘_PyPegen_lookahead’ from incompatible pointer type [-Wincompatible-pointer-types]
    32 | _PyPegen_lookahead(1, _PyPegen_name_token, p)
    | ^~~~~~~~~~~~~~~~~~~
    | |
    | struct _expr * ()(Parser *)
    In file included from /tmp/tmp0vbqe8gp/parse.c:2:
    /home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-aarch64.lto-pgo/build/Parser/pegen/pegen.h:90:36: note: expected ‘void * (
    )(Parser *)’ but argument is of type ‘struct _expr * ()(Parser *)’
    90 | int _PyPegen_lookahead(int, void *(func)(Parser *), Parser *);
    | ~~~~~~~^~~~~~~~~~~~~~~
    /tmp/tmp54o68k3n/parse.c: In function ‘start_rule’:
    /tmp/tmp54o68k3n/parse.c:32:35: warning: passing argument 2 of ‘_PyPegen_lookahead’ from incompatible pointer type [-Wincompatible-pointer-types]
    32 | _PyPegen_lookahead(0, _PyPegen_name_token, p)
    | ^~~~~~~~~~~~~~~~~~~
    | |
    | struct _expr * (
    )(Parser *)
    In file included from /tmp/tmp54o68k3n/parse.c:2:
    /home/buildbot/buildarea/3.x.cstratak-fedora-rawhide-aarch64.lto-pgo/build/Parser/pegen/pegen.h:90:36: note: expected ‘void * ()(Parser *)’ but argument is of type ‘struct _expr * ()(Parser *)’
    90 | int _PyPegen_lookahead(int, void *(func)(Parser *), Parser *);
    | ~~~~~~~^~~~~~~~~~~~~~~

    @lysnikolaou
    Copy link
    Contributor

    @lysnikolaou lysnikolaou commented Apr 23, 2020

    test_peg_generator emits compiler warnings

    These are expected. Is this a problem?

    @WadeSanchez WadeSanchez mannequin added performance labels Apr 23, 2020
    @ned-deily
    Copy link
    Member

    @ned-deily ned-deily commented Apr 23, 2020

    These are expected. Is this a problem?

    People and bots running tests normally expect to not see any output from each test in the default case unless there are errors. If these are supposed to be emitted, we should find a way to hide them (and check the results, if necessary) unless a verbose option is selected. Personally, I wouldn't consider that a showstopper for a6 but as something to be fixed by b1.

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented Apr 23, 2020

    New changeset ee40e4b by Pablo Galindo in branch 'master':
    bpo-40334: Don't downcast from Py_ssize_t to int (GH-19671)
    ee40e4b

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented Apr 23, 2020

    test_peg_generator emits compiler warnings:

    This is fixed by #19672

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented Apr 23, 2020

    Those warnings are legitimate: we cannot cast function pointers that have different return types from one to the other. This only happens in the test, but could happen in the grammar if we use lookahead with NAME (&NAME).

    @ned-deily
    Copy link
    Member

    @ned-deily ned-deily commented Apr 23, 2020

    bpo-40370 documents test_peg_generator breakage on an AIX buildbot.

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented Apr 23, 2020

    New changeset 1df5a9e by Pablo Galindo in branch 'master':
    bpo-40334: Fix build errors and warnings in test_peg_generator (GH-19672)
    1df5a9e

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented Apr 23, 2020

    New changeset 8d1cbff by Lysandros Nikolaou in branch 'master':
    bpo-40334: Suppress all output in test_peg_generator (GH-19675)
    8d1cbff

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Apr 23, 2020

    GCC warning on aarch64 Fedora Rawhide LTO + PGO 3.x buildbot:
    https://buildbot.python.org/all/#/builders/612/builds/286

    In function ‘assemble_lnotab’,
    inlined from ‘assemble_emit’ at Python/compile.c:5709:25,
    inlined from ‘assemble’ at Python/compile.c:6048:18:
    Python/compile.c:5663:19: warning: writing 1 byte into a region of size 0 [-Wstringop-overflow=]
    5663 | *lnotab++ = k;
    | ~~~~~~~~~~^~~

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented Apr 23, 2020

    GCC warning on aarch64 Fedora Rawhide LTO + PGO 3.x buildbot:

    Is this related to this issue? We didn't change that line

    @stratakis
    Copy link
    Mannequin

    @stratakis stratakis mannequin commented Apr 23, 2020

    Is this related to this issue? We didn't change that line

    I can provide you access to the buildbot if you'd like to debug the issue.

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Apr 23, 2020

    Is this related to this issue? We didn't change that line

    I don't know. It's just that I spotted this compiler warnining while reviewing recent buildbot failures.

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented Apr 23, 2020

    I don't know. It's just that I spotted this compiler warnining while reviewing recent buildbot failures.

    Given that is a compiler warning and we didn't change those lines, I would say is not related to this PR :)

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented Apr 23, 2020

    New changeset ebebb64 by Lysandros Nikolaou in branch 'master':
    bpo-40334: Improve various PEG-Parser related stuff (GH-19669)
    ebebb64

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented May 4, 2020

    New changeset e10e7c7 by Lysandros Nikolaou in branch 'master':
    bpo-40334: Spacialized error message for invalid args after bare '*' (GH-19865)
    e10e7c7

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented May 6, 2020

    New changeset 999ec9a by Lysandros Nikolaou in branch 'master':
    bpo-40334: Add type to the assignment rule in the grammar file (GH-19963)
    999ec9a

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented May 6, 2020

    New changeset 99db2a1 by Pablo Galindo in branch 'master':
    bpo-40334: Allow trailing comma in parenthesised context managers (GH-19964)
    99db2a1

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented May 6, 2020

    New changeset 470aac4 by Pablo Galindo in branch 'master':
    bpo-40334: Generate comments in the parser code to improve debugging (GH-19966)
    470aac4

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented May 7, 2020

    New changeset 2f37c35 by Lysandros Nikolaou in branch 'master':
    bpo-40334: Fix error location upon parsing an invalid string literal (GH-19962)
    2f37c35

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented May 7, 2020

    New changeset 4638c64 by Lysandros Nikolaou in branch 'master':
    bpo-40334: Error message for invalid default args in function call (GH-19973)
    4638c64

    @gvanrossum
    Copy link
    Member Author

    @gvanrossum gvanrossum commented May 10, 2020

    New changeset ac7a92c by Pablo Galindo in branch 'master':
    bpo-40334: Avoid collisions between parser variables and grammar variables (GH-19987)
    ac7a92c

    @gvanrossum
    Copy link
    Member Author

    @gvanrossum gvanrossum commented May 10, 2020

    FWIW I propose that we add another Misc/NEWS for the same issue summarizing what happened between alpha6 and beta1. (A lot!)

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented May 10, 2020

    There is some discrepancy with the codeop module when running with the new parser:

    ./python -c "import codeop; codeop.CommandCompiler()('raise = 2\n\n', symbol='exec')"

    (No error)

    ./python -Xoldparser -c "import codeop; codeop.CommandCompiler()('raise = 2\n\n', symbol='exec')"
    ...
    File "<input>", line 1
    raise = 2
    ^
    SyntaxError: invalid syntax

    @Carreau
    Copy link
    Mannequin

    @Carreau Carreau mannequin commented May 10, 2020

    Thanks Pablo for replying to my tweet, a couple of other non-failing case in
    https://bugs.python.org/issue40585 that I filled concurrently to your message.

    (also using this occasion to thanks all of you for your work on the new parser)

    @Carreau
    Copy link
    Mannequin

    @Carreau Carreau mannequin commented May 10, 2020

    Seem to not be in the new parser but simply in codeop in particular def _maybe_compile

    The logic seem weird (but weird logic usually have a reason), it try to compile thrice by appending many \n.

    1. Why do that and not return the first successful compile directly ? I'm not sure.
    2. It does compare the repr of error when compile(source + "\n") and compile(source+'\n\n') and only raise if both are identical (which they are not in the new parser, they differ by \n...)
    SyntaxError('invalid syntax', ('<input>', 1, 6, 'def a-b')) 
    vs
    SyntaxError('invalid syntax', ('<input>', 1, 6, 'def a-b\n'))

    This logic seem to go back to the 2000s.

    @gvanrossum
    Copy link
    Member Author

    @gvanrossum gvanrossum commented May 11, 2020

    It definitely has its reasons -- there are pernicious edge cases. Probably revealed by the tests.

    @gvanrossum
    Copy link
    Member Author

    @gvanrossum gvanrossum commented May 11, 2020

    New changeset 27c0d9b by Shantanu in branch 'master':
    bpo-40334: produce specialized errors for invalid del targets (GH-19911)
    27c0d9b

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented May 13, 2020

    New changeset a15c9b3 by Lysandros Nikolaou in branch 'master':
    bpo-40334: Always show the caret on SyntaxErrors (GH-20050)
    a15c9b3

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented May 15, 2020

    New changeset 16ab070 by Pablo Galindo in branch 'master':
    bpo-40334: Correctly identify invalid target in assignment errors (GH-20076)
    16ab070

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented May 17, 2020

    New changeset 2c8cd06 by Lysandros Nikolaou in branch 'master':
    bpo-40334: Improvements to error-handling code in the PEG parser (GH-20003)
    2c8cd06

    @rhettinger
    Copy link
    Contributor

    @rhettinger rhettinger commented May 18, 2020

    One minor nit: We're getting a nuisance compiler warning:

    test_peg_generator passed (2 min 40 sec)
    /var/folders/1b/3499qp8s7xv5w0fvjcgl45100000gn/T/tmp90rifd5z/parse.c:97:19: warning: code will never be executed [-Wunreachable-code]
    p->mark = _mark;
    ^~~~~
    1 warning generated.

    @lysnikolaou
    Copy link
    Contributor

    @lysnikolaou lysnikolaou commented May 18, 2020

    I'm not getting any compiler warnings on macOS (clang) and on Ubuntu (gcc) and I can't find any related warnings on the Windows Github Actions logs either. Could you maybe post a verbose log of test_peg_generator, Raymond?

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented May 18, 2020

    New changeset 75b863a by Lysandros Nikolaou in branch 'master':
    bpo-40334: Reproduce error message for type comments on bare '*' in the new parser (GH-20151)
    75b863a

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented May 21, 2020

    New changeset f50516e by Batuhan Taskaya in branch 'master':
    bpo-40334: Correctly generate C parser when assigned var is None (GH-20296)
    f50516e

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented May 22, 2020

    New changeset ae14583 by Lysandros Nikolaou in branch 'master':
    bpo-40334: Produce better error messages for non-parenthesized genexps (GH-20153)
    ae14583

    @miss-islington
    Copy link
    Contributor

    @miss-islington miss-islington commented May 22, 2020

    New changeset 55c8923 by Miss Islington (bot) in branch '3.9':
    bpo-40334: Produce better error messages for non-parenthesized genexps (GH-20153)
    55c8923

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented May 24, 2020

    New changeset cba5031 by Batuhan Taskaya in branch 'master':
    bpo-40334: Support suppressing of multiple optional variables in Pegen (GH-20367)
    cba5031

    @miss-islington
    Copy link
    Contributor

    @miss-islington miss-islington commented May 24, 2020

    New changeset 82c274e by Miss Islington (bot) in branch '3.9':
    bpo-40334: Support suppressing of multiple optional variables in Pegen (GH-20367)
    82c274e

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented Jun 18, 2020

    New changeset 01ece63 by Lysandros Nikolaou in branch 'master':
    bpo-40334: Produce better error messages on invalid targets (GH-20106)
    01ece63

    @pablogsal
    Copy link
    Member

    @pablogsal pablogsal commented Jun 19, 2020

    New changeset a5442b2 by Lysandros Nikolaou in branch '3.9':
    [3.9] bpo-40334: Produce better error messages on invalid targets (GH-20106) (GH-20973)
    a5442b2

    @lysnikolaou
    Copy link
    Contributor

    @lysnikolaou lysnikolaou commented Jun 19, 2020

    All parts of the implementation are now in. New issues should be opened for any potential bugs.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 type-feature
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants