bpo-40334: Refactor peg_generator to receive a Tokens file when building c code #19745

pablogsal · 2020-04-28T00:15:10Z

https://bugs.python.org/issue40334

This PR does the following:

Fix a bunch of (very minor) mypy stuff that was missing.
Separate the C parser and the Python parser in pegen main (because both receive different arguments). Thread down all these changes to the generator build module.
Add a new option to the C parser command line to receive the Tokens file.
Thread down the Tokens file and add code to parse it and calculate the required token information.
Use the new tokens in the c_generator (and simplify some code that was hardcoding some token names).
Update the build files (Makefile and the Windows one) to use the new option.
Run black over the source.

pablogsal · 2020-04-28T00:27:26Z

This is how the command line looks now with the sub-parsers:

Main CL

~/github/python/master/Tools/peg_generator [bpo-40334](https://bugs.python.org/issue40334)-use-tokens
❯ python -m pegen
usage: pegen [-h] [-q] [-v] [--skip-actions] {c,python} ...

Experimental PEG-like parser generator

positional arguments:
  {c,python}      target language for the generated code
    c             Generate C code for inclusion into CPython
    python        Generate Python code

optional arguments:
  -h, --help      show this help message and exit
  -q, --quiet     Don't print the parsed grammar
  -v, --verbose   Print timing stats; repeat for more debug output

C subparser

~/github/python/master/Tools/peg_generator [bpo-40334](https://bugs.python.org/issue40334)-use-tokens
❯ python -m pegen c -h
usage: pegen c [-h] [--compile-extension] [-o OUT] [--optimized] [--skip-actions] grammar_filename tokens_filename

positional arguments:
  grammar_filename      Grammar description
  tokens_filename       Tokens description

optional arguments:
  -h, --help            show this help message and exit
  -o OUT, --output OUT  Where to write the generated parser
  --compile-extension   Compile generated C code into an extension module
  --optimized           Compile the extension in optimized mode
  --skip-actions        Suppress code emission for rule actions

Python subparser

~/github/python/master/Tools/peg_generator [bpo-40334](https://bugs.python.org/issue40334)-use-tokens
❯ python -m pegen python -h
usage: pegen python [-h] [-o OUT] grammar_filename

positional arguments:
  grammar_filename      Grammar description

optional arguments:
  -h, --help            show this help message and exit
  -o OUT, --output OUT  Where to write the generated parser
  --skip-actions  Suppress code emission for rule actions

…ing c code

Tools/peg_generator/pegen/__main__.py

Tools/peg_generator/pegen/build.py

Tools/peg_generator/pegen/c_generator.py

lysnikolaou · 2020-04-28T10:51:25Z

Tools/peg_generator/pegen/c_generator.py

+        if name in self.non_exact_tokens:
            name = name.lower()
            return f"{name}_var", f"_PyPegen_{name}_token(p)"


I'm wondering if we should special-case NAME here and call _PyPegen_name_token for it and then call _PyPegen_expect_token for all the others (which is what the "named" functions already do anyway). This way we could get rid of these "named" expect functions (e.g. _PyPegen_indent_token).

Hmmm, I think I still like to have the abstraction in case we need to intercede something like with the NAME token in the future (this also allows us to not have "hardcoded" names in the generator) but if you think is better to re-add that in the future and eliminate the wrappers I can totally do it as I don't feel strongly about it.

What do you think?

I think I still like to have the abstraction in case we need to intercede something like with the NAME token in the future

It feels, that that's only the case with STRINGs. I don't really expect to do anything more complicated that just returning the token with all the whitespace ones (INDENT, DEDENT etc.) and I'd re-add those functions for async and await, if it were ever needed.

With that said, I really don't feel strongly about either one, so it's your call! 😃

Ok, I am going to merge this and we can revisit this in further refactors of this code. I have some improvements in mind to avoid checking for sub-strings (like var.startswith("name") when distinguishing the types and we can explore doing this in that PR. 😉

I experimented with your suggestion and the main blocker is that _PyPegen_lookahead is called with these functions and it imposes that they may take the parser as the only argument and there is not a simple way to make _PyPegen_lookahead allow to forward the arguments :(

Edit: I am continuing experimenting with this idea....

I didn't think of this problem. Thanks a lot for doing the investigation!

Tools/peg_generator/pegen/build.py

Co-Authored-By: Lysandros Nikolaou <lisandrosnik@gmail.com>

lysnikolaou

LGTM! Thanks!

pablogsal · 2020-04-28T12:53:05Z

Hi! The buildbot AMD64 Fedora Stable 3.x has failed when building commit 5b9f498.

Unrelated failure:

1 test failed:
test_concurrent_futures

pablogsal requested a review from a team as a code owner April 28, 2020 00:15

the-knights-who-say-ni added the CLA signed label Apr 28, 2020

bedevere-bot added the awaiting core review label Apr 28, 2020

pablogsal force-pushed the bpo-40334-use-tokens branch from 6c13937 to 45ca184 Compare April 28, 2020 00:16

pablogsal added the skip news label Apr 28, 2020

pablogsal requested review from lysnikolaou and gvanrossum April 28, 2020 00:19

pablogsal force-pushed the bpo-40334-use-tokens branch 2 times, most recently from 1aab4d1 to d7df6b6 Compare April 28, 2020 00:29

bpo-40334: Refactor peg_generator to receive a Tokens file when build…

a371b79

…ing c code

pablogsal force-pushed the bpo-40334-use-tokens branch from d7df6b6 to a371b79 Compare April 28, 2020 00:30

lysnikolaou reviewed Apr 28, 2020

View reviewed changes

pablogsal and others added 3 commits April 28, 2020 12:34

Apply suggestions from code review

417d503

Co-Authored-By: Lysandros Nikolaou <lisandrosnik@gmail.com>

Update Tools/peg_generator/pegen/c_generator.py

c268ef1

Co-Authored-By: Lysandros Nikolaou <lisandrosnik@gmail.com>

Address feedback

9b8dfd8

pablogsal requested a review from lysnikolaou April 28, 2020 11:51

lysnikolaou approved these changes Apr 28, 2020

View reviewed changes

pablogsal merged commit 5b9f498 into python:master Apr 28, 2020

bedevere-bot removed the awaiting core review label Apr 28, 2020

This comment has been minimized.

Sign in to view

pablogsal mentioned this pull request Apr 28, 2020

bpo-40334: refactor and cleanup for the PEG generators #19775

Merged

pablogsal deleted the bpo-40334-use-tokens branch May 1, 2020 17:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-40334: Refactor peg_generator to receive a Tokens file when building c code #19745

bpo-40334: Refactor peg_generator to receive a Tokens file when building c code #19745

pablogsal commented Apr 28, 2020 •

edited

Loading

pablogsal commented Apr 28, 2020 •

edited

Loading

lysnikolaou Apr 28, 2020

pablogsal Apr 28, 2020

lysnikolaou Apr 28, 2020

pablogsal Apr 28, 2020 •

edited

Loading

pablogsal Apr 28, 2020 •

edited

Loading

lysnikolaou Apr 28, 2020

lysnikolaou left a comment

This comment has been minimized.

pablogsal commented Apr 28, 2020

bpo-40334: Refactor peg_generator to receive a Tokens file when building c code #19745

bpo-40334: Refactor peg_generator to receive a Tokens file when building c code #19745

Conversation

pablogsal commented Apr 28, 2020 • edited Loading

pablogsal commented Apr 28, 2020 • edited Loading

Main CL

C subparser

Python subparser

lysnikolaou Apr 28, 2020

Choose a reason for hiding this comment

pablogsal Apr 28, 2020

Choose a reason for hiding this comment

lysnikolaou Apr 28, 2020

Choose a reason for hiding this comment

pablogsal Apr 28, 2020 • edited Loading

Choose a reason for hiding this comment

pablogsal Apr 28, 2020 • edited Loading

Choose a reason for hiding this comment

lysnikolaou Apr 28, 2020

Choose a reason for hiding this comment

lysnikolaou left a comment

Choose a reason for hiding this comment

This comment has been minimized.

pablogsal commented Apr 28, 2020

pablogsal commented Apr 28, 2020 •

edited

Loading

pablogsal commented Apr 28, 2020 •

edited

Loading

pablogsal Apr 28, 2020 •

edited

Loading

pablogsal Apr 28, 2020 •

edited

Loading