Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-40334: Refactor peg_generator to receive a Tokens file when building c code #19745

Merged
merged 4 commits into from
Apr 28, 2020

Conversation

pablogsal
Copy link
Member

@pablogsal pablogsal commented Apr 28, 2020

https://bugs.python.org/issue40334

This PR does the following:

  • Fix a bunch of (very minor) mypy stuff that was missing.
  • Separate the C parser and the Python parser in pegen main (because both receive different arguments). Thread down all these changes to the generator build module.
  • Add a new option to the C parser command line to receive the Tokens file.
  • Thread down the Tokens file and add code to parse it and calculate the required token information.
  • Use the new tokens in the c_generator (and simplify some code that was hardcoding some token names).
  • Update the build files (Makefile and the Windows one) to use the new option.
  • Run black over the source.

@pablogsal
Copy link
Member Author

pablogsal commented Apr 28, 2020

This is how the command line looks now with the sub-parsers:

Main CL

~/github/python/master/Tools/peg_generator [bpo-40334](https://bugs.python.org/issue40334)-use-tokens
❯ python -m pegen
usage: pegen [-h] [-q] [-v] [--skip-actions] {c,python} ...

Experimental PEG-like parser generator

positional arguments:
  {c,python}      target language for the generated code
    c             Generate C code for inclusion into CPython
    python        Generate Python code

optional arguments:
  -h, --help      show this help message and exit
  -q, --quiet     Don't print the parsed grammar
  -v, --verbose   Print timing stats; repeat for more debug output

C subparser

~/github/python/master/Tools/peg_generator [bpo-40334](https://bugs.python.org/issue40334)-use-tokens
❯ python -m pegen c -h
usage: pegen c [-h] [--compile-extension] [-o OUT] [--optimized] [--skip-actions] grammar_filename tokens_filename

positional arguments:
  grammar_filename      Grammar description
  tokens_filename       Tokens description

optional arguments:
  -h, --help            show this help message and exit
  -o OUT, --output OUT  Where to write the generated parser
  --compile-extension   Compile generated C code into an extension module
  --optimized           Compile the extension in optimized mode
  --skip-actions        Suppress code emission for rule actions

Python subparser

~/github/python/master/Tools/peg_generator [bpo-40334](https://bugs.python.org/issue40334)-use-tokens
❯ python -m pegen python -h
usage: pegen python [-h] [-o OUT] grammar_filename

positional arguments:
  grammar_filename      Grammar description

optional arguments:
  -h, --help            show this help message and exit
  -o OUT, --output OUT  Where to write the generated parser
  --skip-actions  Suppress code emission for rule actions

@pablogsal pablogsal force-pushed the bpo-40334-use-tokens branch 2 times, most recently from 1aab4d1 to d7df6b6 Compare April 28, 2020 00:29
Tools/peg_generator/pegen/__main__.py Outdated Show resolved Hide resolved
Tools/peg_generator/pegen/build.py Outdated Show resolved Hide resolved
Tools/peg_generator/pegen/build.py Outdated Show resolved Hide resolved
Tools/peg_generator/pegen/c_generator.py Outdated Show resolved Hide resolved
Tools/peg_generator/pegen/c_generator.py Outdated Show resolved Hide resolved
Tools/peg_generator/pegen/c_generator.py Outdated Show resolved Hide resolved
Comment on lines +64 to 66
if name in self.non_exact_tokens:
name = name.lower()
return f"{name}_var", f"_PyPegen_{name}_token(p)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should special-case NAME here and call _PyPegen_name_token for it and then call _PyPegen_expect_token for all the others (which is what the "named" functions already do anyway). This way we could get rid of these "named" expect functions (e.g. _PyPegen_indent_token).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I think I still like to have the abstraction in case we need to intercede something like with the NAME token in the future (this also allows us to not have "hardcoded" names in the generator) but if you think is better to re-add that in the future and eliminate the wrappers I can totally do it as I don't feel strongly about it.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I still like to have the abstraction in case we need to intercede something like with the NAME token in the future

It feels, that that's only the case with STRINGs. I don't really expect to do anything more complicated that just returning the token with all the whitespace ones (INDENT, DEDENT etc.) and I'd re-add those functions for async and await, if it were ever needed.

With that said, I really don't feel strongly about either one, so it's your call! 😃

Copy link
Member Author

@pablogsal pablogsal Apr 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I am going to merge this and we can revisit this in further refactors of this code. I have some improvements in mind to avoid checking for sub-strings (like var.startswith("name") when distinguishing the types and we can explore doing this in that PR. 😉

Copy link
Member Author

@pablogsal pablogsal Apr 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I experimented with your suggestion and the main blocker is that _PyPegen_lookahead is called with these functions and it imposes that they may take the parser as the only argument and there is not a simple way to make _PyPegen_lookahead allow to forward the arguments :(

Edit: I am continuing experimenting with this idea....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think of this problem. Thanks a lot for doing the investigation!

Tools/peg_generator/pegen/build.py Show resolved Hide resolved
pablogsal and others added 3 commits April 28, 2020 12:34
Co-Authored-By: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-Authored-By: Lysandros Nikolaou <lisandrosnik@gmail.com>
Copy link
Contributor

@lysnikolaou lysnikolaou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

@pablogsal pablogsal merged commit 5b9f498 into python:master Apr 28, 2020
@bedevere-bot

This comment has been minimized.

@pablogsal
Copy link
Member Author

Hi! The buildbot AMD64 Fedora Stable 3.x has failed when building commit 5b9f498.

Unrelated failure:

1 test failed:
test_concurrent_futures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants