-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate all tokens related code and docs from Grammar/Tokens #74640
Comments
Currently Lib/token.py is generated from Include/token.h. This contradicts common practice when the C code is generated from the Python code (see for example opcode.py and sre_constants.py). In additional the table in Parser/tokenizer.c should be manually supported matching Include/token.h. Generating Include/token.h and Parser/tokenizer.c from Lib/token.py would be simpler and more reliable. |
I like the idea. |
I can work on it |
I already write a patch. |
PR 1860 makes following files be generated from token.py:
New Makefile target regen-token regenerates these files. The dict EXACT_TOKEN_TYPES that maps operator strings to token names now is automatically generated and moved from tokenize.py to token.py. Tokens COMMENT, NL and ENCODING used only in tokenize.py now are added in token.py as in bpo-25324. |
I think this covers all the changes from PR bpo-1608. Looks a lot nicer too, building it every time from the make file. You may want to add to the docs that token.py is now the source of the tokens. |
The regular expression tokenize.Funny also can be generated. Information is not enough for distinguish between Operator, Bracket and Special, but seems this isn't needed. Some token names can be generated from Grammar/Grammar. But needed an additional mapping for relations between token strings and names ('+' <-> PLUS, etc). |
Alternate PR 10370 generates all files from a single file Grammar/Tokens using a single script Tools/scripts/generate_token.py. In addition, the script doesn't write files when the content is not changed. Thus it can be used with read-only sources. |
Could anybody please make a review? There are two alternate PRs: PR 1860 and PR 10370. The difference between them is that the former one uses Lib/token.py as a source, and the latter one uses Grammar/Tokens as a source and generates Lib/token.py too. |
If there are no objections I am going to merge PR 10370 in few days. |
New changeset 8ac6581 by Serhiy Storchaka in branch 'master': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: