bpo-42729: Introduce ast.parse_tokens() to interface with "tokenize" #23922
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, with Python it's possible:
representation (AST.parse()).
program.
Python also offers "tokenize" module, but it stands as a disconnected
island: the only things it allows to do is to get from
stream-of-characters program representation to stream-of-tokens, and
back. At the same time, conceptually, tokenization is not a disconnected
feature, it's the first stage of language processing pipeline. The fact
that "tokenize" is disconnected from the rest of the pipeline, as
listed above, is more an artifact of CPython implementation: both
"ast" module and compile() module are backed by the underlying bytecode
compiler implementation written in C, and that's what connects them.
On the other hand, "tokenize" module is pure-Python, while the
underlying compiler has its own tokenizer implementation (not exposed).
That's the likely reason of such disconnection between "tokenize" and
the rest of the infrastructure.
Thia patch closes that gap, and establishes an API which allows to
parse token stream (iterable) into an AST. The initial implementation
for CPython is naive, making a loop thru surface program representation.
That's considered ok, as the idea is to establish a standard API to be
able to go tokens -> AST. Then individual Python implementation can
make/optimize it based on their needs.
The function introduced here is ast.parse_tokens(). It follows the
signature of the existing ast.parse(), except that first parameter
is "token_stream" instead of "source".
Another alternative would be to overload existing ast.parse() to
accept token iterable. I guess, at the current stage, where we try
to tighten up type strictness of API, and have clear typing signatures
for API functions, this is not favored solution.
Signed-off-by: Paul Sokolovsky pfalcon@users.sourceforge.net
https://bugs.python.org/issue42729