Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mypyc] Introduce FormatOp and add a tokenizer for .format() call #10935

Merged
merged 11 commits into from Aug 6, 2021

Conversation

97littleleaf11
Copy link
Collaborator

@97littleleaf11 97littleleaf11 commented Aug 5, 2021

Description

This PR adds a tokenizer that convert a str.format() format string into literals and specifiers. By doing so, the code structure of translate_str_format is clearer.

This PR also introduces FormatOp. Compare to ConversionSpecifier, FormatOp has fewer attributes and indicates compile time optimizations. For example, to mark a conversion from any object to string, ConversionSpecifier may have several representations, like '%s', '{}' or '{:{}}'. However, there would only exist one corresponding FormatOp.

Currently FormatOp is just an Enum for convenience. We might add several attributes later and upgrade it to a class if we need to support more conversions.

To help for the future optimization, these parts of code are extracted into new functions:

  • generate_format_ops that shrink ConversionSpecifier into FormatOp
  • convert_expr that can help convert the expressions into desired results.

@97littleleaf11 97littleleaf11 changed the title [mypyc] Tokenizer for format call [mypyc] Introduce FormatOp and add a tokenizer for .format() call Aug 5, 2021
@97littleleaf11 97littleleaf11 marked this pull request as ready for review August 5, 2021 13:39
Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning up the code! Left several minor comments, mostly about docstrings/missing functionality.

mypyc/irbuild/format_str_tokenizer.py Outdated Show resolved Hide resolved
mypyc/irbuild/format_str_tokenizer.py Show resolved Hide resolved
mypyc/irbuild/format_str_tokenizer.py Outdated Show resolved Hide resolved
format_ops = []
for spec in specifiers:
# TODO: Match specifiers instead of using whole_seq
if spec.whole_seq == '%s' or spec.whole_seq == '{:{}}':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about %d? Will it be supported in a follow-up PR?

It would be better to make this generic over different kinds of formatting instead of special casing percent sign formatting, etc. Do you plan to improve this in a follow-up PR (I assume that this is what the TODO comment is about)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

mypyc/irbuild/format_str_tokenizer.py Show resolved Hide resolved
mypyc/irbuild/format_str_tokenizer.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JukkaL JukkaL merged commit 58c0a05 into python:master Aug 6, 2021
@97littleleaf11 97littleleaf11 deleted the modify-format-call branch August 7, 2021 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants