Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for string formatting #7418

Merged
merged 39 commits into from Sep 24, 2019

Conversation

@ilevkivskyi
Copy link
Collaborator

ilevkivskyi commented Aug 29, 2019

Fixes #1444
Fixes #2639
Fixes #7114

This PR does three related things:

  • Fixes some corner cases in % formatting
  • Tightens bytes vs str interactions (including those that are technically not errors)
  • Adds type checking for str.format() calls

This also fixes few issues discovered by this PR during testing (notably a bug in f-string transformation), and adds/extends a bunch of docstring and comments to existing code. The implementation is mostly straightforward, there are few hacky things, but IMO nothing unreasonable.

Here are few comments:

  • It was hard to keep the approach to str.format() calls purely regexp-based (as for % formatting), mostly because the former can be both nested and repeated (And we must support nested formatting because this is how we support f-strings). CPython itself uses a custom parser but it is huge, so I decided to have a mixed approach. This way we can keep code simple while still maintaining practically one-to-one compatibility with runtime behavior (the error messages are sometimes different however).
  • This causes few hundreds of errors internally, I am not sure what to do with these. Most popular ones are:
    • Unicode upcast ('%s' % u'...' results in unicode, not str, on Python 2)
    • Using '%s' % some_bytes and/or '{:s}'.format(some_bytes) on Python 3
    • Unused arguments in str.format() call (probably because at runtime they are silently ignored)
  • I added new error code for string interpolation/formatting and used it everywhere in old and new code. Potentially we might split out the str vs bytes errors into a separate error code, because technically they are not type errors, just suspicious/dangerous code.
@ilevkivskyi ilevkivskyi requested review from msullivan and JukkaL Aug 29, 2019
Ivan Levkivskyi added 4 commits Aug 29, 2019
Ivan Levkivskyi
Copy link
Collaborator

msullivan left a comment

I've about half finished the review and will do the rest tomorrow morning, but it looks good so far and the overall design seems solid.

This is a great feature, especially since this is how we represent F-strings.

return False


def custom_special_method(typ: Type, name: str) -> bool:

This comment has been minimized.

Copy link
@msullivan

msullivan Aug 29, 2019

Collaborator

This is a nice generalization of the custom equality function but should probably live somewhere other than checkstrformat. checker might be a better place than checkexpr where it was before?

This comment has been minimized.

Copy link
@ilevkivskyi

ilevkivskyi Aug 31, 2019

Author Collaborator

This will make it a runtime import cycle, currently 'checkstrformat' only imports others under if TYPE_CHECKING: .... Maybe we should make a new module like checker_utils.py?

StringFormatterChecker.check_str_interpolation() for printf-style % interpolation.
Note that although at runtime format strings are parsed using custom parsers,
here we use a regerx-based approach. This way we 99% match runtime behaviour while keeping

This comment has been minimized.

Copy link
@msullivan

msullivan Aug 29, 2019

Collaborator

regex

[ARG_POS],
[None])
[ARG_POS, ARG_POS],
[None, None])

This comment has been minimized.

Copy link
@msullivan

msullivan Aug 29, 2019

Collaborator

I am somewhat confused how this every worked but oh well.
I guess they always get zipped and zip ends early?

See After https://docs.python.org/3/library/string.html#formatspec for
specifications. The regexps are intentionally wider, to report better errors,
instead of just not matching.
"""

This comment has been minimized.

Copy link
@msullivan

msullivan Aug 29, 2019

Collaborator

I agree with the regex/overmatching approach

self.msg.fail('Formatting nesting must be at most two levels deep',
ctx, code=codes.STRING_FORMATTING)
return None
sub_conv_specs = self.parse_format_value(conv_spec.format_spec[1:], ctx=ctx,

This comment has been minimized.

Copy link
@msullivan

msullivan Aug 29, 2019

Collaborator

What is the [1:] for?

This comment has been minimized.

Copy link
@ilevkivskyi

ilevkivskyi Aug 31, 2019

Author Collaborator

The original format spec includes the starting colon, so I remove it everywhere, but here it is indeed unnecessary.

Copy link
Collaborator

msullivan left a comment

Looks great.

expected_type: Type) -> None:
# TODO: try refactoring to combine this logic with % formatting.
if spec.type == 'c':
if isinstance(repl, (StrExpr, BytesExpr)) and len(cast(StrExpr, repl).value) != 1:

This comment has been minimized.

Copy link
@msullivan

msullivan Aug 30, 2019

Collaborator

This cast is going to crash with mypyc if it is a BytesExpr. (Why is the cast needed?)

" use !r if this is a desired behavior", call,
code=codes.STRING_FORMATTING)
if spec.flags:
numeric_types = UnionType([self.named_type('builtins.int'),

This comment has been minimized.

Copy link
@msullivan

msullivan Aug 30, 2019

Collaborator

How does this interact with things like decimal?

This comment has been minimized.

Copy link
@ilevkivskyi

ilevkivskyi Aug 30, 2019

Author Collaborator

I just tried and it is complicated, Decimal supports some flags, but doesn't actually supports {:d}, but still supports %d. I will try to write some more precise rules.


def validate_and_transform_accessors(self, temp_ast: Expression, original_repl: Expression,
spec: ConversionSpecifier, ctx: Context) -> bool:
"""Validate and transform (in-place) format field accessors.

This comment has been minimized.

Copy link
@msullivan

msullivan Aug 30, 2019

Collaborator

Explain what transforming is here?

Ivan Levkivskyi
Ivan Levkivskyi and others added 5 commits Sep 10, 2019
Ivan Levkivskyi
@ilevkivskyi ilevkivskyi merged commit 5b3346f into python:master Sep 24, 2019
2 checks passed
2 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@ilevkivskyi ilevkivskyi deleted the ilevkivskyi:strict-formatting branch Sep 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.