Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mypyc] Faster bytes formatting #10989

Merged
merged 5 commits into from Aug 19, 2021
Merged

Conversation

97littleleaf11
Copy link
Collaborator

Description

  • Share the same tokenizer with string formatting.
  • Add a new conversion helper function convert_expr_to_bytes that convert values to bytes.
  • Add a bytes construction C helper function CPyBytes_Build and join_formatted_bytes.
  • Add a new FormatOp for %b.

Test Plan

Several run tests and an irbuild test for CPyBytes_Build

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! Left some minor comments. Optimizing non-ascii literals can be left for a follow-up issue if it's non-trivial (they just need to work correctly).

# https://www.python.org/dev/peps/pep-0461/
def test_bytes_formatting() -> None:
val = 10
assert b"%x" % val == b'a'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test producing an empty bytes object (e.g. b"%b" % bytes() and b"" % ()).

Add test for __bytes__ conversion (of create an issue if it doesn't work yet).

@@ -64,3 +64,11 @@
return_type=bytes_rprimitive,
c_function_name='CPyBytes_Join',
error_kind=ERR_MAGIC)

bytes_build_op = custom_op(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment.

mypyc/irbuild/expression.py Outdated Show resolved Hide resolved
mypyc/irbuild/expression.py Outdated Show resolved Hide resolved
mypyc/irbuild/format_str_tokenizer.py Outdated Show resolved Hide resolved
"""Merge the list of literals and the list of substitutions
alternatively using 'bytes_build_op'.

The literals should only contains ascii literal objects.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be ascii? Is it sufficient for them to be arbitrary bytes literals?


for a, b in zip(literals, substitutions):
if a:
result_list.append(builder.load_bytes(a.encode('ascii')))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you'd use latin1 as the encoding, could we support non-ascii bytes values?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The literals inside BytesExpr is guaranteed to be ascii literals, otherwise it won't pass the type-checker.

mypyc/test-data/run-bytes.test Show resolved Hide resolved
@JukkaL
Copy link
Collaborator

JukkaL commented Aug 18, 2021

I checked that this makes the bytes_format benchmark about 2.1x faster than before!

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! Looks good.

@JukkaL JukkaL merged commit f1167bc into python:master Aug 19, 2021
@97littleleaf11 97littleleaf11 deleted the bytes-formatting branch August 19, 2021 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants