[mypyc] Faster bytes formatting #10989

97littleleaf11 · 2021-08-18T09:52:25Z

Description

Share the same tokenizer with string formatting.
Add a new conversion helper function convert_expr_to_bytes that convert values to bytes.
Add a bytes construction C helper function CPyBytes_Build and join_formatted_bytes.
Add a new FormatOp for %b.

Test Plan

Several run tests and an irbuild test for CPyBytes_Build

JukkaL

Looks good, thanks! Left some minor comments. Optimizing non-ascii literals can be left for a follow-up issue if it's non-trivial (they just need to work correctly).

JukkaL · 2021-08-18T10:45:50Z

mypyc/test-data/run-bytes.test

+# https://www.python.org/dev/peps/pep-0461/
+def test_bytes_formatting() -> None:
+    val = 10
+    assert b"%x" % val == b'a'


Test producing an empty bytes object (e.g. b"%b" % bytes() and b"" % ()).

Add test for __bytes__ conversion (of create an issue if it doesn't work yet).

JukkaL · 2021-08-18T10:46:08Z

mypyc/primitives/bytes_ops.py

@@ -64,3 +64,11 @@
    return_type=bytes_rprimitive,
    c_function_name='CPyBytes_Join',
    error_kind=ERR_MAGIC)
+
+bytes_build_op = custom_op(


Add comment.

mypyc/irbuild/expression.py

mypyc/irbuild/format_str_tokenizer.py

JukkaL · 2021-08-18T12:36:07Z

mypyc/irbuild/format_str_tokenizer.py

+    """Merge the list of literals and the list of substitutions
+    alternatively using 'bytes_build_op'.
+
+    The literals should only contains ascii literal objects.


Do these need to be ascii? Is it sufficient for them to be arbitrary bytes literals?

JukkaL · 2021-08-18T12:36:42Z

mypyc/irbuild/format_str_tokenizer.py

+
+    for a, b in zip(literals, substitutions):
+        if a:
+            result_list.append(builder.load_bytes(a.encode('ascii')))


If you'd use latin1 as the encoding, could we support non-ascii bytes values?

The literals inside BytesExpr is guaranteed to be ascii literals, otherwise it won't pass the type-checker.

mypyc/test-data/run-bytes.test

JukkaL · 2021-08-18T13:14:45Z

I checked that this makes the bytes_format benchmark about 2.1x faster than before!

JukkaL

Thanks for the updates! Looks good.

97littleleaf11 added 2 commits August 18, 2021 17:47

Support bytes formatting

6001aa6

Delete blank line

57e6d6f

JukkaL reviewed Aug 18, 2021

View reviewed changes

97littleleaf11 added 3 commits August 19, 2021 19:01

Rename convert_expr to convert_format_expr and add some tests.

88a71e9

Add load_bytes_from_str_literal()

ff8c275

Fix

103ebac

JukkaL approved these changes Aug 19, 2021

View reviewed changes

JukkaL merged commit f1167bc into python:master Aug 19, 2021

97littleleaf11 deleted the bytes-formatting branch August 19, 2021 16:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mypyc] Faster bytes formatting #10989

[mypyc] Faster bytes formatting #10989

97littleleaf11 commented Aug 18, 2021

JukkaL left a comment

JukkaL Aug 18, 2021

JukkaL Aug 18, 2021

JukkaL Aug 18, 2021

JukkaL Aug 18, 2021

97littleleaf11 Aug 19, 2021

JukkaL commented Aug 18, 2021

JukkaL left a comment

[mypyc] Faster bytes formatting #10989

[mypyc] Faster bytes formatting #10989

Conversation

97littleleaf11 commented Aug 18, 2021

Description

Test Plan

JukkaL left a comment

Choose a reason for hiding this comment

JukkaL Aug 18, 2021

Choose a reason for hiding this comment

JukkaL Aug 18, 2021

Choose a reason for hiding this comment

JukkaL Aug 18, 2021

Choose a reason for hiding this comment

JukkaL Aug 18, 2021

Choose a reason for hiding this comment

97littleleaf11 Aug 19, 2021

Choose a reason for hiding this comment

JukkaL commented Aug 18, 2021

JukkaL left a comment

Choose a reason for hiding this comment