Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize _PyUnicodeWriter implementation #119396

Closed
vstinner opened this issue May 22, 2024 · 4 comments
Closed

Optimize _PyUnicodeWriter implementation #119396

vstinner opened this issue May 22, 2024 · 4 comments

Comments

@vstinner
Copy link
Member

vstinner commented May 22, 2024

To prepare gh-119182 implementation, I propose to optimize first the _PyUnicodeWriter implementation. For example, optimize the UTF-8 decoder in PyUnicode_FromFormat() by avoiding the creation of a temporary buffer: write directly characters in the writer.

Linked PRs

vstinner added a commit to vstinner/cpython that referenced this issue May 22, 2024
Add fast paths for str, int and float object types.

Benchmark on %S and %R formats:

+----------------+--------+----------------------+
| Benchmark      | ref    | change               |
+================+========+======================+
| str()          | 654 ns | 556 ns: 1.18x faster |
+----------------+--------+----------------------+
| repr()         | 722 ns | 627 ns: 1.15x faster |
+----------------+--------+----------------------+
| Geometric mean | (ref)  | 1.16x faster         |
+----------------+--------+----------------------+
@serhiy-storchaka
Copy link
Member

I do not know how much this is needed, but the code looks correct.

Could you please collect some data about types used with %S and %R? And how often %S and %R are used with and without the width and the precision? Just add writing the type name in the special log file and run the test suite, then count most common names.

@vstinner
Copy link
Member Author

vstinner commented May 22, 2024

Could you please collect some data about types used with %S and %R?

Sure, here you have (I didn't check width or precision):

%R, str: 113038 (80%)
%S, str: 24825 (18%)
%R, bool: 770 (1%)
%R, tuple: 696 (0%)
%R, int: 507 (0%)
%R, list: 393 (0%)
%S, int: 136 (0%)
%S, _GenericAlias: 120 (0%)
%R, bytes: 116 (0%)
%R, dict: 113 (0%)
%R, NoneType: 85 (0%)
%R, frame: 62 (0%)
%R, type: 52 (0%)
%R, types: 43 (0%)
%R, function: 42 (0%)
%R, _io: 33 (0%)
%S, types: 30 (0%)
%R, datetime: 24 (0%)
%R, object: 19 (0%)
%R, float: 18 (0%)
%R, _StoreAction: 16 (0%)
%R, Element: 15 (0%)
%R, _CallableGenericAlias: 12 (0%)
%R, typing: 12 (0%)
%S, NoneType: 11 (0%)
%R, _asyncio: 11 (0%)
%R, QueryTestCase: 10 (0%)
%S, type: 10 (0%)
%A, NoneType: 7 (0%)
%R, FixedOffset: 7 (0%)
%R, EnumType: 6 (0%)
%A, str: 6 (0%)
%S, _AnnotatedAlias: 6 (0%)
%R, NewType: 6 (0%)
%R, CustomPyPicklerClass: 6 (0%)
%R, ArrayBinASCIITest: 5 (0%)
%R, Mock: 5 (0%)
%R, _contextvars: 5 (0%)
%S, object: 4 (0%)
%S, Exception: 4 (0%)
%R, Foo: 4 (0%)
%S, _ProtocolMeta: 4 (0%)
%R, _thread: 4 (0%)
%R, complex: 4 (0%)
%R, Task: 4 (0%)
%R, _ALWAYS_EQ: 3 (0%)
%R, collections: 3 (0%)
%R, builtin_function_or_method: 3 (0%)
%R, Test: 3 (0%)
%S, _CallableGenericAlias: 3 (0%)
%R, functools: 3 (0%)
%R, CPartialSubclass: 3 (0%)
%R, Future: 3 (0%)
%R, Derived: 3 (0%)
%R, StopIteration: 2 (0%)
%S, BadStr: 2 (0%)
%R, code: 2 (0%)
%S, dict: 2 (0%)
%S, NotImplementedType: 2 (0%)
%S, ellipsis: 2 (0%)
%S, Cheese: 2 (0%)
%S, SubBytes: 2 (0%)
%S, list: 2 (0%)
%S, _TypedDictMeta: 2 (0%)
%S, MutatesYourDict: 2 (0%)
%R, Meta: 2 (0%)
%R, SSLSocket: 2 (0%)
%R, CustomStr: 2 (0%)
%S, ValueError: 2 (0%)
%R, C: 2 (0%)
%R, socket: 2 (0%)
%R, TarInfo: 2 (0%)
%R, sub: 1 (0%)
%R, MyFileIO: 1 (0%)
%R, Completer: 1 (0%)
%R, aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa: 1 (0%)
%R, StackObject: 1 (0%)
%R, bytearray: 1 (0%)
%R, CustomBytes: 1 (0%)
%R, CustomByteArray: 1 (0%)
%R, memoryview: 1 (0%)
%R, array: 1 (0%)
%R, posix: 1 (0%)
%R, xml: 1 (0%)
%R, sqlite: 1 (0%)
%R, Base: 1 (0%)
%R, expr: 1 (0%)

For str, int and float:

%R, str: 113038 (80%)
%S, str: 24825 (18%)
%R, int: 507 (0%)
%S, int: 136 (0%)
%R, float: 18 (0%)
%A, str: 6 (0%)

Patch:

diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c
index 480b671390..d2df7bf62d 100644
--- a/Objects/unicodeobject.c
+++ b/Objects/unicodeobject.c
@@ -2750,6 +2750,7 @@ unicode_fromformat_arg(_PyUnicodeWriter *writer,
         PyObject *obj = va_arg(*vargs, PyObject *);
         PyObject *str;
         assert(obj);
+fprintf(stderr, "@@@ PyUnicode_FromFormat(%%S): %s\n", Py_TYPE(obj)->tp_name);
         str = PyObject_Str(obj);
         if (!str)
             return NULL;
@@ -2766,6 +2767,7 @@ unicode_fromformat_arg(_PyUnicodeWriter *writer,
         PyObject *obj = va_arg(*vargs, PyObject *);
         PyObject *repr;
         assert(obj);
+fprintf(stderr, "@@@ PyUnicode_FromFormat(%%R): %s\n", Py_TYPE(obj)->tp_name);
         repr = PyObject_Repr(obj);
         if (!repr)
             return NULL;
@@ -2782,6 +2784,7 @@ unicode_fromformat_arg(_PyUnicodeWriter *writer,
         PyObject *obj = va_arg(*vargs, PyObject *);
         PyObject *ascii;
         assert(obj);
+fprintf(stderr, "@@@ PyUnicode_FromFormat(%%A): %s\n", Py_TYPE(obj)->tp_name);
         ascii = PyObject_ASCII(obj);
         if (!ascii)
             return NULL;

@serhiy-storchaka
Copy link
Member

Thank you. According to these results we can ignore int and float and only optimize str. Of course, the data obtained in such way is very rough estimation of real world usage, but the difference is too large.

Unfortunately, %R for str is not optimized in this PR, and the usage of str with %S is not overwhelming, so this optimization can have very minor effect. I am not sure now that the gain is worth the cost.

@vstinner
Copy link
Member Author

I also write PR gh-119398 for this issue: "Optimize PyUnicode_FromFormat() UTF-8 decoder".

vstinner added a commit that referenced this issue May 22, 2024
Add unicode_decode_utf8_writer() to write directly characters into a
_PyUnicodeWriter writer: avoid the creation of a temporary string.
Optimize PyUnicode_FromFormat() by using the new
unicode_decode_utf8_writer().

Rename unicode_fromformat_write_cstr() to
unicode_fromformat_write_utf8().

Microbenchmark on the code:

    return PyUnicode_FromFormat(
        "%s %s %s %s %s.",
        "format", "multiple", "utf8", "short", "strings");

Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.
vstinner added a commit to vstinner/cpython that referenced this issue May 27, 2024
Use stringlib to specialize unicode_repr() for each string kind
(UCS1, UCS2, UCS4).

Benchmark:

+-------------------------------------+---------+----------------------+
| Benchmark                           | ref     | change2              |
+=====================================+=========+======================+
| repr('abc')                         | 100 ns  | 103 ns: 1.02x slower |
+-------------------------------------+---------+----------------------+
| repr('a' * 100)                     | 369 ns  | 369 ns: 1.00x slower |
+-------------------------------------+---------+----------------------+
| repr(('a' + squote) * 100)          | 1.21 us | 946 ns: 1.27x faster |
+-------------------------------------+---------+----------------------+
| repr(('a' + nl) * 100)              | 1.23 us | 907 ns: 1.36x faster |
+-------------------------------------+---------+----------------------+
| repr(dquote + ('a' + squote) * 100) | 1.08 us | 858 ns: 1.25x faster |
+-------------------------------------+---------+----------------------+
| Geometric mean                      | (ref)   | 1.16x faster         |
+-------------------------------------+---------+----------------------+
vstinner added a commit that referenced this issue May 28, 2024
Use stringlib to specialize unicode_repr() for each string kind
(UCS1, UCS2, UCS4).

Benchmark:

+-------------------------------------+---------+----------------------+
| Benchmark                           | ref     | change2              |
+=====================================+=========+======================+
| repr('abc')                         | 100 ns  | 103 ns: 1.02x slower |
+-------------------------------------+---------+----------------------+
| repr('a' * 100)                     | 369 ns  | 369 ns: 1.00x slower |
+-------------------------------------+---------+----------------------+
| repr(('a' + squote) * 100)          | 1.21 us | 946 ns: 1.27x faster |
+-------------------------------------+---------+----------------------+
| repr(('a' + nl) * 100)              | 1.23 us | 907 ns: 1.36x faster |
+-------------------------------------+---------+----------------------+
| repr(dquote + ('a' + squote) * 100) | 1.08 us | 858 ns: 1.25x faster |
+-------------------------------------+---------+----------------------+
| Geometric mean                      | (ref)   | 1.16x faster         |
+-------------------------------------+---------+----------------------+
vstinner added a commit that referenced this issue Jun 3, 2024
Optimize unicode_decode_utf8_writer()

Take the ascii_decode() fast-path even if dest is not aligned on
size_t bytes.
mliezun pushed a commit to mliezun/cpython that referenced this issue Jun 3, 2024
Optimize unicode_decode_utf8_writer()

Take the ascii_decode() fast-path even if dest is not aligned on
size_t bytes.
barneygale pushed a commit to barneygale/cpython that referenced this issue Jun 5, 2024
Optimize unicode_decode_utf8_writer()

Take the ascii_decode() fast-path even if dest is not aligned on
size_t bytes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants