Optimize _PyUnicodeWriter implementation #119396

vstinner · 2024-05-22T12:42:01Z

To prepare gh-119182 implementation, I propose to optimize first the _PyUnicodeWriter implementation. For example, optimize the UTF-8 decoder in PyUnicode_FromFormat() by avoiding the creation of a temporary buffer: write directly characters in the writer.

Linked PRs

Add fast paths for str, int and float object types. Benchmark on %S and %R formats: +----------------+--------+----------------------+ | Benchmark | ref | change | +================+========+======================+ | str() | 654 ns | 556 ns: 1.18x faster | +----------------+--------+----------------------+ | repr() | 722 ns | 627 ns: 1.15x faster | +----------------+--------+----------------------+ | Geometric mean | (ref) | 1.16x faster | +----------------+--------+----------------------+

serhiy-storchaka · 2024-05-22T17:51:11Z

I do not know how much this is needed, but the code looks correct.

Could you please collect some data about types used with %S and %R? And how often %S and %R are used with and without the width and the precision? Just add writing the type name in the special log file and run the test suite, then count most common names.

vstinner · 2024-05-22T19:33:36Z

Could you please collect some data about types used with %S and %R?

Sure, here you have (I didn't check width or precision):

%R, str: 113038 (80%)
%S, str: 24825 (18%)
%R, bool: 770 (1%)
%R, tuple: 696 (0%)
%R, int: 507 (0%)
%R, list: 393 (0%)
%S, int: 136 (0%)
%S, _GenericAlias: 120 (0%)
%R, bytes: 116 (0%)
%R, dict: 113 (0%)
%R, NoneType: 85 (0%)
%R, frame: 62 (0%)
%R, type: 52 (0%)
%R, types: 43 (0%)
%R, function: 42 (0%)
%R, _io: 33 (0%)
%S, types: 30 (0%)
%R, datetime: 24 (0%)
%R, object: 19 (0%)
%R, float: 18 (0%)
%R, _StoreAction: 16 (0%)
%R, Element: 15 (0%)
%R, _CallableGenericAlias: 12 (0%)
%R, typing: 12 (0%)
%S, NoneType: 11 (0%)
%R, _asyncio: 11 (0%)
%R, QueryTestCase: 10 (0%)
%S, type: 10 (0%)
%A, NoneType: 7 (0%)
%R, FixedOffset: 7 (0%)
%R, EnumType: 6 (0%)
%A, str: 6 (0%)
%S, _AnnotatedAlias: 6 (0%)
%R, NewType: 6 (0%)
%R, CustomPyPicklerClass: 6 (0%)
%R, ArrayBinASCIITest: 5 (0%)
%R, Mock: 5 (0%)
%R, _contextvars: 5 (0%)
%S, object: 4 (0%)
%S, Exception: 4 (0%)
%R, Foo: 4 (0%)
%S, _ProtocolMeta: 4 (0%)
%R, _thread: 4 (0%)
%R, complex: 4 (0%)
%R, Task: 4 (0%)
%R, _ALWAYS_EQ: 3 (0%)
%R, collections: 3 (0%)
%R, builtin_function_or_method: 3 (0%)
%R, Test: 3 (0%)
%S, _CallableGenericAlias: 3 (0%)
%R, functools: 3 (0%)
%R, CPartialSubclass: 3 (0%)
%R, Future: 3 (0%)
%R, Derived: 3 (0%)
%R, StopIteration: 2 (0%)
%S, BadStr: 2 (0%)
%R, code: 2 (0%)
%S, dict: 2 (0%)
%S, NotImplementedType: 2 (0%)
%S, ellipsis: 2 (0%)
%S, Cheese: 2 (0%)
%S, SubBytes: 2 (0%)
%S, list: 2 (0%)
%S, _TypedDictMeta: 2 (0%)
%S, MutatesYourDict: 2 (0%)
%R, Meta: 2 (0%)
%R, SSLSocket: 2 (0%)
%R, CustomStr: 2 (0%)
%S, ValueError: 2 (0%)
%R, C: 2 (0%)
%R, socket: 2 (0%)
%R, TarInfo: 2 (0%)
%R, sub: 1 (0%)
%R, MyFileIO: 1 (0%)
%R, Completer: 1 (0%)
%R, aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa: 1 (0%)
%R, StackObject: 1 (0%)
%R, bytearray: 1 (0%)
%R, CustomBytes: 1 (0%)
%R, CustomByteArray: 1 (0%)
%R, memoryview: 1 (0%)
%R, array: 1 (0%)
%R, posix: 1 (0%)
%R, xml: 1 (0%)
%R, sqlite: 1 (0%)
%R, Base: 1 (0%)
%R, expr: 1 (0%)

For str, int and float:

%R, str: 113038 (80%)
%S, str: 24825 (18%)
%R, int: 507 (0%)
%S, int: 136 (0%)
%R, float: 18 (0%)
%A, str: 6 (0%)

Patch:

diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c
index 480b671390..d2df7bf62d 100644
--- a/Objects/unicodeobject.c
+++ b/Objects/unicodeobject.c
@@ -2750,6 +2750,7 @@ unicode_fromformat_arg(_PyUnicodeWriter *writer,
         PyObject *obj = va_arg(*vargs, PyObject *);
         PyObject *str;
         assert(obj);
+fprintf(stderr, "@@@ PyUnicode_FromFormat(%%S): %s\n", Py_TYPE(obj)->tp_name);
         str = PyObject_Str(obj);
         if (!str)
             return NULL;
@@ -2766,6 +2767,7 @@ unicode_fromformat_arg(_PyUnicodeWriter *writer,
         PyObject *obj = va_arg(*vargs, PyObject *);
         PyObject *repr;
         assert(obj);
+fprintf(stderr, "@@@ PyUnicode_FromFormat(%%R): %s\n", Py_TYPE(obj)->tp_name);
         repr = PyObject_Repr(obj);
         if (!repr)
             return NULL;
@@ -2782,6 +2784,7 @@ unicode_fromformat_arg(_PyUnicodeWriter *writer,
         PyObject *obj = va_arg(*vargs, PyObject *);
         PyObject *ascii;
         assert(obj);
+fprintf(stderr, "@@@ PyUnicode_FromFormat(%%A): %s\n", Py_TYPE(obj)->tp_name);
         ascii = PyObject_ASCII(obj);
         if (!ascii)
             return NULL;

serhiy-storchaka · 2024-05-22T20:28:06Z

Thank you. According to these results we can ignore int and float and only optimize str. Of course, the data obtained in such way is very rough estimation of real world usage, but the difference is too large.

Unfortunately, %R for str is not optimized in this PR, and the usage of str with %S is not overwhelming, so this optimization can have very minor effect. I am not sure now that the gain is worth the cost.

vstinner · 2024-05-22T20:46:42Z

I also write PR gh-119398 for this issue: "Optimize PyUnicode_FromFormat() UTF-8 decoder".

Add unicode_decode_utf8_writer() to write directly characters into a _PyUnicodeWriter writer: avoid the creation of a temporary string. Optimize PyUnicode_FromFormat() by using the new unicode_decode_utf8_writer(). Rename unicode_fromformat_write_cstr() to unicode_fromformat_write_utf8(). Microbenchmark on the code: return PyUnicode_FromFormat( "%s %s %s %s %s.", "format", "multiple", "utf8", "short", "strings"); Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.

Use stringlib to specialize unicode_repr() for each string kind (UCS1, UCS2, UCS4). Benchmark: +-------------------------------------+---------+----------------------+ | Benchmark | ref | change2 | +=====================================+=========+======================+ | repr('abc') | 100 ns | 103 ns: 1.02x slower | +-------------------------------------+---------+----------------------+ | repr('a' * 100) | 369 ns | 369 ns: 1.00x slower | +-------------------------------------+---------+----------------------+ | repr(('a' + squote) * 100) | 1.21 us | 946 ns: 1.27x faster | +-------------------------------------+---------+----------------------+ | repr(('a' + nl) * 100) | 1.23 us | 907 ns: 1.36x faster | +-------------------------------------+---------+----------------------+ | repr(dquote + ('a' + squote) * 100) | 1.08 us | 858 ns: 1.25x faster | +-------------------------------------+---------+----------------------+ | Geometric mean | (ref) | 1.16x faster | +-------------------------------------+---------+----------------------+

Optimize unicode_decode_utf8_writer() Take the ascii_decode() fast-path even if dest is not aligned on size_t bytes.

bedevere-app bot mentioned this issue May 22, 2024

gh-119396: Optimize %S format of PyUnicode_FromFormat() #119412

Closed

bedevere-app bot mentioned this issue May 22, 2024

gh-119396: Optimize PyUnicode_FromFormat() UTF-8 decoder #119398

Merged

bedevere-app bot mentioned this issue May 27, 2024

gh-119396: Optimize unicode_repr() #119617

Merged

vstinner closed this as completed May 29, 2024

bedevere-app bot mentioned this issue Jun 2, 2024

gh-119396: Optimize unicode_decode_utf8_writer() #119957

Merged

vstinner added a commit that referenced this issue Jun 3, 2024

gh-119396: Optimize unicode_decode_utf8_writer() (#119957)

3ea9b92

Optimize unicode_decode_utf8_writer() Take the ascii_decode() fast-path even if dest is not aligned on size_t bytes.

mliezun pushed a commit to mliezun/cpython that referenced this issue Jun 3, 2024

pythongh-119396: Optimize unicode_decode_utf8_writer() (python#119957)

433ef5c

Optimize unicode_decode_utf8_writer() Take the ascii_decode() fast-path even if dest is not aligned on size_t bytes.

barneygale pushed a commit to barneygale/cpython that referenced this issue Jun 5, 2024

pythongh-119396: Optimize unicode_decode_utf8_writer() (python#119957)

dba3cb1

Optimize unicode_decode_utf8_writer() Take the ascii_decode() fast-path even if dest is not aligned on size_t bytes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize _PyUnicodeWriter implementation #119396

Optimize _PyUnicodeWriter implementation #119396

vstinner commented May 22, 2024 •

edited by bedevere-app bot

Loading

serhiy-storchaka commented May 22, 2024

vstinner commented May 22, 2024 •

edited

Loading

serhiy-storchaka commented May 22, 2024

vstinner commented May 22, 2024

Optimize _PyUnicodeWriter implementation #119396

Optimize _PyUnicodeWriter implementation #119396

Comments

vstinner commented May 22, 2024 • edited by bedevere-app bot Loading

Linked PRs

serhiy-storchaka commented May 22, 2024

vstinner commented May 22, 2024 • edited Loading

serhiy-storchaka commented May 22, 2024

vstinner commented May 22, 2024

vstinner commented May 22, 2024 •

edited by bedevere-app bot

Loading

vstinner commented May 22, 2024 •

edited

Loading