gh-119396: Optimize PyUnicode_FromFormat() UTF-8 decoder #119398

vstinner · 2024-05-22T12:49:25Z

Add unicode_decode_utf8_writer() to write directly characters into a _PyUnicodeWriter writer: avoid the creation of a temporary string. Optimize PyUnicode_FromFormat() by using the new
unicode_decode_utf8_writer().

Rename unicode_fromformat_write_cstr() to
unicode_fromformat_write_utf8().

Microbenchmark on the code:

return PyUnicode_FromFormat(
    "%s %s %s %s %s.",
    "format", "multiple", "utf8", "short", "strings");

Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.

Issue: [C API] Add an efficient public PyUnicodeWriter API #119182

Issue: Optimize _PyUnicodeWriter implementation #119396

vstinner · 2024-05-22T12:50:44Z

Benchmark:

diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.c
index f99ebf0dde..0752b2b1d2 100644
--- a/Modules/_testcapimodule.c
+++ b/Modules/_testcapimodule.c
@@ -3312,6 +3312,14 @@ function_set_warning(PyObject *Py_UNUSED(module), PyObject *Py_UNUSED(args))
     Py_RETURN_NONE;
 }
 
+static PyObject *
+bench(PyObject *Py_UNUSED(module), PyObject *Py_UNUSED(args))
+{
+    return PyUnicode_FromFormat(
+        "%s %s %s %s %s.",
+        "format", "multiple", "utf8", "short", "strings");
+}
+
 static PyMethodDef TestMethods[] = {
     {"set_errno",               set_errno,                       METH_VARARGS},
     {"test_config",             test_config,                     METH_NOARGS},
@@ -3454,6 +3462,7 @@ static PyMethodDef TestMethods[] = {
     {"check_pyimport_addmodule", check_pyimport_addmodule, METH_VARARGS},
     {"test_weakref_capi", test_weakref_capi, METH_NOARGS},
     {"function_set_warning", function_set_warning, METH_NOARGS},
+    {"bench", bench, METH_NOARGS},
     {NULL, NULL} /* sentinel */
 };

Command:

./python -m venv env
env/bin/python -m pip install pyperf
env/bin/python -m pyperf timeit -s 'import _testcapi; func=_testcapi.bench' 'func()' -v -o ref.json

Result, Python built with gcc -O3:

620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster

vstinner · 2024-05-22T14:40:59Z

Oh, there was a performance regression on b"abc".decode(): I fixed it.

Benchmark:

import pyperf
import _testcapi
runner = pyperf.Runner()

utf8 = b'abc'
runner.bench_func('abc', utf8.decode)

utf8 = 'abcé'.encode()
runner.bench_func('abc + UTF-8', utf8.decode)

utf8 = 'éabc'.encode()
runner.bench_func('UTF-8 + abc', utf8.decode)

utf8 = b'x' * (1024 * 1024)
runner.bench_func('ASCII 1 MiB', utf8.decode)

utf8 = ('x' * (1024 * 1024) + 'é').encode()
runner.bench_func('ASCII 1 MiB + UTF-8', utf8.decode)

utf8 = ('é' + 'x' * (1024 * 1024)).encode()
runner.bench_func('UTF-8 + ASCII 1 MiB', utf8.decode)

utf8 = ('€' + 'x' * (1024 * 1024)).encode()
runner.bench_func('UTF-8 euro + ASCII 1 MiB', utf8.decode)

Results, Python built with gcc -O3, CPU isolation.

+---------------------+---------+-----------------------+
| Benchmark           | ref     | change                |
+=====================+=========+=======================+
| abc                 | 73.7 ns | 74.7 ns: 1.01x slower |
+---------------------+---------+-----------------------+
| abc + UTF-8         | 167 ns  | 172 ns: 1.03x slower  |
+---------------------+---------+-----------------------+
| ASCII 1 MiB         | 118 us  | 118 us: 1.00x faster  |
+---------------------+---------+-----------------------+
| ASCII 1 MiB + UTF-8 | 1.08 ms | 1.07 ms: 1.00x faster |
+---------------------+---------+-----------------------+
| UTF-8 + ASCII 1 MiB | 572 us  | 570 us: 1.00x faster  |
+---------------------+---------+-----------------------+
| Geometric mean      | (ref)   | 1.00x slower          |
+---------------------+---------+-----------------------+

Benchmark hidden because not significant (2): UTF-8 + abc, UTF-8 euro + ASCII 1 MiB

=> There is no significant impact on bytes.decode() performance (no slow down).

vstinner · 2024-05-22T14:42:10Z

cc @serhiy-storchaka

serhiy-storchaka

LGTM.

Objects/unicodeobject.c

Add unicode_decode_utf8_writer() to write directly characters into a _PyUnicodeWriter writer: avoid the creation of a temporary string. Optimize PyUnicode_FromFormat() by using the new unicode_decode_utf8_writer(). Rename unicode_fromformat_write_cstr() to unicode_fromformat_write_utf8(). Microbenchmark on the code: return PyUnicode_FromFormat( "%s %s %s %s %s.", "format", "multiple", "utf8", "short", "strings"); Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.

vstinner · 2024-05-22T19:22:36Z

I enabled automerge. Thanks for the review @serhiy-storchaka.

…n#119398) Add unicode_decode_utf8_writer() to write directly characters into a _PyUnicodeWriter writer: avoid the creation of a temporary string. Optimize PyUnicode_FromFormat() by using the new unicode_decode_utf8_writer(). Rename unicode_fromformat_write_cstr() to unicode_fromformat_write_utf8(). Microbenchmark on the code: return PyUnicode_FromFormat( "%s %s %s %s %s.", "format", "multiple", "utf8", "short", "strings"); Result: 620 ns +- 8 ns -> 382 ns +- 2 ns: 1.62x faster.

vstinner added the skip news label May 22, 2024

bedevere-app bot added the awaiting core review label May 22, 2024

bedevere-app bot mentioned this pull request May 22, 2024

[C API] Add an efficient public PyUnicodeWriter API #119182

Closed

vstinner force-pushed the utf8_writer branch from 6c8aedc to d3fe16f Compare May 22, 2024 14:40

serhiy-storchaka approved these changes May 22, 2024

View reviewed changes

Objects/unicodeobject.c Outdated Show resolved Hide resolved

Objects/unicodeobject.c Outdated Show resolved Hide resolved

bedevere-app bot added awaiting merge and removed awaiting core review labels May 22, 2024

vstinner added 3 commits May 22, 2024 21:17

Fix unicode_decode_utf8() perf regression

99f4b13

Address review

d496db8

vstinner force-pushed the utf8_writer branch from d3fe16f to d496db8 Compare May 22, 2024 19:19

vstinner enabled auto-merge (squash) May 22, 2024 19:20

vstinner disabled auto-merge May 22, 2024 20:45

vstinner enabled auto-merge (squash) May 22, 2024 20:45

vstinner changed the title ~~gh-119182: Optimize PyUnicode_FromFormat() UTF-8 decoder~~ gh-119398: Optimize PyUnicode_FromFormat() UTF-8 decoder May 22, 2024

vstinner changed the title ~~gh-119398: Optimize PyUnicode_FromFormat() UTF-8 decoder~~ gh-119396: Optimize PyUnicode_FromFormat() UTF-8 decoder May 22, 2024

bedevere-app bot mentioned this pull request May 22, 2024

Optimize _PyUnicodeWriter implementation #119396

Closed

vstinner merged commit 9b422fc into python:main May 22, 2024
34 checks passed

vstinner deleted the utf8_writer branch May 22, 2024 21:05

bedevere-app bot removed the awaiting merge label May 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-119396: Optimize PyUnicode_FromFormat() UTF-8 decoder #119398

gh-119396: Optimize PyUnicode_FromFormat() UTF-8 decoder #119398

vstinner commented May 22, 2024 •

edited by bedevere-app bot

Loading

vstinner commented May 22, 2024 •

edited

Loading

vstinner commented May 22, 2024

vstinner commented May 22, 2024

serhiy-storchaka left a comment

vstinner commented May 22, 2024

gh-119396: Optimize PyUnicode_FromFormat() UTF-8 decoder #119398

gh-119396: Optimize PyUnicode_FromFormat() UTF-8 decoder #119398

Conversation

vstinner commented May 22, 2024 • edited by bedevere-app bot Loading

vstinner commented May 22, 2024 • edited Loading

vstinner commented May 22, 2024

vstinner commented May 22, 2024

serhiy-storchaka left a comment

Choose a reason for hiding this comment

vstinner commented May 22, 2024

vstinner commented May 22, 2024 •

edited by bedevere-app bot

Loading

vstinner commented May 22, 2024 •

edited

Loading