Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-119396: Optimize unicode_repr() #119617

Merged
merged 2 commits into from
May 28, 2024
Merged

gh-119396: Optimize unicode_repr() #119617

merged 2 commits into from
May 28, 2024

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented May 27, 2024

Use stringlib to specialize unicode_repr() for each string kind (UCS1, UCS2, UCS4).

Benchmark:

+-------------------------------------+---------+----------------------+
| Benchmark                           | ref     | change2              |
+=====================================+=========+======================+
| repr('abc')                         | 100 ns  | 103 ns: 1.02x slower |
+-------------------------------------+---------+----------------------+
| repr('a' * 100)                     | 369 ns  | 369 ns: 1.00x slower |
+-------------------------------------+---------+----------------------+
| repr(('a' + squote) * 100)          | 1.21 us | 946 ns: 1.27x faster |
+-------------------------------------+---------+----------------------+
| repr(('a' + nl) * 100)              | 1.23 us | 907 ns: 1.36x faster |
+-------------------------------------+---------+----------------------+
| repr(dquote + ('a' + squote) * 100) | 1.08 us | 858 ns: 1.25x faster |
+-------------------------------------+---------+----------------------+
| Geometric mean                      | (ref)   | 1.16x faster         |
+-------------------------------------+---------+----------------------+

Use stringlib to specialize unicode_repr() for each string kind
(UCS1, UCS2, UCS4).

Benchmark:

+-------------------------------------+---------+----------------------+
| Benchmark                           | ref     | change2              |
+=====================================+=========+======================+
| repr('abc')                         | 100 ns  | 103 ns: 1.02x slower |
+-------------------------------------+---------+----------------------+
| repr('a' * 100)                     | 369 ns  | 369 ns: 1.00x slower |
+-------------------------------------+---------+----------------------+
| repr(('a' + squote) * 100)          | 1.21 us | 946 ns: 1.27x faster |
+-------------------------------------+---------+----------------------+
| repr(('a' + nl) * 100)              | 1.23 us | 907 ns: 1.36x faster |
+-------------------------------------+---------+----------------------+
| repr(dquote + ('a' + squote) * 100) | 1.08 us | 858 ns: 1.25x faster |
+-------------------------------------+---------+----------------------+
| Geometric mean                      | (ref)   | 1.16x faster         |
+-------------------------------------+---------+----------------------+
@vstinner
Copy link
Member Author

Benchmark:

import pyperf
runner = pyperf.Runner()
squote = "'"
dquote = '"'
nl = '\n'
runner.bench_func("repr('abc')", repr, 'abc')
runner.bench_func("repr('a' * 100)", repr, 'a' * 100)
runner.bench_func("repr(('a' + squote) * 100)", repr, ('a' + squote) * 100)
runner.bench_func("repr(('a' + nl) * 100)", repr, ('a' + nl) * 100)
runner.bench_func("repr(dquote + ('a' + squote) * 100)", repr, dquote + ('a' + squote) * 100)

@vstinner
Copy link
Member Author

cc @serhiy-storchaka

@vstinner
Copy link
Member Author

This is a first step. The second step will be to avoid a temporary string in PyUnicode_FromFormat("%R", str_obj).

@vstinner
Copy link
Member Author

This is a first step. The second step will be to avoid a temporary string in PyUnicode_FromFormat("%R", str_obj).

I implemented the second step locally. Sadly, it's slower! Not faster. IMO the first step (making the code faster) is still worth it :-)

@vstinner vstinner merged commit 0518edc into python:main May 28, 2024
34 checks passed
@vstinner vstinner deleted the unicode_repr branch May 28, 2024 16:05
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
Use stringlib to specialize unicode_repr() for each string kind
(UCS1, UCS2, UCS4).

Benchmark:

+-------------------------------------+---------+----------------------+
| Benchmark                           | ref     | change2              |
+=====================================+=========+======================+
| repr('abc')                         | 100 ns  | 103 ns: 1.02x slower |
+-------------------------------------+---------+----------------------+
| repr('a' * 100)                     | 369 ns  | 369 ns: 1.00x slower |
+-------------------------------------+---------+----------------------+
| repr(('a' + squote) * 100)          | 1.21 us | 946 ns: 1.27x faster |
+-------------------------------------+---------+----------------------+
| repr(('a' + nl) * 100)              | 1.23 us | 907 ns: 1.36x faster |
+-------------------------------------+---------+----------------------+
| repr(dquote + ('a' + squote) * 100) | 1.08 us | 858 ns: 1.25x faster |
+-------------------------------------+---------+----------------------+
| Geometric mean                      | (ref)   | 1.16x faster         |
+-------------------------------------+---------+----------------------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant