-
-
Notifications
You must be signed in to change notification settings - Fork 32.6k
Description
Hello,
We have found a regression between CPython 3.10.8 and CPython 3.11 resulting in string concatenation to be significantly slower in loops on Windows 10. This is described in details in this StackOverflow post.
Here is a minimal, reproducible example of benchmarking code:
import time
a = 'a'
start = time.time()
for _ in range(1000000):
a += 'a'
end = time.time()
print(a[:5], (end-start) * 1000)
CPython 3.11.0 is about 100 times slower than CPython 3.10.8 due to a quadratic running time (as opposed to a linear running time for CPython 3.10.8).
The analysis shows that CPython 3.10.8 was generating an INPLACE_ADD instruction so PyUnicode_Append
is called at runtime, while CPython 3.11.0 new generates a BINARY_OP instruction so PyUnicode_Concat
is actually called. The later function creates a new bigger string reducing drastically the performance of the string appending loop in the provided code. This appears to be related to the issue #89799 . I think if we want to replace INPLACE_ADD with a BINARY_OP, then an optimization checking the number of references (so to eventually do an in-place operation) is missing in the code of CPython 3.11.0. What do you think about it?
My environment is an embedded CPython 3.10.8 and an embedded CPython 3.11.0, both running on Windows 10 (22H2) with a x86-64 processor (i5-9600KF).