Slower string concatenation in CPython 3.11

Hello,

We have found a regression between CPython 3.10.8 and CPython 3.11 resulting in string concatenation to be significantly slower in loops on Windows 10. This is described in details in this [StackOverflow post](https://stackoverflow.com/questions/74605279/python-3-11-worse-optimized-than-3-10/74607850).

Here is a minimal, reproducible example of benchmarking code:

```python
import time
a = 'a'

start = time.time()
for _ in range(1000000):
    a += 'a'
end = time.time()

print(a[:5], (end-start) * 1000)
```

CPython 3.11.0 is about 100 times slower than CPython 3.10.8 due to a quadratic running time (as opposed to a linear running time for CPython 3.10.8).

The analysis shows that CPython 3.10.8 was generating an INPLACE_ADD instruction so `PyUnicode_Append` is called at runtime, while CPython 3.11.0 new generates a BINARY_OP instruction so `PyUnicode_Concat` is actually called. The later function creates a new bigger string reducing drastically the performance of the string appending loop in the provided code. This appears to be related to the issue #89799 . I think if we want to replace INPLACE_ADD with a BINARY_OP, then an optimization checking the number of references (so to eventually do an in-place operation) is missing in the code of CPython 3.11.0. What do you think about it?

My environment is an embedded CPython 3.10.8 and an embedded CPython 3.11.0, both running on Windows 10 (22H2) with a x86-64 processor (i5-9600KF).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Slower string concatenation in CPython 3.11 #99862

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Slower string concatenation in CPython 3.11 #99862

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions