Skip to content

Slower string concatenation in CPython 3.11 #99862

@zephyr111

Description

@zephyr111

Hello,

We have found a regression between CPython 3.10.8 and CPython 3.11 resulting in string concatenation to be significantly slower in loops on Windows 10. This is described in details in this StackOverflow post.

Here is a minimal, reproducible example of benchmarking code:

import time
a = 'a'

start = time.time()
for _ in range(1000000):
    a += 'a'
end = time.time()

print(a[:5], (end-start) * 1000)

CPython 3.11.0 is about 100 times slower than CPython 3.10.8 due to a quadratic running time (as opposed to a linear running time for CPython 3.10.8).

The analysis shows that CPython 3.10.8 was generating an INPLACE_ADD instruction so PyUnicode_Append is called at runtime, while CPython 3.11.0 new generates a BINARY_OP instruction so PyUnicode_Concat is actually called. The later function creates a new bigger string reducing drastically the performance of the string appending loop in the provided code. This appears to be related to the issue #89799 . I think if we want to replace INPLACE_ADD with a BINARY_OP, then an optimization checking the number of references (so to eventually do an in-place operation) is missing in the code of CPython 3.11.0. What do you think about it?

My environment is an embedded CPython 3.10.8 and an embedded CPython 3.11.0, both running on Windows 10 (22H2) with a x86-64 processor (i5-9600KF).

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance or resource usagetype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions