-
-
Notifications
You must be signed in to change notification settings - Fork 33.5k
gh-139871: Optimize bytearray unique bytes iconcat #141862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
If the bytearray is empty and a uniquely referenced bytes object is
being concatenated (ex. one just recieved from read), just use its
storage as the backing for the bytearray rather than copying it.
build_bytes_unique: Mean +- std dev: [base] 383 ns +- 11 ns -> [iconcat_opt] 342 ns +- 5 ns: 1.12x faster
build_bytearray: Mean +- std dev: [base] 496 ns +- 8 ns -> [iconcat_opt] 471 ns +- 13 ns: 1.05x faster
encode: Mean +- std dev: [base] 482 us +- 2 us -> [iconcat_opt] 13.8 us +- 0.1 us: 34.78x faster
Benchmark hidden because not significant (1): build_bytes
Geometric mean: 2.53x faster
note: Performance of build_bytes is expected to stay constant.
```python
import pyperf
runner = pyperf.Runner()
count1 = 1_000
count2 = 100
count3 = 10_000
CHUNK_A = b'a' * count1
CHUNK_B = b'b' * count2
CHUNK_C = b'c' * count3
def build_bytes():
# Bytes not uniquely referenced.
ba = bytearray()
ba += CHUNK_A
ba += CHUNK_B
ba += CHUNK_C
def build_bytes_unique():
ba = bytearray()
# Repeat inline results in uniquely referenced bytes.
ba += b'a' * count1
ba += b'b' * count2
ba += b'c' * count3
def build_bytearray():
# Each bytearray appended is uniquely referenced.
ba = bytearray()
ba += bytearray(CHUNK_A)
ba += bytearray(CHUNK_B)
ba += bytearray(CHUNK_C)
runner.bench_func('build_bytes', build_bytes)
runner.bench_func('build_bytes_unique', build_bytes_unique)
runner.bench_func('build_bytearray', build_bytearray)
runner.timeit(
name="encode",
setup="a = 'a' * 1_000_000",
stmt="bytearray(a, encoding='utf8')")
```
Objects/bytearrayobject.c
Outdated
|
|
||
| // optimization: Avoid copying the bytes coming in when possible. | ||
| if (self->ob_alloc == 0 && _PyObject_IsUniquelyReferenced(other)) { | ||
| // note: ob_bytes_object is always the immortal empty bytes here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you replace the comment with an assertion?
| // optimization: Avoid copying the bytes coming in when possible. | ||
| if (self->ob_alloc == 0 && _PyObject_IsUniquelyReferenced(other)) { | ||
| // note: ob_bytes_object is always the immortal empty bytes here. | ||
| if (!_canresize(self)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not also run this check when (self->ob_alloc == 0 && _PyObject_IsUniquelyReferenced(other)) is false? So move it before the outer if?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the general case it's currently checked by bytearray_resize_lock_held.
Checking here technically changes the order of operations where an error can happen (the main codepath the buffer protocol always is called and can error before _canresize). Not certain how important that is to keep in this case. Keeping order would mean moving _canresize check inside the type checks here.
Co-authored-by: Victor Stinner <vstinner@python.org>
|
|
||
| // optimization: Avoid copying the bytes coming in when possible. | ||
| if (self->ob_alloc == 0 && _PyObject_IsUniquelyReferenced(other)) { | ||
| assert(_Py_IsImmortal(self->ob_bytes_object)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking at something like:
| assert(_Py_IsImmortal(self->ob_bytes_object)); | |
| assert(self->ob_bytes_object == Py_GetConstantBorrow(Py_CONSTANT_EMPTY_BYTES)); |
|
Here's a test that should pass, but doesn't: // make some bytes
PyObject *bytes = PyBytes_FromString("aaB");
assert(bytes);
// make an empty bytearray
PyObject *ba = PyByteArray_FromStringAndSize("", 0);
assert(ba);
// append bytes to bytearray (in place, getting a new reference)
PyObject *new_ba = PySequence_InPlaceConcat(ba, bytes);
assert(new_ba == ba);
Py_DECREF(new_ba);
// pop from bytearray
Py_DECREF(PyObject_CallMethod(ba, "pop", ""));
// check that our bytes was not modified
assert(memcmp(PyBytes_AsString(bytes), "aaB", 3) == 0);
Py_DECREF(bytes);
Py_DECREF(ba);AFAIK, you need to use |
| PyObject *taken = PyObject_CallMethodNoArgs(other, | ||
| &_Py_ID(take_bytes)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks unsafe to me. If you call a method, you may invalidate the assumptions you verified earlier
If the bytearray is empty and a uniquely referenced bytes object is being concatenated (ex. one just received from read), just use its storage as the backing for the bytearray rather than copying it. The bigger the bytes the bigger the saving.
build_bytes_unique: Mean +- std dev: [base] 383 ns +- 11 ns -> [iconcat_opt] 342 ns +- 5 ns: 1.12x faster
build_bytearray: Mean +- std dev: [base] 496 ns +- 8 ns -> [iconcat_opt] 471 ns +- 13 ns: 1.05x faster
encode: Mean +- std dev: [base] 482 us +- 2 us -> [iconcat_opt] 13.8 us +- 0.1 us: 34.78x faster
Benchmark hidden because not significant (1): build_bytes
Geometric mean: 2.53x faster
note: Performance of build_bytes is expected to stay constant.
From my understanding of reference counting I think this is safe to do for
iconcat(and would be safe to do forba[:] = b'\0' * 1000discuss topic). The briefly refcount 2 isn't ideal but I think good enough for the performance delta. I'm hoping if I can ship an implementation of gh-87613 can do the same optimization forbytearray(b'\0' * 4096).If the iconcat refcount 2 part isn't good, can tweak to keep the
enecode+bytearrayperformance improvement without changingiconcatgenerally.cc: @vstinner , @encukou
.take_bytes([n])a zero-copy path frombytearraytobytes#139871