Skip to content

Conversation

@jakelishman
Copy link

@jakelishman jakelishman commented Oct 24, 2025

This ensures the buffers used by the empty bytearray and array.array are aligned the same as a pointer returned by the allocator. This is a more convenient default for interop with other languages that have stricter requirements of type-safe buffers (e.g. Rust's &[T] type) even when empty.

I tried to do the same for bytes, but I think its default buffer is only forcibly aligned on an 8 because of the uint64_t member in PyBytesObject, and it ends up dependent on where bytes_empty gets laid out. If that's desirable too, I might need some help figuring out a strategy for it.

I'm not sure where's appropriate to put a test for this, or if it can/should be documented as reliable.

Issue: #140557

This ensures the buffers used by the empty `bytearray` and `array.array`
are aligned the same as a pointer returned by the allocator.  This is a
more convenient default for interop with other languages that have
stricter requirements of type-safe buffers (e.g. Rust's `&[T]` type)
even when empty.
@python-cla-bot
Copy link

python-cla-bot bot commented Oct 24, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 👍

@cmaloney
Copy link
Contributor

cmaloney commented Oct 26, 2025

I'm 👎 on this for bytearray; I've been looking at making the empty bytearray point to an empty bytes object where this wouldn't hold true.

see: gh-139871 for ways having a bytes inside can make things faster (less copies of data often)

@cmaloney
Copy link
Contributor

re: bytearray, it also supports a "fast" start-delete which moves the start offset (ob_start) inside the allocated space by an arbitrary count of bytes which, to me, implies that unaligned data access from other languages needs to be supported in accessing/manipulating its internal buffers/data.

@jakelishman
Copy link
Author

jakelishman commented Oct 26, 2025

This doesn't enforce that every buffer always has aligned access - you can easily take out a view of a bytes or bytearray that you offset by a byte - so other languages have to still handle the case of an unaligned pointer. It just makes the default empty object aligned at zero cost, and that object is common.

The empty bytes internal pointer still ends up aligned on an 8, which would in practice still make the pointer aligned for most data types you might be casting to, so swapping to that would still be an improvement over the status quo.

@jakelishman
Copy link
Author

We do still have to handle unaligned access everywhere, I agree there's no escaping it. The goal here is only around making unaligned objects less common - bytearray being aligned on a 1 (and actually turning up on a 1) is a lot more commonly visible now in Python 3.14 that pickle 5 uses it in the data stream to represent PickleBuffer in band, so more libraries (e.g. Numpy) want to zero-copt view onto the recreated buffer.

@cmaloney
Copy link
Contributor

I'm still concerned here:

Both array and bytearray externalize their item storage. The PyObject* that is the array object (or bytearray) is a separate allocation from the storage buffer for the elements. As an optimization, CPython doesn't allocate space if the length is 0, and these two values are used to handle a couple otherwise hard to handle return cases. The array one is only used in array_buffer_getbuf so that memoryview() / buffer protocol works on an array with no storage allocated itself. The bytearray one shows up in the C API PyByteArray_AS_STRING only when bytearray has no internal storage.

  1. These are just placeholders to fill corner cases which seem like they shouldn't be common in code (to use the returned pointer without doing out of bounds access you'd need to look at the len() / size).
  2. This adds a new CPython "guarantee" that it sounds like want to depend on in code but no test that would break if changes effect it. It should be possible to write one for alignment with https://docs.python.org/3/library/stdtypes.html#memoryview
  3. The default "buffer" for bytearray (and array.array) are both immutable and shouldn't really be indexed into, read from, or written to; they're 0 bytes long, code can't set / modify data in them, and there is no data to read from them.

@jakelishman
Copy link
Author

I'm not aiming to make this a guarantee at all, just that by default the zero allocation case is already aligned. In Rust and other languages we do still have to handle unaligned buffers, just like Numpy alread has to in pure C.

The default "buffer" for bytearray (and array.array) are both immutable and shouldn't really be indexed into, read from, or written to; they're 0 bytes long, code can't set / modify data in them, and there is no data to read from them.

Right, this is where other languages have stronger guarantees - I mentioned it in gh-140557 as the motivation that in Rust, creating a &[T] primitive slice requires an aligned non-null pointer even if it's invalid for any reads. We can't convert every buffer to a slice with zero copies, so we already have alternative handling, but this patch makes the empty bytearray object go down a common path rather than a colder path by default. For similar fast-path alignment reasons, Numpy internally sets its "aligned" flag on empty buffers even if the pointer isn't actually aligned for the data type it purportedly points to, to avoid triggering copying code / extra handling in ufuncs that require alignment.

@jakelishman
Copy link
Author

About rarity of appearance: both of these pointers show through the buffer protocol like you mentioned, and that's where I come across them in language interop - that's the defined way to get zero-copy access to a data buffer owned by Python space.

We can't have zero-copy access to misaligned buffers, but in practice, the vast majority of buffers that are back by an allocation end up aligned anyway, so requiring copies is rare (since you have to deliberately offset an allocated pointer by a sub-unit amount). We can add special handling to produce an empty slice in Rust that doesn't refer to the same pointer we actually get from the Python buffer protocol, if it's misaligned as an extra optimisation to avoid a copy, so we don't require CPython support, but if the default is actually aligned, then there's less point propagating this hypothetical optimising special-handling code through a lot of downstream packages and just using the slower "copy to force alignment" paths that (should) already exist for it.

When I wrote this patch, it was zero cost to CPython to achieve that for the defaults. If you have additional work that would seriously raise the cost to CPython then the calculus is different, though even reliably having the empty buffer aligned on an 8, like the empty bytes buffer is in practice, would be enough for the majority of cases I care about.

@cmaloney
Copy link
Contributor

Can you link to the code or provide a rust sample which needs to special-case handling a zero-length bytearray? I think that would help me understand here.

@serhiy-storchaka
Copy link
Member

It is not guaranteed that the start of the bytearray buffer has some alignment. This is a CPython implementation detail. But some code depends on this, and it may not work on non-x86 platforms if this is not aligned. There are exceptions: if there was a deletion from the beginning of the bytearray (we cannot do anything with this, but the user can guarantee that this did not happen), and when the bytearray is empty. The latter case is easy to fix for us, it costs nothing.

@jakelishman
Copy link
Author

cmaloney: Let's say I've got FFI got that wraps the Py_buffer interface, first by making (not precise - I'm just including the illustrative stuff):

struct PyBuffer {
  buf: *mut (),
  itemsize: isize,
  ndim: ::std::ffi::c_int,
  shape: *const isize,
  strides: *const isize,
}
impl PyBuffer {
  /// Is the slice contiguous in memory?
  fn is_contiguous(&self) -> bool { /* ... */ }
  /// How many bytes can be read from it?
  fn len_bytes(&self) -> usize { /* ... */ }
}

Let's say I've got one of these structs that I then initialised with PyObject_GetBuffer, and now I want to expose a Rust-native slice view for a specific Rust type, which might not match the "native" type of the buffer, since I might have been given an array created from a bytearray object whose itemsize is 1 but actually represents storage of uint64_t (u64 in Rust). I return an Result here to signify to the caller that the function is fallible1:

enum SliceError {
  Unaligned,
  Noncontiguous,
  Nullptr,
}

fn slice_from_buffer<T>(buf: &PyBuffer) -> Result<&[T], SliceError> {
  if buf.buf.is_null() {
    return Err(SliceError::Nullptr);
  }
  if !buf.is_contiguous() {
    return Err(SliceError::Noncontiguous);
  }
  if !buf.buf.is_aligned::<T>() {
    return Err(SliceError::Unaligned);
  }
  // SAFETY: pointer is non-null, aligned, and valid for this many contiguous reads:
  Ok(unsafe { std::slice::from_raw_parts(
    buf.buf,
    buf.len_bytes() / std::mem::size_of::<T>()
  })
}

I have to do the alignment and contiguous checks before I call std::slice::from_raw_parts, because it's undefined behaviour in Rust to create a slice backed by an unaligned or null pointer, even if the length is zero. (The reason is to enable specific type-niche optimisations in the compiler to save space in compound types that contain &[T].)

A Rust caller of this function is still responsible for handling the case of an unaligned pointer, which might cause them to do something like

match slice_from_buffer::<u64>(&buf) {
  Ok(slice) => use_slice(slice),
  Err(SliceError::Noncontiguous | SliceError::Unaligned) => {
    let aligned_contiguous = /* copy buffer to somewhere aligned */;
    use_slice(aligned_contiguous.as_slice())
  }
  Err(SliceError::Nullptr) => panic!(),
}

The idea of this PR is just to make it so that slightly more stuff by default gets to go down the happy path, in a way that doesn't cost CPython anything. It's still the Rust user's responsibility to handle the unhappy path since that's totally valid Python code still, and the Rust library's responsibility to make sure everything is safe for FFI use in Rust2. This PR isn't intending to add any restrictions on what CPython is allowed to do or what other Python implementations may do.


Your #139871 looks to me to also achieve the same goal I was going for here, just as a side effect (since the empty bytes buffer happens to be 8-byte aligned in CPython), so if that merged, it'd improve the bytearray situation as well. This PR is slightly stronger for bytearray, but that's largely immaterial (it most likely only affects SIMD types).

Footnotes

  1. this code is usually in Python interface libraries like PyO3, whose implementation is here, but I wrote the example manually because PyO3's has additional complications and spreads the required checks into a few separate places. Also, the same sort of code comes up in lots of specialised places too, like rust-numpy that provides FFI access to Numpy arrays, etc.

  2. there are safe Rust-side optimisations that can be done around empty slices (which I'm planning to contribute in several Rust projects) such as creating a slice out of a dangling pointer magicked from thin air, which helps these cases as well,

@cmaloney
Copy link
Contributor

cmaloney commented Oct 27, 2025

I'm 👍 for doing array here as it supports larger than 1-byte element objects. Does still need a test so if the code is broken/removed accidentally CPython devs will notice. That should be implementable with https://docs.python.org/3/library/ctypes.html#ctypes.alignment.

bytearray should be covered by my CPython internals change which will help your case. For bytes-like and slicing in general across rust and other languages I found https://davidben.net/2024/01/15/empty-slices.html really helpful to improve my knowledge. I don't think it makes sense for CPython to adopt the rust semantic for its "bytes-like" generally; and "buffer protocol" / memoryview across the ecosystem is based on C "bytes" for better and worse.

Copy link
Contributor

@cmaloney cmaloney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduce scope to just array.array + add a test; empty buffer + non-empty array.array storage being aligned I think is a nice improvement

@serhiy-storchaka
Copy link
Member

I do not think that bytearray should be excluded. If not a special optimization of using NULL for empty array, it would have standard alignment (as returned by malloc()). The fact that currently it can have worse alignment is the result of optimization. Optimization should not cause regression.

@cmaloney
Copy link
Contributor

re: bytearray gh-140557 / GH-140128 makes it aligned as a side effect because bytes is aligned (and removes usage of _PyByteArray_empty_string; ob_start is always set)

@serhiy-storchaka
Copy link
Member

Then I do not see reasons to object this change.

@cmaloney
Copy link
Contributor

  1. I don't like having implementation details relied on by external code/systems without a test / validation. To me that creates a potential future release blocker if the assertion gets broken. I don't see adding a test here as a really onerous requirement.
  2. Aligning _PyByteArray_empty_string is going to be a no op / just waste space shortly, so isn't a lot of value in doing in this PR as this will just show up in 3.15

I have slight nit in wording of the NEWS entry; I think it would be good to mention Rust and to make it more concise.

@serhiy-storchaka
Copy link
Member

I think there is a good chance that the code that relies on this already exist. It works only because the Intel plathform is tolerable to unaligned access (and there is no actually access to memory here, because it is an empty buffer). But on other platforms pointers of different type can use different registers.

This is not limited to Rust. In C, casting a pointer with wrong alignment can be an undefined behavior. If bytesarray is used instead of array as a collection of 16-, 32- or 64-bit integers, there may be problems.

@jakelishman
Copy link
Author

Hi both - I'd been busy at work the last few days and not had a chance to check in, sorry.

Given the above comments: I've added some tests of the new behaviour (open to suggestions if I've missed some API that'd make them simpler), and I had a go at shortening the NEWS entry to make it shorter and stress that it's just about the empty default, not a guarantee for all buffers. Happy to change anything more if there's consensus, or to pause this and take it to Discourse if needs be?

I'm definitely not trying to make all buffers follow Rust semantics - that's never possible anyway, given that we can always do memoryview(b"01234567")[1:] to get a buffer backed by a pointer that'd be unaligned for anything other than a byte, let alone whatever downstream implementers of the buffer protocol do. I care about bytearray because it appears more now because of the default handling of PickleBuffer.

@cmaloney
Copy link
Contributor

I think we're all moving towards consensus and actually really near it :). Will do a more thorough review later today

return _linked_to_musl


try:
Copy link
Contributor

@cmaloney cmaloney Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be easier to do in Lib/test/test_capi/test_misc.py. In particular only need to

  1. Add a new _testcapimodule.c entry point that makes a Py_buffer C API of the PyObject passed to it, gets the pointer and turns that pointer into a PyObject * (https://docs.python.org/3/c-api/long.html#c.PyLong_FromVoidPtr) which it returns
  2. one alignment test across the range of types and constructions care about.

I like how your existing test iterates through / tests all the different array typecodes.

I think it would be good to extend the test to both test empty (the size that caused this bug) + non-empty arrays (they should also be aligned)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually found a better test file for the alignment pieces to live in: Lib/test/test_buffer.py; still should implement "get the pointer" in _testcapimodule.c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants