Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 574: update #883

Merged
merged 2 commits into from Jan 27, 2019

Conversation

Projects
None yet
4 participants
@pitrou
Copy link
Member

commented Jan 23, 2019

Make it a requirement that buffers are contiguous. Efficient tooling for non-contiguous buffers is not available from CPython currently, and consumers shouldn't be required to implement their own handling.

Also add a bit of trivia about an undocumented ZODB-specific cPickle hook.

@pitrou

This comment has been minimized.

Copy link
Member Author

commented Jan 23, 2019

cc @ncoghlan

Also cc @skrah about the rudimentary support for non-contiguous buffers (including the slow copy to contiguous).

@ncoghlan ncoghlan merged commit 5bf886e into python:master Jan 27, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@pitrou

This comment has been minimized.

Copy link
Member Author

commented Jan 28, 2019

Actually, it seems I may have to make the contiguity requirement a bit stronger and mandate C-contiguous buffers. The reason is that, in pure Python, there doesn't seem to be any way to read the byte contents of a Fortran-contiguous buffer in physical memory order. Consider this:

>>> array = _testbuffer.ndarray(list(range(6)), format="B", shape=(3, 2), strides=(1, 3))
>>> array.tolist()
[[0, 3], [1, 4], [2, 5]]
>>> array.c_contiguous
False
>>> array.f_contiguous
True
>>> m = memoryview(array)
>>> m.tobytes()
b'\x00\x03\x01\x04\x02\x05'
>>> bytes(m)
b'\x00\x03\x01\x04\x02\x05'

Or with Numpy:

>>> a = np.arange(12, dtype='int8').reshape((3,4))                                                                                                              
>>> bytes(a)                                                                                                                                                    
b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b'
>>> bytes(a.T)                                                                                                                                                  
b'\x00\x04\x08\x01\x05\t\x02\x06\n\x03\x07\x0b'
>>> memoryview(a).tobytes()                                                                                                                                     
b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b'
>>> memoryview(a.T).tobytes()                                                                                                                                   
b'\x00\x04\x08\x01\x05\t\x02\x06\n\x03\x07\x0b'

... the logical transposition affects the bytes copy, even though the underlying memory contents are identical.

This is not a problem in C, which has naturally access to the memory pointed to by a Py_buffer, but the pure Python pickle implementation does not look like it will be able to deal with Fortran-contiguous buffers. Perhaps we'll be later able to relax the restriction, if we add some appropriate API to memoryview.

It's a bit unfortunate. @skrah do you see a way of achieving this?

@ncoghlan

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2019

Hmm, I thought memoryview(original).cast('B') could handle that, but reading https://docs.python.org/3/library/stdtypes.html#memoryview.cast makes me suspect I never tried it with a multi-dimensional input (since the docs are quite explicit about only handling 1D views).

@skrah

This comment has been minimized.

Copy link
Member

commented Jan 29, 2019

@pitrou

This comment has been minimized.

Copy link
Member Author

commented Jan 29, 2019

But not Fortran-contiguous data, which is the issue here:

>>> m = memoryview(x.T)                                                                                                                               
>>> m.cast('B').tobytes()                                                                                                                             
Traceback (most recent call last):
  File "<ipython-input-6-80a67b12e997>", line 1, in <module>
    m.cast('B').tobytes()
TypeError: memoryview: casts are restricted to C-contiguous views
@ncoghlan

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2019

D'oh :(

I don't believe there was any philosophical objection behind that omission, though - just a lack of a concrete use case to justify the extra complexity in the code.

It does raise a question for the pickle 5 spec though: even if the memoryview limitation is resolved in Python 3.8, would the pickle5 backport library be able to unpickle Fortran-contiguous data on older versions?

@pitrou

This comment has been minimized.

Copy link
Member Author

commented Jan 29, 2019

even if the memoryview limitation is resolved in Python 3.8, would the pickle5 backport library be able to unpickle Fortran-contiguous data on older versions?

Unpickling will depend on the __reduce_ex__ implementation rather than on the Unpickler implementation, AFAICT. However, I would have to test it to make sure...

@pitrou

This comment has been minimized.

Copy link
Member Author

commented Feb 2, 2019

Hmm... Thinking about this again, I could add the required API to PickleBuffer so that it doesn't have to depend on some Python 3.8 memoryview improvements. I think the implementation should be reasonably simple.

@ncoghlan

This comment has been minimized.

Copy link
Contributor

commented Feb 2, 2019

Having it work in the pickle5 API backport would definitely be desirable. That said, adding support for F-contiguous data in memoryview for 3.8+ would likely still be desirable for testing purposes at that point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.