-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
memoryviews and ctypes #60148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've been playing with the interaction of ctypes and memoryviews and am curious about intended behavior. Consider the following: >>> import ctypes
>>> d = ctypes.c_double()
>>> m = memoryview(d)
>>> m.ndim
0
>>> m.shape
()
>>> m.readonly
False
>>> m.itemsize
8
>>> As you can see, you have a memory view for the ctypes double object. However, the fact that it has a 0-dimension and no shape seems to cause all sorts of weird behavior. For instance, indexing and slicing don't work: >>> m[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: invalid indexing of 0-dim memory
>>> m[:]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: invalid indexing of 0-dim memory
>>> As such, you can't really seem to do anything interesting with the resulting memory view. For example, you can't pull data out of it. Nor can you overwrite the contents (i.e., replacing the contents with an 8-byte byte string). Attempting to cast the memory view to something else doesn't work either. >>> d = ctypes.c_double()
>>> m = memoryview(d)
>>> m2 = m.cast('c')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: memoryview: source format must be a native single character format prefixed with an optional '@'
>>> I must be missing something really obvious here. Is there no way to get access to the memory behind a ctypes object? |
You can still read the underlying representation: >>> d = ctypes.c_double(0.6)
>>> m = memoryview(d)
>>> bytes(m)
b'333333\xe3?'
>>> d.value = 0.7
>>> bytes(m)
b'ffffff\xe6?' |
I don't want to read the representation by copying it into a bytes object. I want direct access to the underlying memory--including the ability to modify it. As it stands now, it's completely useless. |
0-dim memory is indexed by x[()]. The ctypes example has an additional Only native single character formats in struct module syntax are To demonstrate 0-dim indexing, here's an example using _testbuffer: >>> x = ndarray(3.14, shape=[], format='d', flags=ND_WRITABLE)
>>> x[()]
3.14
>>> tau = 6.28
>>> x[()] = tau
>>> x[()]
6.28
>>> m = memoryview(x)
>>> m[()]
6.28
>>> m[()] = 100.111
>>> m[()]
100.111 |
BTW, if c_double means "native machine double", then ctypes should |
Even with the <d format, I'm not sure why it can't be cast to simple byte-view. None of that seems to work at all. |
The decision was made in order to be able to cast back and forth between Python 3.4 will have support for all formats in struct module syntax, You can still pack/unpack directly using the struct module: >>> import ctypes, struct
>>> d = ctypes.c_double()
>>> m = memoryview(d)
>>> struct.pack_into(m.format, m, 0, 22.7)
>>> struct.unpack_from(m.format, m, 0)[0]
22.7 |
I don't think memoryviews should be imposing any casting restrictions at all. It's low level. Get out of the way. |
So you want to be able to segfault the core interpreter using the |
No, I want to be able to access the raw bytes sitting behind a memoryview as bytes without all of this casting and reinterpretation. Just show me the raw bytes. Not doubles, not ints, not structure packing, not copying into byte strings, or whatever. Is this really impossible? It sure seems so. |
Just to be specific, why is something like this not possible? >>> d = ctypes.c_double()
>>> m = memoryview(d)
>>> m[0:8] = b'abcdefgh'
>>> d.value
8.540883223036124e+194
>>> (Doesn't have to be exactly like this, but what's wrong with overwriting bytes with bytes of a compatible size?). |
I should add that 0-dim indexing doesn't work as described either: >>> import ctypes
>>> d = ctypes.c_double()
>>> m = memoryview(d)
>>> m[()]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NotImplementedError: memoryview: unsupported format <d
>>> |
Please read msg170482. It even contains a copy and paste example! |
As I understand it, you prefer memoryviews where the format is Typed memoryviews are certainly useful, in fact they are http://docs.cython.org/src/userguide/memoryviews.html I can see only one obvious benefit of ignoring the format: All possible m[0] = b'\x00\x00\x00\x01' ... should be preferable to: m[0] = 1 If you think that typed memoryviews are a mistake, I suggest raising |
There's probably a bigger discussion about memoryviews for a rainy day. However, the number one thing that would save all of this in my book would be to make sure cast('B') is universally supported regardless of format including endianness--especially in the standard library. For example, being able to do this: >>> a = array.array('d',[1.0, 2.0, 3.0, 4.0])
>>> m = memoryview(a).cast('B')
>>> m[0:4] = b'\x00\x01\x02\x03'
>>> a
array('d', [1.0000000112050316, 2.0, 3.0, 4.0])
>>> Right now, it doesn't work for ctypes. For example: >>> import ctypes
>>> a = (ctypes.c_double * 4)(1,2,3,4)
>>> a
<__main__.c_double_Array_4 object at 0x1006a7cb0>
>>> m = memoryview(a).cast('B')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: memoryview: source format must be a native single character format prefixed with an optional '@'
>>> As some background, being able to work with a "byte" view of memory is important for a lot of problems involving I/O, data interchange, and related problems where being able to accurately construct/deconstruct the underlying memory buffers is more useful than actually interpreting their contents. |
One followup note---I think it's fine to punt on cast('B') if the memoryview is non-contiguous. That's a rare case that's probably not as common. |
We could add a flag memoryview(x, raw=True) to the constructor. This view So you could do assignments like: m[10] = b'\x00\x00\x00\x01' This would be more flexible in general since memoryview currently only supports I think the feature would not add much additional complexity to the code. The question is: Is this a general need? Are many people are using memoryviews |
You don't need |
In my experience, I tend to only use memoryview() for “bytes-like” buffers (but see bpo-23756 about clarifying what this means). Example from /Lib/_compression.py:67: def readinto(self, b):
with memoryview(b) as view, view.cast("B") as byte_view:
data = self.read(len(byte_view))
byte_view[:len(data)] = data
return len(data) Fixing cast("B") or adding a memoryview(raw=True) mode could probably help when all you want is a byte buffer. |
A functional memoryview for ctypes objects would avoid having to use workarounds, such as the following: >>> d = ctypes.c_double()
>>> b = (ctypes.c_char * ctypes.sizeof(d)).from_buffer(d)
>>> b[:] = b'abcdefgh'
>>> d.value
8.540883223036124e+194 or using numpy.frombuffer as a bridge: >>> d = ctypes.c_double()
>>> m = memoryview(numpy.frombuffer(d, 'B'))
>>> m[:] = b'abcdefgh'
>>> d.value
8.540883223036124e+194 David's request that cast('B') should be made to work for all contiguous buffers seems reasonable. That said, the ctypes format strings also need fixing. Let's see what happens when "@d" is used instead of "<d": >>> double_stgdict = stgdict(ctypes.c_double)
>>> double_stgdict
dict:
ob_base:
ob_refcnt: 1
ob_type: py_object(<class 'StgDict'>)
ma_used: 7
ma_keys: LP_PyDictKeysObject(0x1aa5750)
ma_values: LP_LP_PyObject(<NULL>)
size: 8
align: 8
length: 0
ffi_type_pointer:
size: 8
alignment: 8
type: 3
elements: <NULL>
proto: py_object('d')
setfunc: SETFUNC(0x7f9f9b6e3e60)
getfunc: GETFUNC(0x7f9f9b6e3d90)
paramfunc: PARAMFUNC(0x7f9f9b6e31d0)
argtypes: py_object(<NULL>)
converters: py_object(<NULL>)
restype: py_object(<NULL>)
checker: py_object(<NULL>)
flags: 4096
format: b'<d'
ndim: 0
shape: LP_c_long(<NULL>)
>>> d = ctypes.c_double(3.14)
>>> m = memoryview(d)
>>> m[()]
3.14
>>> m[()] = 6.28
>>> d.value
6.28
>>> m = m.cast('B')
>>> m[:] = b'abcdefgh'
>>> d.value
8.540883223036124e+194 This shows that changing the format string (set by PyCSimpleType_new in _ctypes.c) to use "@" makes the memoryview work normally. OTOH, the swapped type (e.g. c_double.__ctype_be__) would need to continue to use a standard little-endian ("<") or big-endian (">") format. |
Yuriy: cast() does not do this. What's requested is that e.g. a Thus, you'd be able to do: m[0] = b'\x00\x00\x00\x01' This has other implications, for example, two NaNs would compare |
Here is a patch that allows any “C-contiguous” memoryview() to be cast to a byte view. Apart from the test that was explicitly checking that this wasn’t supported, the rest of the test suite still passes. I basically removed the check that was generating the “source format must be a native single character” error. If two NANs are represented by the same byte sequence, I would expect their byte views to compare equal, which is the case with my patch. |
The question is whether we want this behavior. |
Assuming bpo-23756 is resolved and various standard library functions are meant to work with any C-contiguous buffer, then it makes sense to me for memoryview.cast("B") to work for any C-contiguous buffer. I also got the impression that David, Yuriy, and Eryksun all support this. I don’t understand why you wouldn’t want this behaviour. It seems pointless just to maintain symmetry with being unable to cast back to “<d”. And casting from e.g. floating point to bytes to integers already disregards the original data type, so casting from unsupported types to bytes should be no worse. |
The proposal sounds reasonable to me. |
If people are content with writing m[124:128] = b'abcd' and accept On the bright side, it is less work. -- I'll review the patch. |
Le 07/08/2015 14:57, Stefan Krah a écrit :
As long as the casting has to be explicit, this sounds ok to me. |
Ok, shall we sneak this past Larry for 3.5? |
Why not :) |
New changeset e33f2b8b937f by Stefan Krah in branch '3.5': New changeset c7c4b8411037 by Stefan Krah in branch 'default': |
Done. Thanks for the patch. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: