New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buffer Protocol Support for Custom DType Array #18442
Comments
Just so you are aware of it, we/I are in the process of revamping the dtype API, which will lead to a new API (and eventually force you to move). I am aware of this limitation, but have treated it as a backburner, partially because I am not sure how often it actually makes sense to export it! For example, given quaternion you can use But, we definitely can move the creation of the buffer format string to be accessible to user DTypes in the future (I will only aim for after the new API, which is hopefully not far off now). Another thing that should work (but I am not immediately sure how to write in cython), is to not use Just curious, but can you say what you are working on? I am trying to see what kind of DTypes will benefit from all my changes (and keep in mind what kind of holes in the API might crop up for them). |
Forgot to include a link to the new DType NEPs (42 might be more interesting, but this is the entry point): https://numpy.org/neps/nep-0041-improved-dtype-support.html |
I am working on an extension of posit datatype in NumPy. In the case you are unaware of this, it is an alternative format to floating-point arithmetic (see Beating Floating Point at its Own Game) which provides higher accuracy and seems to be really useful in areas such as numerical analysis or Deep Learning (see https://github.com/RaulMurillo/deep-pensieve). Currently, there are few libraries that support this format (only in software emulation), most of them in C/C++. However, none of them provide an interface for array handling as NumPy does, and other characteristics of the library such as broadcasting and ufuncs could help in the development of applications and the format itself, IMO. I referred to quaternion because it is a well-known extension package, and also this posit package is still WIP and not publicly available (but it will be in the near future, I will link it here when it is). I supposed your solution of using A similar problem arises with posits. The posit types in this package are implemented in C language as a I think the API changes you mention could be quite beneficial for this package. During its development, I ran into many of the issues mentioned in the User Impact section of NEP 41. |
Cool, nice to see interest, and this type of developments! What I am not certain is how the buffer protocol can even be used for this reasonably. A way like Eric suggested in the So I would think you will always need some |
I tested generating a view of the array with that method: dt = arr.dtype
new_dt = np.dtype([
('__numpy_value', (np.void, dt.itemsize)),
('__numpy_dtype', [
(str(dt), [])
])
])
arr.view(new_dt) A memoryview can be created from this array, but then it is not possible to assign it to a typed memoryview: cdef posit16[:, :] arr_view = arr.view(new_dt)
ValueError: Buffer dtype mismatch, expected 'posit16' but got end Another problem is that this can not solve intrinsic calls to def foo(posit16[:, :] array_1):
...
ValueError: cannot include dtype 'k' in a buffer Here, it tries to create a
Since I implemented the buffer protocol in the simple DType (I can call numpy/numpy/core/src/multiarray/scalartypes.c.src Line 2449 in 740a8cf
|
By the way, I forgot to mention that I made the posit types defined in Python (C-API level) available to Cython as an extern extension type as shown in https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html?highlight=complex#external-extension-types |
Well, The only problem really is the "format", and we could very much allow your dtype to effectively have a method My current guess is that the path of least resistance may be to really define such a special "format" string for your own dtype (NumPy could provide a guideline). And then teach Cython that |
I did some tests on the matter. By slightly modifying the numpy/numpy/core/src/multiarray/buffer.c Lines 418 to 422 in 0eb9f54
default:
if (descr->type == 'k'){
if (_append_char(str, 'k') < 0) return -1;
}
else{
PyErr_Format(PyExc_ValueError,
"cannot include dtype '%c' in a buffer",
descr->type);
return -1;
} (here, 'k' is the descriptor type of the custom dtype) it is possible to generate a >>> import numpy as np
>>> import posit
>>> arr = np.arange(5).astype(np.posit16)
>>> arr_view = memoryview(arr)
>>> arr_view
<memory at 0x7efff724ca00>
>>> arr_view.format
'k' I guess this method is simpler and could be implemented for custom DTypes, assigning the corresponding descriptor type to the format of the view. However, neither this method nor Eric's allows accessing the internal data of an object from a >>> arr_view[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NotImplementedError: memoryview: format k not supported The same issue also occurs with, for example, Finally, regarding to Cython, we can see that using posits with this compiler (or any other custom DType) has the same problems that if we would like to use half-precision floats (see https://stackoverflow.com/questions/47421443/using-half-precision-numpy-floats-in-cython). |
@RaulMurillo sure, NumPy could allow you put whatever you like into the exported buffer One issue with it is that your So, before we make it convenient to create such clashes, it might be useful to have some loose formalism about it. Maybe that is just introducing a new character code that signals that the |
@seberg, completely agree that checking for I think the question was answered, so I will close the issue. Please re-open if there is a need. |
Feature
I am defining a Python extension type, as well as adding built-in support for this new type to NumPy with an extension module that allows generating ndarrays of this custom dtype. It has a similar implementation to quaternion package (so I will use this type for illustration).
This new data type also implements the “buffer protocol”, so applying
memoryview
on an object of this type gets the expected output. However, if an array with objects of this type is created, applyingmemoryview
raises aValueError
due to the fact that this type is not supported by the buffer protocol of the NumPy library.Is there any way to obtain the memory view of a NumPy array containing custom extension types?
Similar issues #4983 and PR #15309 expose this problem for the
datetime
dtype, and also solution proposed in #4983 (comment) seems to be in the right direction.However, it would be desirable to support buffer for any specialized types that implement themselves this protocol (in the same manner as numpy scalar types do). Finally, my purpose is to integrate this type in a Cython module to speed-up the code, but the typed memoryviews used in this language require the NumPy array buffer support when passing array arguments to Cython functions, so in this case solution from #4983 (comment) is not an option either.
Reproducing code example:
The buffer protocol is implemented in the quaternion class/type, as described here and in the official documentation.
When trying to get the memory view of an array containing objects of this class,
ValueError
is risen.NumPy/Python version information:
NumPy Version: 1.20.1
Python: 3.8
The text was updated successfully, but these errors were encountered: