New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate PyArrayObject* direct field access #116
Changes from 20 commits
84ee545
c625ee0
b7cc20a
c135371
4dce9c0
2bb06a5
bff2e51
d88bfeb
67c1098
09debad
db90bf5
57d6b5b
0a7156f
c9f0f29
f8ff0e3
5771857
405fc9e
57aa914
dcc355a
5590809
a059979
dc81482
a2ed62a
694a383
affea42
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -225,27 +225,29 @@ provides a starting point. | |
|
||
For example,:: | ||
|
||
>>> np.array([1.0, 2.0, np.NA, 7.0], namasked=True) | ||
array([1., 2., NA, 7.], namasked=True) | ||
>>> np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]') | ||
>>> np.array([1.0, 2.0, np.NA, 7.0], maskna=True) | ||
array([1., 2., NA, 7.], maskna=True) | ||
>>> np.array([1.0, 2.0, np.NA, 7.0], dtype='NA') | ||
array([1., 2., NA, 7.], dtype='NA[<f8]') | ||
>>> np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f4]') | ||
array([1., 2., NA, 7.], dtype='NA[<f4]') | ||
|
||
produce arrays with values [1.0, 2.0, <inaccessible>, 7.0] / | ||
mask [Unmasked, Unmasked, Masked, Unmasked], and | ||
mask [Exposed, Exposed, Exposed, Hidden], and | ||
values [1.0, 2.0, <NA bitpattern>, 7.0] respectively. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When you say "respectively", I would expect the same number of outputs as there are examples, but you only have two here. Confusion could occur if someone thought that the mask line was a separate output example, somehow. Plus, I am a little concerned about having different print outputs. It might be ok, but I am not sure it is the best idea. On the other hand, it does allow for future mixing of NA-bitpatterns and masks... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, the respectively got out of kilter with multiple edits. The explanation here was to show what's under the hood. I think the reprs should be different, otherwise you wouldn't be able to tell the difference, and eval-ing the repr would produce a different object. |
||
|
||
It may be worth overloading the np.NA __call__ method to accept a dtype, | ||
returning a zero-dimensional array with a missing value of that dtype. | ||
Without doing this, NA printouts would look like:: | ||
|
||
>>> np.sum(np.array([1.0, 2.0, np.NA, 7.0], namasked=True)) | ||
array(NA, dtype='float64', namasked=True) | ||
>>> np.sum(np.array([1.0, 2.0, np.NA, 7.0], maskna=True)) | ||
array(NA, dtype='float64', maskna=True) | ||
>>> np.sum(np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]')) | ||
array(NA, dtype='NA[<f8]') | ||
|
||
but with this, they could be printed as:: | ||
|
||
>>> np.sum(np.array([1.0, 2.0, np.NA, 7.0], namasked=True)) | ||
>>> np.sum(np.array([1.0, 2.0, np.NA, 7.0], maskna=True)) | ||
NA('float64') | ||
>>> np.sum(np.array([1.0, 2.0, np.NA, 7.0], dtype='NA[f8]')) | ||
NA('NA[<f8]') | ||
|
@@ -274,12 +276,12 @@ from another view which doesn't have them masked. For example:: | |
|
||
>>> a = np.array([1,2]) | ||
>>> b = a.view() | ||
>>> b.flags.hasnamask = True | ||
>>> b.flags.hasmaskna = True | ||
>>> b | ||
array([1,2], namasked=True) | ||
array([1,2], maskna=True) | ||
>>> b[0] = np.NA | ||
>>> b | ||
array([NA,2], namasked=True) | ||
array([NA,2], maskna=True) | ||
>>> a | ||
array([1,2]) | ||
>>> # The underlying number 1 value in 'a[0]' was untouched | ||
|
@@ -351,10 +353,10 @@ Creating Masked Arrays | |
There are two flags which indicate and control the nature of the mask | ||
used in masked arrays. | ||
|
||
First is 'arr.flags.hasnamask', which is True for all masked arrays and | ||
First is 'arr.flags.hasmaskna', which is True for all masked arrays and | ||
may be set to True to add a mask to an array which does not have one. | ||
|
||
Second is 'arr.flags.ownnamask', which is True if the array owns the | ||
Second is 'arr.flags.ownmaskna', which is True if the array owns the | ||
memory to the mask, and False if the array has no mask, or has a view | ||
into the mask of another array. If this is set to False in a masked | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might be simpler if copying the mask could be specified when making the view, something like a 'copymask' keyword. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nevermind, I see this is done below. |
||
array, the array will create a copy of the mask so that further modifications | ||
|
@@ -402,8 +404,16 @@ New functions added to the ndarray are:: | |
array is unmasked and has the 'NA' part stripped from the | ||
parameterized type ('NA[f8]' becomes just 'f8'). | ||
|
||
arr.view(namasked=True) | ||
This is a shortcut for 'a = arr.view(); a.flags.hasnamask=True'. | ||
arr.view(maskna=True) | ||
This is a shortcut for | ||
>>> a = arr.view() | ||
>>> a.flags.hasmaskna = True | ||
|
||
arr.view(ownmaskna=True) | ||
This is a shortcut for | ||
>>> a = arr.view() | ||
>>> a.flags.hasmaskna = True | ||
>>> a.flags.ownmaskna = True | ||
|
||
Element-wise UFuncs With Missing Values | ||
======================================= | ||
|
@@ -461,7 +471,7 @@ will also use the unmasked value counts for their calculations if | |
|
||
Some examples:: | ||
|
||
>>> a = np.array([1., 3., np.NA, 7.], namasked=True) | ||
>>> a = np.array([1., 3., np.NA, 7.], maskna=True) | ||
>>> np.sum(a) | ||
array(NA, dtype='<f8', masked=True) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "masked=True" -> "maskna=True" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. changed |
||
>>> np.sum(a, skipna=True) | ||
|
@@ -471,11 +481,11 @@ Some examples:: | |
>>> np.mean(a, skipna=True) | ||
3.6666666666666665 | ||
|
||
>>> a = np.array([np.NA, np.NA], dtype='f8', namasked=True) | ||
>>> a = np.array([np.NA, np.NA], dtype='f8', maskna=True) | ||
>>> np.sum(a, skipna=True) | ||
0.0 | ||
>>> np.max(a, skipna=True) | ||
array(NA, dtype='<f8', namasked=True) | ||
array(NA, dtype='<f8', maskna=True) | ||
>>> np.mean(a) | ||
NA('<f8') | ||
>>> np.mean(a, skipna=True) | ||
|
@@ -487,18 +497,18 @@ The functions 'np.any' and 'np.all' require some special consideration, | |
just as logical_and and logical_or do. Maybe the best way to describe | ||
their behavior is through a series of examples:: | ||
|
||
>>> np.any(np.array([False, False, False], namasked=True)) | ||
>>> np.any(np.array([False, False, False], maskna=True)) | ||
False | ||
>>> np.any(np.array([False, NA, False], namasked=True)) | ||
>>> np.any(np.array([False, NA, False], maskna=True)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit-picking here, but shouldn't it be "np.NA", not "NA"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. changed |
||
NA | ||
>>> np.any(np.array([False, NA, True], namasked=True)) | ||
>>> np.any(np.array([False, NA, True], maskna=True)) | ||
True | ||
|
||
>>> np.all(np.array([True, True, True], namasked=True)) | ||
>>> np.all(np.array([True, True, True], maskna=True)) | ||
True | ||
>>> np.all(np.array([True, NA, True], namasked=True)) | ||
>>> np.all(np.array([True, NA, True], maskna=True)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just thinking out loud here... would it make sense to have skipna=True available for np.all? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense |
||
NA | ||
>>> np.all(np.array([False, NA, True], namasked=True)) | ||
>>> np.all(np.array([False, NA, True], maskna=True)) | ||
False | ||
|
||
Parameterized NA Data Types | ||
|
@@ -609,14 +619,14 @@ The important part of future-proofing the design is making sure | |
the C ABI-level choices and the Python API-level choices have a natural | ||
transition to multi-NA support. Here is one way multi-NA support could look:: | ||
|
||
>>> a = np.array([np.NA(1), 3, np.NA(2)], namasked='multi') | ||
>>> a = np.array([np.NA(1), 3, np.NA(2)], maskna='multi') | ||
>>> np.sum(a) | ||
NA(1) | ||
>>> np.sum(a[1:]) | ||
NA(2) | ||
>>> b = np.array([np.NA, 2, 5], namasked=True) | ||
>>> b = np.array([np.NA, 2, 5], maskna=True) | ||
>>> a + b | ||
array([NA(0), 5, NA(2)], namasked='multi') | ||
array([NA(0), 5, NA(2)], maskna='multi') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to clarify, the implication here is not to mix masked and bit-patterns? Consider the following: ...
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's correct, it isn't about mixing masks and bitpatterns, its about having multiple distinct NAs in both masks and bitpatterns. The NEP makes no distinction between bitpattern-NA and masked NA with regard to computations, however the multi-NA idea would allow a user to distinguish between them by using different multi-NA payloads. |
||
|
||
PEP 3118 | ||
======== | ||
|
@@ -696,28 +706,28 @@ This gives us the following additions to the PyArrayObject:: | |
/* | ||
* Descriptor for the mask dtype. | ||
* If no mask: NULL | ||
* If mask : bool/structured dtype of bools | ||
* If mask : bool/uint8/structured dtype of mask dtypes | ||
*/ | ||
PyArray_Descr *maskdescr; | ||
PyArray_Descr *maskna_descr; | ||
/* | ||
* Raw data buffer for mask. If the array has the flag | ||
* NPY_ARRAY_OWNNAMASK enabled, it owns this memory and | ||
* NPY_ARRAY_OWNMASKNA enabled, it owns this memory and | ||
* must call PyArray_free on it when destroyed. | ||
*/ | ||
npy_uint8 *maskdata; | ||
npy_mask *maskna_data; | ||
/* | ||
* Just like dimensions and strides point into the same memory | ||
* buffer, we now just make the buffer 3x the nd instead of 2x | ||
* and use the same buffer. | ||
*/ | ||
npy_intp *maskstrides; | ||
npy_intp *maskna_strides; | ||
|
||
There are 2 (or 3) flags which must be added to the array flags:: | ||
|
||
NPY_ARRAY_HASNAMASK | ||
NPY_ARRAY_OWNNAMASK | ||
NPY_ARRAY_HASMASKNA | ||
NPY_ARRAY_OWNMASKNA | ||
/* To possibly add in a later revision */ | ||
NPY_ARRAY_HARDNAMASK | ||
NPY_ARRAY_HARDMASKNA | ||
|
||
To allow the easy detection of NA support, and whether an array | ||
has any missing values, we add the following functions: | ||
|
@@ -807,7 +817,7 @@ NPY_ITER_ARRAYMASK | |
can be only one such mask, and there cannot also be a virtual | ||
mask. | ||
|
||
As a special case, if the flag NPY_ITER_USE_NAMASK is specified | ||
As a special case, if the flag NPY_ITER_USE_MASKNA is specified | ||
at the same time, the mask for the operand is used instead | ||
of the operand itself. If the operand has no mask but is | ||
based on an NA dtype, that mask exposed by the iterator converts | ||
|
@@ -827,14 +837,14 @@ Iterator NA-array Features | |
|
||
We add several new per-operand flags: | ||
|
||
NPY_ITER_USE_NAMASK | ||
NPY_ITER_USE_MASKNA | ||
If the operand has an NA dtype, an NA mask, or both, this adds a new | ||
virtual operand to the end of the operand list which iterates | ||
over the mask of the particular operand. | ||
|
||
NPY_ITER_IGNORE_NAMASK | ||
NPY_ITER_IGNORE_MASKNA | ||
If an operand has an NA mask, by default the iterator will raise | ||
an exception unless NPY_ITER_USE_NAMASK is specified. This flag | ||
an exception unless NPY_ITER_USE_MASKNA is specified. This flag | ||
disables that check, and is intended for cases where one has first | ||
checked that all the elements in the array are not NA using the | ||
PyArray_ContainsNA function. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -129,60 +129,38 @@ def _reconstruct(subtype, shape, dtype): | |
return ndarray.__new__(subtype, shape, dtype) | ||
|
||
|
||
# format_re and _split were taken from numarray by J. Todd Miller | ||
# format_re was originally from numarray by J. Todd Miller | ||
|
||
def _split(input): | ||
"""Split the input formats string into field formats without splitting | ||
the tuple used to specify multi-dimensional arrays.""" | ||
|
||
newlist = [] | ||
hold = asbytes('') | ||
|
||
listinput = input.split(asbytes(',')) | ||
for element in listinput: | ||
if hold != asbytes(''): | ||
item = hold + asbytes(',') + element | ||
else: | ||
item = element | ||
left = item.count(asbytes('(')) | ||
right = item.count(asbytes(')')) | ||
|
||
# if the parenthesis is not balanced, hold the string | ||
if left > right : | ||
hold = item | ||
|
||
# when balanced, append to the output list and reset the hold | ||
elif left == right: | ||
newlist.append(item.strip()) | ||
hold = asbytes('') | ||
|
||
# too many close parenthesis is unacceptable | ||
else: | ||
raise SyntaxError(item) | ||
|
||
# if there is string left over in hold | ||
if hold != asbytes(''): | ||
raise SyntaxError(hold) | ||
|
||
return newlist | ||
|
||
format_re = re.compile(asbytes(r'(?P<order1>[<>|=]?)(?P<repeats> *[(]?[ ,0-9]*[)]? *)(?P<order2>[<>|=]?)(?P<dtype>[A-Za-z0-9.]*)')) | ||
format_re = re.compile(asbytes(r'(?P<order1>[<>|=]?)(?P<repeats> *[(]?[ ,0-9]*[)]? *)(?P<order2>[<>|=]?)(?P<dtype>[A-Za-z0-9.]*(?:\[[a-zA-Z0-9,.]+\])?)')) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might be nice to split up the string, 154 char is a bit long. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure |
||
sep_re = re.compile(asbytes(r'\s*,\s*')) | ||
space_re = re.compile(asbytes(r'\s+$')) | ||
|
||
# astr is a string (perhaps comma separated) | ||
|
||
_convorder = {asbytes('='): _nbo} | ||
|
||
def _commastring(astr): | ||
res = _split(astr) | ||
if (len(res)) < 1: | ||
raise ValueError("unrecognized formant") | ||
startindex = 0 | ||
result = [] | ||
for k,item in enumerate(res): | ||
# convert item | ||
while startindex < len(astr): | ||
mo = format_re.match(astr, pos=startindex) | ||
try: | ||
(order1, repeats, order2, dtype) = format_re.match(item).groups() | ||
(order1, repeats, order2, dtype) = mo.groups() | ||
except (TypeError, AttributeError): | ||
raise ValueError('format %s is not recognized' % item) | ||
raise ValueError('format number %d of "%s" is not recognized' % | ||
(len(result)+1, astr)) | ||
startindex = mo.end() | ||
# Separator or ending padding | ||
if startindex < len(astr): | ||
if space_re.match(astr, pos=startindex): | ||
startindex = len(astr) | ||
else: | ||
mo = sep_re.match(astr, pos=startindex) | ||
if not mo: | ||
raise ValueError( | ||
'format number %d of "%s" is not recognized' % | ||
(len(result)+1, astr)) | ||
startindex = mo.end() | ||
|
||
if order2 == asbytes(''): | ||
order = order1 | ||
|
@@ -192,7 +170,7 @@ def _commastring(astr): | |
order1 = _convorder.get(order1, order1) | ||
order2 = _convorder.get(order2, order2) | ||
if (order1 != order2): | ||
raise ValueError('in-consistent byte-order specification %s and %s' % (order1, order2)) | ||
raise ValueError('inconsistent byte-order specification %s and %s' % (order1, order2)) | ||
order = order1 | ||
|
||
if order in [asbytes('|'), asbytes('='), _nbo]: | ||
|
@@ -203,7 +181,7 @@ def _commastring(astr): | |
else: | ||
newitem = (dtype, eval(repeats)) | ||
result.append(newitem) | ||
|
||
return result | ||
|
||
def _getintp_ctype(): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -318,6 +318,7 @@ | |
# End 1.6 API | ||
'PyArray_MaskedCopyInto': 281, | ||
'PyArray_MaskedMoveInto': 282, | ||
'PyArray_SetBase': 283, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this need to be in the API? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To support full separation of the ABI from the memory layout of PyArrayObject *, all of its accessors will have to move into the API. This one seemed like a no-brainer to start, as it's doing more than just setting the field. |
||
} | ||
|
||
ufunc_types_api = { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,7 +42,8 @@ extern "C" CONFUSE_EMACS | |
#define PyArray_HasArrayInterface(op, out) \ | ||
PyArray_HasArrayInterfaceType(op, NULL, NULL, out) | ||
|
||
#define PyArray_IsZeroDim(op) (PyArray_Check(op) && (PyArray_NDIM(op) == 0)) | ||
#define PyArray_IsZeroDim(op) (PyArray_Check(op) && \ | ||
(PyArray_NDIM((PyArrayObject *)op) == 0)) | ||
|
||
#define PyArray_IsScalar(obj, cls) \ | ||
(PyObject_TypeCheck(obj, &Py##cls##ArrType_Type)) | ||
|
@@ -159,12 +160,18 @@ extern "C" CONFUSE_EMACS | |
(k)*PyArray_STRIDES(obj)[2] + \ | ||
(l)*PyArray_STRIDES(obj)[3])) | ||
|
||
#define PyArray_XDECREF_ERR(obj) \ | ||
if (obj && (PyArray_FLAGS(obj) & NPY_ARRAY_UPDATEIFCOPY)) { \ | ||
PyArray_FLAGS(PyArray_BASE(obj)) |= NPY_ARRAY_WRITEABLE; \ | ||
PyArray_FLAGS(obj) &= ~NPY_ARRAY_UPDATEIFCOPY; \ | ||
} \ | ||
Py_XDECREF(obj) | ||
static NPY_INLINE void | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The use of NPY_INLINE in a public include probably means that extensions need to use the same compiler that numpy was compiled with. Hmm, wonder if that affects cross compiling? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The neighborhood iterator was already doing this, so there is at least some precedence. |
||
PyArray_XDECREF_ERR(PyArrayObject *arr) | ||
{ | ||
if (arr != NULL) { | ||
if (PyArray_FLAGS(arr) & NPY_ARRAY_UPDATEIFCOPY) { | ||
PyArrayObject *base = (PyArrayObject *)PyArray_BASE(arr); | ||
PyArray_ENABLEFLAGS(base, NPY_ARRAY_WRITEABLE); | ||
PyArray_CLEARFLAGS(arr, NPY_ARRAY_UPDATEIFCOPY); | ||
} | ||
Py_DECREF(arr); | ||
} | ||
} | ||
|
||
#define PyArray_DESCR_REPLACE(descr) do { \ | ||
PyArray_Descr *_new_; \ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean "[Exposed, Exposed, Hidden, Exposed]"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed