Skip to content

Loading…

Deprecate PyArrayObject* direct field access #116

Merged
merged 25 commits into from

3 participants

@mwiebe
NumPy member

This pull request deprecates direct access to PyArrayObject fields. This direct access has been discouraged for a while through comments in the header file and documentation, but up till now, there was no way to disable it. I've created such a mechanism, and C extensions can test that they don't use deprecated C API by #defining NPY_NO_DEPRECATED_API at the top of the C file.

I've confirmed that scipy master builds against this branch, and its test failures look unrelated to these changes (iterative methods failures). Additional testing of different versions and platforms would be appreciated!

This also includes a few other miscellaneous changes:

mwiebe and others added some commits
@mwiebe mwiebe ENH: core: Improve casting failure message produced by PyArray_FromArray 84ee545
@mwiebe mwiebe NEP: missingdata: Rename 'namask' to 'maskna', which looks better c625ee0
@mwiebe mwiebe ENH: core: Deprecating direct access to the PyArrayObject fields
NOTE: WIP, code doesn't build
b7cc20a
@mwiebe mwiebe ENH: core: Work in progress making things work without direct field a…
…ccess
c135371
@mwiebe mwiebe ENH: core: More progress refactoring code to not use PyArrayObject fi…
…elds directly
4dce9c0
@mwiebe mwiebe ENH: core: Progress getting NumPy building without direct field access 2bb06a5
@mwiebe mwiebe ENH: core: Some fixes, change some tests to not use yield bff2e51
@mwiebe mwiebe ENH: core: More cleanup towards deprecating direct arrayobject field …
…access
d88bfeb
@mwiebe mwiebe ENH: core: more progress removing direct ArrayObject field access 67c1098
@mwiebe mwiebe ENH: core: Get it fully building without direct field access 09debad
Mark Wiebe WRN: nditer: remove some warnings db90bf5
Mark Wiebe ENH: core: More cleanups removing direct PyArrayObject field access 57d6b5b
Mark Wiebe ENH: core: Got the tests running after the ArrayObject field access d…
…eprecation
0a7156f
Mark Wiebe ENH: core: Fix more test failures post-field access deprecation c9f0f29
Mark Wiebe BUG: dtype: comma-list dtype formats didn't accept M8[] parameterized…
… dtypes
f8ff0e3
Mark Wiebe BUG: core: PyArray_GetArrayParamsFromObject was treating __array_inte…
…rface__ incorrectly

I'm not sure why this came up after the arrayobject field access
changes, since it looks like the bug was in there before already.
5771857
Mark Wiebe TST: rec: DType in join_by test was inconsistent 405fc9e
Mark Wiebe BUG: core: Needed to initialize 'obj' to NULL for Py_XDECREF_ERR in f…
…ail section
57aa914
Mark Wiebe BUG: mmap: Make the memmap subclass rely on the Python mmap's destruc…
…tor being correct

There was some fishy code flushing and closing the _mmap property,
which looked as if it was paranoid in not trusting mmap.mmap to
behave properly. This was done in a way which assumed the .base
attribute isn't collapsed, something which I've changed in the refactoring.

The easiest way to fix this is to trust mmap.mmap - if this is incorrect,
the specific platform on which this fails should have been already
documented in the comments!
dcc355a
Mark Wiebe ENH: core: Add access macros back in conditionally, for backwards com…
…patibilities

Sticking with inline functions when NPY_NO_DEPRECATED_API is defined.
5590809
@charris charris commented on the diff
doc/neps/missing-data.rst
((6 lines not shown))
may be set to True to add a mask to an array which does not have one.
-Second is 'arr.flags.ownnamask', which is True if the array owns the
+Second is 'arr.flags.ownmaskna', which is True if the array owns the
memory to the mask, and False if the array has no mask, or has a view
into the mask of another array. If this is set to False in a masked
@charris NumPy member
charris added a note

It might be simpler if copying the mask could be specified when making the view, something like a 'copymask' keyword.

@charris NumPy member
charris added a note

Nevermind, I see this is done below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@charris charris commented on an outdated diff
numpy/core/_internal.py
((28 lines not shown))
- elif left == right:
- newlist.append(item.strip())
- hold = asbytes('')
-
- # too many close parenthesis is unacceptable
- else:
- raise SyntaxError(item)
-
- # if there is string left over in hold
- if hold != asbytes(''):
- raise SyntaxError(hold)
-
- return newlist
-
-format_re = re.compile(asbytes(r'(?P<order1>[<>|=]?)(?P<repeats> *[(]?[ ,0-9]*[)]? *)(?P<order2>[<>|=]?)(?P<dtype>[A-Za-z0-9.]*)'))
+format_re = re.compile(asbytes(r'(?P<order1>[<>|=]?)(?P<repeats> *[(]?[ ,0-9]*[)]? *)(?P<order2>[<>|=]?)(?P<dtype>[A-Za-z0-9.]*(?:\[[a-zA-Z0-9,.]+\])?)'))
@charris NumPy member
charris added a note

Might be nice to split up the string, 154 char is a bit long.

@mwiebe NumPy member
mwiebe added a note

Sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@charris charris commented on an outdated diff
numpy/core/code_generators/numpy_api.py
@@ -318,6 +318,7 @@ multiarray_funcs_api = {
# End 1.6 API
'PyArray_MaskedCopyInto': 281,
'PyArray_MaskedMoveInto': 282,
+ 'PyArray_SetBase': 283,
@charris NumPy member
charris added a note

Does this need to be in the API?

@mwiebe NumPy member
mwiebe added a note

To support full separation of the ABI from the memory layout of PyArrayObject *, all of its accessors will have to move into the API. This one seemed like a no-brainer to start, as it's doing more than just setting the field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@charris charris commented on the diff
numpy/core/include/numpy/ndarrayobject.h
@@ -159,12 +160,18 @@ extern "C" CONFUSE_EMACS
(k)*PyArray_STRIDES(obj)[2] + \
(l)*PyArray_STRIDES(obj)[3]))
-#define PyArray_XDECREF_ERR(obj) \
- if (obj && (PyArray_FLAGS(obj) & NPY_ARRAY_UPDATEIFCOPY)) { \
- PyArray_FLAGS(PyArray_BASE(obj)) |= NPY_ARRAY_WRITEABLE; \
- PyArray_FLAGS(obj) &= ~NPY_ARRAY_UPDATEIFCOPY; \
- } \
- Py_XDECREF(obj)
+static NPY_INLINE void
@charris NumPy member
charris added a note

The use of NPY_INLINE in a public include probably means that extensions need to use the same compiler that numpy was compiled with. Hmm, wonder if that affects cross compiling?

@mwiebe NumPy member
mwiebe added a note

The neighborhood iterator was already doing this, so there is at least some precedence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@charris charris commented on the diff
numpy/core/include/numpy/ndarraytypes.h
@@ -605,7 +609,25 @@ typedef struct PyArrayObject {
PyArray_Descr *descr; /* Pointer to type structure */
int flags; /* Flags describing array -- see below */
PyObject *weakreflist; /* For weakreferences */
+} PyArrayObject_fieldaccess;
+
+/*
+ * To hide the implementation details, we only expose
+ * the Python struct HEAD.
+ */
+#ifdef NPY_NO_DEPRECATED_API
+typedef struct tagPyArrayObject {
@charris NumPy member
charris added a note

I believe tags are in their own namespace, so the tag prefix isn't really needed. Adding '_t' to the typedef is more commonly done.

@mwiebe NumPy member
mwiebe added a note

I haven't noticed any consistent convention in NumPy with regards to this. I agree that typedefs like npy_int32 should be npy_int32_t as you suggest, but there are lots of things to change to make it consistent. This is the kind of thing that would go in a NumPy coding standards document somewhere...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@charris charris commented on the diff
numpy/core/include/numpy/ndarraytypes.h
((22 lines not shown))
}
-#define _PyArray_ITER_NEXT2(it) { \
- if ((it)->coordinates[1] < (it)->dims_m1[1]) { \
- (it)->coordinates[1]++; \
- (it)->dataptr += (it)->strides[1]; \
- } \
- else { \
- (it)->coordinates[1] = 0; \
- (it)->coordinates[0]++; \
- (it)->dataptr += (it)->strides[0] - \
- (it)->backstrides[1]; \
- } \
+#define _PyArray_ITER_NEXT2(it) { \
@charris NumPy member
charris added a note

Now I contradict myself and wonder if these should be inline functions. Having a typecheck on the 'it' variable could be helpful.

@mwiebe NumPy member
mwiebe added a note

It's unfortunate that all the NumPy array creation functions return PyObject* instead of PyArrayObject*.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@WeatherGod WeatherGod commented on an outdated diff
doc/neps/missing-data.rst
((13 lines not shown))
produce arrays with values [1.0, 2.0, <inaccessible>, 7.0] /
-mask [Unmasked, Unmasked, Masked, Unmasked], and
+mask [Exposed, Exposed, Exposed, Hidden], and

I think you mean "[Exposed, Exposed, Hidden, Exposed]"

@mwiebe NumPy member
mwiebe added a note

Indeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@WeatherGod WeatherGod commented on an outdated diff
doc/neps/missing-data.rst
((13 lines not shown))
produce arrays with values [1.0, 2.0, <inaccessible>, 7.0] /
-mask [Unmasked, Unmasked, Masked, Unmasked], and
+mask [Exposed, Exposed, Exposed, Hidden], and
values [1.0, 2.0, <NA bitpattern>, 7.0] respectively.

When you say "respectively", I would expect the same number of outputs as there are examples, but you only have two here. Confusion could occur if someone thought that the mask line was a separate output example, somehow.

Plus, I am a little concerned about having different print outputs. It might be ok, but I am not sure it is the best idea. On the other hand, it does allow for future mixing of NA-bitpatterns and masks...

@mwiebe NumPy member
mwiebe added a note

Yes, the respectively got out of kilter with multiple edits. The explanation here was to show what's under the hood. I think the reprs should be different, otherwise you wouldn't be able to tell the difference, and eval-ing the repr would produce a different object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@WeatherGod WeatherGod commented on an outdated diff
doc/neps/missing-data.rst
@@ -461,7 +471,7 @@ will also use the unmasked value counts for their calculations if
Some examples::
- >>> a = np.array([1., 3., np.NA, 7.], namasked=True)
+ >>> a = np.array([1., 3., np.NA, 7.], maskna=True)
>>> np.sum(a)
array(NA, dtype='<f8', masked=True)

"masked=True" -> "maskna=True"

@mwiebe NumPy member
mwiebe added a note

changed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@WeatherGod WeatherGod commented on an outdated diff
doc/neps/missing-data.rst
((6 lines not shown))
False
- >>> np.any(np.array([False, NA, False], namasked=True))
+ >>> np.any(np.array([False, NA, False], maskna=True))

Nit-picking here, but shouldn't it be "np.NA", not "NA"?

@mwiebe NumPy member
mwiebe added a note

changed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@WeatherGod

The NEP might benefit from some consideration of ufuncs applied to 2-D arrays along an axis. I think masked arrays right now have some behaviors in those situations that might differ from what I might logically extrapolate from the 1-D examples.

@WeatherGod WeatherGod commented on an outdated diff
doc/neps/missing-data.rst
((16 lines not shown))
True
- >>> np.all(np.array([True, NA, True], namasked=True))
+ >>> np.all(np.array([True, NA, True], maskna=True))

Just thinking out loud here... would it make sense to have skipna=True available for np.all?

@mwiebe NumPy member
mwiebe added a note

Makes sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@mwiebe
NumPy member

Do you have an example case you're thinking for a ufunc along an axis of a multidimensional array which would be surprising? I guess I thought this part was pretty obvious, but I can make some examples.

@WeatherGod WeatherGod commented on an outdated diff
doc/neps/missing-data.rst
((12 lines not shown))
>>> a + b
- array([NA(0), 5, NA(2)], namasked='multi')
+ array([NA(0), 5, NA(2)], maskna='multi')

Just to clarify, the implication here is not to mix masked and bit-patterns? Consider the following:

...

a = np.array([np.NA, 2, 5], maskna=True)
b = np.array([1, np.NA, 7], dtype='NA')
a + b
...
What should be the result? Now, let's say that -- somehow -- resulting array retains the distinction between the masked-NA and the bit-pattern NA. And the user wants to skip masked-NA values, but not skip bitpattern-NA values. I don't see how that is going to be possible in the current framework. Note, I am not saying that it should be possible, I am just raising a possible future issue with maintaining the distinction between bitpattern-NA and masked NA.

@mwiebe NumPy member
mwiebe added a note

That's correct, it isn't about mixing masks and bitpatterns, its about having multiple distinct NAs in both masks and bitpatterns. The NEP makes no distinction between bitpattern-NA and masked NA with regard to computations, however the multi-NA idea would allow a user to distinguish between them by using different multi-NA payloads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@WeatherGod

Consider the following:

>>> import numpy as np
>>> a = np.random.random((3, 2))
>>> b = np.ma.masked_array(a, mask=[[False, True], [True, True], [False, False]])
>>> b
masked_array(data =
 [[0.110804969841 --]
 [-- --]
 [0.955128477746 0.440430735546]],
             mask =
 [[False  True]
 [ True  True]
 [False False]],
       fill_value = 1e+20)
>>> b.mean(axis=0)
masked_array(data = [0.532966723794 0.440430735546],
             mask = [False False],
       fill_value = 1e+20)

>>> b.mean(axis=1)
masked_array(data = [0.110804969841 -- 0.697779606646],
             mask = [False  True False],
       fill_value = 1e+20)

Note that for masked arrays, they also do not return 0.0 for completely masked rows:

>>> b[1].sum()
masked

I am not saying that this is the way it should be, but if there is going to be a difference, then this should be made very clear for former np.ma users.

@mwiebe
NumPy member

I've added a section 'Differences with numpy.ma' which uses your example to show how things behave differently. Thanks!

Mark Wiebe ENH: dtype: Make handling of struct dtype align= flag more rigorous (…
…also fixes ticket #1912)

This adds an 'aligned'= property to the format dict, which gets put
in the str() representation when necessary to preserve it.
dc81482
@mwiebe
NumPy member

I've added another commit which makes the 'align=' struct dtype flag stickiness better, and fixes ticket #1912. Are there any more changes you think are necessary before this gets merged?

@charris
NumPy member

I haven't had time to finish looking through it, but I suspect I won't have anything substantive to add. However, I've been thinking the name PyArray_SetBase is too generic. What is the Base and what does setting it do? So maybe another name and some explanation of the intended use of the function would help.

@mwiebe
NumPy member

Sure, the name PyArray_SetBase is what it is because it matches with PyArray_BASE.

I think the possible confusion is due to poor naming conventions in the NumPy API. First, Py* should be reserved for CPython, not NumPy. I think a transition to Npy* should be done when the API has matured some more. Second, PyArray_* has all the PyArrayObject-specific functions and random other functions mixed in together.

The overloading of the BASE property together with the UPDATEIFCOPY flag also seems unfortunate, as it significantly changes the meaning of that property depending on the flag.

@charris
NumPy member

I'm guessing that PyArray_SetBase sets a pointer to the array from which a view is taken, so maybe something like PyArray_SetBaseArray or PyArray_SetRootArray, or some such. I really think all arrays should be views of some separate memory/buffer object, but that is another issue.

Can you expand on the interplay between the BASE and UPDATECOPY flags? I don't think it matters for this commit, but I would like to learn something ;)

@mwiebe
NumPy member

Yes, except the base can be any object that owns the memory, not just an array. PyArray_SetBaseObject? I agree with you about the separate memory/buffer object.

When the UPDATEIFCOPY flag of arr is set, arr's base is then the original array from which a copy is made. Arr's base is set readonly as well. When arr is freed, it copies its data into its base and sets arr to writeable again. Quite different from arr's base owning the memory...

@charris
NumPy member

PyArray_SetBaseObject sounds good, or maybe PyArray_SetBufferObject or PyArray_SetMemoryObject.

The UPDATEIFCOPY sounds like two paradigms getting mixed together.

@mwiebe
NumPy member

I'll switch it to PyArray_SetBaseObject. Changing it to a name other than Base would need a bigger cascading set of changes.

@mwiebe
NumPy member

I've changed the name to PyArray_SetBaseObject, and added some documentation about it.

@mwiebe
NumPy member

Chuck, did you have anything else you'd like changed before merging this?

@charris
NumPy member

No, go for it. Might clean up the trailing whitespace though ;)

@mwiebe mwiebe merged commit affea42 into numpy:master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jul 19, 2011
  1. @mwiebe
  2. @mwiebe
  3. @mwiebe

    ENH: core: Deprecating direct access to the PyArrayObject fields

    mwiebe committed with Mark Wiebe
    NOTE: WIP, code doesn't build
  4. @mwiebe
  5. @mwiebe

    ENH: core: More progress refactoring code to not use PyArrayObject fi…

    mwiebe committed with Mark Wiebe
    …elds directly
  6. @mwiebe
  7. @mwiebe

    ENH: core: Some fixes, change some tests to not use yield

    mwiebe committed with Mark Wiebe
  8. @mwiebe
  9. @mwiebe
  10. @mwiebe
  11. WRN: nditer: remove some warnings

    Mark Wiebe committed
  12. ENH: core: Got the tests running after the ArrayObject field access d…

    Mark Wiebe committed
    …eprecation
  13. BUG: core: PyArray_GetArrayParamsFromObject was treating __array_inte…

    Mark Wiebe committed
    …rface__ incorrectly
    
    I'm not sure why this came up after the arrayobject field access
    changes, since it looks like the bug was in there before already.
  14. TST: rec: DType in join_by test was inconsistent

    Mark Wiebe committed
  15. BUG: core: Needed to initialize 'obj' to NULL for Py_XDECREF_ERR in f…

    Mark Wiebe committed
    …ail section
  16. BUG: mmap: Make the memmap subclass rely on the Python mmap's destruc…

    Mark Wiebe committed
    …tor being correct
    
    There was some fishy code flushing and closing the _mmap property,
    which looked as if it was paranoid in not trusting mmap.mmap to
    behave properly. This was done in a way which assumed the .base
    attribute isn't collapsed, something which I've changed in the refactoring.
    
    The easiest way to fix this is to trust mmap.mmap - if this is incorrect,
    the specific platform on which this fails should have been already
    documented in the comments!
  17. ENH: core: Add access macros back in conditionally, for backwards com…

    Mark Wiebe committed
    …patibilities
    
    Sticking with inline functions when NPY_NO_DEPRECATED_API is defined.
Commits on Jul 20, 2011
Commits on Jul 22, 2011
  1. ENH: dtype: Make handling of struct dtype align= flag more rigorous (…

    Mark Wiebe committed
    …also fixes ticket #1912)
    
    This adds an 'aligned'= property to the format dict, which gets put
    in the str() representation when necessary to preserve it.
  2. DOC: core: Document the PyArray_SetBaseObject function

    Mark Wiebe committed
Commits on Jul 26, 2011
  1. STY: Remove trailing whitespace

    Mark Wiebe committed
Something went wrong with that request. Please try again.