Split maskna support out of mainline into a branch #297

Merged
merged 3 commits into from Jun 21, 2012

Projects

None yet

3 participants

@njsmith
Member
njsmith commented Jun 6, 2012

As per earlier discussion on the list, this PR attempts to remove exactly and only the maskna-related code from numpy mainline:
http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062417.html

The suggestion is that we merge this to master for the 1.7 release, and immediately "git revert" it on a branch so that it can be modified further without blocking the release.

The first patch does the actual maskna removal; the second and third rearrange things so that PyArray_ReduceWrapper does not end up in the public API, for reasons described therein.

All tests pass with Python 2.4, 2.5, 2.6, 2.7, 3.1, 3.2 on 64-bit Ubuntu. The docs also appear to build. Before I re-based this I also tested against Scipy, matplotlib, and pandas, and all were fine.

@mwiebe
Member
mwiebe commented Jun 14, 2012

I've taken a look through this now, it is as advertised, and the places bigger changes were required to make things work look ok to me. I'm ok with merging it to get 1.7 out the door.

There's a problem building it, though, since there are variable declarations in the middle of blocks in mapping.c. Here are the key lines:

mapping.c(722) : error C2275: 'npy_intp' : illegal use of this type as an expression
mapping.c(864) : error C2065: 'stransfer' : undeclared identifier

@njsmith
Member
njsmith commented Jun 14, 2012

Thanks for the catch. I did a rebase to modify the first commit to move those two sets of declarations to the top of the block, so it should be fixed now (though I don't have msvc to test).

njsmith added some commits May 11, 2012
@njsmith njsmith Remove maskna API from ndarray, and all (and only) the code supportin…
…g it

The original masked-NA-NEP branch contained a large number of changes
in addition to the core NA support. For example:
 - ufunc.__call__ support for where= argument
 - nditer support for arbitrary masks (in support of where=)
 - ufunc.reduce support for simultaneous reduction over multiple axes
 - a new "array assignment API"
 - ndarray.diagonal() returning a view in all cases
 - bug-fixes in __array_priority__ handling
 - datetime test changes
etc. There's no consensus yet on what should be done with the
maskna-related part of this branch, but the rest is generally useful
and uncontroversial, so the goal of this branch is to identify exactly
which code changes are involved in maskna support.

The basic strategy used to create this patch was:
 - Remove the new masking-related fields from ndarray, so no arrays
   are masked
 - Go through and remove all the code that this makes
   dead/inaccessible/irrelevant, in a largely mechanical fashion. So
   for example, if I saw 'if (PyArray_HASMASK(a)) { ... }' then that
   whole block was obviously just dead code if no arrays have masks,
   and I removed it. Likewise for function arguments like skipna that
   are useless if there aren't any NAs to skip.

This changed the signature of a number of functions that were newly
exposed in the numpy public API. I've removed all such functions from
the public API, since releasing them with the NA-less signature in 1.7
would create pointless compatibility hassles later if and when we add
back the NA-related functionality. Most such functions are removed by
this commit; the exception is PyArray_ReduceWrapper, which requires
more extensive surgery, and will be handled in followup commits.

I also removed the new ndarray.setasflat method. Reason: a comment
noted that the only reason this was added was to allow easier testing
of one branch of PyArray_CopyAsFlat. That branch is now the main
branch, so that isn't an issue. Nonetheless this function is arguably
useful, so perhaps it should have remained, but I judged that since
numpy's API is already hairier than we would like, it's not a good
idea to add extra hair "just in case". (Also AFAICT the test for this
method in test_maskna was actually incorrect, as noted here:
   https://github.com/njsmith/numpyNEP/blob/master/numpyNEP.py
so I'm not confident that it ever worked in master, though I haven't
had a chance to follow-up on this.)

I also removed numpy.count_reduce_items, since without skipna it
became trivial.

I believe that these are the only exceptions to the "remove dead code"
strategy.
b272bc6
@njsmith njsmith Move reduction.{c,h} from multiarray/ to umath/
This is done as a separate commit to make sure git can track the
change. This commit will not build. See next commit for actual changes
and rationale.
605c2b4
@njsmith njsmith Remove PyArray_ReduceWrapper from public API
There are two reasons to want to keep PyArray_ReduceWrapper out of the
public multiarray API:
 - Its signature is likely to change if/when masked arrays are added
 - It is essentially a wrapper for array->scalar transformations
   (*not* just reductions as its name implies -- the whole reason it
   is in multiarray.so in the first place is to support count_nonzero,
   which is not actually a reduction!). It provides some nice
   conveniences (like making it easy to apply such functions to
   multiple axes simultaneously), but, we already have a general
   mechanism for writing array->scalar transformations -- generalized
   ufuncs. We do not want to have two independent, redundant
   implementations of this functionality, one in multiarray and one in
   umath! So in the long run we should add these nice features to the
   generalized ufunc machinery. And in the short run, we shouldn't add
   it to the public API and commit ourselves to supporting it.

However, simply removing it from numpy_api.py is not easy, because
this code was used in both multiarray and umath. This commit:
 - Moves ReduceWrapper and supporting code to umath/, and makes
   appropriate changes (e.g. renaming it to PyUFunc_ReduceWrapper and
   cleaning up the header files).
 - Reverts numpy.count_nonzero to its previous implementation, so that
   it loses the new axis= and keepdims= arguments. This is
   unfortunate, but this change isn't so urgent that it's worth tying
   our APIs in knots forever. (Perhaps in the future it can become a
   generalized ufunc.)
3626d0c
@teoliphant teoliphant merged commit 134174c into numpy:master Jun 21, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment