ENH: allow numpy.apply_along_axis() to work with ndarray subclasses #7918

bennyrowland · 2016-08-08T13:20:14Z

This commit modifies the numpy.apply_along_axis() function so that if
it is called with an ndarray subclass, the internal func1d calls
receive subclass instances and the overall function returns an instance
of the subclass. There are two new tests for these two behaviours.

This commit modifies the numpy.apply_along_axis() function so that if it is called with an ndarray subclass, the internal func1d calls receive subclass instances and the overall function returns an instance of the subclass. There are two new tests for these two behaviours.

mhvk · 2016-08-08T16:01:10Z

I like anything that makes numpy more subclass aware, but worry this will not quite work at least for the ndarray subclass I am responsible for, astropy's Quantity. Specifically, you assume here that the output array should be similar to the input array, but this may not necessarily be true. E.g., for a quantity with units of m, my func1d could be calculating a variance, which would result in an output quantity with units of m^2. Or func1d might be a counting function, in which case it would return not a quantity but an integer; or a checking function that returns a bool array, etc.

I think one could circumvent these problems by not running arr.__array_wrap__ at the very end, but rather run res.__array_wrap__ just after creating the output array (see in-line comment). To me, it seems this would make sense for the non-scalar case only (i.e., where func1d returns an ndarray subclass); for the scalar case, I think the output should be a plain ndarray, i.e., no wrapping should be done.

mhvk · 2016-08-08T16:03:30Z

numpy/lib/shape_base.py

@@ -107,7 +107,7 @@ def apply_along_axis(func1d, axis, arr, *args, **kwargs):
            res = func1d(arr[tuple(i.tolist())], *args, **kwargs)
            outarr[tuple(ind)] = res
            k += 1
-        return outarr
+        return arr.__array_wrap__(outarr)
    else:
        Ntot = product(outshape)


So, for this branch, I would suggest to run

res.__array_wrap__(outarr)

just after creating outarr.

mhvk · 2016-08-08T16:12:54Z

After looking at the test cases: what is your own usage where you would want the subclass to be preserved?

Sticking with numpy's own ndarray subclasses, I think a case with a MaskedArray might be more logical, where, e.g., one does want to keep track of a result that was masked.

bennyrowland · 2016-08-08T18:06:47Z

My use case is this: I have an ndarray subclass storing sets of temporal MR data which has additional metadata properties such as the time between data points, spectrometer frequency etc. For example I might have a 3D array which contains temporal data on an x-y grid.
First use case: I have a function which calculates the frequency of an oscillation in the data in Hz, which requires the sampling frequency stored in the subclass. By using asanyarray() on line 77 I ensure that func1d gets passed a single temporal dataset with its metadata attached. In this case it returns a scalar and I think you are absolutely correct that it would be more appropriate for the 2D grid to come in a bare ndarray.
Second use case: I want to apply some processing to each temporal dataset, e.g. a windowing function. In this case the function takes in the subclass slice and returns an identically sized new instance of the same subclass, containing the same metadata. Obviously I want the overall 3D return grid to also have the same values included in it. After reading your comments above I completely agree that it makes more sense to use the return array from the func1d call to wrap the output, rather than the input array, and I will make this change to the code.

In light of your suggestions I agree that the matrix test case is not necessarily an ideal choice, it was motivated by a desire to use existing functions rather than add my own additional ones, but that caused it to become rather far from my original use case. I will change this to something closer to what I wanted.
In the second test there is no check that the return value preserves the subclass, the check is that the func1d calls get the subclass to operate on.

Modified pull request so that result of func1d call gets to call __array_wrap__ instead of the input array. Modified tests to work with this

bennyrowland · 2016-08-08T18:21:38Z

I have also run into the same issue with array scalars coming from np.mean etc. I guess the correct solution in this case is to set the shape of the outarr to be 1 in that dimension if the shape of res == (). This can then be short-circuited if desired at the subclass level in array_wrap by checking if the array is in fact an array scalar and returning the scalar instead, which allows functions like np.mean to be used to get the scalars if desired.

Modified pull request to handle case where calling func1d generates scalar arrays

mhvk · 2016-08-08T18:36:50Z

@bennyrowland - OK, sounds like we agree on what it should do! I also agree that setting the dimension to 1 for the purposes of filling the array makes sense, but I'm not sure about keeping that dimension. Better would be to have a keepdims keyword which decides whether or not that dimension gets kept. Since this presumably is not something func1d would ever need, it may be possible to add that -- you'd just get it out in the code using keepdims = kwargs.pop('keepdims', False).

But maybe best to let the real numpy developers judge that (I'm guessing this may become sufficiently "new feature" that an e-mail to the mailing list would be in order; @ahaldane? @charris?)

I did manage to come up with what I think is a decent test case using masked arrays (sadly, simple things like np.sum are simply going to fail because of the way those return either numpy scalars or the masked singleton; no fault of the code here, though!):

def my_func1d(a):
    return a.dot(a)

m = np.ma.MaskedArray([[1, 2], [3, 4]], mask=[[True, False], [True, True]])
np.apply_along_axis(my_func1d, 0, m)
# Should give [4, --]
# Note that m[0].dot(m[0]) yields a masked array scalar unlike things like m[0].sum()

bennyrowland · 2016-08-08T21:11:35Z

@mhvk Actually in my current version I decided to squeeze that dimension if a scalar array is returned. Certainly could add a keepdims keyword argument though. If we are going to start adding arguments to apply_along_axis, one thing I thought might be interesting was optionally passing the index of the current slice to func1d, so that each call knows where it is in the larger grid. This could be used to index into other arguments passed through from the apply_along_axis call, for example to give each axis custom data. An example use case for this might be: if I have a 2D array containing a set of temporal data and I want to filter a different frequency from each dataset.

def filter_dataset(data, frequency_list, index):
    return data.filter(frequency_list[index])
result = np.apply_along_axis(filter_dataset, 1, input, pass_index=True)

I did send a notification email to numpy-discussion to invite comment on this PR, was that not the right mailing list?

mhvk · 2016-08-08T22:58:52Z

numpy/lib/shape_base.py

@@ -112,8 +112,9 @@ def apply_along_axis(func1d, axis, arr, *args, **kwargs):
        Ntot = product(outshape)
        holdshape = outshape
        outshape = list(arr.shape)
-        outshape[axis] = len(res)
+        outshape[axis] = len(res) if res.shape != () else 1


I guess this could just be outshape[axis] = res.size.

mhvk · 2016-08-08T23:16:51Z

@bennyrowland - I had two very small additional comments, where the most important might be whether or not to assume that func1d returns something array-like. It seems some of the code does not assume so, and the easiest may simply be to force it using res = np.asanyarray(res).
I certainly think it would be OK that way myself. But it might be good to have someone who has looked at this code before have a look, so cc @charris, @rkern (git blame suggests it goes back to Travis...).

p.s. Of course, with this PR you do beg the question of why not also do apply_over_axes -- I think there all you need is a change from asarray to asanyarray 😈

Modified pull request to cast result of func1d into ndarray (or subclass) and ensure that it has the necessary size, dtype and __array_wrap properties

bennyrowland · 2016-08-09T13:04:02Z

@mhvk - some further excellent suggestions I have acted on, as you say we will have to wait for an opinion from @charris or @rkern. I did take a look at apply_over_axes but I don't really get the use case so I left it alone. If this PR gets approved then I think the next step is a new function apply_along_axes which behaves like apply_along_axis but can extract multi-dimensional slices from the parent ndarray.

bennyrowland · 2016-10-06T18:35:32Z

@charris or @rkern, would still like to get this merged. Please let me know if there is anything else I need to do first.

shoyer · 2016-10-07T19:01:43Z

This definitely needs tests before it's ready to merge.

bennyrowland · 2016-10-11T00:49:45Z

@shoyer, thanks for commenting. There are already three new tests in this PR, what further tests would you like to see included?

shoyer · 2016-10-11T01:04:03Z

numpy/lib/tests/test_shape_base.py

+    def test_preserve_subclass(self):
+        def double(row):
+            return row * 2
+        m = np.matrix(np.arange(4).reshape((2, 2)))


Just a minor style thing -- I think this test would be clearer as just np.matrix([[0, 1], [2, 3]]), without the reshape.

shoyer · 2016-10-11T01:05:36Z

numpy/lib/tests/test_shape_base.py

+        result = apply_along_axis(double, 0, m)
+        assert isinstance(result, np.matrix)
+        assert_array_equal(
+            result, np.matrix([0, 2, 4, 6]).reshape((2, 2))


Same as above. I find reshape on np.matrix especially non-intuitive (and I could only guess at whether passing 1D input to np.matrix does).

shoyer

I have a couple (very) minor suggestions for the tests, but otherwise looks good to me.

shoyer · 2016-10-11T01:06:38Z

@bennyrowland oops, somehow I missed those :)

@shoyer

Modified pull request to make stylistic changes to tests suggested by @shoyer

shoyer · 2016-10-11T16:08:58Z

OK, in it goes. Thanks!

charris added 01 - Enhancement component: numpy.lib labels Aug 8, 2016

mhvk reviewed Aug 8, 2016
View reviewed changes

ENH: allow numpy.apply_along_axis() to work with ndarray subclasses

661d483

Modified pull request so that result of func1d call gets to call __array_wrap__ instead of the input array. Modified tests to work with this

ENH: allow numpy.apply_along_axis() to work with ndarray subclasses

53a58c3

Modified pull request to handle case where calling func1d generates scalar arrays

mhvk reviewed Aug 8, 2016
View reviewed changes

ENH: allow numpy.apply_along_axis() to work with ndarray subclasses

b67a350

Modified pull request to cast result of func1d into ndarray (or subclass) and ensure that it has the necessary size, dtype and __array_wrap properties

shoyer reviewed Oct 11, 2016

View reviewed changes

shoyer approved these changes Oct 11, 2016

View reviewed changes

ENH: allow numpy.apply_along_axis() to work with ndarray subclasses

242da41

Modified pull request to make stylistic changes to tests suggested by @shoyer

shoyer merged commit 84b11f5 into numpy:master Oct 11, 2016

mortonjt mentioned this pull request Dec 22, 2016

Scipy fix scikit-bio/scikit-bio#1472

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: allow numpy.apply_along_axis() to work with ndarray subclasses #7918

ENH: allow numpy.apply_along_axis() to work with ndarray subclasses #7918

bennyrowland commented Aug 8, 2016

mhvk commented Aug 8, 2016

mhvk Aug 8, 2016

mhvk commented Aug 8, 2016

bennyrowland commented Aug 8, 2016

bennyrowland commented Aug 8, 2016

mhvk commented Aug 8, 2016

bennyrowland commented Aug 8, 2016

mhvk Aug 8, 2016

mhvk commented Aug 8, 2016

bennyrowland commented Aug 9, 2016

bennyrowland commented Oct 6, 2016

shoyer commented Oct 7, 2016

bennyrowland commented Oct 11, 2016

shoyer Oct 11, 2016

shoyer Oct 11, 2016

shoyer left a comment

shoyer commented Oct 11, 2016

shoyer commented Oct 11, 2016

ENH: allow numpy.apply_along_axis() to work with ndarray subclasses #7918

ENH: allow numpy.apply_along_axis() to work with ndarray subclasses #7918

Conversation

bennyrowland commented Aug 8, 2016

mhvk commented Aug 8, 2016

mhvk Aug 8, 2016

Choose a reason for hiding this comment

mhvk commented Aug 8, 2016

bennyrowland commented Aug 8, 2016

bennyrowland commented Aug 8, 2016

mhvk commented Aug 8, 2016

bennyrowland commented Aug 8, 2016

mhvk Aug 8, 2016

Choose a reason for hiding this comment

mhvk commented Aug 8, 2016

bennyrowland commented Aug 9, 2016

bennyrowland commented Oct 6, 2016

shoyer commented Oct 7, 2016

bennyrowland commented Oct 11, 2016

shoyer Oct 11, 2016

Choose a reason for hiding this comment

shoyer Oct 11, 2016

Choose a reason for hiding this comment

shoyer left a comment

Choose a reason for hiding this comment

shoyer commented Oct 11, 2016

shoyer commented Oct 11, 2016