New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for axis arguments on reduction functions #1269
Comments
Well as long as we support only a couple signatures (e.g. Another question: can specifying the axis change the return type? |
If you mean "can the dtype change if axis is specified?", I don't think so. There are still the handful of reduction functions (like One aspect of |
Is there a workaround to pass
|
Is it feasible to at least raise an error when summing over one axis? I just spent a long time debugging, and enventually found that |
Re: combinatorial explosion (@pitrou) I often use the Re: tuples: It may be easiest to transform my_ndarray.sum(axis=(2,3,4)) = my_ndarray.sum(axis=2) \
.sum(axis=3-1) \
.sum(axis=4-1-1)
my_ndarray.sum(axis=(2,3,4), keepdims=True) = my_ndarray.sum(axis=2, keepdims=True) \
.sum(axis=3, keepdims=True) \
.sum(axis=4, keepdims=True) This should incur very little overhead, since the array sizes are much smaller for the later Could an owner please label this issue as |
Is there any plans on implementing this? For me it would be convenient to have the axis argument for np.mean and np.sum. |
I have built the following workaround for 2d arrays: @nb.njit
def np_apply_along_axis(func1d, axis, arr):
assert arr.ndim == 2
assert axis in [0, 1]
if axis == 0:
result = np.empty(arr.shape[1])
for i in range(len(result)):
result[i] = func1d(arr[:, i])
else:
result = np.empty(arr.shape[0])
for i in range(len(result)):
result[i] = func1d(arr[i, :])
return result
@nb.njit
def np_mean(array, axis):
return np_apply_along_axis(np.mean, axis, array)
@nb.njit
def np_std(array, axis):
return np_apply_along_axis(np.std, axis, array) This allows to use np_mean/np_std instead of np.mean/np.std with axis support in numba. |
Since a bunch of reduction functions have been implemented lately (like
|
Adding to the chorus here – the axis argument for numpy aggregation functions is crucial for so many applications. Numba is already an awesome project; would be great to have this added in |
Hi, any news on this side? The only thing which prevents me from decorating almost all of my utility functions with @njit is numba's lack of support for the axis parameter. Thank you for all the hard work! |
Hi, I ended up here looking for solutions for error messages using keepdims in sum. To allow for axis and keepdims would be a great feature. 👍 I managed a workaround for keepdims using array.sum(axis=0).reshape((1, -1)) and array.sum(axis=1).reshape((-1, 1)) for 2d arrays. It is not the nicest solution, but numba compiled it at least, and I got some speedup for my function. I guess this should work for other numpy functions as well. Although I have not tried, I guess this could extend to to nd array with axis=sometuple if one does something like @fritzo explains above, and handles the dimensions to reshape something like
Hope this can be of some help to someone :) |
Inspired by the above comment by @joelrich, here is a way to apply numpy functions (e.g.
|
Any news? Is there a plan to add |
@disadone thanks for asking about this. This is a long standing feature request which is both a lot of work to implement and also probably not too straightforward. As such, it is still awaiting a champion. |
@joelrich comment is really neat. However I got a problem with certain kinds of functions like @nb.njit
def np_apply_along_axis(func1d, axis, arr):
assert arr.ndim == 2
assert axis in [0, 1]
if axis == 0:
result = np.empty(arr.shape[1], dtype=arr.dtype)
for i in range(len(result)):
result[i] = func1d(arr[:, i])
else:
result = np.empty(arr.shape[0], dtype=arr.dtype)
for i in range(len(result)):
result[i] = func1d(arr[i, :])
return result |
no progress yet? |
I am super new to numba but I know my way around numpy. The following is a pure python alternative to def reduce(x:np.ndarray, reduce_op:np.ufunc, *, axis:Union[int, Tuple[int]]=None, keepdims:bool=False) -> np.ndarray:
if axis is None:
axis = tuple([x for x in range(x.ndim)])
if isinstance(axis, int):
axis = (axis,)
# tuple to list for indexing semantics
axis = [*axis]
reduced_axes = np.zeros(x.ndim, dtype=bool)
reduced_axes[axis] = True
original_shape = np.array(x.shape)
new_shape = original_shape[~reduced_axes]
# this could be reversed, but we are calling a reduction op on it anyway
new_axes = -np.arange(1, len(axis)+1)
x = np.moveaxis(x, axis, new_axes)
total_reduce = np.prod(original_shape[axis])
total_keep = np.prod(new_shape)
x = np.reshape(x, (total_keep, total_reduce))
result = np.empty((total_keep,), dtype=x.dtype)
for idx in range(result.size):
result[idx] = reduce_op(x[idx, ...])
if keepdims:
new_shape = original_shape
new_shape[axis] = 1
return np.reshape(result, new_shape) with the following test snippet # import the reduce function
import numpy as np
shape = (1, 3, 4, 5)
x = np.arange(np.prod(shape)).reshape(shape)
keepdims = False
axes = (-1, -2, 1)
result = reduce(x, np.sum, axis=axes, keepdims=keepdims)
expected = np.sum(x, axis=axes, keepdims=keepdims)
assert np.allclose(result, expected) and should be easy to turn into a numba function ... except that I don't know how to use numba properly, so I'm fighting my way through beginner mistakes at the moment (enforcing consistent types, etc.). I will update this post once I get it working. Update: So apparently On a tangent, I am quite interested in the idea of testing/choosing numba as the accelerator for scikit-bot instead of going down the cython or raw C route 🚀. ~90% of my performance-critical code relies on the ability to use |
For future reference, here is a numba implementation of a generic reduction op with support for @numba.jit(nopython=True)
def reduce(
x: np.ndarray, axis: np.ndarray, keepdims: bool
) -> np.ndarray:
# replace with your favourite reduction op
reduce_op = np.sum
if keepdims is False:
raise NotImplementedError("Numba can't np.squeeze yet.")
mask = np.zeros(x.ndim, dtype=np.bool8)
mask[axis] = True
original_shape = np.array(x.shape)
squeezed_shape = original_shape[~mask]
# this could be reversed, but we are calling a reduction op on it anyway
new_axes = -np.arange(1, len(axis) + 1)
# not that this will copy if reduction happens along a non-contigous axis
x_work = np.moveaxis(x, axis, new_axes)
x_work = np.ascontiguousarray(x_work)
total_reduce = np.prod(original_shape[axis])
total_keep = np.prod(squeezed_shape)
tmp_shape = to_fixed_tuple(np.array((total_keep, total_reduce)), 2)
x_work = np.reshape(x_work, tmp_shape)
result = np.empty((total_keep,), dtype=x_work.dtype)
for idx in range(result.size):
result[idx] = reduce_op(x_work[idx, ...])
new_shape = original_shape.copy()
new_shape[axis] = 1
new_shape_tuple = to_fixed_tuple(new_shape, x.ndim)
return np.reshape(result, new_shape_tuple) This makes use of |
I just wanted to mention that I also need the axis-keyword in |
Guessing here: probably this is technically feasible and would be a meaningful first step to improving https://github.com/numba/numba/blob/main/numba/np/arraymath.py#L165-L351 Just in case you want to take a closer look 😉 |
Hello, is there any progress on getting axis support for amax/max? |
I have a patch that adds The issue is, due to the complexity/approach the function implementations take a performance hit and it ends up slowing the overall testing framework. (This can be noticed in the runtime of the patch CI tests). The function used to generate the intermediate indices transposes the array according to Lines 745 to 786 in 2ca22c2
The other approach to this problem would be to build reduction specific iterators. (Similar to the |
|
I have a similar issue with
but still get some errors:
Do you have a solution / idea why this is happening? One solution would be
but it would not fit in the above framework. |
While I was looking for a solution to the same problem and appreciated previous suggestions, I soon realized that most calculations can be decomposed into additions and multiplications. Using this fact, basic stat functions can be made from
Variance function can be defined using the formula: var = mean(x**2) - mean(x)**2:
The standard deviation:
Note: A tuple axis is not supported. |
@lucasjinreal Nobody is stopping you 🤷🏻 |
@lucasjinreal Please stop this. |
@lucasjinreal I meant no insult; I was just trying to keep it constructive. And I'm afraid that I can't answer your questions, since the presumptions are false. |
Can't you follow my suggestion? |
Hi, just stumbled upon this problem, too. Can we hope to have this implemented by at least 2030? ) |
Pull-requests welcome! 😈 |
Pull-requests welcome! 😈 |
Now that we support array expressions, I'm finding that I really want to be able to pass the
axis
argument to reduction functions likenp.sum()
,np.mean()
, etc. This raises the issue again of how to do dispatch based on argument values, but it is becoming much more critical now that I can so easily create arrays to reduce in nopython mode.The text was updated successfully, but these errors were encountered: