Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Create the ability for fused operations (fused ufuncs or map_reduce) style #11622

Open
jakirkham opened this issue Jul 26, 2018 · 9 comments

Comments

@jakirkham
Copy link
Contributor

It would be nice to have something in NumPy, which would give the largest value by magnitude. IOW abs(a).max(). However it would be better if it didn't result in creating a second copy of the data as abs(a) would do.

@eric-wieser
Copy link
Member

eric-wieser commented Jul 26, 2018

"I want to do this chain of operations without intermediate copies" seems to already be a problem solved well by numba - I don't see numpy stepping up in its place any time soon.

See also #8528, which is similar.

@rgommers
Copy link
Member

I agree with @eric-wieser that this is not something we would want to introduce a new function for.

If you don't want the memory copy of the full array a, you could do something a bit uglier: max(a.max(), abs(a.min()))

@seberg
Copy link
Member

seberg commented Jul 26, 2018

Numexpr also, although I think they support almost no reductions. This can be done in a bit of (non-pretty) python code, also. If someone has an elegant enough thing, I don't mind adding it to numpy, although it probably might as well be its own sub-package.
Also I personally do not necessary mind adding a few "optimization" ufuncs (i.e. such as this), but also that could possibly be maintained in its own namespace/package!

@eric-wieser
Copy link
Member

eric-wieser commented Jul 26, 2018

I don't like that idea of adding ufuncs for a small subset of op, reduction. If this is a common enough case, perhaps we should support something like np.map_reduce, used as:

max_abs   = np.ufunc.map_reduce(np.abs, np.maximum)
all_equal = np.ufunc.map_reduce(np.equal, np.logical_and)

max_abs(a)       # np.maximum.reduce(np.abs(a)), but more efficient
all_equal(a, b)  # np.logical_and.reduce(np.equal(a, b)), but more efficient

could possibly be maintained in its own package!

That seems reasonable to me - anyone is free to create their own gufunc that does these things

@seberg
Copy link
Member

seberg commented Jul 26, 2018

Yeah, we would have to be very careful to be sparingly, so probably a bad idea. Someone could make a package outside numpy if they feel it important ;). Yeah, such a map_reduce may be something we could think about. Generally, I think such things needs someone to pick it up and push it a bit in any case though...

@jakirkham
Copy link
Contributor Author

Sure, having some map_reduce style function operation in NumPy would be very useful.

It's true that there are a plethora of things that build off NumPy and are useful. Though it's often harder to get a library (especially one you don't own) to use a new dependency. Requiring NumPy is pretty much a given in these cases and downstream libraries tend to encourage features like this one moving into NumPy. For a related discussion (and part of my motivation for thinking about this), please see issue ( scikit-learn/scikit-learn#11681 ).

@seberg
Copy link
Member

seberg commented Jul 26, 2018

Well, I think we just need to make:

  1. A good enough case (showing that enough people are interested)
  2. Think about how best to include this type of thing exactly (a map reduce like helper, or a new ufunc...)
  3. Make a reasonable case that it does not clutter numpy, be bad for maintenance (including not being a bandaid solution that feels like it should be in numpy).

I frankly think that none of those things are necessarily a super high bar, at least for individual functions, but my guess is the interest for most of us currently maintaining is not so much to figure those out...

Also nothing stops us from moving external well maintained stuff into numpy if everyone feels it is super useful, but at the moment it feels like we are going into a direction of rather a package more then less. But, such a package could be maintained under the numpy umbrella as well, to keep things a bit tighter/more opaque what we depend on?

If someone wants an idea of how such things can be partially done using nditer (would be better in the C-api): https://gist.github.com/seberg/30f7bff59b347e71a21571122a5d9245
But I doubt it is a good idea to actually use it, though for huge datasets it might help. AFAIK, numexpr does something similar and probably in a much better way.

@jakirkham
Copy link
Contributor Author

On a more general note as to the value of max_abs, it is worth pointing out that there was work done on a max_abs function previously ( #5509 ) and it is frequently used in norms ( #5218 ). For me, this came up in scikit-learn where they have rolled their own version of this. Wouldn't be surprised to encounter this in other machine learning libraries as well.

Side note: A solution like max(a.max(), abs(a.min())) might not be so bad if we had minmax. ( #9836 )

@charris
Copy link
Member

charris commented Aug 17, 2018

How does scikit-learn handle signed integers?

@seberg seberg changed the title Fast, memory efficient absolute value max ENH: Create the ability for fused operations (fused ufuncs or map_reduce) style Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants