Clarification of setxor1d documentation #14670

synapticarbors · 2019-10-09T22:38:23Z

I've been working on adding an implementation of np.setxor1d to numba (numba/numba#4677), and noticed that the documentation might need some clarifications, and I also had some questions about the expected behavior:

The assume_unique argument has the following description:

assume_unique : bool
If True, the input arrays are both assumed to be unique, which can speed up the calculation.
Default is False.

More specifically the code to work properly requires that the inputs are unique, 1D and sorted.

The input arrays ar1 and ar2, according to the docs, should just be "array-like", but a user gets unexpected errors if the arrays have ndim > 1, when assume_unique is True. The function will fail either at the point where the first call to np.concatenate if the two arrays have incompatible dimensions, or at the second call to np.concatenate when there is an attempt to concat with the 1D [True] value.

I'm not sure what the proper way of handling this is. Perhaps just a clarification of the docs, or a call to ravel() if assume_unique is True to get around the concat issue if the N-D arrays do indeed only contain unique values and they are sorted according to the (default?) order.

I'm happy to put in a PR once I get some feedback from the Numpy devs about the approach that they feel like is most appropriate.

cc/ @stuartarchibald

The text was updated successfully, but these errors were encountered:

synapticarbors · 2019-10-24T14:44:10Z

I would appreciate feedback on this issue if anyone has the time to take a quick look. Just re-upping this in case it got lost in the shuffle two weeks ago.

mattip · 2019-12-08T15:00:45Z

I think the intention in setxor1d is that the input be 1d, but only in certain cases does it actually matter. So the choices are:

clearly state in the documentation that ravel will be applied if ndims > 1 (and make the code enforce that)
always error if ndims > 1
The second choice seems like it would break more code, so I would prefer the first one. Other opinions?

stuartarchibald · 2019-12-12T17:24:36Z

@mattip thanks for your input. I think I'd go with option 1. This is also something Numba can do. @synapticarbors do you have a preference?

synapticarbors · 2019-12-12T17:40:27Z

I think option #1 is probably better to avoid breaking existing code.

eric-wieser added the 04 - Documentation label Nov 1, 2019

eric-wieser self-assigned this Nov 1, 2019

eric-wieser removed their assignment May 31, 2020

rossbar added component: numpy.lib 03 - Maintenance labels Jul 22, 2020

This was referenced Mar 21, 2022

[Tracker] Implement all numpy.* APIs in CuPy cupy/cupy#6078

Open

add cupy.setxor1d api cupy/cupy#6582

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification of setxor1d documentation #14670

Clarification of setxor1d documentation #14670

synapticarbors commented Oct 9, 2019

synapticarbors commented Oct 24, 2019

mattip commented Dec 8, 2019

stuartarchibald commented Dec 12, 2019

synapticarbors commented Dec 12, 2019

Clarification of setxor1d documentation #14670

Clarification of setxor1d documentation #14670

Comments

synapticarbors commented Oct 9, 2019

synapticarbors commented Oct 24, 2019

mattip commented Dec 8, 2019

stuartarchibald commented Dec 12, 2019

synapticarbors commented Dec 12, 2019