n-array versions of set operations, especially intersect1d #5179

cbrueffer · 2014-10-13T11:17:40Z

The stock Python set operations support multiple inputs, while the Numpy equivalents only support two arrays.

A version of intersect1d supporting n arrays was posted to numpy-discussion back in 2009: http://numpy-discussion.10968.n7.nabble.com/intersect1d-for-N-input-arrays-td25956.html

The posted multi-array version works well for me, it would be great to see it (or a variant) in Numpy.

njsmith · 2014-10-13T13:04:34Z

I can't think why we would have any objection to these in principle, though
for the usual free software reasons I also doubt anyone will jump up to do
it for you. Want to prepare a pull request?
On 13 Oct 2014 12:17, "Christian Brueffer" notifications@github.com wrote:

The stock Python set operations support multiple inputs, while the Numpy
equivalents only support two arrays.

A version of intersect1d supporting n arrays was posted to
numpy-discussion back in 2009:
http://numpy-discussion.10968.n7.nabble.com/intersect1d-for-N-input-arrays-td25956.html

The posted multi-array version works well for me, it would be great to see
it (or a variant) in Numpy.

—
Reply to this email directly or view it on GitHub
#5179.

cbrueffer · 2014-10-13T13:11:05Z

Absolutely. Before I do that, do you have an opinion on the issues raised in the referenced email thread? I suppose for API stability n-matrix versions should be added as seperate functions, but should they be implemented by repeatedly calling the respective two-matrix versions, or is something like the provided implementation preferrable?

njsmith · 2014-10-13T13:18:21Z

API issues should be raised on numpy-discussion for the community to weigh
in on (most people don't read the bug tracker).

I guess it's possible that there are people passing the assume_unique
argument as a positional arg, in which case yeah you might need to make a
new function.

I don't really care about the implementation much myself, since I'm not the
one who'll be using it :-), and implementations are easily fixable later as
long as the API is good.
On 13 Oct 2014 14:11, "Christian Brueffer" notifications@github.com wrote:

Absolutely. Before I do that, do you have an opinion on the issues raised
in the referenced email thread? I suppose for API stability n-matrix
versions should be added as seperate functions, but should they be
implemented by repeatedly calling the respective two-matrix versions, or is
something like the provided implementation preferrable?

—
Reply to this email directly or view it on GitHub
#5179 (comment).

jaimefrio · 2014-10-13T14:24:43Z

Well, since the method, both old and new, is sorting based, when you intersect m arrays of n elements each, the whole operation is O(m*n*log(m*n)). If, on the other hand, you do m-1 pairwise intersections, you end up with a worst case of O((m-1)*2*n*log(2*n)). And that worse case only happens if all arrays contain the exact same items, which is probably not the most typical of cases.

My point is that there probably is no major advantage to what you propose over something like reduce(np.intersect1d, arrays), and it may even perform worse in many typical use cases. So saving a few key strokes doesn't seem to justify adding anything to the API.

cbrueffer · 2014-10-13T14:37:12Z

Wow, I wasn't aware of reduce() until now, thanks for the hint! I agree this is a better solution. I'll submit a patch which adds an example of this instead.

njsmith · 2014-10-13T14:46:25Z

Built-in reduce annoyingly requires an identity, so I think it'd be

reduce(np.intersect1d, arrays, [])
On 13 Oct 2014 15:37, "Christian Brueffer" notifications@github.com wrote:

Wow, I wasn't aware of reduce() until now, thanks for the hint! I agree
this is a better solution. I'll submit a patch which adds an example of
this instead.

—
Reply to this email directly or view it on GitHub
#5179 (comment).

cbrueffer · 2014-10-13T15:04:38Z

The third argument is the initializer, so adding [] results in the first intersect being between [] and array1 (if I've understood the API docs correctly), resulting in []. Not exactly what's intended :-)

njsmith · 2014-10-13T15:09:10Z

Doh, I was thinking of union obviously. Well, give it a try - it's possible
you'll have to do something awkward like

reduce(intersect1d, arrs[1:], arrs[0])

but I'm on my phone so don't take my word for it (as if you would after my
last suggestion...).
On 13 Oct 2014 16:04, "Christian Brueffer" notifications@github.com wrote:

The third argument is the initializer, so adding [] results in the first
intersect being between [] and array1 (if I've understood the API docs
correctly), resulting in []. Not exactly what's intended :-)

—
Reply to this email directly or view it on GitHub
#5179 (comment).

cbrueffer · 2014-10-13T15:11:12Z

reduce(np.intersect1d, arrays) as suggested by Jaime actually works perfectly for me. Thanks to both of you for the input!

The approach was suggested by Jaime Frio in issue numpy#5179.

cbrueffer closed this as completed Oct 13, 2014

cbrueffer added a commit to cbrueffer/numpy that referenced this issue Oct 15, 2014

Add examples for intersect1d and union1d of more than two arrays.

657c8b9

The approach was suggested by Jaime Frio in issue numpy#5179.

cbrueffer mentioned this issue Oct 15, 2014

Add examples for intersect1d and union1d of more than two arrays. #5189

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

n-array versions of set operations, especially intersect1d #5179

n-array versions of set operations, especially intersect1d #5179

cbrueffer commented Oct 13, 2014

njsmith commented Oct 13, 2014

cbrueffer commented Oct 13, 2014

njsmith commented Oct 13, 2014

jaimefrio commented Oct 13, 2014

cbrueffer commented Oct 13, 2014

njsmith commented Oct 13, 2014

cbrueffer commented Oct 13, 2014

njsmith commented Oct 13, 2014

cbrueffer commented Oct 13, 2014

n-array versions of set operations, especially intersect1d #5179

n-array versions of set operations, especially intersect1d #5179

Comments

cbrueffer commented Oct 13, 2014

njsmith commented Oct 13, 2014

cbrueffer commented Oct 13, 2014

njsmith commented Oct 13, 2014

jaimefrio commented Oct 13, 2014

cbrueffer commented Oct 13, 2014

njsmith commented Oct 13, 2014

cbrueffer commented Oct 13, 2014

njsmith commented Oct 13, 2014

cbrueffer commented Oct 13, 2014