Contains method is not consistent for subarrays #3016

seberg · 2013-02-25T12:29:48Z

The __contains__ method is written to be used for a single array element. However for example in list of list, __contains__ does a check more equivalent to subarrays. in must return a single boolean.

After some discussion on the list (nabble), there are three main possibilities:

The first item must be an element. That means that an array a for a in b will normally be a simple error. (As nathaniel mentioned on the list).
Do a list of list like comparison. I.e. in operates on the first dimension.
Do some kind of subarray matching (there are many different versions of this allowing different things)

Point 2. seems wrong, since arrays are not list of lists. Point 3. has some merit, it can go as far as allowing things similar to strings 'a' in 'cat', however there are some problems with the details. Point 1. is the simplest and safest solution. One problem with 2. is that for object arrays it can be not quite clear how to interpret for example a tuple/list.

At this time (there was not much discussion yet though), it seems that the best solution is to just raise an error (i.e. solution 1.). Finding subarrays is better suited for a dedicated function.

The text was updated successfully, but these errors were encountered:

charris · 2013-02-25T14:44:07Z

This might be worth raising on the list.

eric-wieser · 2017-12-09T01:15:52Z

An example of this giving a meaningless result:

>>> np.arange(3) in np.eye(3)
True

Implementing one option from the mailing list (@seberg's):

Another way of
seeing this would be ignoring one sized dimensions in a for the sake
of defining its "element". This would allow:
In [1]: b = np.arange(10).reshape(5,2) 

In [2]: b 
Out[2]: 
array([[0, 1], 
       [2, 3], 
       [4, 5], 
       [6, 7], 
       [8, 9]]) 

In [3]: a = np.array([[0, 1]]) # extra dimensions at the start 

In [4]: a in b 
Out[4]: True 

# But would also allow transpose, since now the last axes is a dummy: 
In [5]: a.T in b.T 
Out[5]: True 
Those two examples could also be a shape mismatch error, I tend to think
they are reasonable enough to work, but then the user could just
reshape/transpose to achieve the same.

Gives us:

def __contains__(self, other):
    other = np.asanyarray(other)
    ndim = min(self.ndim, other.ndim)
    eq = self == other
    matched_axes = []
    for ax in range(-ndim, 0):
        if other.shape[ax] == self.shape[ax]:
            matched_axes.append(ax)
        elif other.shape[ax] > self.shape[ax]:
            return False
    return eq.all(axis=matched_axes).any()

Which looks to me like a sensible variant of option 3

seberg · 2017-12-09T10:52:44Z

I personally think we should go with the "item must be an element" solution, then maybe create a fancy function for more complex stuff.

eric-wieser · 2017-12-09T18:32:57Z

Would you be able to link to the past mailing list discussion?

Either way, if we're going to change behaviour, we'd need to pass through the single-element-or-FutureWarning path first.

seberg · 2017-12-09T18:35:28Z

http://numpy-discussion.10968.n7.nabble.com/What-should-np-ndarray-contains-do-td32964.html

I guess this is the one from that time, I really don't remember much about where it went, maybe I was just to lazy to do it, or a bit too confused about the object array case and then lost interest.

seberg · 2017-12-09T18:35:57Z

Better link probably: https://mail.python.org/pipermail/numpy-discussion/2013-February/065572.html

eric-wieser · 2017-12-09T18:55:23Z

Thanks - updated the top post with that, and found the suggestion in that thread that matches my implementation.

shoyer · 2017-12-09T23:52:13Z

I would also only support single elements, and raise an error for higher dimensional keys. It is hard to understand the current behavior as anything other than a bug.

paul-the-noob mentioned this issue Dec 8, 2017

__contains__: erroneous broadcasting when operand is a list #10179

Open

shoyer mentioned this issue Apr 16, 2018

__contains__ does not work with DataArray pydata/xarray#2062

Closed

zou3519 mentioned this issue Aug 14, 2019

Consider changing the behavior of Tensor.__contains__(Tensor) to make more sense pytorch/pytorch#24338

Open

ewmoore mentioned this issue May 7, 2020

Strange behaviour when asking whether list is in numpy array of lists #16181

Closed

seberg added 00 - Bug Priority: high High priority, also add milestones for urgent issues component: numpy._core labels May 11, 2020

sadielbartholomew mentioned this issue Feb 15, 2022

dask: Data.__contains__ NCAS-CMS/cf-python#320

Merged

seberg mentioned this issue Jul 12, 2022

Usage of in statement with arrays produces ambiguous behaviour to the user #21933

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contains method is not consistent for subarrays #3016

Contains method is not consistent for subarrays #3016

seberg commented Feb 25, 2013 •

edited by eric-wieser

charris commented Feb 25, 2013

eric-wieser commented Dec 9, 2017 •

edited

seberg commented Dec 9, 2017

eric-wieser commented Dec 9, 2017

seberg commented Dec 9, 2017

seberg commented Dec 9, 2017

eric-wieser commented Dec 9, 2017

shoyer commented Dec 9, 2017

Contains method is not consistent for subarrays #3016

Contains method is not consistent for subarrays #3016

Comments

seberg commented Feb 25, 2013 • edited by eric-wieser

charris commented Feb 25, 2013

eric-wieser commented Dec 9, 2017 • edited

seberg commented Dec 9, 2017

eric-wieser commented Dec 9, 2017

seberg commented Dec 9, 2017

seberg commented Dec 9, 2017

eric-wieser commented Dec 9, 2017

shoyer commented Dec 9, 2017

seberg commented Feb 25, 2013 •

edited by eric-wieser

eric-wieser commented Dec 9, 2017 •

edited