Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contains method is not consistent for subarrays #3016

Open
seberg opened this issue Feb 25, 2013 · 8 comments
Open

Contains method is not consistent for subarrays #3016

seberg opened this issue Feb 25, 2013 · 8 comments
Labels
00 - Bug component: numpy._core Priority: high High priority, also add milestones for urgent issues

Comments

@seberg
Copy link
Member

seberg commented Feb 25, 2013

The __contains__ method is written to be used for a single array element. However for example in list of list, __contains__ does a check more equivalent to subarrays. in must return a single boolean.

After some discussion on the list (nabble), there are three main possibilities:

  1. The first item must be an element. That means that an array a for a in b will normally be a simple error. (As nathaniel mentioned on the list).
  2. Do a list of list like comparison. I.e. in operates on the first dimension.
  3. Do some kind of subarray matching (there are many different versions of this allowing different things)

Point 2. seems wrong, since arrays are not list of lists. Point 3. has some merit, it can go as far as allowing things similar to strings 'a' in 'cat', however there are some problems with the details. Point 1. is the simplest and safest solution. One problem with 2. is that for object arrays it can be not quite clear how to interpret for example a tuple/list.

At this time (there was not much discussion yet though), it seems that the best solution is to just raise an error (i.e. solution 1.). Finding subarrays is better suited for a dedicated function.

@charris
Copy link
Member

charris commented Feb 25, 2013

This might be worth raising on the list.

@eric-wieser
Copy link
Member

eric-wieser commented Dec 9, 2017

An example of this giving a meaningless result:

>>> np.arange(3) in np.eye(3)
True

Implementing one option from the mailing list (@seberg's):

Another way of
seeing this would be ignoring one sized dimensions in a for the sake
of defining its "element". This would allow:

In [1]: b = np.arange(10).reshape(5,2) 

In [2]: b 
Out[2]: 
array([[0, 1], 
       [2, 3], 
       [4, 5], 
       [6, 7], 
       [8, 9]]) 

In [3]: a = np.array([[0, 1]]) # extra dimensions at the start 

In [4]: a in b 
Out[4]: True 

# But would also allow transpose, since now the last axes is a dummy: 
In [5]: a.T in b.T 
Out[5]: True 

Those two examples could also be a shape mismatch error, I tend to think
they are reasonable enough to work, but then the user could just
reshape/transpose to achieve the same.

Gives us:

def __contains__(self, other):
    other = np.asanyarray(other)
    ndim = min(self.ndim, other.ndim)
    eq = self == other
    matched_axes = []
    for ax in range(-ndim, 0):
        if other.shape[ax] == self.shape[ax]:
            matched_axes.append(ax)
        elif other.shape[ax] > self.shape[ax]:
            return False
    return eq.all(axis=matched_axes).any()

Which looks to me like a sensible variant of option 3

@seberg
Copy link
Member Author

seberg commented Dec 9, 2017

I personally think we should go with the "item must be an element" solution, then maybe create a fancy function for more complex stuff.

@eric-wieser
Copy link
Member

Would you be able to link to the past mailing list discussion?

Either way, if we're going to change behaviour, we'd need to pass through the single-element-or-FutureWarning path first.

@seberg
Copy link
Member Author

seberg commented Dec 9, 2017

http://numpy-discussion.10968.n7.nabble.com/What-should-np-ndarray-contains-do-td32964.html

I guess this is the one from that time, I really don't remember much about where it went, maybe I was just to lazy to do it, or a bit too confused about the object array case and then lost interest.

@seberg
Copy link
Member Author

seberg commented Dec 9, 2017

@eric-wieser
Copy link
Member

Thanks - updated the top post with that, and found the suggestion in that thread that matches my implementation.

@shoyer
Copy link
Member

shoyer commented Dec 9, 2017

I would also only support single elements, and raise an error for higher dimensional keys. It is hard to understand the current behavior as anything other than a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug component: numpy._core Priority: high High priority, also add milestones for urgent issues
Projects
None yet
Development

No branches or pull requests

4 participants