Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Segfault for classes with deceptive __len__ #7266

Merged
merged 2 commits into from
Feb 19, 2016

Conversation

ahaldane
Copy link
Member

Fixes #7264

@charris charris added this to the 1.11.0 release milestone Feb 17, 2016
@ahaldane
Copy link
Member Author

Hmm, failures in python2 are fixed by removing the code block a few lines above, which has the comment /* Not exactly sure what this is about... */ :) Looking into it...

@ahaldane
Copy link
Member Author

OK, I have to leave this for a little bit.

Here are notes so I remember what I was thinking. The issue is that python2 and python3 give different results for the example in #7264.

The first reason is this block marked /* Not exactly sure what this is about... */, which in python2 causes the C() from the example to count simply as an object type by testing PyInstance_Check. In other words, in python2 the sequence checks are completely skipped, and we drop directly to the object array case. I'm not sure there is an equivalent instancecheck in python3, as "old style" classes/instances were removed.

However, if we remove that block at least we now get a ValueError in both python3 and python2... but they are different ValueErrors. At first python2 and python3 are similar in that they create an array of NPY_DOUBLE (a bit strange itself) in PyArray_GetArrayParamsFromObject but they differ when calculating the number of dimensions of this array.

This is caused by yet another call to PyInstance_Check in python2, here, which also has a warning /* FIXME: XXX -- what is the correct thing to do here? */. In python2 this case is True, so ndim gets set to 0. In python3 this is False, so ndim gets set to 1 further down during sequence checks. That means that, when trying to assign to the array of doubles, they go down different branches here, each giving a different ValueError.

I am tempted to just remove the first block above, and live with the fact that we get different ValueErrors in python2 and python3. However that probably means there is a bug that will turn up when more people convert from python2 to python3.

@ahaldane ahaldane force-pushed the segfault_invalidlenclass branch 2 times, most recently from 5965905 to f98673b Compare February 17, 2016 17:04
@ahaldane
Copy link
Member Author

Fixed up, should be ready. Is a release note needed?

I decided to remove the code blocks involving PyInstance_Check (which only exists in python2). It turns out those were the only two uses in all of numpy! Now python2 and python3 have the same behavior.

This does mean a small change in behavior for python2:

class C:
    def __init__(self):
        pass
    def __getitem__(self, ind):
        if ind in [0,1]:
            return ind
        else:
            raise IndexError()
    def __len__(self):
        return 2

np.array([C()])

Old result:

 array([<__main__.C instance at 0x7fe4ce8eb9e0>], dtype=object)

New result:

 array([[0, 1]])

In words, if a class defines both __len__ and __getitem__, it is now considered a sequence, but it wasn't before.

However, as before, if __len__ is not defined but __getitem__ is, you get an object. Apparently some libraries expect that.

@charris
Copy link
Member

charris commented Feb 17, 2016

@anntzer How did you discover the error?

I think a note would be appropriate, put it in 1.11-notes, as we might as well get it fixed and tested. Note that on Python 2.7 the following class also segfaults

In [3]: class C(object):
    def __getitem__(self, i): raise IndexError
    def __len__(self): return 42
   ...:     

@anntzer
Copy link
Contributor

anntzer commented Feb 17, 2016

I was trying to figure out what rules numpy uses to decide between creating an object array and trying to iterate over the array and get floats out of that (basically my starting point was #7233), so at some point I basically wrote the same class as you did and saw the segfault.

@ahaldane ahaldane force-pushed the segfault_invalidlenclass branch 3 times, most recently from 492c105 to e3df40e Compare February 18, 2016 05:08
That function does not exist in python3, causing differing
behavior in python2 vs python3.

This will slightly affect dtype detection when creating arrays from
user-supplied object, in cases where an object array might be produced.
charris added a commit that referenced this pull request Feb 19, 2016
BUG: Segfault for classes with deceptive __len__
@charris charris merged commit 2112e7d into numpy:master Feb 19, 2016
@charris
Copy link
Member

charris commented Feb 19, 2016

Thanks @ahaldane .

@ahaldane
Copy link
Member Author

Thanks, and as Post Script note: I updated the final version to remove dead code in _use_default_type, and then made sure that we did not lose the detection of 'user-defined' types which the dead code checked for.

charris added a commit to charris/numpy that referenced this pull request Feb 19, 2016
@charris charris removed this from the 1.11.0 release milestone Feb 19, 2016
charris added a commit that referenced this pull request Feb 19, 2016
Backport #7266, BUG: Segfault for classes with deceptive __len__
@ahaldane ahaldane deleted the segfault_invalidlenclass branch January 17, 2018 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants