Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Changes in PyArray_FromAny between 1.5.x and 1.6.x #291

Closed
mwhansen opened this Issue · 16 comments

8 participants

@mwhansen

In Numpy 1.5.x, we have

sage: f = 0.5
sage: f.__array_interface__
{'typestr': '=f8'}
sage: numpy.array(f)
array(0.5)
sage: numpy.array(float(f))
array(0.5)

In 1.6, we get the following,

sage: f = 0.5
sage: f.__array_interface__
{'typestr': '=f8'}
sage: numpy.array(f)
array(0.500000000000000, dtype=object)

This seems to be do to the changes in PyArray_FromAny introduced in
http://github.com/mwhansen/numpy/commit/2635398db3f26529ce2aaea4028a8118844f3c48
. In particular, _array_find_type used to be used to query our
__array_interface__ attribute, and it no longer seems to work.

It should be reproducible with the following minimal example:

class Foo(object):
    def __init__(self, value):
        self.value = value
    def __float__(self):
        return float(self.value)
    @property
    def __array_interface__(self):
        return {'typestr': '=f8'}

f = Foo(0.5)
import numpy
numpy.array(f)
@mwiebe mwiebe was assigned
@mwiebe
Owner

In 1.7, there's a different error. It feels like this interface needs a solid set of tests to nail down what it's actually expected to do. In the documentation, it says that shape, typestr, and version are required parameters:

http://docs.scipy.org/doc/numpy/reference/arrays.interface.html#python-side

but neither 1.5 nor 1.6 were enforcing that. Here's what it gives me in 1.7 master:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
D:\Develop\temp\<ipython-input-1-82f13de30894> in <module>()
     10 f = Foo(0.5)
     11 import numpy
---> 12 numpy.array(f)
     13 

ValueError: Missing __array_interface__ shape

Here's one way of using it that works for me in both 1.6/1.7:

class Foo(str):
    def __init__(self, value):
        super(Foo, self).__init__(self, value)
    def __float__(self):
        return float(self.value)
    @property
    def __array_interface__(self):
        return {'shape': (), 'typestr': '>f8', 'version':3}

f = Foo("\x40\x09\x1e\xb8\x51\xeb\x85\x1f")
import numpy
numpy.array(f)
Out[10]: array(3.14)
@mwiebe
Owner
@njsmith
Owner

I think that Sage wants numpy.array([f, f]) to also give an array of floats, though. That works in 1.5, but not in 1.6 or 1.7, even with Mark's modified version.

@njsmith
Owner

(But like I said on the list, I think 1.5's handling of this was kind of array_interface, so I'm not sure what the best solution for sage's use case is.)

@mwhansen

Yes, the goal would be for numpy.array([f, f]) to give an array of floats.

@kiwifb

Hi, I have looked at this issue on sage side for a while. What strikes me is how different arange and linspace behave in 1.6.x (haven't looked at 1.7):

sage: import numpy
sage: numpy.arange(10.0)
array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])
sage: numpy.linspace(0,9,10)
array([0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9], dtype=object)

I would have expected the output to be identical and certainly to have the same type. That was certainly the case in 1.5.x. My point here is that the return type is inconsistent between a function that is coded purely in C in numpy compared to one that use some python. For reference what happens in linspace is:

def linspace(start, stop, num=50, endpoint=True, retstep=False):
num = int(num)
if num <= 0:
return array([], float)
if endpoint:
if num == 1:
return array([float(start)])
step = (stop-start)/float((num-1))
y = _nx.arange(0, num) * step + start
y[-1] = stop
else:
step = (stop-start)/float(num)
y = _nx.arange(0, num) * step + start
if retstep:
return y, step
else:
return y

I find interesting that the two special cases considered return a float array regardless of input but in other case the return array type is not checked.

I also tried to change the array_interface definition in sage from just {'typestr': '=f8'} to {'shape': (), 'typestr': '=f8', 'version':3} but that didn't do anything for that particular example.

@teoliphant
Owner

We should make sure that 1.7 does the same thing here that 1.5 did. Even if this is "off-label" use, it's a reasonable use-case, and if the array factory function used to inspect the array interface to get type information on its objects, then it should continue to do that. This should not have changed. I think this is a blocker for 1.7

@teoliphant
Owner

The following example breaks on 1.6.x / HEAD --- Updated!

 class Foo(object):
     def __init__(self, value)
         self.value = value
     def __float__(self):
         return float(self.value)
     @property
     def __array_interface__(self):
         return {'typestr': '=f8'}

 f = Foo(0.5)
 import numpy
 result2 = numpy.array(f)
 assert result2.dtype == numpy.dtype('f8')
@teoliphant
Owner

What is causing this regression is that PyArray_GetArrayParamsFromObject called form PyArray_Any has different semantics with respect to the array_interface than _array_find_type did --- at least for a single objects.

It seems like we need to add code into PyArray_GetArrayParamsFromObject that allows only typestr to be queried from the ArrayInterface and returns that type information (instead of the full array) when possible. This relies on later code outside of GetArrayParamsFromObject to actually produce the array. This should likely be done with a separate function call not exposed to the API.

@kiwifb

Thank you for your investigation. I am guessing mwhansen tried his code with numpy-1.6.1, I don't know if he tested 1.6.2. My result with linspace is still present in 1.6.2 and it sounds in line with what you are saying.

@teoliphant
Owner

Yes, all the 1.6 releases will have this problem. This can be fixed.

@87
87 commented

Hmm, it seems that the reason this has worked is because the Foo object was coerced into a float without going through the array interface (PyArray_FromInterface) at all. The array interface specifies that the data should be available through the buffer interface, which isn't the case for Foo..

The following works for me:

class Foo(object):
    def __init__(self, value):
        self.value = value
    def __float__(self):
        return float(self.value)
    @property
    def __array_interface__(self):
        return {'typestr': '=f8'}

f = Foo(0.5)

np.array([f,f])
array([ 0.5,  0.5])

But this does not:

np.array(f)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Missing __array_interface__ shape

When supplying shape information:

np.array(f)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: expected a readable buffer object

Is this still regarded as a bug? I think that would change the way the array interface was supposed to work..

(See also: documentation)

@pcpa

This is now the major source of doctest failures in my "work in progress" sagemath fedora package.
Latest log at http://fedorapeople.org/~pcpa/sagemath/test.log

$ grep "ValueError: Missing __array_interface__ shape" ~/.sage/tmp/test.log | wc -l
132
$ rpm -q numpy
numpy-1.7.0-0.3.b1.fc19.x86_64
@certik
Owner

Thanks @pcpa for reporting the log. It looks like what has to be done to fix this issue is described in the
comment by Travis at #291 (comment).

In the meantime, I've added it to the release TODO (#396).

@87 87 referenced this issue
Merged

Fix for issue #291 #444

@pcpa

Many thanks. I confirm it corrects the sagemath issues. Testing a sagemath 5.4.beta1 build. The only minor issue I see is the pattern:

File "/usr/share/sagemath/devel/sage/sage/rings/integer.pyx", line 4821:
sage: numpy.array(2**400).dtype
Expected:
dtype('object')
Got:
dtype('O')

That should be trivial to correct once sagemath updates to numpy 1.7.

@teoliphant teoliphant closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.