Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series no longer returns float64 #510

Closed
craustin opened this issue Dec 20, 2011 · 5 comments

Comments

@craustin
Copy link

commented Dec 20, 2011

Is this desired?

import numpy as np
from pandas import Series
s2 = Series({'A': np.float64(5.0), 'B': np.float64(0.0)})
print type(s2['A'])

(type 'float')

In 0.4.0, this returned numpy.float64. That seems more expected.

@wesm

This comment has been minimized.

Copy link
Member

commented Dec 20, 2011

Well, in theory it shouldn't matter. Under the hood you have a C double and the question is "which kind of box was it put in"? With some of the performance work recently I needed a fast generic __getitem__ for ndarrays, which can be found here:

https://github.com/wesm/pandas/blob/master/pandas/src/util.pxd#L12

cdef inline object get_value_at(ndarray arr, object loc):
    cdef:
        Py_ssize_t i, sz
        void* data_ptr
    if is_float_object(loc):
        casted = int(loc)
        if casted == loc:
            loc = casted
    i = <Py_ssize_t> loc
    sz = cnp.PyArray_SIZE(arr)

    if i < 0:
        i += sz
    elif i >= sz:
        raise IndexError('index out of bounds')
    data_ptr = cnp.PyArray_GETPTR1(arr, i)
    return cnp.PyArray_GETITEM(arr, data_ptr)

It turned out (and I noticed this when I was doing it) that PyArray_GETITEM wants to box float64 objects as Python floats. Since float64 inherits from float:


In [9]: np.float64.mro()                                                                  
Out[9]:                                                                                   
[numpy.float64,                                                                           
 numpy.floating,                                                                          
 numpy.inexact,                                                                           
 numpy.number,                                                                            
 numpy.generic,                                                                           
 float,                                                                                   
 object]

I decided I was willing to live with this for the speed gains from the above function. Was it a source of bugs? just curious

@craustin

This comment has been minimized.

Copy link
Author

commented Dec 20, 2011

Gotcha. It causes only a few issues for us because 1. / x == inf if x is a float64 and a ZeroDivisionError if x is a float. We can workaround in the instances where we expect this to happen.

@wesm

This comment has been minimized.

Copy link
Member

commented Dec 21, 2011

Craig, I went back and looked at this and I figured out the right way to use the NumPy C API. The current git master returns float64 as before and the performance is about the same, within say 50 nanoseconds, perfectly acceptable

@wesm wesm closed this Dec 21, 2011

@craustin

This comment has been minimized.

Copy link
Author

commented Dec 21, 2011

Great Wes, thanks.

@wesm

This comment has been minimized.

Copy link
Member

commented Dec 21, 2011

As an aside, another reason to use float64 is that it avoids entering Python's scalar value memory allocation nightmare (where internal "free lists" can end up consuming a lot of memory). This is particularly problematic when reading lots of stuff from the database (since the DB drivers convert first to Python float/int, which are then converted to NumPy arrays). Not sure how much you have looked into this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.