arr.dtype.type has different hashes #4779

Open
jreback opened this Issue Jun 4, 2014 · 2 comments

3 participants

@jreback

Not sure if this is supposed to be like this.

original issue here: pydata/pandas#7332
cross post to numexpr here: https://code.google.com/p/numexpr/issues/detail?id=126

essentially in pandas were looking up a dtype,type in a cython dictionary

turns out that for int64 (and int32 but NOT int64 on 32-bit platforms), the hashes are DIFFERENT, but same for other dtypes (including float64).

Is this maybe an implementation detail on numexpr and/or incorrect usage of dtype.type and/or invalid guarantees on this object?

FYI, we switched to using dtype.name for the lookup and no issues.

In [20]: import numexpr as ne
In [21]: import numpy as np
In [22]: ne.__version__
Out[22]: '2.4'
In [23]: np.__version__
Out[23]: '1.8.1'

In [24]: a = np.arange(10,dtype='int64')
In [25]: b = np.arange(10,dtype='int64')

In [26]: result_ne = ne.evaluate('a+b')
In [27]: result_numpy = a+b

In [28]: (result_ne == result_numpy).all()
Out[28]: True

In [29]: result_ne.dtype.type
Out[29]: numpy.int64

In [30]: result_numpy.dtype.type
Out[30]: numpy.int64

In [31]: hash(result_ne.dtype.type)
Out[31]: 8768103730016

In [32]: hash(result_numpy.dtype.type)
Out[32]: 8768103729990

For the floats the same though

In [1]: a = np.arange(10.)

In [2]: b = np.arange(10.)

n [4]: hash(ne.evaluate('a+b').dtype.type)
Out[4]: 8751212391216

In [5]: hash((a+b).dtype.type)
Out[5]: 8751212391216
@juliantaylor

this is an issue caused by the ambiguity of C integers
np.int64 on a 64 bit posix machine is of long type, this is also the numpy return value while numexpr returns np.longlong
while they are aliases for the same type they are not the same type object.

hash(np.intp) == hash(result_numpy.dtype.type)
True
hash(np.longlong) == hash(result_ne.dtype.type)
True

on 32 bit you should have the same issues with np.intp and np.intc
for added fun on 64 bit windows long is int32.

I guess a solution could be to make the longlong and long type the same object if they are the same native type.

@charris
NumPy member

I guess a solution could be to make the longlong and long type the same object if they are the same native type.

If the Python C-API didn't depend so much on longs, I'd just remove the long type. It doesn't seem to offer anything new, as it is either 32 bits (int) or 64 bits (long long).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment