Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

numpy.int32 != numpy.int32 #3000

Closed
ubershmekel opened this Issue Feb 18, 2013 · 6 comments

Comments

Projects
None yet
4 participants

I have a function that converts all numpy variables to python ones before insertion into a mongo database.

def jsonize(obj):
    if isinstance(obj, dict):
        for k, v in obj.items():
            obj[k] = jsonize(v)
        return obj
    elif isinstance(obj, (np.ndarray, list, tuple)):
        return [jsonize(i) for i in obj]
    elif isinstance(obj, (np.float32, np.float64)):
        return float(obj)
    elif isinstance(obj, np.int32):
        return long(obj)
    elif isinstance(obj, np.uint8):
        return int(obj)
    elif isinstance(obj, (str, unicode, int, float, long, bool, types.NoneType)):
        return obj
    else:
        print "--- UNKNOWN TYPE %s ---" % type(obj)
        import pdb;pdb.set_trace()
    return obj

But every once in a while a strange np.int32 slips through!

(Pdb) type(x)
<type 'numpy.int32'>
(Pdb) isinstance(x, np.int32)
False
(Pdb) !x.dtype.char
'i'
(Pdb) !np.int32(1).dtype.char
'l'

It took me forever to figure this out and I see another was bitten by this http://projects.scipy.org/numpy/ticket/1246

Maybe isinstance behavior should be fixed but also the repr should represent the dtype.char.

This threw me off as well. Why would np.int32 be an instance of int? What's the correct way to detect numpy ints?

isinstance(np.dtype('l').type(123), int)
True
isinstance(np.dtype('i').type(123), int)
True
isinstance(np.float32(1.2), float)
False
isinstance(np.dtype('i').type(123), np.dtype('l').type)
False
isinstance(np.dtype('l').type(123), np.dtype('i').type)
False
Member

seberg commented Feb 20, 2013

Because you have a 32bit system. You can use np.issubdtype(arr.dtype, np.integer) (or probably also isinstance there) for example. For your stuff up there I would maybe suggest you rather check on np.generic and use generic.item() and maybe array.tolist() (you need to check the item after that of course, but it should be a python type). All these comparisons seem right if you look at hardware specifics, so I don't know if they should be changed or not.

I ended up ditching the unpredictable isinstance in favor of type. The solution isn't as flexible, but it works for my use case.

def jsonize(obj):
    if isinstance(obj, dict):
        for k, v in obj.items():
            obj[jsonize(k)] = jsonize(v)
        return obj
    elif isinstance(obj, (np.ndarray, list, tuple)):
        return [jsonize(i) for i in obj]
    elif isinstance(obj, (np.float32, np.float64)):
        return float(obj)
    elif type(obj) in (str, unicode, int, float, long, bool, types.NoneType):
        return obj
    elif hasattr(obj, '__int__'):
        # np.int32, np.uint8, ...
        return long(obj)
    else:
        print "--- UNKNOWN TYPE %s ---" % type(obj)
        raise Exception('unserializable to mongo: %s' % type(obj))
        #import pdb;pdb.set_trace()
    return obj

Perhaps it's ok if you keep some of these behaviors. But repr should represent the correct underlying class.

Member

seberg commented Feb 20, 2013

There are a lot of ways for numpy (there probably should be one clearer way). You also have dtype.kind, or even (probably forgetting something) np.typecodes using dtype.char. However how you are doing it, you should use __index__ and not __int__. Because that code will convert float128, float16s, (Edit: ok not complelx) silently to ints. As I said .item()/tolist() will always convert to a native python type for a numpy though, so since that is what you want to do, it sounds like by far the most general way.

That is call operator.index or __index__() by hand, since all arrays have the attribute, it just fails.

Thank you for the good advice. I modified the converter to use .item() when it's available.

def jsonize(obj):
    if isinstance(obj, dict):
        for k, v in obj.items():
            obj[jsonize(k)] = jsonize(v)
        return obj
    elif isinstance(obj, (np.ndarray, list, tuple)):
        return [jsonize(i) for i in obj]
    elif type(obj) in (str, unicode, int, float, long, bool, types.NoneType):
        return obj
    elif hasattr(obj, 'item'):
        # np.int32, np.uint8, np.float32, np.float64, ...
        return obj.item()
    else:
        print "--- UNKNOWN TYPE %s ---" % type(obj)
        raise Exception('unserializable to mongo: %s' % type(obj))
        #import pdb;pdb.set_trace()
    return obj

ucyo commented Nov 19, 2013

Hi. Your little tool here helped me a lot thanks. I added datetime.datetime in the elif type(obj) block for Date creation in mongodb. Maybe you wanna update this.

@charris charris closed this Jan 6, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment