lib.maybe_convert_objects will fail on uint64 values that exceed int64 max #4471

Closed
wesm opened this Issue Aug 5, 2013 · 18 comments

Comments

Projects
None yet
6 participants
@wesm
Member

wesm commented Aug 5, 2013

xref #11440 for addtl tests

Observed in the wild. cc @blais

@ghost ghost assigned jtratner Sep 15, 2013

@jtratner

This comment has been minimized.

Show comment
Hide comment
@jtratner

jtratner Sep 15, 2013

Contributor

@jreback @wesm objects that would pass this wouldn't be compatible with anything else but integers that are all > 0 right? Float, float128, complex, and longdouble all lose precision.

Contributor

jtratner commented Sep 15, 2013

@jreback @wesm objects that would pass this wouldn't be compatible with anything else but integers that are all > 0 right? Float, float128, complex, and longdouble all lose precision.

@jtratner

This comment has been minimized.

Show comment
Hide comment
@jtratner

jtratner Sep 15, 2013

Contributor

I'm wrong, float128 (which I think is the same as longdouble) can work...

Contributor

jtratner commented Sep 15, 2013

I'm wrong, float128 (which I think is the same as longdouble) can work...

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Sep 15, 2013

Member

float128 doesn't have be long double... long double could be 64 bits... it's an implementation detail but it ends up being what you expect most of the time...

Member

cpcloud commented Sep 15, 2013

float128 doesn't have be long double... long double could be 64 bits... it's an implementation detail but it ends up being what you expect most of the time...

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Sep 15, 2013

Member

same applies to int64 etc ... e.g., long is 32 bits on 32 bit arch and 64 on 64-bit arch

Member

cpcloud commented Sep 15, 2013

same applies to int64 etc ... e.g., long is 32 bits on 32 bit arch and 64 on 64-bit arch

@jtratner

This comment has been minimized.

Show comment
Hide comment
@jtratner

jtratner Sep 15, 2013

Contributor

@cpcloud so what's the right dtype to contain something that's uint64 and greater than what int64 can handle? - this SO answer claims float128 is 'a mess'. http://stackoverflow.com/questions/9062562/what-is-the-internal-precision-of-numpy-float128

Contributor

jtratner commented Sep 15, 2013

@cpcloud so what's the right dtype to contain something that's uint64 and greater than what int64 can handle? - this SO answer claims float128 is 'a mess'. http://stackoverflow.com/questions/9062562/what-is-the-internal-precision-of-numpy-float128

@jtratner

This comment has been minimized.

Show comment
Hide comment
@jtratner

jtratner Sep 15, 2013

Contributor

i.e., just allow uint64_t size and go from there? and then disallow with anything that's not an actual integer > 0?

Contributor

jtratner commented Sep 15, 2013

i.e., just allow uint64_t size and go from there? and then disallow with anything that's not an actual integer > 0?

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Sep 15, 2013

Member

I'm not sure why this is happening ... uint64 should hold values up to 2 * INT_MAX... i think probably allowing uint64 is the way 2 go...not sure i follow the second question.

Member

cpcloud commented Sep 15, 2013

I'm not sure why this is happening ... uint64 should hold values up to 2 * INT_MAX... i think probably allowing uint64 is the way 2 go...not sure i follow the second question.

@jtratner

This comment has been minimized.

Show comment
Hide comment
@jtratner

jtratner Sep 15, 2013

Contributor

@cpcloud in convert_objects, if you can't fit everything into the same container, then it doesn't work. This is why uint64 doesn't work:

        elif util.is_integer_object(val):
            seen_int = 1
            floats[i] = <float64_t> val
            complexes[i] = <double complex> val
            if not seen_null:
                try:
                    ints[i] = val
                except OverflowError:
                    seen_object = 1
                    break
Contributor

jtratner commented Sep 15, 2013

@cpcloud in convert_objects, if you can't fit everything into the same container, then it doesn't work. This is why uint64 doesn't work:

        elif util.is_integer_object(val):
            seen_int = 1
            floats[i] = <float64_t> val
            complexes[i] = <double complex> val
            if not seen_null:
                try:
                    ints[i] = val
                except OverflowError:
                    seen_object = 1
                    break
@jtratner

This comment has been minimized.

Show comment
Hide comment
@jtratner

jtratner Sep 15, 2013

Contributor

it's not hard to set this up, I just wanted to clarify I had the right idea...going to fix it now.

Contributor

jtratner commented Sep 15, 2013

it's not hard to set this up, I just wanted to clarify I had the right idea...going to fix it now.

@jtratner

This comment has been minimized.

Show comment
Hide comment
@jtratner

jtratner Sep 15, 2013

Contributor

@cpcloud what I mean by the second question is what should be returned from this:

import sys
arr = np.array([-5, sys.maxint + 5, 3], dtype=object)
lib.maybe_convert_objects(arr)

It should be object right? Otherwise the -5 becomes gobbdledygook.

Contributor

jtratner commented Sep 15, 2013

@cpcloud what I mean by the second question is what should be returned from this:

import sys
arr = np.array([-5, sys.maxint + 5, 3], dtype=object)
lib.maybe_convert_objects(arr)

It should be object right? Otherwise the -5 becomes gobbdledygook.

@jtratner

This comment has been minimized.

Show comment
Hide comment
@jtratner

jtratner Sep 15, 2013

Contributor

Well, this is mostly useless anyways, because BlockManager converts uint64 to object internally in form_block:

        elif issubclass(v.dtype.type, np.integer):
            if v.dtype == np.uint64:
                # HACK #2355 definite overflow
                if (v > 2 ** 63 - 1).any():
                    object_items.append((i, k, v))
                    continue
            int_items.append((i, k, v))

So need a unsigned int type or something in block manager

Contributor

jtratner commented Sep 15, 2013

Well, this is mostly useless anyways, because BlockManager converts uint64 to object internally in form_block:

        elif issubclass(v.dtype.type, np.integer):
            if v.dtype == np.uint64:
                # HACK #2355 definite overflow
                if (v > 2 ** 63 - 1).any():
                    object_items.append((i, k, v))
                    continue
            int_items.append((i, k, v))

So need a unsigned int type or something in block manager

@jtratner

This comment has been minimized.

Show comment
Hide comment
@jtratner

jtratner Sep 15, 2013

Contributor

Anyways, working version of lib.maybe_convert_objects here: https://github.com/jtratner/pandas/tree/GH4471_fix_uint64_maybe_convert_objects

Contributor

jtratner commented Sep 15, 2013

Anyways, working version of lib.maybe_convert_objects here: https://github.com/jtratner/pandas/tree/GH4471_fix_uint64_maybe_convert_objects

@pwaller

This comment has been minimized.

Show comment
Hide comment
@pwaller

pwaller May 16, 2016

Contributor

I keep hitting this while importing a dataset which has uint64's in it. Is there anything I can do to help it along, given that someone already made a patch but it didn't get in?

Contributor

pwaller commented May 16, 2016

I keep hitting this while importing a dataset which has uint64's in it. Is there anything I can do to help it along, given that someone already made a patch but it didn't get in?

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback May 16, 2016

Contributor

where's the patch?

Contributor

jreback commented May 16, 2016

where's the patch?

@pwaller

This comment has been minimized.

Show comment
Hide comment
@pwaller

pwaller May 17, 2016

Contributor

@jreback see @jtratner's comment above. Is the patch unsuitable or is it just that it wasn't shepherded into master?

Contributor

pwaller commented May 17, 2016

@jreback see @jtratner's comment above. Is the patch unsuitable or is it just that it wasn't shepherded into master?

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback May 17, 2016

Contributor

that's 2 years old - if someone wants to cherry pick and present then can look

Contributor

jreback commented May 17, 2016

that's 2 years old - if someone wants to cherry pick and present then can look

@DrRibosome

This comment has been minimized.

Show comment
Hide comment
@DrRibosome

DrRibosome Nov 19, 2016

also hitting this bug - just wondering if the fix is in progress, or if interest is simply too low

also hitting this bug - just wondering if the fix is in progress, or if interest is simply too low

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Nov 19, 2016

Contributor

well need someone motivated to push a fix

Contributor

jreback commented Nov 19, 2016

well need someone motivated to push a fix

gfyoung added a commit to gfyoung/pandas that referenced this issue Dec 19, 2016

BUG: Convert uint64 in maybe_convert_objects
Adds handling for uint64 objects during conversion.
When negative numbers and uint64 are detected, we
then convert the result to object.

Picks up where gh-8485 left off. Closes gh-4471.

gfyoung added a commit to gfyoung/pandas that referenced this issue Dec 19, 2016

BUG: Convert uint64 in maybe_convert_objects
Adds handling for uint64 objects during conversion.
When negative numbers and uint64 are detected, we
then convert the result to object.

Picks up where gh-4845 left off. Closes gh-4471.

@jreback jreback modified the milestones: 0.20.0, Next Major Release Dec 19, 2016

gfyoung added a commit to gfyoung/pandas that referenced this issue Dec 19, 2016

BUG: Convert uint64 in maybe_convert_objects
Adds handling for uint64 objects during conversion.
When negative numbers and uint64 are detected, we
then convert the result to object.

Picks up where gh-4845 left off. Closes gh-4471.

gfyoung added a commit to gfyoung/pandas that referenced this issue Dec 19, 2016

BUG: Convert uint64 in maybe_convert_objects
Adds handling for uint64 objects during conversion.
When negative numbers and uint64 are detected, we
then convert the result to object.

Picks up where gh-4845 left off. Closes gh-4471.

gfyoung added a commit to gfyoung/pandas that referenced this issue Dec 19, 2016

BUG: Convert uint64 in maybe_convert_objects
Adds handling for uint64 objects during conversion.
When negative numbers and uint64 are detected, we
then convert the result to object.

Picks up where gh-4845 left off. Closes gh-4471.

gfyoung added a commit to gfyoung/pandas that referenced this issue Dec 20, 2016

BUG: Convert uint64 in maybe_convert_objects
Adds handling for uint64 objects during conversion.
When negative numbers and uint64 are detected, we
then convert the result to object.

Picks up where gh-4845 left off. Closes gh-4471.

@jreback jreback closed this in 0c52813 Dec 20, 2016

ShaharBental added a commit to ShaharBental/pandas that referenced this issue Dec 26, 2016

BUG: Convert uint64 in maybe_convert_objects
Adds handling for `uint64` objects during conversion.  When negative
numbers and `uint64` are detected, we then convert the result to
`object`.    Picks up where #4845 left off. Closes #4471.

Author: gfyoung <gfyoung17@gmail.com>

Closes #14916 from gfyoung/convert-objects-uint64 and squashes the following commits:

ed325cd [gfyoung] BUG: Convert uint64 in maybe_convert_objects
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment