Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib.maybe_convert_objects will fail on uint64 values that exceed int64 max #4471

Closed
wesm opened this issue Aug 5, 2013 · 18 comments
Closed
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@wesm
Copy link
Member

wesm commented Aug 5, 2013

xref #11440 for addtl tests

Observed in the wild. cc @blais

@ghost ghost assigned jtratner Sep 15, 2013
@jtratner
Copy link
Contributor

@jreback @wesm objects that would pass this wouldn't be compatible with anything else but integers that are all > 0 right? Float, float128, complex, and longdouble all lose precision.

@jtratner
Copy link
Contributor

I'm wrong, float128 (which I think is the same as longdouble) can work...

@cpcloud
Copy link
Member

cpcloud commented Sep 15, 2013

float128 doesn't have be long double... long double could be 64 bits... it's an implementation detail but it ends up being what you expect most of the time...

@cpcloud
Copy link
Member

cpcloud commented Sep 15, 2013

same applies to int64 etc ... e.g., long is 32 bits on 32 bit arch and 64 on 64-bit arch

@jtratner
Copy link
Contributor

@cpcloud so what's the right dtype to contain something that's uint64 and greater than what int64 can handle? - this SO answer claims float128 is 'a mess'. http://stackoverflow.com/questions/9062562/what-is-the-internal-precision-of-numpy-float128

@jtratner
Copy link
Contributor

i.e., just allow uint64_t size and go from there? and then disallow with anything that's not an actual integer > 0?

@cpcloud
Copy link
Member

cpcloud commented Sep 15, 2013

I'm not sure why this is happening ... uint64 should hold values up to 2 * INT_MAX... i think probably allowing uint64 is the way 2 go...not sure i follow the second question.

@jtratner
Copy link
Contributor

@cpcloud in convert_objects, if you can't fit everything into the same container, then it doesn't work. This is why uint64 doesn't work:

        elif util.is_integer_object(val):
            seen_int = 1
            floats[i] = <float64_t> val
            complexes[i] = <double complex> val
            if not seen_null:
                try:
                    ints[i] = val
                except OverflowError:
                    seen_object = 1
                    break

@jtratner
Copy link
Contributor

it's not hard to set this up, I just wanted to clarify I had the right idea...going to fix it now.

@jtratner
Copy link
Contributor

@cpcloud what I mean by the second question is what should be returned from this:

import sys
arr = np.array([-5, sys.maxint + 5, 3], dtype=object)
lib.maybe_convert_objects(arr)

It should be object right? Otherwise the -5 becomes gobbdledygook.

@jtratner
Copy link
Contributor

Well, this is mostly useless anyways, because BlockManager converts uint64 to object internally in form_block:

        elif issubclass(v.dtype.type, np.integer):
            if v.dtype == np.uint64:
                # HACK #2355 definite overflow
                if (v > 2 ** 63 - 1).any():
                    object_items.append((i, k, v))
                    continue
            int_items.append((i, k, v))

So need a unsigned int type or something in block manager

@jtratner
Copy link
Contributor

Anyways, working version of lib.maybe_convert_objects here: https://github.com/jtratner/pandas/tree/GH4471_fix_uint64_maybe_convert_objects

@pwaller
Copy link
Contributor

pwaller commented May 16, 2016

I keep hitting this while importing a dataset which has uint64's in it. Is there anything I can do to help it along, given that someone already made a patch but it didn't get in?

@jreback
Copy link
Contributor

jreback commented May 16, 2016

where's the patch?

@pwaller
Copy link
Contributor

pwaller commented May 17, 2016

@jreback see @jtratner's comment above. Is the patch unsuitable or is it just that it wasn't shepherded into master?

@jreback
Copy link
Contributor

jreback commented May 17, 2016

that's 2 years old - if someone wants to cherry pick and present then can look

@DrRibosome
Copy link

also hitting this bug - just wondering if the fix is in progress, or if interest is simply too low

@jreback
Copy link
Contributor

jreback commented Nov 19, 2016

well need someone motivated to push a fix

gfyoung added a commit to forking-repos/pandas that referenced this issue Dec 19, 2016
Adds handling for uint64 objects during conversion.
When negative numbers and uint64 are detected, we
then convert the result to object.

Picks up where pandas-devgh-8485 left off. Closes pandas-devgh-4471.
gfyoung added a commit to forking-repos/pandas that referenced this issue Dec 19, 2016
Adds handling for uint64 objects during conversion.
When negative numbers and uint64 are detected, we
then convert the result to object.

Picks up where pandas-devgh-4845 left off. Closes pandas-devgh-4471.
@jreback jreback modified the milestones: 0.20.0, Next Major Release Dec 19, 2016
gfyoung added a commit to forking-repos/pandas that referenced this issue Dec 19, 2016
Adds handling for uint64 objects during conversion.
When negative numbers and uint64 are detected, we
then convert the result to object.

Picks up where pandas-devgh-4845 left off. Closes pandas-devgh-4471.
gfyoung added a commit to forking-repos/pandas that referenced this issue Dec 19, 2016
Adds handling for uint64 objects during conversion.
When negative numbers and uint64 are detected, we
then convert the result to object.

Picks up where pandas-devgh-4845 left off. Closes pandas-devgh-4471.
gfyoung added a commit to forking-repos/pandas that referenced this issue Dec 19, 2016
Adds handling for uint64 objects during conversion.
When negative numbers and uint64 are detected, we
then convert the result to object.

Picks up where pandas-devgh-4845 left off. Closes pandas-devgh-4471.
gfyoung added a commit to forking-repos/pandas that referenced this issue Dec 20, 2016
Adds handling for uint64 objects during conversion.
When negative numbers and uint64 are detected, we
then convert the result to object.

Picks up where pandas-devgh-4845 left off. Closes pandas-devgh-4471.
ShaharBental pushed a commit to ShaharBental/pandas that referenced this issue Dec 26, 2016
Adds handling for `uint64` objects during conversion.  When negative
numbers and `uint64` are detected, we then convert the result to
`object`.    Picks up where pandas-dev#4845 left off. Closes pandas-dev#4471.

Author: gfyoung <gfyoung17@gmail.com>

Closes pandas-dev#14916 from gfyoung/convert-objects-uint64 and squashes the following commits:

ed325cd [gfyoung] BUG: Convert uint64 in maybe_convert_objects
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants