Fix overflow error in cartesian_product #15265

Closed
wants to merge 5 commits into
from

Conversation

Projects
None yet
3 participants
Contributor

david-hoffman commented Jan 30, 2017

When the numbers in X are large it can cause an overflow error on windows machine where the native int is 32 bit. Switching to np.intp alleviates this problem.

Other fixes would include switching to np.uint32 or np.uint64.

#15234

@david-hoffman david-hoffman Fix overflow error in cartesian_product
When the numbers in `X` are large it can cause an overflow error on windows machine where the native `int` is 32 bit. Switching to np.intp alleviates this problem.

Other fixes would include switching to np.uint32 or np.uint64.
b196878

codecov-io commented Jan 30, 2017 edited

Codecov Report

Merging #15265 into 0.19.x will not impact coverage.

@@           Coverage Diff           @@
##           0.19.x   #15265   +/-   ##
=======================================
  Coverage   85.27%   85.27%           
=======================================
  Files         144      144           
  Lines       50946    50946           
=======================================
  Hits        43444    43444           
  Misses       7502     7502
Impacted Files Coverage Δ
pandas/tools/util.py 96.77% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ead784f...c9c8d5e. Read the comment docs.

Contributor

jreback commented Jan 30, 2017

need a test for this (IOW something simple that fails w/o the fix and passes with it).

best way is to step thru your code until you hit this and see what the inputs are (to cartesian product) and can use that as a test.

need a whats new (bug fix) as well.

@david-hoffman david-hoffman Added tests for large numbers
7aeee85
@david-hoffman david-hoffman Update test so that it will actually run on "normal" machine
47a6c6c
pandas/tools/tests/test_util.py
+ X = np.arange(65536)
+ Y = np.arange(65535)
+ result1, result2 = cartesian_product([X, Y])
+ expected_size = X.size * Y.size
@jreback

jreback Jan 30, 2017

Contributor

hmm, this is still quite large.

@david-hoffman

david-hoffman Jan 30, 2017

Contributor

I couldn't think of another way to cause the error without adding checking code to the cartesian_product function itself, e.g.

if np.any(cumprodX < 0):
    raise RuntimeError(...)
@jreback

jreback Jan 30, 2017

Contributor

so, let's take the test out (just verify what you were doing works on windows).

pls add a whatsnew note and can merge

note that this is still buggy, because we do the cartesian product in the first place (this is #14942), so I think this will still break for you (just in a different place now).

@david-hoffman

david-hoffman Jan 30, 2017

Contributor

Yeah, I switched to an altered version of numpy's histogramdd for this problem.

jreback added this to the 0.20.0 milestone Jan 30, 2017

Contributor

jreback commented Jan 30, 2017

ping on green.

david-hoffman added some commits Jan 30, 2017

@david-hoffman david-hoffman Remove `test_large_input` because it's too big
d54583e
@david-hoffman david-hoffman Update v0.19.2.txt
c9c8d5e

jreback closed this in 48fc9d6 Feb 1, 2017

Contributor

jreback commented Feb 1, 2017

thanks @david-hoffman

FYI, you were actually branched off of the 0.19.x branch, rather than master, but I picked the commit.

@AnkurDedania AnkurDedania added a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017

@david-hoffman @AnkurDedania david-hoffman + AnkurDedania BUG: Fix overflow error in cartesian_product
When the numbers in `X` are large it can cause an overflow error on
windows machine where the native `int` is 32 bit. Switching to np.intp
alleviates this problem.    Other fixes would include switching to
np.uint32 or np.uint64.

closes #15234

Author: David Hoffman <dave.p.hoffman@gmail.com>

Closes #15265 from david-hoffman/patch-1 and squashes the following commits:

c9c8d5e [David Hoffman] Update v0.19.2.txt
d54583e [David Hoffman] Remove `test_large_input` because it's too big
47a6c6c [David Hoffman] Update test so that it will actually run on "normal" machine
7aeee85 [David Hoffman] Added tests for large numbers
b196878 [David Hoffman] Fix overflow error in cartesian_product
cfb916e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment