BUG: Fixes gh-12218, TypeError converting int to float inside stats.ks_2samp #12280

pvanmulbregt · 2020-05-30T20:44:08Z

A multiplication of large integers fed as input to np.sqrt exceeded its ability
to convert to float, generating a TypeError inside stats.ks_2samp() for large
samples. If both sample sizes were larger than about 2^21 (~cube root of 2^64,
or about 2million), this condition would be triggered.

Reference issue

Closes gh-12218

What does this implement/fix?

Converts some large integers to floats before multiplying them. Python could handle the large result as an int, but np.sqrt() couldn't.

A multiplication of large integers fed as input to np.sqrt exceeded its ability to convert to float, generating a TypeError inside stats.ks_2samp() for large samples. If both sample sizes were larger than about 2^21 (~cube root of 2^64, or about 2million), this condition would be triggered.

WarrenWeckesser

Thank @pvanmulbregt, this looks good. I made one small suggestion inline.

WarrenWeckesser · 2020-06-27T12:34:05Z

scipy/stats/stats.py

@@ -6731,13 +6731,16 @@ def ks_2samp(data1, data2, alternative='two-sided', mode='auto'):

    if mode == 'asymp':
        # The product n1*n2 is large.  Use Smirnov's asymptoptic formula.
+        # Ensure float to avoid overflow in multiplication
+        # sorted because the one-sided formula is not symmetric in n1, n2
+        m, n = sorted(np.array([n1, n2], float), reverse=True)


No need to create a numpy array here. This is quite a bit faster:

Suggested change

m, n = sorted(np.array([n1, n2], float), reverse=True)

m, n = sorted([float(n1), float(n2)], reverse=True)

Replace the use of np.array([]) with [] as the np array doesn't provide any additional value.

WarrenWeckesser · 2020-06-28T19:07:02Z

Thanks @pvanmulbregt, merged.

…ats.ks_2samp (scipy#12280) A multiplication of large integers fed as input to np.sqrt exceeded its ability to convert to float, generating a TypeError inside stats.ks_2samp() for large samples. If both sample sizes were larger than about 2^21 (~cube root of 2^64, or about 2million), this condition would be triggered.

pvanmulbregt added the scipy.stats label May 30, 2020

WarrenWeckesser closed this Jun 27, 2020

WarrenWeckesser reopened this Jun 27, 2020

WarrenWeckesser requested changes Jun 27, 2020

View reviewed changes

Use a Python list to hold the two sizes to be sorted inside ks_2samp.

2a15c0d

Replace the use of np.array([]) with [] as the np array doesn't provide any additional value.

WarrenWeckesser merged commit f974e4b into scipy:master Jun 28, 2020

tylerjereddy added this to the 1.6.0 milestone Jun 28, 2020

tylerjereddy added defect A clear bug or issue that prevents SciPy from being installed or used as expected backport-candidate This fix should be ported by a maintainer to previous SciPy versions. labels Jun 28, 2020

tylerjereddy modified the milestones: 1.6.0, 1.5.1 Jul 4, 2020

tylerjereddy removed the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label Jul 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fixes gh-12218, TypeError converting int to float inside stats.ks_2samp #12280

BUG: Fixes gh-12218, TypeError converting int to float inside stats.ks_2samp #12280

pvanmulbregt commented May 30, 2020

WarrenWeckesser left a comment •

edited

WarrenWeckesser Jun 27, 2020

pvanmulbregt Jun 27, 2020

WarrenWeckesser commented Jun 28, 2020

	m, n = sorted(np.array([n1, n2], float), reverse=True)
	m, n = sorted([float(n1), float(n2)], reverse=True)

BUG: Fixes gh-12218, TypeError converting int to float inside stats.ks_2samp #12280

BUG: Fixes gh-12218, TypeError converting int to float inside stats.ks_2samp #12280

Conversation

pvanmulbregt commented May 30, 2020

Reference issue

What does this implement/fix?

WarrenWeckesser left a comment • edited

Choose a reason for hiding this comment

WarrenWeckesser Jun 27, 2020

Choose a reason for hiding this comment

pvanmulbregt Jun 27, 2020

Choose a reason for hiding this comment

WarrenWeckesser commented Jun 28, 2020

WarrenWeckesser left a comment •

edited