New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fixes gh-12218, TypeError converting int to float inside stats.ks_2samp #12280
Conversation
A multiplication of large integers fed as input to np.sqrt exceeded its ability to convert to float, generating a TypeError inside stats.ks_2samp() for large samples. If both sample sizes were larger than about 2^21 (~cube root of 2^64, or about 2million), this condition would be triggered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank @pvanmulbregt, this looks good. I made one small suggestion inline.
scipy/stats/stats.py
Outdated
@@ -6731,13 +6731,16 @@ def ks_2samp(data1, data2, alternative='two-sided', mode='auto'): | |||
|
|||
if mode == 'asymp': | |||
# The product n1*n2 is large. Use Smirnov's asymptoptic formula. | |||
# Ensure float to avoid overflow in multiplication | |||
# sorted because the one-sided formula is not symmetric in n1, n2 | |||
m, n = sorted(np.array([n1, n2], float), reverse=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to create a numpy array here. This is quite a bit faster:
m, n = sorted(np.array([n1, n2], float), reverse=True) | |
m, n = sorted([float(n1), float(n2)], reverse=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Replace the use of np.array([]) with [] as the np array doesn't provide any additional value.
Thanks @pvanmulbregt, merged. |
…ats.ks_2samp (scipy#12280) A multiplication of large integers fed as input to np.sqrt exceeded its ability to convert to float, generating a TypeError inside stats.ks_2samp() for large samples. If both sample sizes were larger than about 2^21 (~cube root of 2^64, or about 2million), this condition would be triggered.
A multiplication of large integers fed as input to
np.sqrt
exceeded its abilityto convert to float, generating a TypeError inside
stats.ks_2samp(
) for largesamples. If both sample sizes were larger than about 2^21 (~cube root of 2^64,
or about 2million), this condition would be triggered.
Reference issue
Closes gh-12218
What does this implement/fix?
Converts some large integers to floats before multiplying them. Python could handle the large result as an int, but
np.sqrt()
couldn't.