-
-
Notifications
You must be signed in to change notification settings - Fork 19k
TST: groupby.sum with large integers #62372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
TST: groupby.sum with large integers #62372
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these tests actually verify the error, mainly because everything is below 64 bits.
You should have a test using object
dtype using large integers >64 bits and >128 bits.
Oh!, I miss understod the task sorry. I will change this with a new aproach |
Sorry, I installed pandas==1.0.4 on python 3.8.20 and ran the reproduction. The errors occur on 54-63 bits. The datatypes are int64 and uint64. import pandas as pd
for i in range(129):
n = 2 ** i
df = pd.DataFrame([['A', 14], ['A', n]], columns=['gb', 'val'])
gb_sum = df.groupby('gb').sum().values[0][0]
df_sum = df.sum().values[1]
if gb_sum != df_sum:
print(df["val"])
print(f"Trying n = 2 ** {i} '{n}'...")
print(f"df.sum().values[1] '{df_sum}' != df.groupby('gb').sum().values[0][0] '{gb_sum}") Output:
|
Then the test are correct, but the bug i thought was fixed still exsist right? |
It was fixed. I can't reproduce it on main. |
This PR adds regression tests for
groupby.sum
with large integers (int64
,uint64
, and nullable dtypes).These tests would have failed before the bug was fixed, and they now pass, ensuring no regressions in the future.