New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting an error when I try to call an aggregate on a groupby object #409
Comments
Hi @eyadsibai, thanks for posting this issue! We have recently made some fixes to that code and have a pre-release out. Would you be able to tell me if |
Hi @devin-petersohn the issue remains. |
Looking through the stack, it is a problem with Does Would you be able to share the |
hi @devin-petersohn , the following reproduces the error. import pandas as pd
df = pd.DataFrame({'A': [2, 2, 2, 3, 4],
'B': [5, 6, 7, 8, 9],
'C': ['a', 'b', 'c', 'd', 'e']})
df = df.groupby('A').size()
print(df)
import modin.pandas as pd
df = pd.DataFrame({'A': [2, 2, 2, 3, 4],
'B': [5, 6, 7, 8, 9],
'C': ['a', 'b', 'c', 'd', 'e']})
df = df.groupby('A').size()
print(df) |
The aggregation is done correctly, but on all columns atomically in
we end up with 3 different column (in the example above) with the same aggregated values @devin-petersohn any suggestions for an elegant solution? |
I see what is happening now thanks @eavidan! The deeper problem with As for We correctly remove the result from the columns list, but the result still has the column internally. It looks like we need to drop the column before we do the operation. It should happen in |
* Resolves modin-project#409 * Removes a grouped column from the result to match pandas * Changes the way we compute `size` to match pandas * Adds consisntecy between the DataFrame being computed on and the result
…413) * Add a fix to Groupby for aggregations by a column from the DataFrame * Resolves #409 * Removes a grouped column from the result to match pandas * Changes the way we compute `size` to match pandas * Adds consisntecy between the DataFrame being computed on and the result * Additional updates for `groupby` + agg: * Computing columns more directly now. We reset the index or columns and use those indices to compute that actual index externally. This is more correct (and was actually being computed previously, but incorrectly). * Adding **kwargs to `modin.pandas.groupby.DataFrameGroupby.rank` * Adding tests for string + integer inter-operations * Cleaning up and making some code more consistent * Fix lint and version issue * Fix lint * Making sort correct for python2 * Fix lint * Python2 and Python3 compat fix
System information
Describe the problem
Getting an error when I try to call .size or other aggregates like .count on a groupby object
Source code / logs
The text was updated successfully, but these errors were encountered: