Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: groupby on empty dataframe returns pandas type #4941

Open
cristianmtr opened this issue Sep 8, 2022 · 1 comment
Open

BUG: groupby on empty dataframe returns pandas type #4941

cristianmtr opened this issue Sep 8, 2022 · 1 comment
Labels
bug 🦗 Something isn't working External Pull requests and issues from people who do not regularly contribute to modin P2 Minor bugs or low-priority feature requests

Comments

@cristianmtr
Copy link

cristianmtr commented Sep 8, 2022

System information

DISTRIB_ID=Pop
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Pop!_OS 22.04 LTS"
modin==0.15.2
Python 3.8.13
    import modin.pandas as md_pd
    import pandas

    df_modin = md_pd.DataFrame(
        [], columns=["smth", "answer_uuid", "item_id", "item_level"]
    )
    pandas_df = df_modin.groupby("answer_uuid")
    assert type(pandas_df) == pandas.core.groupby.generic.DataFrameGroupBy

Describe the problem

Groupby returns of pandas type. Shouldn't it return of modin type?

@cristianmtr cristianmtr added bug 🦗 Something isn't working Triage 🩹 Issues that need triage labels Sep 8, 2022
@mvashishtha
Copy link
Collaborator

@cristianmtr thank you for reporting this bug. I can reproduce it at version c9fc326. The bug only applies to empty dataframes. For empty dataframes we default to pandas and end up using the pandas groupby result.

You can work around this bug by

  1. doing your groupby on the pandas dataframe
  2. doing a groupby operation like sum
  3. converting the result to a modin series or dataframe

e.g for your snippet above you can do md_pd.DataFrame(pandas_df.sum())

I don't see an easy fix for this bug. Probably the right thing is to not default to pandas on the groupby call, but instead default to pandas for the operations on the groupby object, e.g. groupby.sum. The default to pandas would have to convert the original dataframe to pandas, then do the operation on that pandas object.

@mvashishtha mvashishtha added P2 Minor bugs or low-priority feature requests and removed Triage 🩹 Issues that need triage labels Sep 8, 2022
@mvashishtha mvashishtha changed the title groupby returns pandas type BUG: groupby on empty dataframe returns pandas type Sep 8, 2022
@anmyachev anmyachev added the External Pull requests and issues from people who do not regularly contribute to modin label Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working External Pull requests and issues from people who do not regularly contribute to modin P2 Minor bugs or low-priority feature requests
Projects
None yet
Development

No branches or pull requests

3 participants