Skip to content

Conversation

@Dr-Irv
Copy link
Contributor

@Dr-Irv Dr-Irv commented Feb 9, 2022

Here's a test using mypy that reveals the issue.

import pandas as pd

ind = pd.MultiIndex.from_product([["a", "b"], ["x", "y"]], names=["ab", "xy"])
df = pd.DataFrame({"x": [1, 2, 3, 4], "y": [4, 5, 6, 7]})
s = pd.Series([1, 2, 3, 4], index=ind)

cnt_df = df.groupby("x").count()

siz_df = df.groupby("x").size()

app_df = df.groupby("x").apply(sum)

cnt_s = s.groupby("ab").count()

siz_s = s.groupby("xy").size()

app_s = s.groupby("xy").apply(sum)

reveal_type(cnt_df)
reveal_type(siz_df)
reveal_type(app_df)
reveal_type(cnt_s)
reveal_type(siz_s)
reveal_type(app_s)

Before this PR, the result of mypy on the reveal_type lines is:

issue45875.py:17: note: Revealed type is "Union[pandas.core.series.Series, pandas.core.frame.DataFrame]"
issue45875.py:18: note: Revealed type is "Union[pandas.core.frame.DataFrame, pandas.core.series.Series]"
issue45875.py:19: note: Revealed type is "Any"
issue45875.py:20: note: Revealed type is "Union[pandas.core.series.Series, pandas.core.frame.DataFrame]"
issue45875.py:21: note: Revealed type is "Union[pandas.core.frame.DataFrame, pandas.core.series.Series]"
issue45875.py:22: note: Revealed type is "Any"

After this PR, the result is:

issue45875.py:19: note: Revealed type is "pandas.core.frame.DataFrame"
issue45875.py:20: note: Revealed type is "pandas.core.frame.DataFrame"
issue45875.py:21: note: Revealed type is "pandas.core.frame.DataFrame"
issue45875.py:22: note: Revealed type is "pandas.core.series.Series"
issue45875.py:23: note: Revealed type is "pandas.core.series.Series"
issue45875.py:24: note: Revealed type is "pandas.core.series.Series"

This will be needed as we work on the general typing issues.

@Dr-Irv Dr-Irv requested a review from jbrockmendel February 9, 2022 18:08
@Dr-Irv Dr-Irv added the Typing type annotations, mypy/pyright type checking label Feb 9, 2022
Copy link
Member

@twoertwein twoertwein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me from a typing perspective.

@simonjayhawkins
Copy link
Member

Here's a test using mypy that reveals the issue.

This code sample raises ValueError: No axis named 1 for object type Series in the python interpreter

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Feb 10, 2022

Here's a test using mypy that reveals the issue.

This code sample raises ValueError: No axis named 1 for object type Series in the python interpreter

I fixed it. Edited above.

@simonjayhawkins @jbrockmendel just did a newer commit that takes advantage of the generic aspect of the GroupBy class. Requires a couple of casts, but I think this is cleaner.

@simonjayhawkins
Copy link
Member

After this PR, the result is:

issue45875.py:19: note: Revealed type is "pandas.core.frame.DataFrame"
issue45875.py:20: note: Revealed type is "pandas.core.frame.DataFrame"
issue45875.py:21: note: Revealed type is "pandas.core.frame.DataFrame"
issue45875.py:22: note: Revealed type is "pandas.core.series.Series"
issue45875.py:23: note: Revealed type is "pandas.core.series.Series"
issue45875.py:24: note: Revealed type is "pandas.core.series.Series"

running code sample through interpreter gives...

   y
x   
1  1
2  1
3  1
4  1
<class 'pandas.core.frame.DataFrame'>
x
1    1
2    1
3    1
4    1
dtype: int64
<class 'pandas.core.series.Series'>
   x  y
x      
1  1  4
2  2  5
3  3  6
4  4  7
<class 'pandas.core.frame.DataFrame'>
ab
a    2
b    2
dtype: int64
<class 'pandas.core.series.Series'>
xy
x    2
y    2
dtype: int64
<class 'pandas.core.series.Series'>
xy
x    4
y    6
dtype: int64
<class 'pandas.core.series.Series'>

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dr-Irv lgtm can you update the PR title to be more descriptive TYP: ...

@Dr-Irv Dr-Irv changed the title Issue45875 TYP: Update type results for Groupby.count() and Groupby.size() Feb 10, 2022
@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Feb 10, 2022

For the record, here is what I used for testing:

import pandas as pd
from typing import TYPE_CHECKING

if not TYPE_CHECKING:

    def reveal_type(*args, **kwargs):
        pass


ind = pd.MultiIndex.from_product([["a", "b"], ["x", "y"]], names=["ab", "xy"])
df = pd.DataFrame({"x": [1, 2, 3, 4], "y": [4, 5, 6, 7]})
s = pd.Series([1, 2, 3, 4], index=ind)

cnt_df = df.groupby("x").count()
print(type(cnt_df))

cnt_df_s = df.groupby("x").y.count()
print(type(cnt_df_s))

# siz_df = df.groupby("x").size()
# print(type(siz_df))

app_df = df.groupby("x").apply(sum)
print(type(app_df))

cnt_df_a = df.groupby("x", as_index=False).count()
print(type(cnt_df_a))

cnt_df_s_a = df.groupby("x", as_index=False).y.count()
print(type(cnt_df_s_a))

# siz_df_a = df.groupby("x", as_index=False).size()
# print(type(siz_df_a))

app_df_a = df.groupby("x", as_index=False).apply(sum)
print(type(app_df_a))

cnt_s = s.groupby("ab").count()
print(type(cnt_s))

# siz_s = s.groupby("xy").size()
# print(type(siz_s))

app_s = s.groupby("xy").apply(sum)
print(type(app_s))

reveal_type(cnt_df)
reveal_type(cnt_df_s)
# reveal_type(siz_df)
reveal_type(app_df)
reveal_type(cnt_df_a)
reveal_type(cnt_df_s_a)
# reveal_type(siz_df_a)
reveal_type(app_df_a)
reveal_type(cnt_s)
# reveal_type(siz_s)
reveal_type(app_s)

I took out the size() test because it is dynamically typed based on as_index. Running this program yields:

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>

mypy reports:

issue45875.py:47: note: Revealed type is "pandas.core.frame.DataFrame*"
issue45875.py:48: note: Revealed type is "Any"
issue45875.py:50: note: Revealed type is "pandas.core.frame.DataFrame*"
issue45875.py:51: note: Revealed type is "pandas.core.frame.DataFrame*"
issue45875.py:52: note: Revealed type is "Any"
issue45875.py:54: note: Revealed type is "pandas.core.frame.DataFrame*"
issue45875.py:55: note: Revealed type is "pandas.core.series.Series*"
issue45875.py:57: note: Revealed type is "pandas.core.series.Series"

The two Any values from mypy are because we can't type df.groupby("x").y or df.groupby("x", as_index=False).y

@jreback
Copy link
Contributor

jreback commented Mar 6, 2022

@Dr-Irv status here

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Mar 7, 2022

@jreback all green now - merged with latest main

@jreback jreback added this to the 1.5 milestone Mar 7, 2022
@jreback jreback merged commit 5ab67d7 into pandas-dev:main Mar 7, 2022
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
@Dr-Irv Dr-Irv deleted the issue45875 branch February 13, 2023 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Typing type annotations, mypy/pyright type checking

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: groupby().count() should be better typed

5 participants