Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groupby().first() docs should explain distinction between nth and first #27578

Open
kyleabeauchamp opened this issue Jul 25, 2019 · 5 comments
Open
Assignees

Comments

@kyleabeauchamp
Copy link

kyleabeauchamp commented Jul 25, 2019

Problem description

The existing doc for groupby().first() (https://pandas-docs.github.io/pandas-docs-travis/reference/api/pandas.core.groupby.GroupBy.first.html?highlight=first#pandas.core.groupby.GroupBy.first) does not describe the behavior with respect to missing data. In particular, it does not mention the fact that the behavior is broadcasting columnwise.

The docs read: "Compute first of group values...Computed first of values within each group." I think the correct description is "For each column, compute the first non-null entry, possibly aggregating values from across multiple rows." We might also want a simple example to explain the behavior.

Code Sample, a copy-pastable example if possible

import pandas as pd
x = pd.DataFrame(dict(A=[1, 1, 3], B=[None, 5, 6], C=[1, 2, 3]))
print(x.groupby("A", as_index=False).first())
print(x.groupby("A", as_index=False).nth(0))
print(x.groupby("A", as_index=False).head(1))
[...]
   A    B  C
0  1  5.0  1
1  3  6.0  3
   A    B  C
0  1  NaN  1
2  3  6.0  3
   A    B  C
0  1  NaN  1
2  3  6.0  3
@ghost
Copy link

ghost commented Jul 25, 2019

IIUC, you're pointing out that the docstring for first does not make it clear that the function ignores nan values? I think you're right.

Why don't you open a PR? little fixes like that are usually fairly painless to get in.

@WillAyd
Copy link
Member

WillAyd commented Jul 25, 2019

This is discussed in #8427 we may just want to align these

@kyleabeauchamp
Copy link
Author

So I looked at adding a docstring but the docstrings are currently auto-templated from the function name and a pre-existing template...so I'm gunna say this is not amenable to a trivial doc-only fix. https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby/groupby.py#L1346

@mroeschke mroeschke added the Docs label Jul 10, 2021
@NumberPiOso
Copy link
Contributor

take

@NumberPiOso
Copy link
Contributor

I was working on this issue, and I have a PR almost ready. However, I see in #8427 that computing the first non null entry is not the desired behaviour of this method.

The solution for #8427 would solve both problems changing the result of first.

However, I will still publish the PR, expecting the best decision to be taken here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants