`groupby().first()` docs should explain distinction between nth and first #27578

kyleabeauchamp · 2019-07-25T00:18:21Z

Problem description

The existing doc for groupby().first() (https://pandas-docs.github.io/pandas-docs-travis/reference/api/pandas.core.groupby.GroupBy.first.html?highlight=first#pandas.core.groupby.GroupBy.first) does not describe the behavior with respect to missing data. In particular, it does not mention the fact that the behavior is broadcasting columnwise.

The docs read: "Compute first of group values...Computed first of values within each group." I think the correct description is "For each column, compute the first non-null entry, possibly aggregating values from across multiple rows." We might also want a simple example to explain the behavior.

Code Sample, a copy-pastable example if possible

import pandas as pd
x = pd.DataFrame(dict(A=[1, 1, 3], B=[None, 5, 6], C=[1, 2, 3]))
print(x.groupby("A", as_index=False).first())
print(x.groupby("A", as_index=False).nth(0))
print(x.groupby("A", as_index=False).head(1))
[...]
   A    B  C
0  1  5.0  1
1  3  6.0  3
   A    B  C
0  1  NaN  1
2  3  6.0  3
   A    B  C
0  1  NaN  1
2  3  6.0  3

The text was updated successfully, but these errors were encountered:

ghost · 2019-07-25T06:54:02Z

IIUC, you're pointing out that the docstring for first does not make it clear that the function ignores nan values? I think you're right.

Why don't you open a PR? little fixes like that are usually fairly painless to get in.

WillAyd · 2019-07-25T14:34:47Z

This is discussed in #8427 we may just want to align these

kyleabeauchamp · 2019-07-26T02:39:25Z

So I looked at adding a docstring but the docstrings are currently auto-templated from the function name and a pre-existing template...so I'm gunna say this is not amenable to a trivial doc-only fix. https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby/groupby.py#L1346

NumberPiOso · 2022-02-21T16:41:40Z

take

NumberPiOso · 2022-03-01T22:00:54Z

I was working on this issue, and I have a PR almost ready. However, I see in #8427 that computing the first non null entry is not the desired behaviour of this method.

The solution for #8427 would solve both problems changing the result of first.

However, I will still publish the PR, expecting the best decision to be taken here.

WillAyd added the Groupby label Jul 25, 2019

mroeschke added the Docs label Jul 10, 2021

github-actions bot assigned NumberPiOso Feb 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`groupby().first()` docs should explain distinction between nth and first #27578

`groupby().first()` docs should explain distinction between nth and first #27578

kyleabeauchamp commented Jul 25, 2019 •

edited

ghost commented Jul 25, 2019

WillAyd commented Jul 25, 2019

kyleabeauchamp commented Jul 26, 2019

NumberPiOso commented Feb 21, 2022

NumberPiOso commented Mar 1, 2022

groupby().first() docs should explain distinction between nth and first #27578

groupby().first() docs should explain distinction between nth and first #27578

Comments

kyleabeauchamp commented Jul 25, 2019 • edited

Problem description

Code Sample, a copy-pastable example if possible

ghost commented Jul 25, 2019

WillAyd commented Jul 25, 2019

kyleabeauchamp commented Jul 26, 2019

NumberPiOso commented Feb 21, 2022

NumberPiOso commented Mar 1, 2022

`groupby().first()` docs should explain distinction between nth and first #27578

`groupby().first()` docs should explain distinction between nth and first #27578

kyleabeauchamp commented Jul 25, 2019 •

edited