Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groupby().last() drops columns #1809

Closed
gtakacs opened this issue Aug 25, 2012 · 5 comments
Closed

groupby().last() drops columns #1809

gtakacs opened this issue Aug 25, 2012 · 5 comments
Labels
Milestone

Comments

@gtakacs
Copy link

gtakacs commented Aug 25, 2012

DataFrame.groupby().last() gives incorrect result in the following case (pandas version: 0.8.1):

df1 = DataFrame({"A": [1, 1], "B": [1, 1], "C": ["x", "y"]})
print df1.groupby("A").last()
#    B
# A   
#1  1

The integer column B is handled correctly, but the string column C is dropped.
The result is correct if column B is not present:

df2 = DataFrame({"A": [1, 1], "C": ["x", "y"]})
print df2.groupby("A").last()
#    C
# A   
#1  y
@manuteleco
Copy link

I've also come across the same issue today using "first()".

@changhiskhan
Copy link
Contributor

groupby currently excludes non-numerical/non-boolean columns from many of the "shortcut" methods.
You can do something like as a workaround for now:

In [32]: df1.groupby('A').agg(lambda x: x.irow(-1))
Out[32]:
B C
A
1 1 y

@wesm
Copy link
Member

wesm commented Sep 13, 2012

In principle this should work though since it's not exactly a numerical computation. I'll have to look and see how difficult it will be to fix

@ghost ghost assigned changhiskhan Sep 18, 2012
wesm added a commit that referenced this issue Sep 20, 2012
* chang/groupby-last:
  cython methods for group bins #1809
  BUG: allow non-numeric columns in groupby first/last #1809
@wesm
Copy link
Member

wesm commented Sep 20, 2012

Fixed after merging PR #1935

@wesm wesm closed this as completed Sep 20, 2012
yarikoptic added a commit to neurodebian/pandas that referenced this issue Sep 27, 2012
Version 0.9.0 Release Candidate 1

* tag 'v0.9.0rc1': (58 commits)
  RLS: Version 0.9.0 Release Candidate 1
  BLD: add lib depends pandas-dev#1945
  BUG: missing case for assigning DataFrame via ix
  BUG: python 3.1 timedelta compat issue
  BUG: python 3 tzoffset is not hashable
  TST: adds dateutil to travis-ci install commands
  BUG: let selecting multiple columns in DataFrame.__getitem__ work when there are duplicates. close pandas-dev#1943
  BUG: DatetimeConverter does not handle datetime64 arrays properly
  BUG: reindex with axis=1 when setting Series to scalar location, close pandas-dev#1942
  BUG: fix formatting of Timestamps in to_html/IPython notebook. refactor to_html code. close pandas-dev#1940
  ENH: allow single str input to na_values pandas-dev#1944
  TST: when xlrd is not installed skip tests needing it, close pandas-dev#1941
  BUG: DatetimeIndex localizes twice if input is localized DatetimeIndex pandas-dev#1838
  BUG: align input on setting via ix pandas-dev#1630
  cython methods for group bins pandas-dev#1809
  BUG: allow non-numeric columns in groupby first/last pandas-dev#1809
  TST: skip unicode filename test if system requires encoding to ascii
  BUG: more fixedoffset occurrences pandas-dev#1928
  BUG: no zone in tzinfo pandas-dev#1838
  BUG: handle lists too in DataFrame.xs when partially selecting data from DataFrame. close pandas-dev#1796
  ...
@zhaoxiaozhi-zz
Copy link

try df.groupby(['A','B'], as_index=False)['col1','col2'].first()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants