Skip to content

BUGI: groupby issue with user function with mutation, and duplicate indices #5758

Open
jreback opened this Issue Dec 20, 2013 · 0 comments

1 participant

@jreback
jreback commented Dec 20, 2013

Only happens when the user function mutates the input

Originally from: http://stackoverflow.com/questions/20691168/pandas-apply-to-data-frame-groupby/20705226#20705226

Could be auto-fixed or maybe just a better error report to the user (about the dups)

In [40]: df = DataFrame(dict(A = ['foo','foo','bar','bar'], B = np.random.randn(4)),index=[1,1,2,2])

In [41]: df
Out[41]: 
     A         B
1  foo  0.971425
1  foo -1.151693
2  bar  1.265031
2  bar -0.219011

[4 rows x 2 columns]

In [42]: def f(x):                                                
    x['std'] = x['B'].std()
    return x
   ....: 

Cannot straight perform the apply

In [44]: df.groupby('A').apply(f)
ValueError: cannot reindex from a duplicate axis

By using unique indices everything is ok (so straightforward to fix)

In [45]: df.reset_index().groupby('A').apply(f).set_index('index')
Out[45]: 
         A         B       std
index                         
1      foo  0.971425  1.501271
1      foo -1.151693  1.501271
2      bar  1.265031  1.049376
2      bar -0.219011  1.049376

[4 rows x 3 columns]
@jreback jreback modified the milestone: 0.15.0, 0.14.0 Apr 9, 2014
@jreback jreback modified the milestone: 0.16.0, Next Major Release Mar 3, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.