Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ER/DOC: Sorting in multi-index columns: misleading error message, unclear docs #4370

Closed
jgehrcke opened this issue Jul 26, 2013 · 4 comments · Fixed by #6639
Closed

ER/DOC: Sorting in multi-index columns: misleading error message, unclear docs #4370

jgehrcke opened this issue Jul 26, 2013 · 4 comments · Fixed by #6639
Labels
Docs Error Reporting Incorrect or improved errors from pandas
Milestone

Comments

@jgehrcke
Copy link
Contributor

related #739

Have a look at this example:

import pandas as pd
import numpy as np
from StringIO import StringIO
print "Pandas version %s\n\n" % pd.__version__

data1 = """idx,metric
0,2.1
1,2.5
2,3"""

data2 = """idx,metric
0,2.7
1,2.2
2,2.8"""

df1 = pd.read_csv(StringIO(data1))
df2 = pd.read_csv(StringIO(data2))
concatenated = pd.concat([df1, df2], ignore_index=True)
merged = concatenated.groupby("idx").agg([np.mean, np.std])

print merged
print merged.sort('metric')

and its output:

$ python test.py 
Pandas version 0.11.0


     metric          
       mean       std
idx                  
0      2.40  0.424264
1      2.35  0.212132
2      2.90  0.141421
Traceback (most recent call last):
  File "test.py", line 22, in <module>
    print merged.sort('metric')
  File "/***/Python-2.7.3/lib/python2.7/site-packages/pandas/core/frame.py", line 3098, in sort
    inplace=inplace)
  File "/***/Python-2.7.3/lib/python2.7/site-packages/pandas/core/frame.py", line 3153, in sort_index
    % str(by))
ValueError: Cannot sort by duplicate column metric

The problem here is not that there is a duplicate column metric as stated by the error message. The problem is that there are still two sub-levels. The solution in this case is to use

merged.sort([('metric', 'mean')])

for sorting by the mean of the metric. It took myself quite a while to figure this out. First of all, the error message should be more clear in this case. Then, maybe I was too stupid, but I could not find the solution in the docs, but within a thread on StackOverflow. Looks like the error message above is the result of an over-generalized condition around https://github.com/pydata/pandas/blob/v0.12.0rc1/pandas/core/frame.py#L3269

@jreback
Copy link
Contributor

jreback commented Jul 26, 2013

yep....docs/error msg are unclear

@adambernier
Copy link

Thanks for raising this issue. Saved me today.

@jreback
Copy link
Contributor

jreback commented Feb 8, 2016

u realize this is several years old and is closed

@adambernier
Copy link

Yes, sorry to bug. Just wanted to say "thank you".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Error Reporting Incorrect or improved errors from pandas
Projects
None yet
3 participants