Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
BUG: sort_index/sortlevel fails MultiIndex after columns are added. #8017
Comments
|
just make the example ONLY the muli_float and just run it on one. simplier/shorter is much better |
8one6
commented
Aug 13, 2014
|
Sure. I'll make the change above. I had thought it was important to show that it worked fine with a float index that was not a multiindex. But no problem. (Done.) |
|
@8one6 thanks, your title says it all though |
jreback
changed the title from
BUG: sort_index fails with numerical level on MultiIndex after columns are added. to BUG: sort_index fails with float index level on MultiIndex after columns are added.
Aug 13, 2014
8one6
commented
Aug 13, 2014
|
@jreback Just one note on your title update. The problem is still there if all of the |
jreback
changed the title from
BUG: sort_index fails with float index level on MultiIndex after columns are added. to BUG: sort_index/sortlevel fails MultiIndex after columns are added.
Aug 13, 2014
|
yeh I think has to do with the adding, will look |
jreback
added Bug Indexing MultiIndex
labels
Aug 13, 2014
jreback
added this to the
0.15.0
milestone
Aug 13, 2014
8one6
commented
Aug 22, 2014
|
I think the fix for this will be much deeper in the bowels of Pandas indices than I'm able to handle. Is there any other way I could help toward a patch for this bug? |
|
np. would appreciate a pull-request on any other issue. thanks! |
8one6
commented
Sep 15, 2014
|
In an attempt to get around this issue, I started sticking a character at the end of some of my column names (to turn them from numbers into strings). While the
which looks fine. and sorts fine:
But when I add a new column and try to sort, this still goes wrong:
(I.e. I think that after the |
jreback
referenced
this issue
Sep 16, 2014
Merged
BUG: make sure that the multi-index is lex-sorted before passing to _lexsort_indexer (GH8017) #8282
|
@8one6 pls take a look at #8282 this was actually a very strange bug. In essence, when you add the column it is inserted in the multi-index at the end. This makes the index no longer lexsorted itself (in fact goes from 2->1 for the lexsort_depth). And in fact, the only way to actually then lexsort it is to reconstrut it in its entirety. I believe this was designed this way to avoid having to do a complete refactorization anytime anything is inserted into a multi-index. Secondarily, their was a display bug when using FloatIndexes e.g. setup
master
this PR
|
8one6
commented
Sep 16, 2014
|
First off, thank you so much for your time on this. I'll try to give this a test ASAP. Do you think your PR also addresses the version of this issue that I highlighted in the post I put up yesterday (the one immediately before your last post)? |
|
@8one6 yes its the same issue (the printing issue is only with a Float64Index among the levels). |
|
@jreback Are you sure it is only with a FloatIndex? If I change it to integers in the example above, I have the exact same behaviour |
|
@jorisvandenbossche you are talking about the printing or sorting issue? |
|
@jreback both With int columns:
|
jreback
closed this
in #8282
Sep 17, 2014
|
@jorisvandenbossche this is fixed/tested with all dtypes |
8one6
commented
Oct 29, 2014
|
I think there is still some lingering issue here. I still need to get a MWE up and running, but in the mean time, here is a screenshot showing the issue. I would say that I.e. the thing to notice here is that the first columns in the first two display cells have Just to make sure I'm not going nuts here, can you guys confirm that this looks like a bug and that its worth the effort to put together a MWE to demonstrate from scratch? |
8one6
referenced
this issue
Oct 29, 2014
Open
BUG: columns misaligned in repr when having >10 columns with integer index #8300
|
That seems like a possible bug, as this sorts differently/correctly without a multi-index. Can you try to show a small reproducible example showing the issue? |
8one6
commented
Oct 29, 2014
|
I'll have a shot at it. It's odd because I have two DataFrames whose generation is very similar but which don't exhibit parallel behavior in this case. I.e. DF1 comes out of its process sorting just fine, but DF2 (which is the one up above) comes out sorted incorrectly, even though they have very similar structures. One key difference is that DF1 has many more columns than DF2. Not sure if that could be related. Either way, I'll have a look. |

8one6 commentedAug 13, 2014
I have a
DataFramewith aMultiIndexon the columns. The first level of the MultiIndex containsstrings. The second,floats (though the problem persists if the second level isints). I add a column to theDataFrame(which should not come last if the columns are sorted). I try to sort theDataFrame. The result does not seem to be sorted. The behavior is fine if the columns are simply anIndex(even after adding columns). And the sort works fine in theMultiIndexcase as long as no columns have been added since theDataFramewas created.MWE:
This sorts just fine as it isnow:
But if I add columns to both this `DataFrame and then show it sorted, I get what looks to be a wrong result (the new column remains last, rather than being placed second-to-last as it should be):
I'm able to produce this behavior on two systems. The first runs Pandas 0.14.0 and Numpy 1.8.1 and the second runs Pandas 0.14.1 and Numpy 1.8.2. This issue is described here: http://stackoverflow.com/questions/25287130/pandas-sort-index-fails-with-multiindex-containing-floats-as-one-level-when-col?noredirect=1#comment39408150_25287130