Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
BUG - sparse dataframes lose multi-index column names #11600
Comments
jreback
added Bug Sparse MultiIndex
labels
Nov 14, 2015
jreback
added this to the
Next Major Release
milestone
Nov 14, 2015
jreback
added Difficulty Intermediate Effort Medium
labels
Nov 14, 2015
|
sparse has not gotten a lot of love, so pull-requests are welcome. |
Ezekiel-Kruglick
commented
Nov 14, 2015
|
I've been looking for a way to get involved with pandas and contribute, maybe this will be my start. Although once I get setup I see some other stuff on the list that looks less challenging :) |
|
that would be great! there are a bunch of sparse issues which are on the easier side as well (though I don't think this is terribly involved) lmk when u need help |
|
http://pandas.pydata.org/pandas-docs/stable/contributing.html for how to contribute |
Ezekiel-Kruglick
commented
Nov 14, 2015
|
Yup, already have it forked and cloned to my desktop and am exploring how the code connects, that page was quite useful! |
Ezekiel-Kruglick
commented
Nov 14, 2015
|
Okay, I think I have it fixed. The test code is longer than the fix code by far. Passes the most obvious testing. I'm going to run the whole suit of tests then will get you a pull request to review. |
Ezekiel-Kruglick
commented
Nov 14, 2015
|
OK, pull request submitted. Ran all nosetests and got OK (SKIP=116) on 9172 tests. If this gets integrated I'll post a short update as an answer on SO. |
Ezekiel-Kruglick
referenced
this issue
Nov 15, 2015
Closed
BUG GH11600 - MultiIndex column level names lost when to_sparse() called #11606
jreback
modified the milestone: 0.17.1, Next Major Release
Nov 19, 2015
jreback
added a commit
that referenced
this issue
Nov 19, 2015
|
|
+ jreback |
207e0ce
|
|
closed by #11606 |
Ezekiel-Kruglick commentedNov 14, 2015
From SO: http://stackoverflow.com/questions/33702198/do-python-pandas-sparse-dataframes-lose-multi-index-column-names-or-am-i-doing-i
Bug is simple in concept, multi-index with column level names loses those names when going into sparse dataframes.
Minimal example - first create a multi-index dataframe:
This gives us a nice test multi-index with column and row level names. Now if I make a sparse matrix out of that and show it, the column level names are gone.
And if I convert the sparse version back to dense those level names are still gone.
I AM aware that displaying the sparse version calls to_dense() but the loss appears to be happening at the conversion to sparse. I'm exploring moving to sparse to reduce memory usage for a code base and my attempts to access the levels within the sparse dataframe generate "KeyError: 'Level not found'"