New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG - sparse dataframes lose multi-index column names #11600

Closed
Ezekiel-Kruglick opened this Issue Nov 14, 2015 · 8 comments

Comments

Projects
None yet
2 participants
@Ezekiel-Kruglick

Ezekiel-Kruglick commented Nov 14, 2015

From SO: http://stackoverflow.com/questions/33702198/do-python-pandas-sparse-dataframes-lose-multi-index-column-names-or-am-i-doing-i

Bug is simple in concept, multi-index with column level names loses those names when going into sparse dataframes.

Minimal example - first create a multi-index dataframe:

In[2]: import pandas as pd
In[3]: miindex = pd.MultiIndex.from_product([["x","y"], ["10","20"]],names=['row-foo', 'row-bar'])
micol = pd.MultiIndex.from_product([['a','b','c'], ["1","2"]],names=['col-foo', 'col-bar'])
df = pd.DataFrame(index=miindex, columns=micol).sortlevel().sortlevel(axis=1)
df = df.fillna(value=3.14)
df
Out[3]: 
col-foo             a           b           c      
col-bar             1     2     1     2     1     2
row-foo row-bar                                    
x       10       3.14  3.14  3.14  3.14  3.14  3.14
        20       3.14  3.14  3.14  3.14  3.14  3.14
y       10       3.14  3.14  3.14  3.14  3.14  3.14
        20       3.14  3.14  3.14  3.14  3.14  3.14

This gives us a nice test multi-index with column and row level names. Now if I make a sparse matrix out of that and show it, the column level names are gone.

In[4]: ds = df.to_sparse()
ds
Out[4]: 
                    a           b           c      
                    1     2     1     2     1     2
row-foo row-bar                                    
x       10       3.14  3.14  3.14  3.14  3.14  3.14
        20       3.14  3.14  3.14  3.14  3.14  3.14
y       10       3.14  3.14  3.14  3.14  3.14  3.14
        20       3.14  3.14  3.14  3.14  3.14  3.14

And if I convert the sparse version back to dense those level names are still gone.

In[6]: ds.to_dense()
Out[6]: 
                    a           b           c      
                    1     2     1     2     1     2
row-foo row-bar                                    
x       10       3.14  3.14  3.14  3.14  3.14  3.14
        20       3.14  3.14  3.14  3.14  3.14  3.14
y       10       3.14  3.14  3.14  3.14  3.14  3.14
        20       3.14  3.14  3.14  3.14  3.14  3.14

I AM aware that displaying the sparse version calls to_dense() but the loss appears to be happening at the conversion to sparse. I'm exploring moving to sparse to reduce memory usage for a code base and my attempts to access the levels within the sparse dataframe generate "KeyError: 'Level not found'"

@jreback jreback added this to the Next Major Release milestone Nov 14, 2015

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Nov 14, 2015

Contributor

sparse has not gotten a lot of love, so pull-requests are welcome.

Contributor

jreback commented Nov 14, 2015

sparse has not gotten a lot of love, so pull-requests are welcome.

@Ezekiel-Kruglick

This comment has been minimized.

Show comment
Hide comment
@Ezekiel-Kruglick

Ezekiel-Kruglick Nov 14, 2015

I've been looking for a way to get involved with pandas and contribute, maybe this will be my start. Although once I get setup I see some other stuff on the list that looks less challenging :)

Ezekiel-Kruglick commented Nov 14, 2015

I've been looking for a way to get involved with pandas and contribute, maybe this will be my start. Although once I get setup I see some other stuff on the list that looks less challenging :)

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Nov 14, 2015

Contributor

that would be great!

there are a bunch of sparse issues which are on the easier side as well (though I don't think this is terribly involved)

lmk when u need help

Contributor

jreback commented Nov 14, 2015

that would be great!

there are a bunch of sparse issues which are on the easier side as well (though I don't think this is terribly involved)

lmk when u need help

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback
Contributor

jreback commented Nov 14, 2015

@Ezekiel-Kruglick

This comment has been minimized.

Show comment
Hide comment
@Ezekiel-Kruglick

Ezekiel-Kruglick Nov 14, 2015

Yup, already have it forked and cloned to my desktop and am exploring how the code connects, that page was quite useful!

Ezekiel-Kruglick commented Nov 14, 2015

Yup, already have it forked and cloned to my desktop and am exploring how the code connects, that page was quite useful!

@Ezekiel-Kruglick

This comment has been minimized.

Show comment
Hide comment
@Ezekiel-Kruglick

Ezekiel-Kruglick Nov 14, 2015

Okay, I think I have it fixed. The test code is longer than the fix code by far.

Passes the most obvious testing. I'm going to run the whole suit of tests then will get you a pull request to review.

Ezekiel-Kruglick commented Nov 14, 2015

Okay, I think I have it fixed. The test code is longer than the fix code by far.

Passes the most obvious testing. I'm going to run the whole suit of tests then will get you a pull request to review.

@Ezekiel-Kruglick

This comment has been minimized.

Show comment
Hide comment
@Ezekiel-Kruglick

Ezekiel-Kruglick Nov 14, 2015

OK, pull request submitted. Ran all nosetests and got OK (SKIP=116) on 9172 tests. If this gets integrated I'll post a short update as an answer on SO.

Ezekiel-Kruglick commented Nov 14, 2015

OK, pull request submitted. Ran all nosetests and got OK (SKIP=116) on 9172 tests. If this gets integrated I'll post a short update as an answer on SO.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Nov 19, 2015

Contributor

closed by #11606

Contributor

jreback commented Nov 19, 2015

closed by #11606

@jreback jreback closed this Nov 19, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment