New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_dummies with sparse doesn't convert numeric to sparse #18686

Closed
NagabhushanS opened this Issue Dec 8, 2017 · 4 comments

Comments

Projects
None yet
5 participants
@NagabhushanS

NagabhushanS commented Dec 8, 2017

I got the error
AttributeError: 'IntBlock' object has no attribute 'sp_index'

when converting a SparseDataFrame to Scipy csr_matrix using the following code:

dfTotalCat = get_dummies(dfTotalCat, sparse=True)

XTotalCat = csr_matrix(dfTotalCat.to_coo())

The SparseDataFrame is obtained from get_dummies.

Following is the exact error trace:

Traceback (most recent call last):
File "pandaSrc.py", line 76, in
XTotalCat = csr_matrix(dfTotalCat.to_coo())
File "C:\Users\nagabhushan.s\AppData\Local\Programs\Python\Python36\lib\site-p
ackages\pandas\core\sparse\frame.py", line 255, in to_coo
row = s.sp_index.to_int_index().indices
File "C:\Users\nagabhushan.s\AppData\Local\Programs\Python\Python36\lib\site-p
ackages\pandas\core\generic.py", line 3614, in getattr
return object.getattribute(self, name)
File "C:\Users\nagabhushan.s\AppData\Local\Programs\Python\Python36\lib\site-p
ackages\pandas\core\sparse\series.py", line 245, in sp_index
return self.block.sp_index
AttributeError: 'IntBlock' object has no attribute 'sp_index'

@TomAugspurger

This comment has been minimized.

Contributor

TomAugspurger commented Dec 8, 2017

Could you make a reproducible examples? What's dfTotalCat?

@dfaivre

This comment has been minimized.

dfaivre commented Dec 22, 2017

If not all of your columns are dummy encoded, then it will return some columns that are not sparse. Seems like if you df.to_sparse() before dummy encoding, the error should go away.

@TomAugspurger -- I don't know enough to know if this is expected behavior and docs just need to be updated (took me a bit to figure it out...)

Repro code below

import pandas as pd

df = pd.DataFrame(
    {
        "A": ["a", "b", "c", "a"],
        "B": [1, 2, 3, 4]
    })
df['A'] = df['A'].astype('category')


def _throw_no_attribute_sp_index_err():
    one_hot = pd.get_dummies(df, sparse=True)
    print(one_hot.columns)
    one_hot.to_coo()


def _no_throw():
    one_hot = pd.get_dummies(df.to_sparse(), sparse=True)
    print(one_hot.columns)
    one_hot.to_coo()
@TomAugspurger

This comment has been minimized.

Contributor

TomAugspurger commented Dec 24, 2017

Thanks. I'm not sure that get_dummies(sparse=True) should convert numeric columns. I think it's best to just document this.

@TomAugspurger TomAugspurger changed the title from Attribute Error when converting SparseDataFrame to Scipy sparse csr_matrix. to get_dummies sparse with sparse doesn't convert numeric to sparse Dec 24, 2017

@TomAugspurger TomAugspurger changed the title from get_dummies sparse with sparse doesn't convert numeric to sparse to get_dummies with sparse doesn't convert numeric to sparse Dec 24, 2017

@TomAugspurger TomAugspurger added the Docs label Dec 24, 2017

@hexgnu

This comment has been minimized.

Contributor

hexgnu commented Dec 26, 2017

I did a little digging into this... and what happens is that get_dummies somehow casts the non-sparse column as sparse even though the underlying block is not sparse. Which causes some cascading issues like the sp_index error. Haven't quite figured out what is going on with that but right now my hypothesis is that it's something to do with how concat is working with sparse frames.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment