Pivot to SparseDataFrame: TypeError: ufunc 'isnan' not supported in sparse matrix conversion #11633

Closed
DSLituiev opened this Issue Nov 18, 2015 · 7 comments

Comments

Projects
None yet
2 participants

I want to convert a DataFrame to SparseDataFrame before pivoting it (when it gets really sparse, see also this discussion ). I have a textual key, which I need to keep ("chr"):

df = pd.DataFrame( list(zip([3,2,4,1,5,3,2],
             ["chr1", "chr1", "chr1",  "chr1", "chr2", "chr2", "chr3"], 
            [100,100, 100, 200, 1,3,1],
            [True, True, True, False, True, False, True],
            [-1,0,1,3, 0,2,1])) ,
            columns = ["counts", "chr", "pos", "strand", "distance"])

df.iloc[:,1:].dtypes
Out[]: 
chr         object
pos          int64
strand        bool
distance     int64
dtype: object

For this small table it works well with regular DataFrame:

pd.pivot_table(df, index= [ "chr", "pos"], columns= ["strand","distance"], values= "counts").fillna(0)

     strand   False    True       
distance     2  3    -1  0  1
chr  pos                     
chr1 100     0  0     3  2  4
     200     0  1     0  0  0
chr2 1       0  0     0  5  0
     3       3  0     0  0  0
chr3 1       0  0     0  0  2

But I need to do it on much larger matrices. So I tried to do following trick:

dfpiv = pd.pivot_table(pd.SparseDataFrame(df), index= [ "chr", "pos"], columns= ["strand","distance"], values= "counts")

but I am getting:

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Are there any plans to include a functionality option into pivot function for automatic conversion into SparseDataFrame?

If I include default_fill_value=0, which makes sense in my case I get yet another error:

>>> dfsp = pd.SparseDataFrame(df, default_fill_value=0)
ValueError: could not convert string to float: '<value from "chr" column>'
Contributor

jreback commented Nov 18, 2015

you would have to show a copy-pastable example. and pd.show_versions()

please see updated post with an example above

pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.0
nose: 1.3.7
pip: 7.1.2
setuptools: 18.4
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.16.0
statsmodels: None
IPython: 4.0.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2.dev0
numexpr: 2.4.3
matplotlib: 1.4.3
openpyxl: 2.2.6
xlrd: 0.9.3
xlwt: 1.0.0
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.5
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)

jreback added this to the Next Major Release milestone Dec 10, 2015

Contributor

jreback commented Dec 10, 2015

this is quite easy to fix, need to replace ~np.isnan(arr) with pd.notnull(arr)

pull-requests are welcome

Do you have a test file dedicated to sparse?

@jreback jreback modified the milestone: 0.18.1, Next Major Release Feb 23, 2016

@jreback jreback modified the milestone: 0.18.2, 0.18.1 Apr 18, 2016

jreback closed this in 86f68e6 May 18, 2016

@nps nps added a commit to nps/pandas that referenced this issue May 30, 2016

@sinhrks @nps sinhrks + nps BUG: Sparse creation with object dtype may raise TypeError
closes #11633
closes #11856

Author: sinhrks <sinhrks@gmail.com>

Closes #13201 from sinhrks/sparse_isnull and squashes the following commits:

443b47e [sinhrks] BUG: Sparse creation with object dtype may raise TypeError
6749e72
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment