Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Support dtypes other than float in sparse data structures #667
Comments
benjello
referenced
this issue
Jul 24, 2013
Closed
SparseDataFrame should be able to handle also non float non sparse Columns #2873
|
partially in #3482 |
jreback
added the
Testing
label
Feb 15, 2014
jreback
modified the milestone: 0.15.0, 0.14.0
Feb 15, 2014
jreback
modified the milestone: 0.16.0, 0.17.0
Jan 26, 2015
aolieman
commented
Apr 10, 2016
|
Would it be easier to work towards supporting floats other than float64 first? The overall enhancement of supporting other kinds of dtypes seems a major effort that should probably be tackled in smaller steps. I'm particularly interested in reducing memory usage of dummy variables (i.e. bool) and small-valued counts (e.g. uint8). When sparsifying dataframes that contain such dtypes, being able to convert to float16 rather than float64 would already help a lot. I've posted a StackOverflow question (and answer) regarding my attempts to achieve this. |
aolieman
commented
Apr 10, 2016
|
@jreback thanks for mentioning the 0.18.1 fixes. They should solve some of the issues I encountered, but dtype coercion still occurs with frames. Even if a Attempts to construct a one-column frame in 0.18.0 (same result for multiple columns): In []: dense_series = pd.Series([False]*5 + [True]*3 + [False]*5, dtype='bool', name='b')
In []: dense_df = pd.DataFrame(dense_series)
In []: sparse_df = dense_df.to_sparse(fill_value=False)
In []: sparse_df['b'].dtype
Out[]: dtype('float64')
In []: sparse_series = dense_series.to_sparse(fill_value=False)
In []: sparse_series.dtype
Out[]: dtype('bool')
In []: sparse_df = pd.SparseDataFrame(sparse_series)
In []: sparse_df['b'].dtype
Out[]: dtype('float64')
In []: sparse_df = pd.SparseDataFrame(sparse_series, dtype='bool')
In []: sparse_df['b'].dtype
Out[]: dtype('bool')
In []: sparse_df.info()
------------------------
[traceback omitted]
AttributeError: ("'SingleBlockManager' object has no attribute 'view'", 'occurred at index b')
In []: sparse_df['b'].values
Out[]:
SingleBlockManager
Items: RangeIndex(start=0, stop=13, step=1)
BoolBlock: 13 dtype: boolMy apologies if this is solved in 0.18.1 (which I'm not able to test right now) or if I'm doing it wrong. |
|
well it's an open issue - welcome to have test and such |
This was referenced Apr 27, 2016
jreback
added a commit
that referenced
this issue
Aug 3, 2016
|
|
sinhrks + jreback |
45d54d0
|
wesm commentedJan 23, 2012
No description provided.