-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Open
Labels
EnhancementNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further actionSparseSparse Data TypeSparse Data Type
Description
Previously, you check for the SparseDataFrame class, but now we want people to store sparse data in normal DataFrames/Series.
So for a library developer, what is the recommended way to check for sparse data?
For a Series (or SparseArray), the pd.api.types.is_sparse
still works on a plain Series with sparse data:
In [42]: df = pd.DataFrame({'a': pd.SparseArray([1, np.nan, 1])})
In [43]: s = df['a']
In [44]: type(s)
Out[44]: pandas.core.series.Series
In [45]: pd.api.types.is_sparse(s)
Out[45]: True
but for a DataFrame that function does not work. So you could apply that method on each column/dtype, and check with any
or all
depending on your requirements:
In [56]: df = pd.DataFrame({'a': pd.SparseArray([1, np.nan, 1]), 'b': [1, 2, 3]})
In [57]: df.dtypes.apply(pd.api.types.is_sparse)
Out[57]:
a True
b False
dtype: bool
In [58]: df.dtypes.apply(pd.api.types.is_sparse).any()
Out[58]: True
In [59]: df.dtypes.apply(pd.api.types.is_sparse).all()
Out[59]: False
So that actually works quite OK, now I am writing it.
Do we want to make this even easier somehow? Or document this?
Metadata
Metadata
Assignees
Labels
EnhancementNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further actionSparseSparse Data TypeSparse Data Type