Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DataFrame.apply looses sparse dtype #23744

Closed
jorisvandenbossche opened this issue Nov 16, 2018 · 2 comments · Fixed by #23755
Closed

BUG: DataFrame.apply looses sparse dtype #23744

jorisvandenbossche opened this issue Nov 16, 2018 · 2 comments · Fixed by #23755
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Sparse Sparse Data Type
Milestone

Comments

@jorisvandenbossche
Copy link
Member

In [151]: df = pd.SparseDataFrame(np.array([[0, 1, 0], [0, 0, 0], [0, 0, 1]]), 
                                  columns=['a', 'b', 'c'], default_fill_value=0)
In [152]: df2 = pd.DataFrame(df)

In [153]: df.apply(np.exp)['a']
Out[153]: 
0    1.0
1    1.0
2    1.0
Name: a, dtype: Sparse[float64, 1.0]
BlockIndex
Block locations: array([], dtype=int32)
Block lengths: array([], dtype=int32)

In [155]: df2.apply(np.exp)['a']
Out[155]: 
0    1.0
1    1.0
2    1.0
Name: a, dtype: float64
@jorisvandenbossche jorisvandenbossche added Sparse Sparse Data Type ExtensionArray Extending pandas with custom dtypes or arrays. labels Nov 16, 2018
@JustinZhengBC
Copy link
Contributor

JustinZhengBC commented Nov 16, 2018

I observe the sparse dtype being lost after the copy, even before the apply

>>> df = pd.SparseDataFrame(np.array([[0, 1, 0], [0, 0, 0], [0, 0, 1]]), columns=list('abc'), default_fill_value=0)
>>> df2 = pd.DataFrame(df)
>>> type(df)
<class 'pandas.core.sparse.frame.SparseDataFrame'>
>>> type(df2)
<class 'pandas.core.frame.DataFrame'>

df2 = pd.SparseDataFrame(df) results in expected behaviour.

@jorisvandenbossche
Copy link
Member Author

@JustinZhengBC The pd.DataFrame(df) instead of pd.SparseDataFrame(df) is on purpose, as I was testing the sparse support in a normal DataFrame. That was not fully clear from the issue.

Normal DataFrames also support storing sparse columns in it:

In [14]: type(df2)                                                                                                                                                              
Out[14]: pandas.core.frame.DataFrame

In [15]: df2.dtypes                                                                                                                                                             
Out[15]: 
a    Sparse[int64, 0]
b    Sparse[int64, 0]
c    Sparse[int64, 0]
dtype: object

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants