Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DataFrame.__array_ufunc__ #23743

Closed
jorisvandenbossche opened this issue Nov 16, 2018 · 2 comments · Fixed by #36955
Closed

Implement DataFrame.__array_ufunc__ #23743

jorisvandenbossche opened this issue Nov 16, 2018 · 2 comments · Fixed by #36955
Labels
Enhancement Sparse Sparse Data Type

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Nov 16, 2018

Applying a ufunc on a DataFrame with sparse columns does not retain its sparse dtype:

In [105]: df = pd.SparseDataFrame(np.array([[0, 1, 0], [1, 0, 1]]),
                                  columns=['a', 'b', 'c'], default_fill_value=0)
In [106]: df2 = pd.DataFrame(df)

In [107]: np.exp(df)['a']
Out[107]: 
0    1.000000
1    2.718282
Name: a, dtype: Sparse[float64, 0]
BlockIndex
Block locations: array([0], dtype=int32)
Block lengths: array([2], dtype=int32)

In [108]: np.exp(df2)['a']
Out[108]: 
0    1.000000
1    2.718282
Name: a, dtype: float64

Although SparseDataFrame returns the correct thing here, I am not sure it actually works as desired, as I am not sure it prevents materializing the full data (which in principle should be possible to not do)


edit from Tom

Implementing DataFrame.__array_ufunc__ is probably the best way to do this.

The semantics will be similar to
Series.array_ufunc,
but applied blockwise.

  1. Series and DataFrame objs in inputs will first be aligned.
  2. All arrays will be unboxed from blocks
  3. The ufunc will be applied to each array. If the array defines array_ufunc, it'll be called.
  4. The results will be re-boxed in a DataFrame with the original labels.

There are some additional complicates with dimensionality, shapes, broadcasting... But the basic idea of using __array_ufunc__ blockwise so that the underlying array's __array_ufunc__ is called makes sense.

@jorisvandenbossche jorisvandenbossche added the Sparse Sparse Data Type label Nov 16, 2018
@JustinZhengBC
Copy link
Contributor

I think it's not that ufuncs cause pd.DataFrame to lose its sparse type, but it is lost as soon as the copy is made with a pd.DataFrame constructor because they're different classes (similar to #23744).

>>> df = pd.SparseDataFrame(np.array([[0, 1, 0], [0, 0, 0], [0, 0, 1]]), columns=list('abc'), default_fill_value=0)
>>> df2 = pd.DataFrame(df)
>>> type(df)
<class 'pandas.core.sparse.frame.SparseDataFrame'>
>>> type(df2)
<class 'pandas.core.frame.DataFrame'>

@jorisvandenbossche
Copy link
Member Author

See #23744 (comment) for an answer.

@jreback jreback modified the milestones: 0.24.0, Contributions Welcome Nov 18, 2018
@jreback jreback modified the milestones: Contributions Welcome, 0.25.0 May 28, 2019
jreback added a commit to jreback/pandas that referenced this issue May 29, 2019
preserve dtypes when applying a ufunc to a sparse dtype

closes pandas-dev#18502
closes pandas-dev#23743
jreback added a commit to jreback/pandas that referenced this issue May 29, 2019
preserve dtypes when applying a ufunc to a sparse dtype

closes pandas-dev#18502
closes pandas-dev#23743
jreback added a commit to jreback/pandas that referenced this issue May 29, 2019
preserve dtypes when applying a ufunc to a sparse dtype

closes pandas-dev#18502
closes pandas-dev#23743
jreback added a commit to jreback/pandas that referenced this issue May 30, 2019
preserve dtypes when applying a ufunc to a sparse dtype

closes pandas-dev#18502
closes pandas-dev#23743
jreback added a commit to jreback/pandas that referenced this issue Jun 2, 2019
preserve dtypes when applying a ufunc to a sparse dtype

closes pandas-dev#18502
closes pandas-dev#23743
jreback added a commit to jreback/pandas that referenced this issue Jun 8, 2019
preserve dtypes when applying a ufunc to a sparse dtype

closes pandas-dev#18502
closes pandas-dev#23743
jreback added a commit to jreback/pandas that referenced this issue Jun 9, 2019
preserve dtypes when applying a ufunc to a sparse dtype

closes pandas-dev#18502
closes pandas-dev#23743
jreback added a commit to jreback/pandas that referenced this issue Jun 21, 2019
preserve dtypes when applying a ufunc to a sparse dtype

closes pandas-dev#18502
closes pandas-dev#23743
@TomAugspurger TomAugspurger modified the milestones: 0.25.0, 0.25.1 Jul 2, 2019
@TomAugspurger TomAugspurger removed this from the 0.25.1 milestone Aug 20, 2019
@TomAugspurger TomAugspurger changed the title Applying ufuncs on DataFrame with Sparse data looses sparse dtype Implement DataFrame.__array_ufunc__ Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Sparse Sparse Data Type
Projects
None yet
5 participants