Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.__setitem__ is broken for SparseArray #22367

Closed
TomAugspurger opened this issue Aug 15, 2018 · 2 comments
Closed

DataFrame.__setitem__ is broken for SparseArray #22367

TomAugspurger opened this issue Aug 15, 2018 · 2 comments
Labels
Sparse Sparse Data Type

Comments

@TomAugspurger
Copy link
Contributor

It doesn't sparsify, and the values are incorrect.

I'll fix this in #22325

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({"A": [1., None]})

In [3]: df['A'] = pd.SparseSeries(df.A)

In [4]: df.A.values
Out[4]: array([1., 1.])

In [5]: df = pd.DataFrame({"A": [True, False]})

In [6]: df['A'] = pd.SparseSeries(df.A)

In [7]: df.A.values
Out[7]: array([ True,  True])
@TomAugspurger TomAugspurger added the Sparse Sparse Data Type label Aug 15, 2018
@TomAugspurger
Copy link
Contributor Author

It's actually already fixed on #22325, aside from bool.

@TomAugspurger
Copy link
Contributor Author

Essentially came down to BoolBlock.should_store. For "container" EA types like SparseDtype, .type will be the type of the underlying value (bool in this case). This made BoolBlock think it could hold SparseArray[bool], which it should actually return a new ExtensionBlock.

So a new general rule, non-ExtensionBlock blocks shouldn't think they can store ExtensionArrays.

diff --git a/pandas/core/internals/blocks.py b/pandas/core/internals/blocks.py
index f320258e3..4f58a576f 100644
--- a/pandas/core/internals/blocks.py
+++ b/pandas/core/internals/blocks.py
@@ -2298,7 +2298,8 @@ class TimeDeltaBlock(DatetimeLikeBlockMixin, IntBlock):
         return result
 
     def should_store(self, value):
-        return issubclass(value.dtype.type, np.timedelta64)
+        return (issubclass(value.dtype.type, np.timedelta64) and
+                not is_extension_array_dtype(value))
 
     def to_native_types(self, slicer=None, na_rep=None, quoting=None,
                         **kwargs):
@@ -2337,7 +2338,8 @@ class BoolBlock(NumericBlock):
         return isinstance(element, (bool, np.bool_))
 
     def should_store(self, value):
-        return issubclass(value.dtype.type, np.bool_)
+        return (issubclass(value.dtype.type, np.bool_) and not
+                is_extension_array_dtype(value))
 
     def replace(self, to_replace, value, inplace=False, filter=None,
                 regex=False, convert=True, mgr=None):
@@ -2879,7 +2881,8 @@ class DatetimeBlock(DatetimeLikeBlockMixin, Block):
 
     def should_store(self, value):
         return (issubclass(value.dtype.type, np.datetime64) and
-                not is_datetimetz(value))
+                not is_datetimetz(value) and
+                not is_extension_array_dtype(value))
 
     def set(self, locs, values, check=False):
         """

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Aug 16, 2018
tm9k1 pushed a commit to tm9k1/pandas that referenced this issue Nov 19, 2018
Makes SparseArray an ExtensionArray.

* Fixed DataFrame.__setitem__ for updating to sparse.

Closes pandas-dev#22367

* Fixed Series[sparse].to_sparse

Closes pandas-dev#22389

Closes pandas-dev#21978
Closes pandas-dev#19506
Closes pandas-dev#22835
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Sparse Sparse Data Type
Projects
None yet
Development

No branches or pull requests

1 participant