Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.__setitem__ is broken for SparseArray #22367

Closed
TomAugspurger opened this issue Aug 15, 2018 · 2 comments

Comments

Projects
None yet
1 participant
@TomAugspurger
Copy link
Contributor

commented Aug 15, 2018

It doesn't sparsify, and the values are incorrect.

I'll fix this in #22325

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({"A": [1., None]})

In [3]: df['A'] = pd.SparseSeries(df.A)

In [4]: df.A.values
Out[4]: array([1., 1.])

In [5]: df = pd.DataFrame({"A": [True, False]})

In [6]: df['A'] = pd.SparseSeries(df.A)

In [7]: df.A.values
Out[7]: array([ True,  True])
@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Aug 15, 2018

It's actually already fixed on #22325, aside from bool.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Aug 15, 2018

Essentially came down to BoolBlock.should_store. For "container" EA types like SparseDtype, .type will be the type of the underlying value (bool in this case). This made BoolBlock think it could hold SparseArray[bool], which it should actually return a new ExtensionBlock.

So a new general rule, non-ExtensionBlock blocks shouldn't think they can store ExtensionArrays.

diff --git a/pandas/core/internals/blocks.py b/pandas/core/internals/blocks.py
index f320258e3..4f58a576f 100644
--- a/pandas/core/internals/blocks.py
+++ b/pandas/core/internals/blocks.py
@@ -2298,7 +2298,8 @@ class TimeDeltaBlock(DatetimeLikeBlockMixin, IntBlock):
         return result
 
     def should_store(self, value):
-        return issubclass(value.dtype.type, np.timedelta64)
+        return (issubclass(value.dtype.type, np.timedelta64) and
+                not is_extension_array_dtype(value))
 
     def to_native_types(self, slicer=None, na_rep=None, quoting=None,
                         **kwargs):
@@ -2337,7 +2338,8 @@ class BoolBlock(NumericBlock):
         return isinstance(element, (bool, np.bool_))
 
     def should_store(self, value):
-        return issubclass(value.dtype.type, np.bool_)
+        return (issubclass(value.dtype.type, np.bool_) and not
+                is_extension_array_dtype(value))
 
     def replace(self, to_replace, value, inplace=False, filter=None,
                 regex=False, convert=True, mgr=None):
@@ -2879,7 +2881,8 @@ class DatetimeBlock(DatetimeLikeBlockMixin, Block):
 
     def should_store(self, value):
         return (issubclass(value.dtype.type, np.datetime64) and
-                not is_datetimetz(value))
+                not is_datetimetz(value) and
+                not is_extension_array_dtype(value))
 
     def set(self, locs, values, check=False):
         """

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Aug 16, 2018

brute4s99 added a commit to brute4s99/pandas that referenced this issue Nov 19, 2018

[API/REF]: SparseArray is an ExtensionArray (pandas-dev#22325)
Makes SparseArray an ExtensionArray.

* Fixed DataFrame.__setitem__ for updating to sparse.

Closes pandas-dev#22367

* Fixed Series[sparse].to_sparse

Closes pandas-dev#22389

Closes pandas-dev#21978
Closes pandas-dev#19506
Closes pandas-dev#22835
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.