Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series construction with EA dtype and index but no data fails #26469

Closed
jorisvandenbossche opened this issue May 20, 2019 · 5 comments · Fixed by #33846
Closed

BUG: Series construction with EA dtype and index but no data fails #26469

jorisvandenbossche opened this issue May 20, 2019 · 5 comments · Fixed by #33846
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors ExtensionArray Extending pandas with custom dtypes or arrays.
Milestone

Comments

@jorisvandenbossche
Copy link
Member

Using master (0.25.0dev):

In [8]: pd.Series(None, index=[1, 2, 3], dtype='Int64')                                                                                                                                                             
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/scipy/pandas/pandas/core/internals/construction.py in _try_cast(arr, take_fast_path, dtype, copy, raise_cast_failure)
    691         if is_integer_dtype(dtype):
--> 692             subarr = maybe_cast_to_integer_array(arr, dtype)
    693 

~/scipy/pandas/pandas/core/dtypes/cast.py in maybe_cast_to_integer_array(arr, dtype, copy)
   1311         if not hasattr(arr, "astype"):
-> 1312             casted = np.array(arr, dtype=dtype, copy=copy)
   1313         else:

TypeError: data type not understood

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-8-9447295feee6> in <module>
----> 1 pd.Series(None, index=[1, 2, 3], dtype='Int64')

~/scipy/pandas/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    202                 data = data._data
    203             elif isinstance(data, dict):
--> 204                 data, index = self._init_dict(data, index, dtype)
    205                 dtype = None
    206                 copy = False

~/scipy/pandas/pandas/core/series.py in _init_dict(self, data, index, dtype)
    295 
    296         # Input is now list-like, so rely on "standard" construction:
--> 297         s = Series(values, index=keys, dtype=dtype)
    298 
    299         # Now we just make sure the order is respected, if any

~/scipy/pandas/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    253             else:
    254                 data = sanitize_array(data, index, dtype, copy,
--> 255                                       raise_cast_failure=True)
    256 
    257                 data = SingleBlockManager(data, index, fastpath=True)

~/scipy/pandas/pandas/core/internals/construction.py in sanitize_array(data, index, dtype, copy, raise_cast_failure)
    620         subarr = _try_cast(arr, False, dtype, copy, raise_cast_failure)
    621     else:
--> 622         subarr = _try_cast(data, False, dtype, copy, raise_cast_failure)
    623 
    624     # scalar like, GH

~/scipy/pandas/pandas/core/internals/construction.py in _try_cast(arr, take_fast_path, dtype, copy, raise_cast_failure)
    711             # create an extension array from its dtype
    712             array_type = dtype.construct_array_type()._from_sequence
--> 713             subarr = array_type(arr, dtype=dtype, copy=copy)
    714         elif dtype is not None and raise_cast_failure:
    715             raise

~/scipy/pandas/pandas/core/arrays/integer.py in _from_sequence(cls, scalars, dtype, copy)
    305     @classmethod
    306     def _from_sequence(cls, scalars, dtype=None, copy=False):
--> 307         return integer_array(scalars, dtype=dtype, copy=copy)
    308 
    309     @classmethod

~/scipy/pandas/pandas/core/arrays/integer.py in integer_array(values, dtype, copy)
    110     TypeError if incompatible types
    111     """
--> 112     values, mask = coerce_to_array(values, dtype=dtype, copy=copy)
    113     return IntegerArray(values, mask)
    114 

~/scipy/pandas/pandas/core/arrays/integer.py in coerce_to_array(values, dtype, mask, copy)
    202 
    203     if not values.ndim == 1:
--> 204         raise TypeError("values must be a 1D list-like")
    205     if not mask.ndim == 1:
    206         raise TypeError("mask must be a 1D list-like")

TypeError: values must be a 1D list-like

while this works fine for non-EA dtypes.

The problem is that the None / np.nan is eventually passed to the _from_sequence method, which expects a list-like. I don't think we should let people fix their _from_sequence, but we should rather make sure we never pass that to it.

@jorisvandenbossche jorisvandenbossche added Bug ExtensionArray Extending pandas with custom dtypes or arrays. labels May 20, 2019
@jorisvandenbossche jorisvandenbossche added this to the 0.25.0 milestone May 20, 2019
@jreback jreback modified the milestones: 0.25.0, Contributions Welcome Jun 28, 2019
@vanschelven
Copy link

This problem also manifests when doing an apply on a Series of extended-array dtype, e.g.:

import pandas as pd
pd.Series([], dtype="Int64").apply(lambda f: f)

The reason is that the apply code contains a shortcut for the empty series case, which constructs a new series in exactly the problematic way described in this issue: with an EA datatype and an index, but no data:

pandas/pandas/core/series.py

Lines 3947 to 3950 in 2ae5152

if len(self) == 0:
return self._constructor(dtype=self.dtype, index=self.index).__finalize__(
self
)

I'm on pandas 0.24.2

@TomAugspurger
Copy link
Contributor

@vanschelven interested in working on this? I think the fix is to call ExtensionArray._from_sequence([]) rather than ExtensionArray._from_sequence(None).

@vanschelven
Copy link

vanschelven commented Jul 19, 2019

I'm willing to do some work on this, but setting up a full development environment for pandas is a bit much for me now. I've made a "shoot from the hip" (completely untested) PR #27475 with what I think could be the solution though.

@vanschelven
Copy link

The problem with your suggestion is that at the moment of failure arr is already not an array (its a scalar, in particular np.nan). That is, there is no line ExtensionArray._from_sequence(None) to replace. In the PR referenced above I've gone to what I understand to be the moment np.nan gets introduced and removed that behavior.

@vanschelven
Copy link

As evidenced from the PR's travis logs, that's not a solution, so I'm letting this rest for now

@jbrockmendel jbrockmendel added the Constructors Series/DataFrame/Index/pd.array Constructors label Jul 23, 2019
@jreback jreback modified the milestones: Contributions Welcome, 1.1 Apr 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
5 participants