Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: assigning column with name that is registered as extension dtype errors #38386

Closed
jorisvandenbossche opened this issue Dec 9, 2020 · 2 comments · Fixed by #38427
Closed
Assignees
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@jorisvandenbossche
Copy link
Member

In [34]: df = pd.DataFrame([0])

In [35]: df["Int64"] = [1]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/scipy/pandas/pandas/core/generic.py in _set_item(self, key, value)
   3822         try:
-> 3823             loc = self._info_axis.get_loc(key)
   3824         except KeyError:

~/scipy/pandas/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
    353                     raise KeyError(key) from err
--> 354             raise KeyError(key)
    355         return super().get_loc(key, method=method, tolerance=tolerance)

KeyError: 'Int64'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-35-26205ed43ee5> in <module>
----> 1 df["Int64"] = [1]

~/scipy/pandas/pandas/core/frame.py in __setitem__(self, key, value)
   3161         else:
   3162             # set column
-> 3163             self._set_item(key, value)
   3164 
   3165     def _setitem_slice(self, key: slice, value):

~/scipy/pandas/pandas/core/frame.py in _set_item(self, key, value)
   3238         self._ensure_valid_index(value)
   3239         value = self._sanitize_column(key, value)
-> 3240         NDFrame._set_item(self, key, value)
   3241 
   3242         # check if we are modifying a copy

~/scipy/pandas/pandas/core/generic.py in _set_item(self, key, value)
   3824         except KeyError:
   3825             # This item wasn't present, just insert at end
-> 3826             self._mgr.insert(len(self._info_axis), key, value)
   3827             return
   3828 

~/scipy/pandas/pandas/core/internals/managers.py in insert(self, loc, item, value, allow_duplicates)
   1195 
   1196         # insert to the axis; this could possibly raise a TypeError
-> 1197         new_axis = self.items.insert(loc, item)
   1198 
   1199         if value.ndim == self.ndim - 1 and not is_extension_array_dtype(value.dtype):

~/scipy/pandas/pandas/core/indexes/numeric.py in insert(self, loc, item)
    172     def insert(self, loc: int, item):
    173         try:
--> 174             item = self._validate_fill_value(item)
    175         except TypeError:
    176             return self.astype(object).insert(loc, item)

~/scipy/pandas/pandas/core/indexes/numeric.py in _validate_fill_value(self, value)
    117         Convert value to be insertable to ndarray.
    118         """
--> 119         if is_bool(value) or is_bool_dtype(value):
    120             # force conversion to object
    121             # so we don't lose the bools

~/scipy/pandas/pandas/core/dtypes/common.py in is_bool_dtype(arr_or_dtype)
   1398         return arr_or_dtype.is_object and arr_or_dtype.inferred_type == "boolean"
   1399     elif is_extension_array_dtype(arr_or_dtype):
-> 1400         return getattr(arr_or_dtype, "dtype", arr_or_dtype)._is_boolean
   1401 
   1402     return issubclass(dtype.type, np.bool_)

AttributeError: 'str' object has no attribute '_is_boolean'

This is something that started to fail recently. Getting this from the geopandas CI testing with pandas master.

@jorisvandenbossche jorisvandenbossche added this to the 1.2 milestone Dec 9, 2020
@jorisvandenbossche
Copy link
Member Author

I suppose this is caused by #38030 (@jbrockmendel)

That PR introduced:

try:
item = self._validate_fill_value(item)
except TypeError:
return self.astype(object).insert(loc, item)

but the problem here is that the validation function is not raising a TypeError, but a different error. This, in turn, seems to be caused by a bug in is_bool_dtype:

In [36]: pd.api.types.is_bool_dtype("Int64")
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-36-e7e8bfa4bbdd> in <module>
----> 1 pd.api.types.is_bool_dtype("Int64")

~/scipy/pandas/pandas/core/dtypes/common.py in is_bool_dtype(arr_or_dtype)
   1398         return arr_or_dtype.is_object and arr_or_dtype.inferred_type == "boolean"
   1399     elif is_extension_array_dtype(arr_or_dtype):
-> 1400         return getattr(arr_or_dtype, "dtype", arr_or_dtype)._is_boolean
   1401 
   1402     return issubclass(dtype.type, np.bool_)

AttributeError: 'str' object has no attribute '_is_boolean'

@jbrockmendel
Copy link
Member

Agreed we should fix this in is_bool_dtype

@rhshadrach rhshadrach added the Indexing Related to indexing on series/frames, not to indexes themselves label Dec 10, 2020
@rhshadrach rhshadrach self-assigned this Dec 12, 2020
@rhshadrach rhshadrach added Regression Functionality that used to work in a prior pandas version and removed Bug labels Dec 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants