DataFrame.loc/iloc fails to update object with new dtype #24269

TomAugspurger · 2018-12-13T14:13:15Z

Similar to #4312 and #5702, but this seems specific to object dtype -> new dytpe.

In this first case, we likely convert the whole block, even though we just wanted A.

In [20]: df = pd.DataFrame({"A": [1, 2], "B": [3, 4]}, dtype=object)

In [21]: df.dtypes
Out[21]:
A    object
B    object
dtype: object

In [22]: df.loc[:, ['A']] = df.loc[:, ['A']].astype(int)

In [23]: df.dtypes
Out[23]:
A    int64
B    int64
dtype: object

In this one, (maybe a different bug), we fail to convert ['a', 'b'] to float when they start out in an object block with other values.

In [13]: df = pd.DataFrame([[np.nan, np.nan, 1, pd.Timestamp('2000')]], columns=['a', 'b', 'c', 'd'], dtype=object)

In [14]: df.loc[:, ['a', 'b']] = df.loc[:, ['a', 'b']].astype(float)

In [15]: df.dtypes
Out[15]:
a    object
b    object
c    object
d    object
dtype: object

The text was updated successfully, but these errors were encountered:

slnguyen · 2018-12-20T16:37:34Z

Something that might be useful to note is that when indexing with a single label we get the expected output. For example:

df = pd.DataFrame({"A": [1, 2], "B": [3, 4]}, dtype=object)
df.loc[:, 'A'] = (df.loc[:, 'A']).astype(int)

df.dtype returns the expected:

A     int64
B    object

However as you mentioned before:

df = pd.DataFrame({"A": [1, 2], "B": [3, 4]}, dtype=object)
df.loc[:, ['A']] = (df.loc[:, ['A']]).astype(int)

df.dtype returns

A    int64
B    int64
dtype: object

So the issue might have something to do with casting and indexing the dataframe with a list of labels versus single label?

slnguyen · 2018-12-21T16:19:12Z

I'm not sure if this fix is the right way to resolve the problem, but I've added another control statement so that updating dataframe values corresponding to a list of labels is handled the same way as updating dataframe values corresponding to a single label. The performance test asv_bench/benchmarks/indexing.py returns that the benchmarks are not significantly changed.

pandas/pandas/core/indexing.py

Lines 618 to 652 in 04a0eac

    
           if isinstance(indexer, tuple): 
        
               indexer = maybe_convert_ix(*indexer) 
        
               # if we are setting on the info axis ONLY 
        
               # set using those methods to avoid block-splitting 
        
               # logic here 
        
               if (len(indexer) > info_axis and 
        
                       is_integer(indexer[info_axis]) and 
        
                       all(com.is_null_slice(idx) 
        
                           for i, idx in enumerate(indexer) 
        
                           if i != info_axis) and 
        
                       item_labels.is_unique): 
        
                   self.obj[item_labels[indexer[info_axis]]] = value 
        
                   return 
        
           if isinstance(value, (ABCSeries, dict)): 
        
               # TODO(EA): ExtensionBlock.setitem this causes issues with 
        
               # setting for extensionarrays that store dicts. Need to decide 
        
               # if it's worth supporting that. 
        
               value = self._align_series(indexer, Series(value)) 
        
           elif isinstance(value, ABCDataFrame): 
        
               value = self._align_frame(indexer, value) 
        
           if isinstance(value, ABCPanel): 
        
               value = self._align_panel(indexer, value) 
        
           # check for chained assignment 
        
           self.obj._check_is_chained_assignment_possible() 
        
           # actually do the set 
        
           self.obj._consolidate_inplace() 
        
           self.obj._data = self.obj._data.setitem(indexer=indexer, 
        
                                                   value=value) 
        
           self.obj._maybe_update_cacher(clear=True)

Changes were made based on my previous comment. When indexing via single label the code goes through lines 624-631 (see code snipped above). When indexing via a list of labels the code goes through lines 649-652.

simonjayhawkins · 2020-03-30T18:25:58Z

In this one, (maybe a different bug), we fail to convert ['a', 'b'] to float when they start out in an object block with other values.

on master the first case fails to convert now as well.

>>> import numpy as np
>>>
>>> import pandas as pd
>>>
>>> pd.__version__
'1.1.0.dev0+1029.gbdf969cd6'
>>>
>>> df = pd.DataFrame({"A": [1, 2], "B": [3, 4]}, dtype=object)
>>>
>>> print(df.dtypes)
A    object
B    object
dtype: object
>>> # Out[21]:
>>> # A    object
>>> # B    object
>>> # dtype: object
>>>
>>> df.loc[:, ["A"]] = df.loc[:, ["A"]].astype(int)
>>>
>>> print(df.dtypes)
A    object
B    object
dtype: object
>>> # Out[23]:
>>> # A    int64
>>> # B    int64
>>> # dtype: object
>>>
>>> df = pd.DataFrame(
...     [[np.nan, np.nan, 1, pd.Timestamp("2000")]],
...     columns=["a", "b", "c", "d"],
...     dtype=object,
... )
>>>
>>> df.loc[:, ["a", "b"]] = df.loc[:, ["a", "b"]].astype(float)
>>>
>>> df.dtypes
a    object
b    object
c    object
d    object
dtype: object
>>> # Out[15]:
>>> # a    object
>>> # b    object
>>> # c    object
>>> # d    object
>>> # dtype: object
>>>

TomAugspurger added Dtype Conversions Unexpected or buggy dtype conversions Difficulty Intermediate labels Dec 13, 2018

TomAugspurger added this to the Contributions Welcome milestone Dec 13, 2018

slnguyen mentioned this issue Dec 21, 2018

BUG: DataFrame.loc/iloc fails to update object with new dtype GH24269 #24383

Closed

4 tasks

gfyoung added Indexing Related to indexing on series/frames, not to indexes themselves Bug labels Dec 23, 2018

jbrockmendel removed Effort Medium labels Oct 21, 2019

This was referenced Jan 6, 2022

BUG/API: DataFrame.iloc[:, foo] = bar inplaceness? #44353

Closed

BUG: indexing with loc and iloc with list-likes and new dtypes do not change from object dtype #20635

Open

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame.loc/iloc fails to update object with new dtype #24269

DataFrame.loc/iloc fails to update object with new dtype #24269

TomAugspurger commented Dec 13, 2018

slnguyen commented Dec 20, 2018

slnguyen commented Dec 21, 2018 •

edited

Loading

simonjayhawkins commented Mar 30, 2020

DataFrame.loc/iloc fails to update object with new dtype #24269

DataFrame.loc/iloc fails to update object with new dtype #24269

Comments

TomAugspurger commented Dec 13, 2018

slnguyen commented Dec 20, 2018

slnguyen commented Dec 21, 2018 • edited Loading

simonjayhawkins commented Mar 30, 2020

slnguyen commented Dec 21, 2018 •

edited

Loading