loc row assignment with NaN and NaT coerces to either NaN or NaT #12499

jonathanstrong · 2016-03-01T03:46:44Z

import datetime
import pandas as pd
import pytz
data = [{'one': 0, 'two': datetime.datetime(2016, 3, 1, 3, 13, 22, 98986, tzinfo=pytz.timezone('UTC'))}]
df = pd.DataFrame(data)
df.loc[1] = [np.nan, np.datetime64('NaT')]

traceback

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-7cc4bcff4319> in <module>()
      4 data = [{'one': 0, 'two': datetime.datetime(2016, 3, 1, 3, 13, 22, 98986, tzinfo=pytz.timezone('UTC'))}]
      5 df = pd.DataFrame(data)
----> 6 df.loc[1] = [np.nan, np.datetime64('NaT')]

/home/jstrong/src/envs/vc4/local/lib/python2.7/site-packages/pandas/core/indexing.pyc in __setitem__(self, key, value)
    115     def __setitem__(self, key, value):
    116         indexer = self._get_setitem_indexer(key)
--> 117         self._setitem_with_indexer(indexer, value)
    118 
    119     def _has_valid_type(self, k, axis):

/home/jstrong/src/envs/vc4/local/lib/python2.7/site-packages/pandas/core/indexing.pyc in _setitem_with_indexer(self, indexer, value)
    337                         value = Series(value,index=self.obj.columns,name=indexer)
    338 
--> 339                     self.obj._data = self.obj.append(value)._data
    340                     self.obj._maybe_update_cacher(clear=True)
    341                     return self.obj

/home/jstrong/src/envs/vc4/local/lib/python2.7/site-packages/pandas/core/frame.pyc in append(self, other, ignore_index, verify_integrity)
   4229             to_concat = [self, other]
   4230         return concat(to_concat, ignore_index=ignore_index,
-> 4231                       verify_integrity=verify_integrity)
   4232 
   4233     def join(self, other, on=None, how='left', lsuffix='', rsuffix='',

/home/jstrong/src/envs/vc4/local/lib/python2.7/site-packages/pandas/tools/merge.pyc in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    811                        verify_integrity=verify_integrity,
    812                        copy=copy)
--> 813     return op.get_result()
    814 
    815 

/home/jstrong/src/envs/vc4/local/lib/python2.7/site-packages/pandas/tools/merge.pyc in get_result(self)
    993 
    994             new_data = concatenate_block_managers(
--> 995                 mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy)
    996             if not self.copy:
    997                 new_data._consolidate_inplace()

/home/jstrong/src/envs/vc4/local/lib/python2.7/site-packages/pandas/core/internals.pyc in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   4454                                                 copy=copy),
   4455                          placement=placement)
-> 4456               for placement, join_units in concat_plan]
   4457 
   4458     return BlockManager(blocks, axes)

/home/jstrong/src/envs/vc4/local/lib/python2.7/site-packages/pandas/core/internals.pyc in concatenate_join_units(join_units, concat_axis, copy)
   4551     to_concat = [ju.get_reindexed_values(empty_dtype=empty_dtype,
   4552                                          upcasted_na=upcasted_na)
-> 4553                  for ju in join_units]
   4554 
   4555     if len(to_concat) == 1:

/home/jstrong/src/envs/vc4/local/lib/python2.7/site-packages/pandas/core/internals.pyc in get_reindexed_values(self, empty_dtype, upcasted_na)
   4799 
   4800             if self.is_null and not getattr(self.block,'is_categorical',None):
-> 4801                 missing_arr = np.empty(self.shape, dtype=empty_dtype)
   4802                 if np.prod(self.shape):
   4803                     # NumPy 1.6 workaround: this statement gets strange if all

TypeError: data type not understood

Same if you loc[1] = [np.nan, np.nan]. Seems the problem is loc assignment involving NaN for a datetime index column. Not sure the full breadth of the bug.

Expected Output

   one                              two
0    0 2016-03-01 03:13:22.098986+00:00
1    1                              NaT

output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-79-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 8.0.2
setuptools: 0.6
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
IPython: 4.0.1
sphinx: None
patsy: 0.4.1
dateutil: 2.5.0
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.6
matplotlib: 1.5.0+1301.g7b517da
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.4.1
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
Jinja2: None

The text was updated successfully, but these errors were encountered:

jonathanstrong · 2016-03-01T03:50:56Z

fyi - quickie workaround:

df['a_datetime_col'] = df['a_datetime_col'].astype('object')

jreback · 2016-03-01T12:13:23Z

xref #11365 and #3746

note your work around completely defeats the purpose of dtypes.

mroeschke · 2018-06-22T15:25:18Z

Update:

This almost looks fixed

In [10]: df.loc[1] = [np.nan, np.datetime64('NaT')]

In [11]: df
Out[11]:
   one                               two
0    0  2016-03-01 03:13:22.098986+00:00
1  NaT                               NaT

However the element in df.loc[1, 'one'] should be np.nan.

Also this looks incorrect as well:

In [12]: df.loc[1, :] = [np.nan, np.datetime64('NaT')]

In [13]: df
Out[13]:
   one                               two
0    0  2016-03-01 03:13:22.098986+00:00
1  NaN                               NaN

jbrockmendel · 2022-01-01T00:18:23Z

@mroeschke making sure we're on the same page before i write (and xfail) a test: in [11] we expect df["one"] to be float64 and have nan instead of NaT, right?

mroeschke · 2022-01-01T00:20:31Z

@mroeschke making sure we're on the same page before i write (and xfail) a test: in [11] we expect df["one"] to be float64 and have nan instead of NaT, right?

Correct

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Timezones Timezone data dtype Difficulty Intermediate labels Mar 1, 2016

jreback added this to the 0.18.1 milestone Mar 1, 2016

jreback modified the milestones: 0.18.1, 0.18.2 Apr 25, 2016

jorisvandenbossche modified the milestones: Next Major Release, 0.19.0 Aug 21, 2016

jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label Aug 21, 2016

jorisvandenbossche modified the milestones: 0.19.0, Next Major Release Aug 21, 2016

jreback modified the milestones: 0.20.0, 0.19.0 Sep 8, 2016

tambu-j mentioned this issue Sep 8, 2016

setitem type coercion fails when setting a datetime column in a single row dataframe #14179

Closed

joseortiz3 mentioned this issue Feb 27, 2017

.loc assignment of pd.Timestamp to Series of dtype object results in cast to Long #15526

Closed

jreback modified the milestones: 0.20.0, Next Major Release Mar 23, 2017

jreback mentioned this issue Oct 30, 2017

BUG: setitem with at not inferring dtype correctly #6942

Closed

mroeschke changed the title ~~loc assignment with NaN/NaT causes crash~~ loc row assignment with NaN and NaT coerces to either NaN or NaT Jul 26, 2018

mroeschke mentioned this issue Oct 8, 2018

BUG-22796 Concat multicolumn tz-aware DataFrame #23036

Merged

4 tasks

mroeschke mentioned this issue Feb 26, 2019

BUG: repr of np.datetime64('NaT') in Series/DataFrame with dtype object #25445

Merged

3 tasks

jbrockmendel removed Difficulty Intermediate labels Oct 21, 2019

mroeschke removed the Regression Functionality that used to work in a prior pandas version label Apr 23, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loc row assignment with NaN and NaT coerces to either NaN or NaT #12499

loc row assignment with NaN and NaT coerces to either NaN or NaT #12499

jonathanstrong commented Mar 1, 2016

jonathanstrong commented Mar 1, 2016

jreback commented Mar 1, 2016

mroeschke commented Jun 22, 2018

jbrockmendel commented Jan 1, 2022

mroeschke commented Jan 1, 2022

loc row assignment with NaN and NaT coerces to either NaN or NaT #12499

loc row assignment with NaN and NaT coerces to either NaN or NaT #12499

Comments

jonathanstrong commented Mar 1, 2016

Expected Output

output of pd.show_versions()

jonathanstrong commented Mar 1, 2016

jreback commented Mar 1, 2016

mroeschke commented Jun 22, 2018

jbrockmendel commented Jan 1, 2022

mroeschke commented Jan 1, 2022

output of `pd.show_versions()`