Assigning timedelta64 to new column casts to float instead of filling missing values with NaT #7592

ischwabacher · 2014-06-27T18:31:42Z

When assigning a timedelta64 array to a subset of a new column of a DataFrame, missing data is not filled with NaT as expected; rather, the new column is cast to float64 and NaN is used instead. This cast does not usually occur when all values are present, except when there are already float64 columns but no timedelta64 columns in the DataFrame and indexing is done through .ix or .loc.

It's possible these should be two separate issues.

There are a lot of issues involving NaT in the issue tracker; I'm not 100% sure that this isn't a duplicate. (Nor am I 100% sure this isn't intended behavior, but if it is I'd expect it to be documented more prominently.)

import numpy as np
import pandas as pd

one_hour = 60*60*10**9

temp = pd.DataFrame({}, index=pd.date_range('2014-1-1', periods=4))
temp['A'] = np.array([1*one_hour]*4, dtype='m8[ns]')
temp.loc[:,'B'] = np.array([2*one_hour]*4, dtype='m8[ns]')
temp.loc[:3,'C'] = np.array([3*one_hour]*3, dtype='m8[ns]')
temp.ix[:,'D'] = np.array([4*one_hour]*4, dtype='m8[ns]')
temp.ix[:3,'E'] = np.array([5*one_hour]*3, dtype='m8[ns]')
temp['F'] = np.timedelta64('NaT')
temp.ix[:-1,'F'] = np.array([6*one_hour]*3, dtype='m8[ns]')

temp
#                   A        B             C        D             E        F
#2014-01-01 01:00:00 02:00:00  1.080000e+13 04:00:00  1.800000e+13 06:00:00
#2014-01-02 01:00:00 02:00:00  1.080000e+13 04:00:00  1.800000e+13 06:00:00
#2014-01-03 01:00:00 02:00:00  1.080000e+13 04:00:00  1.800000e+13 06:00:00
#2014-01-04 01:00:00 02:00:00           NaN 04:00:00           NaN      NaT
# 
# [4 rows x 6 columns]

temp = pd.DataFrame({}, index=pd.date_range('2014-1-1', periods=4))
# Partial assignment converts
temp.ix[:-1,'A'] = np.array([1*one_hour]*3, dtype='m8[ns]')
# DataFrame is all floats; converts
temp.ix[:,'B'] = np.array([2*one_hour]*4, dtype='m8[ns]')
# .ix and .loc behave the same
temp.loc[:,'C'] = np.array([3*one_hour]*4, dtype='m8[ns]')
# straight column assignment doesn't convert
temp['D'] = np.array([4*one_hour]*4, dtype='m8[ns]')
# Now there are timedeltas; doesn't convert
temp.ix[:,'E'] = np.array([5*one_hour]*4, dtype='m8[ns]')
# .ix and .loc still behave the same
temp.loc[:,'F'] = np.array([6*one_hour]*4, dtype='m8[ns]')

temp
#                        A             B             C        D        E  \
#2014-01-01  3.600000e+12  7.200000e+12  1.080000e+13 04:00:00 05:00:00   
#2014-01-02  3.600000e+12  7.200000e+12  1.080000e+13 04:00:00 05:00:00   
#2014-01-03  3.600000e+12  7.200000e+12  1.080000e+13 04:00:00 05:00:00   
#2014-01-04           NaN  7.200000e+12  1.080000e+13 04:00:00 05:00:00   
# 
#                   F  
#2014-01-01 06:00:00  
#2014-01-02 06:00:00  
#2014-01-03 06:00:00  
#2014-01-04 06:00:00  
# 
# [4 rows x 6 columns]

temp = pd.DataFrame({}, index=pd.date_range('2014-1-1', periods=4))
# No columns yet, no conversion
temp.ix[:,'A'] = np.array([2*one_hour]*4, dtype='m8[ns]')
#                   A
#2014-01-01 02:00:00
#2014-01-02 02:00:00
#2014-01-03 02:00:00
#2014-01-04 02:00:00
# 
# [4 rows x 1 columns]

The text was updated successfully, but these errors were encountered:

jreback · 2014-06-27T18:50:10Z

show your numpy / pandas versions

ischwabacher · 2014-06-27T19:02:02Z

Oops.

pd.show_versions()
# 
# INSTALLED VERSIONS
# ------------------
# commit: None
# python: 2.7.5.final.0
# python-bits: 64
# OS: Darwin
# OS-release: 13.2.0
# machine: x86_64
# processor: i386
# byteorder: little
# LC_ALL: None
# LANG: en_US.UTF-8
# 
# pandas: 0.13.1
# Cython: None
# numpy: 1.8.0
# scipy: 0.12.1
# statsmodels: None
# IPython: 1.1.0
# sphinx: None
# patsy: None
# scikits.timeseries: None
# dateutil: 2.2
# pytz: 2013.9
# bottleneck: None
# tables: None
# numexpr: None
# matplotlib: 1.3.0
# openpyxl: None
# xlrd: None
# xlwt: None
# xlsxwriter: None
# sqlalchemy: None
# lxml: None
# bs4: None
# html5lib: None
# bq: None
# apiclient: None

jreback · 2014-06-27T19:17:26Z

hmm, this is a partial assignment with dtype inference. Looks like a bug (e.g. this works for a rhs of datestime64[ns]). Welcome to have you dig in!

jreback · 2014-06-27T20:44:54Z

this is fixed in master. Had an erroneous dtype inference on the setting value (when you were setting on a FloatBlock)

ischwabacher mentioned this issue Jun 27, 2014

Chained assignment yields unexpected result with timedelta64 values (np v1.8, pd v0.13.1) #7585

Closed

jreback added Bug labels Jun 27, 2014

jreback modified the milestones: 0.15.0, 0.14.1 Jun 27, 2014

jreback mentioned this issue Jun 27, 2014

BUG: Bug in timedelta inference when assigning an incomplete Series (GH7592) #7593

Merged

jreback closed this as completed in #7593 Jun 27, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assigning timedelta64 to new column casts to float instead of filling missing values with NaT #7592

Assigning timedelta64 to new column casts to float instead of filling missing values with NaT #7592

ischwabacher commented Jun 27, 2014

jreback commented Jun 27, 2014

ischwabacher commented Jun 27, 2014

jreback commented Jun 27, 2014

jreback commented Jun 27, 2014

Assigning timedelta64 to new column casts to float instead of filling missing values with NaT #7592

Assigning timedelta64 to new column casts to float instead of filling missing values with NaT #7592

Comments

ischwabacher commented Jun 27, 2014

jreback commented Jun 27, 2014

ischwabacher commented Jun 27, 2014

jreback commented Jun 27, 2014

jreback commented Jun 27, 2014