BUG: comparisons fail for NaT in DataFrame #15697

adbull · 2017-03-16T14:38:55Z

Code Sample, a copy-pastable example if possible

>>> import pandas as pd
>>> nat = pd.NaT
>>> x = pd.Series([nat])
>>> x.eq(nat)
0    False
dtype: bool
>>> x == nat
0    False
dtype: bool
>>> y = pd.DataFrame(dict(x=x))
>>> y.eq(nat)
       x
0    NaT
>>> y == nat
       x
0    True

Problem description

Comparisons in a dataframe containing a single nat give incorrect answers. Note this occurs with both datetime and timedelta nats.

Expected Output

0    False
dtype: bool

0    False
dtype: bool

       x
0  False

       x
0  False

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.8-100.fc24.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C
LANG: C
LOCALE: None.None

pandas: 0.19.0+579.g4ce9c0c
pytest: 3.0.5
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
xarray: 0.9.1
IPython: 4.2.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: 0.999
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
s3fs: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2017-03-16T15:01:47Z

This is as expected, NaT != NaT, just as nan != nan

see the big red box: http://pandas-docs.github.io/pandas-docs-travis/missing_data.html#values-considered-missing

In [1]: nat = pd.NaT
   ...: x = pd.Series([nat])
   ...: 

In [2]: x.eq(nat)
Out[2]: 
0    False
dtype: bool

In [3]: x.isnull()
Out[3]: 
0    True
dtype: bool

In [4]: x = pd.Series([np.nan])
   ...: 
   ...: 

In [5]: x.eq(np.nan)
Out[5]: 
0    False
dtype: bool

In [6]: x.isnull()
Out[6]: 
0    True
dtype: bool

jreback · 2017-03-16T15:04:10Z

actually you are right, this broken for dataframe for NaT, but works for np.nan.

so I'll mark it, though you should never do this. Maybe we should just raise.

In [15]: y = DataFrame(dict(x=[np.nan]))

In [16]: y.eq(np.nan)
Out[16]: 
       x
0  False

In [17]: y == np.nan
Out[17]: 
       x
0  False

In [18]: y = DataFrame(dict(x=[pd.NaT]))

In [19]: y.eq(pd.NaT)
Out[19]: 
    x
0 NaT

In [20]: y == pd.NaT
Out[20]: 
      x
0  True

adbull · 2017-03-16T16:08:12Z

Agreed; as above, the correct result of NaT == NaT is False. The bug is that for a DataFrame, we instead get NaT or True, depending on how we test for equality.

Is there a reason why this call should never be made? At worst, we could just call apply() and use the Series methods, which work fine.

jreback · 2017-03-16T16:15:45Z

Is there a reason why this call should never be made? At worst, we could just call apply() and use the Series methods, which work fine.

you don't want to compare against a null value, its not intuitive to do this as nan != nan is just plain confusing to most people.

The more explicit

df[df.isnull()] is much more obvious.

So its not that you shouldn't do it if you know what you are doing, its just non-obvious from reading. Further it can provide lots of opportunities for odd bugs, consider.

for x in ['foo', np.nan]:
     df.eq(x)

this will give totally unexpected results.

adbull · 2017-03-16T16:18:39Z

Sure, that specific call would be better as .isnull(), but nats break comparisons in dataframes more generally.

>>> nat = pd.NaT
>>> now = pd.to_datetime('now')
>>> nat < now
False
>>> pd.DataFrame([[nat]]) < now
      0
0  True

jreback · 2017-03-16T16:23:28Z

here as the issue for Series: #9005

this is not very common on frames, you generally cannot compare frames, unless they are of a single dtype. You would usually select out a portion and then compare.

So its a bug. pull-requests are welcomed!

jreback closed this as completed Mar 16, 2017

jreback added this to the No action milestone Mar 16, 2017

jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Usage Question labels Mar 16, 2017

jreback reopened this Mar 16, 2017

jreback modified the milestones: Next Major Release, No action Mar 16, 2017

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Timedelta Timedelta data type Timeseries and removed Usage Question labels Mar 16, 2017

jreback mentioned this issue Sep 17, 2017

Unexpected exception on column with NaT #17559

Closed

jreback modified the milestones: Next Major Release, Interesting Issues Sep 17, 2017

jreback modified the milestones: Interesting Issues, Next Major Release Nov 26, 2017

jreback mentioned this issue Dec 18, 2017

BUG: NaT value comparation always result in True within DataFrame but not Series #18816

Closed

jbrockmendel mentioned this issue Dec 19, 2017

DataFrame vs Series vs Index arithmetic Roundup #18824

Closed

59 tasks

nmusolino mentioned this issue Mar 13, 2018

DataFrame[timedelta64] / timedelta64 or pydatetime has wrong dtype and wrong values #20088

Closed

jbrockmendel mentioned this issue Aug 7, 2018

dispatch scalar DataFrame ops to Series #22163

Merged

jreback modified the milestones: Contributions Welcome, 0.24.0 Aug 14, 2018

jreback closed this as completed in #22163 Aug 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: comparisons fail for NaT in DataFrame #15697

BUG: comparisons fail for NaT in DataFrame #15697

adbull commented Mar 16, 2017

INSTALLED VERSIONS

jreback commented Mar 16, 2017

jreback commented Mar 16, 2017 •

edited

adbull commented Mar 16, 2017 •

edited

jreback commented Mar 16, 2017

adbull commented Mar 16, 2017

jreback commented Mar 16, 2017

BUG: comparisons fail for NaT in DataFrame #15697

BUG: comparisons fail for NaT in DataFrame #15697

Comments

adbull commented Mar 16, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Mar 16, 2017

jreback commented Mar 16, 2017 • edited

adbull commented Mar 16, 2017 • edited

jreback commented Mar 16, 2017

adbull commented Mar 16, 2017

jreback commented Mar 16, 2017

Output of `pd.show_versions()`

jreback commented Mar 16, 2017 •

edited

adbull commented Mar 16, 2017 •

edited