Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DTI comparison with None, datetime.date #19301

Merged
merged 17 commits into from Feb 2, 2018

Conversation

Projects
None yet
3 participants
@jbrockmendel
Copy link
Member

commented Jan 18, 2018

Discussed in #19288

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry
datetime(2016, 1, 1).date()])
def test_dti_cmp_invalid(self, tz, other):
# GH#19301
dti = pd.date_range('2016-01-01', periods=2, tz=tz)

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Jan 18, 2018

Contributor

So is this saying that

DatetimeIndex([None, '2016-01-01']) == [None, datetime.date(2016, 1, 1)])

is [False, False]? I thought that in #18188 we decided that DatetimeIndex compared with datetime.date would coerce the date to a datetime at midnight?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Jan 18, 2018

Author Member

@TomAugspurger I think part of the confusion is over Timestamp comparison vs DatetimeIndex comparison vs Series[datetime64] comparison (e.g. the whatsnew note in #18188 talks about Series (but puts the tests in index tests...)). This PR and a bunch of other recent ones have been focused on making the Index/Series behavior more consistent.

Following this, #19288 can be de-kludged to make Series[datetime64] comparison dispatch to DatetimeIndex comparison, ensuring internal consistency. But you're right that this would mean a change in the behavior of Series[datetime64] comparisons.

For the moment I'm taking Timestamp behavior as canonical and making DatetimeIndex match that.

ts = pd.Timestamp('2016-01-01')
>>> ts == ts.to_pydatetime().date()
False
>>> ts < ts.to_pydatetime().date()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/_libs/tslib.pyx", line 1165, in pandas._libs.tslib._Timestamp.__richcmp__
TypeError: Cannot compare type 'Timestamp' with type 'date'

I recall a discussion of whether date should be treated as datetime-at-midnight for Timestamp comparison purposes, my thought being it should be treated as Period(..., freq='D').

This comment has been minimized.

Copy link
@jreback

jreback Jan 18, 2018

Contributor
In [1]: ts = pd.Timestamp('2016-01-01')

In [2]: ts
Out[2]: Timestamp('2016-01-01 00:00:00')

In [3]: ts.date()
Out[3]: datetime.date(2016, 1, 1)

In [4]: ts.to_pydatetime()
Out[4]: datetime.datetime(2016, 1, 1, 0, 0)

In [5]: ts.to_pydatetime() == ts.date()
Out[5]: False

In [6]: ts.to_pydatetime().date() == ts.date()
Out[6]: True

I find [5] a bit odd.

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Jan 19, 2018

Author Member

I find [5] a bit odd.

The behavior is analogous to comparing Timestamp vs Period. eq and ne return False and True, respectively, and all others raise TypeError. Its odd if you interpret date as "datetime implicitly at midnight", but pretty intuitive if you interpret it as "less specific than a timestamp"

@@ -41,6 +41,25 @@ def addend(request):
return request.param


class TestDatetimeIndexComparisons(object):

This comment has been minimized.

Copy link
@jreback

jreback Jan 18, 2018

Contributor

can you lump the None with the NaT comparisons. Do we have the same testing of these comparisons for scalars?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Jan 19, 2018

Author Member

AFAICT the comparison tests are scattered. I had planned to consolidate them here in a follow-up to keep narrow focus here.

can you lump the None with the NaT comparisons

They would need to be separate tests since the behavior is different, but I can put the tests adjacent to each other in this class.

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Jan 19, 2018

Author Member

Centralizing the DTI comparison tests is now a part of #19317. After that I'll make a pass to make sure all the relevant cases are covered.

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Jan 19, 2018

Looks like Travis cancellation.

@codecov

This comment has been minimized.

Copy link

commented Jan 21, 2018

Codecov Report

Merging #19301 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #19301   +/-   ##
=======================================
  Coverage   91.67%   91.67%           
=======================================
  Files         148      148           
  Lines       48553    48553           
=======================================
  Hits        44513    44513           
  Misses       4040     4040
Flag Coverage Δ
#multiple 90.04% <100%> (ø) ⬆️
#single 41.71% <16.66%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 95.23% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 35812ea...2ea1205. Read the comment docs.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 21, 2018

rebase

Index, ABCSeries)):
# Following Timestamp convention, __eq__ is all-False
# and __ne__ is all True, others raise TypeError.
if opname == '__eq__':

This comment has been minimized.

Copy link
@jreback

jreback Jan 21, 2018

Contributor

I think I prefer to return in the if/elif, then just raise instead of an else

dti > other
with pytest.raises(TypeError):
dti >= other

# TODO: De-duplicate with test_comparisons_nat below

This comment has been minimized.

Copy link
@jreback

jreback Jan 21, 2018

Contributor

is there a reason NOT to de-dupe with the other comparisons in this file (e.g. the TODO comment), in this PR? This is adding code then then will be inserted into the parameterization, I would rather just do it right once.

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Jan 21, 2018

Author Member

I think this comment was added in #19317, which was largely intended to be cut/paste with edits done in a separate PR. I can go and add this de-duplication to this PR, but that would be expanding the scope.

This comment has been minimized.

Copy link
@jreback

jreback Jan 21, 2018

Contributor

let's do it. right now these are a bit orphaned and should be combined with existing tests. in general this is a good thing to do anyhow.

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Jan 21, 2018

Author Member

Done. Turned out they weren't duplicated so much as poorly-named.

@@ -434,6 +434,7 @@ Conversion
- Bug in ``.astype()`` to non-ns timedelta units would hold the incorrect dtype (:issue:`19176`, :issue:`19223`, :issue:`12425`)
- Bug in subtracting :class:`Series` from ``NaT`` incorrectly returning ``NaT`` (:issue:`19158`)
- Bug in comparison of timezone-aware :class:`DatetimeIndex` against ``NaT`` incorrectly raising ``TypeError`` (:issue:`19276`)
- Bug in comparison of :class:`DatetimeIndex` against ``None`` or ``datetime.date`` objects raising ``TypeError`` for ``==`` and ``!=`` comparisons instead of all-``False`` and all-``True``, respectively (:issue:`19301`)

This comment has been minimized.

Copy link
@jreback

jreback Jan 21, 2018

Contributor

rebase again, I just updated

with pytest.raises(TypeError):
dti < other
with pytest.raises(TypeError):
dti <= other

This comment has been minimized.

Copy link
@jreback

jreback Jan 21, 2018

Contributor

this should also test pd.NaT here (and I think we have a tests for np.nan that should test raising).

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Jan 22, 2018

Author Member

Just added nan to the params for this test. pd.NaT test is immediately below this one.

This comment has been minimized.

Copy link
@jreback

jreback Jan 24, 2018

Contributor

the datetime.date part of this needs to move to where we test comparisions with Timestamp, datetime.datetime and np.datetime64

@pytest.mark.parametrize('other', [None,
datetime(2016, 1, 1).date(),
np.nan])
def test_dti_cmp_invalid(self, tz, other):

This comment has been minimized.

Copy link
@jreback

jreback Jan 23, 2018

Contributor

hmm, I think leave None and np.nan here is ok (rename this from invalid -> null or something more descriptive).

put the date with a Timestamp / datetime test.

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Jan 23, 2018

11 hours and the OSX build hasn't started. Possible problem with Travis?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 24, 2018

see that red X on the status (on the travis page). They give messages.

with pytest.raises(TypeError):
dti < other
with pytest.raises(TypeError):
dti <= other

This comment has been minimized.

Copy link
@jreback

jreback Jan 24, 2018

Contributor

the datetime.date part of this needs to move to where we test comparisions with Timestamp, datetime.datetime and np.datetime64

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Jan 24, 2018

the datetime.date part of this needs to move to where we test comparisions with Timestamp, datetime.datetime and np.datetime64

OK, but it'll be a test identical the the one here, just in a different place.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 24, 2018

my point is there already are tests for this - need to collocate them

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Jan 24, 2018

my point is there already are tests for this - need to collocate them

OK. There are no such tests in this file; possible in scalar. My first choice is to not move anything in this PR, but if we have to either move DTI-with-timestamp-comparison test to tests.scalar or timestamp-with-DTI-comparison test to tests.indexes, I'd prefer the latter.

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Jan 24, 2018

Not seeing it in scalar either. I guess I'll add some extras here.

dti >= other

@pytest.mark.parametrize('other', [None,
np.nan])

This comment has been minimized.

Copy link
@jreback

jreback Jan 24, 2018

Contributor

can you add pd.NaT here

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Jan 24, 2018

Author Member

No, pd.NaT behaves differently from None or np.nan.

This comment has been minimized.

Copy link
@jreback

jreback Jan 25, 2018

Contributor

huh? that would be troubling, why is that

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Jan 25, 2018

Author Member

The __eq__ and __ne__ behave the same, but None and np.nan raise TypeError for the inequalities, whereas pd.NaT returns False.

This comment has been minimized.

Copy link
@jreback

jreback Jan 25, 2018

Contributor

right. ok then split the tests to reflect this, testing all nulls at once for eq/ne make the test easy to grok.

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Jan 25, 2018

Author Member

make the test easy to grok.

I'll do this, but object to the notions that fixtures and easy-to-grok can go in the same file. If I can't import the test and run it in the interpreter, then I'm in a Strange Land.

@@ -44,7 +45,74 @@ def addend(request):


class TestDatetimeIndexComparisons(object):
# TODO: De-duplicate with test_comparisons_nat below
@pytest.mark.parametrize('other', [datetime(2016, 1, 1),

This comment has been minimized.

Copy link
@jreback

jreback Jan 24, 2018

Contributor

can you move all of these comparisons (that you are moving anyway) to a new test_compare.py

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Jan 24, 2018

Author Member

Are you sure? The other test_arithmetic.py files we've refactored out recently mostly include a TestComparison class.

This comment has been minimized.

Copy link
@jreback

jreback Jan 27, 2018

Contributor

I would like to separate them, a followup is ok.


@pytest.mark.parametrize('other', [None,
np.nan])
def test_dti_cmp_non_datetime(self, tz, other):

This comment has been minimized.

Copy link
@jreback

jreback Jan 24, 2018

Contributor

dti_cmp_null_scalar

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Jan 24, 2018

Author Member

Sure

@jbrockmendel jbrockmendel reopened this Jan 25, 2018

@jreback
Copy link
Contributor

left a comment

looks ok. pls rebase

@@ -44,7 +45,74 @@ def addend(request):


class TestDatetimeIndexComparisons(object):
# TODO: De-duplicate with test_comparisons_nat below
@pytest.mark.parametrize('other', [datetime(2016, 1, 1),

This comment has been minimized.

Copy link
@jreback

jreback Jan 27, 2018

Contributor

I would like to separate them, a followup is ok.

@jreback jreback added this to the 0.23.0 milestone Jan 27, 2018

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Jan 28, 2018

Thoughts here? This is now a blocker for the next steps in core.ops.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Feb 1, 2018

this looks good. rebase and ping on green.

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Feb 1, 2018

ping

@jreback

jreback approved these changes Feb 2, 2018

@jreback jreback merged commit cd6510d into pandas-dev:master Feb 2, 2018

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Feb 2, 2018

thanks!

@jbrockmendel just wanted to say. very much appreciate your changes. As they get more complicated and/or hit edge cases, I am necessarily being more of a stickler on things. don't take it personally (and you are very responsive!)

pandas is quite complicated and enforcing consistency across user AND developer experiences is hard, but very important.

@jbrockmendel jbrockmendel deleted the jbrockmendel:dti_cmp_fix branch Feb 4, 2018

harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.