Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dispatch Series[datetime64] comparison ops to DatetimeIndex #19800

Merged
merged 16 commits into from
Mar 1, 2018
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,9 +139,21 @@ def wrapper(self, other):

if isinstance(other, Index):
o_mask = other.values.view('i8') == libts.iNaT
elif isinstance(other, ABCSeries):
try:
# GH#19800 On 32-bit Windows builds this fails, but we can
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last few commits have been troubleshooting appveyor-only errors. Based on this line in the logs I guessed it was a 32-bit build, but now I see a "64" in their so am back to no-clue:

[gw1] win32 -- Python 3.6.4 C:\Miniconda3_64\envs\pandas\python.exe

It looks like in this build pandas.tests.frame.test_operators.TestDataFrameOperators.test_comparison_invalid is trying to compare a DatetimeIndex with a Series[int32]. In other builds the same test
compares with a Series[int64] and goes through OK. The exact error varies depending on which commit we're looking at, but the pertinent one is a call to other.values.view('i8') which raises ValueError with a Series[int32].

Note that pd.date_range('2016-01-01', periods=3) == pd.Series([0, 1, 2], dtype='i4') raises locally, so the appveyor-specific behavior is whatever is causing it to pass an int32 series when everything else passes an int64.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this it a bug then, something is assuming int64 but getting int32, if you want to attach the last:

In [6]: pd.Series([0, 1, 2], dtype='i4').view('i8')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-b2f0cc20755c> in <module>()
----> 1 pd.Series([0, 1, 2], dtype='i4').view('i8')

~/pandas/pandas/core/series.py in view(self, dtype)
    537 
    538     def view(self, dtype=None):
--> 539         return self._constructor(self._values.view(dtype),
    540                                  index=self.index).__finalize__(self)
    541 

ValueError: new type not compatible with array.

In [7]: pd.Series([0, 1, 2], dtype='i4').astype('i8', copy=False)
Out[7]: 
0    0
1    1
2    2
dtype: int64

I get why this is not allowed, but its a bit annoying.

# infer that the mask should be all-False
view = other.values.view('i8')
except ValueError:
o_mask = np.zeros(shape=other.shape, dtype=bool)
else:
o_mask = view == libts.iNaT
else:
o_mask = other.view('i8') == libts.iNaT

# for older numpys we need to be careful not to pass a Series
# as a mask below
o_mask = com._values_from_object(o_mask)
if o_mask.any():
result[o_mask] = nat_result

Expand Down
32 changes: 16 additions & 16 deletions pandas/core/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,7 @@
import numpy as np
import pandas as pd

from pandas._libs import (lib, index as libindex,
algos as libalgos)
from pandas._libs import lib, algos as libalgos

from pandas import compat
from pandas.util._decorators import Appender
Expand Down Expand Up @@ -944,24 +943,20 @@ def na_op(x, y):
# integer comparisons

# we have a datetime/timedelta and may need to convert
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment still correct? Or is it only for y now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only for y now.

assert not needs_i8_conversion(x)
mask = None
if (needs_i8_conversion(x) or
(not is_scalar(y) and needs_i8_conversion(y))):

if is_scalar(y):
mask = isna(x)
y = libindex.convert_scalar(x, com._values_from_object(y))
else:
mask = isna(x) | isna(y)
y = y.view('i8')
if not is_scalar(y) and needs_i8_conversion(y):
mask = isna(x) | isna(y)
y = y.view('i8')
x = x.view('i8')

try:
method = getattr(x, name, None)
if method is not None:
with np.errstate(all='ignore'):
result = getattr(x, name)(y)
result = method(y)
if result is NotImplemented:
raise TypeError("invalid type comparison")
except AttributeError:
else:
result = op(x, y)

if mask is not None and mask.any():
Expand Down Expand Up @@ -991,6 +986,12 @@ def wrapper(self, other, axis=None):
return self._constructor(res_values, index=self.index,
name=res_name)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some comments at various places

if is_datetime64_dtype(self) or is_datetime64tz_dtype(self):
res_values = dispatch_to_index_op(op, self, other,
pd.DatetimeIndex)
return self._constructor(res_values, index=self.index,
name=res_name)

elif is_timedelta64_dtype(self):
res_values = dispatch_to_index_op(op, self, other,
pd.TimedeltaIndex)
Expand All @@ -1008,8 +1009,7 @@ def wrapper(self, other, axis=None):
elif isinstance(other, (np.ndarray, pd.Index)):
# do not check length of zerodim array
# as it will broadcast
if (not is_scalar(lib.item_from_zerodim(other)) and
len(self) != len(other)):
if other.ndim != 0 and len(self) != len(other):
raise ValueError('Lengths must match to compare')

res_values = na_op(self.values, np.asarray(other))
Expand Down
4 changes: 2 additions & 2 deletions pandas/tests/indexes/datetimes/test_partial_slicing.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

import pytest

from datetime import datetime, date
from datetime import datetime
import numpy as np
import pandas as pd
import operator as op
Expand Down Expand Up @@ -349,7 +349,7 @@ def test_loc_datetime_length_one(self):

@pytest.mark.parametrize('datetimelike', [
Timestamp('20130101'), datetime(2013, 1, 1),
date(2013, 1, 1), np.datetime64('2013-01-01T00:00', 'ns')])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remind me, why was this test removed?

Copy link
Member Author

@jbrockmendel jbrockmendel Feb 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b/c ATM Series[datetime64].__cmp__(date) treats the date as a datetime, i.e. allows the comparison. But DatetimeIndex does not -- following convention set by Timestamp (and datetime itself). The DatetimeIndex behavior is canonical.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a separate test for date (I know its wrong, but assert that it is wrong)

np.datetime64('2013-01-01T00:00', 'ns')])
@pytest.mark.parametrize('op,expected', [
(op.lt, [True, False, False, False]),
(op.le, [True, True, False, False]),
Expand Down
23 changes: 15 additions & 8 deletions pandas/tests/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
import pandas as pd
import pandas.compat as compat
from pandas.core.dtypes.common import (
is_object_dtype, is_datetimetz,
is_object_dtype, is_datetimetz, is_datetime64_dtype,
needs_i8_conversion)
import pandas.util.testing as tm
from pandas import (Series, Index, DatetimeIndex, TimedeltaIndex,
Expand Down Expand Up @@ -296,14 +296,21 @@ def test_none_comparison(self):
# result = None != o # noqa
# assert result.iat[0]
# assert result.iat[1]
if (is_datetime64_dtype(o) or is_datetimetz(o)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be simpler to separate out into datetimelike and non-datetimelike

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as in separate tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, all of these ifs make it hard to read

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. This whole test should be parametrized, and there isinstance(o, Series) check should be rendered unnecessary. Vote for doing this in the next pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok that's fine

# Following DatetimeIndex (and Timestamp) convention,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume inequality comparison raising is tested elsewhere? Or should we assert that in an else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to port that test from the previous PR, good catch.

# inequality comparisons with Series[datetime64] raise
with pytest.raises(TypeError):
None > o
with pytest.raises(TypeError):
o > None
else:
result = None > o
assert not result.iat[0]
assert not result.iat[1]

result = None > o
assert not result.iat[0]
assert not result.iat[1]

result = o < None
assert not result.iat[0]
assert not result.iat[1]
result = o < None
assert not result.iat[0]
assert not result.iat[1]

def test_ndarray_compat_properties(self):

Expand Down