Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Assorted DatetimeIndex bugfixes #24157

Merged
merged 32 commits into from
Dec 14, 2018
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
092b3b9
Deprecate time_rule
jbrockmendel Dec 7, 2018
fbcc04b
cleanup
jbrockmendel Dec 7, 2018
ac52857
fix test
jbrockmendel Dec 7, 2018
121c373
Fix indexing with ellipsis, GH#21282
jbrockmendel Dec 7, 2018
ef66bb9
isort tests.arithmetic, tests.dtypes
jbrockmendel Dec 7, 2018
91738d3
raise on date_range with NaT for either start or end
jbrockmendel Dec 7, 2018
bb0d065
revert noncentral changes
jbrockmendel Dec 7, 2018
ff3a5c0
more useful error messages on validate_frequency failure
jbrockmendel Dec 7, 2018
e54159d
Fix DTI-->Categorical -->DTI roundtrip
jbrockmendel Dec 7, 2018
44e0126
extend test to DTA
jbrockmendel Dec 8, 2018
12e0f4e
comment
jbrockmendel Dec 8, 2018
28bf2de
update GH reference
jbrockmendel Dec 8, 2018
06d0a8e
make TDI catch nanos valuerror
jbrockmendel Dec 8, 2018
81c7d0f
fix error message, hopefully
jbrockmendel Dec 8, 2018
97f976f
Merge branch 'master' of https://github.com/pandas-dev/pandas into deets
jbrockmendel Dec 8, 2018
9e55d97
Merge branch 'master' of https://github.com/pandas-dev/pandas into deets
jbrockmendel Dec 9, 2018
c7f280f
Merge branch 'master' of https://github.com/pandas-dev/pandas into deets
jbrockmendel Dec 9, 2018
af303c7
improve comments
jbrockmendel Dec 9, 2018
8ece686
use take
jbrockmendel Dec 9, 2018
30f01a0
whitespace
jbrockmendel Dec 9, 2018
9617b85
Merge branch 'master' of https://github.com/pandas-dev/pandas into deets
jbrockmendel Dec 11, 2018
3731098
Merge branch 'master' of https://github.com/pandas-dev/pandas into deets
jbrockmendel Dec 13, 2018
df05c88
do ellipsis check later
jbrockmendel Dec 13, 2018
8f39b23
Avoid AssertionError
jbrockmendel Dec 13, 2018
e958ce6
remove resolved comment
jbrockmendel Dec 13, 2018
97cb6b3
Merge branch 'master' of https://github.com/pandas-dev/pandas into deets
jbrockmendel Dec 13, 2018
36d37e4
more whatsnew
jbrockmendel Dec 13, 2018
3fc1c19
comment harder
jbrockmendel Dec 13, 2018
88f7094
comment harder
jbrockmendel Dec 13, 2018
e178bb9
remove time_rule entirely
jbrockmendel Dec 13, 2018
50f6b7e
whatsnew
jbrockmendel Dec 13, 2018
b927925
flake8 fixup
jbrockmendel Dec 13, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1105,7 +1105,7 @@ Deprecations
- Passing a string alias like ``'datetime64[ns, UTC]'`` as the `unit` parameter to :class:`DatetimeTZDtype` is deprecated. Use :class:`DatetimeTZDtype.construct_from_string` instead (:issue:`23990`).
- In :meth:`Series.where` with Categorical data, providing an ``other`` that is not present in the categories is deprecated. Convert the categorical to a different dtype or add the ``other`` to the categories first (:issue:`24077`).
- :meth:`Series.clip_lower`, :meth:`Series.clip_upper`, :meth:`DataFrame.clip_lower` and :meth:`DataFrame.clip_upper` are deprecated and will be removed in a future version. Use ``Series.clip(lower=threshold)``, ``Series.clip(upper=threshold)`` and the equivalent ``DataFrame`` methods (:issue:`24203`)

- Passing a `time_rule` to `pandas.tseries.offsets.generate_range` is deprecated and will raise a ``TypeError`` in a future version. Pass an ``offset`` instead (:issue:`24157`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these don't jive with what's at the top of the PR. missing 2 notes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about #11587 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it merit a note? All that is changed is an error message. But yes, this does close 11587, I'll update this above.


.. _whatsnew_0240.deprecations.datetimelike_int_ops:

Expand Down Expand Up @@ -1395,6 +1395,7 @@ Indexing
- Bug in :func:`Index.union` and :func:`Index.intersection` where name of the ``Index`` of the result was not computed correctly for certain cases (:issue:`9943`, :issue:`9862`)
- Bug in :class:`Index` slicing with boolean :class:`Index` may raise ``TypeError`` (:issue:`22533`)
- Bug in ``PeriodArray.__setitem__`` when accepting slice and list-like value (:issue:`23978`)
- Bug in :class:`DatetimeIndex`, :class:`TimedeltaIndex` where indexing with ``Ellipsis`` would lose their ``freq`` attribute (:issue:`21282`)

Missing
^^^^^^^
Expand Down
19 changes: 16 additions & 3 deletions pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,10 @@ def __getitem__(self, key):
"numpy.newaxis (`None`) and integer or boolean "
"arrays are valid indices")

if key is Ellipsis:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make a more explicit comment. This is likely a more general fix (IOW this is likely broken for all __getitem__ on EA)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't this just work as is? why is this not a view (and not a copy)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't this just work as is?

as-is DatetimeIndex[...] loses its freq attribute

why is this not a view (and not a copy)?

Since the copy doesn't have deep=True, self.copy() effectively a view?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you be more explicit in your comment, this is not obvious

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to a hopefully clearer spot a few lines down

# GH#21282 avoid losing `freq` attribute
return self.copy()

getitem = self._data.__getitem__
if is_int:
val = getitem(key)
Expand Down Expand Up @@ -547,9 +551,18 @@ def _validate_frequency(cls, index, freq, **kwargs):
if index.size == 0 or inferred == freq.freqstr:
return None

on_freq = cls._generate_range(start=index[0], end=None,
periods=len(index), freq=freq, **kwargs)
if not np.array_equal(index.asi8, on_freq.asi8):
try:
on_freq = cls._generate_range(start=index[0], end=None,
periods=len(index), freq=freq,
**kwargs)
assert np.array_equal(index.asi8, on_freq.asi8)
except (ValueError, AssertionError) as e:
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
if "non-fixed" in str(e):
# non-fixed frequencies are not meaningful for timedelta64;
# we retain that error message
raise e
# GH#11587 if index[0] is NaT _generate_range will raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is too cryptic, pls make more readable if you don't actually know what's happening here

# ValueError, which we re-raise with a targeted error message
raise ValueError('Inferred frequency {infer} from passed values '
'does not conform to passed frequency {passed}'
.format(infer=inferred, passed=freq.freqstr))
Expand Down
15 changes: 12 additions & 3 deletions pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@
from pandas.util._decorators import Appender

from pandas.core.dtypes.common import (
_INT64_DTYPE, _NS_DTYPE, is_datetime64_dtype, is_datetime64tz_dtype,
is_extension_type, is_float_dtype, is_int64_dtype, is_object_dtype,
is_period_dtype, is_string_dtype, is_timedelta64_dtype)
_INT64_DTYPE, _NS_DTYPE, is_categorical_dtype, is_datetime64_dtype,
is_datetime64tz_dtype, is_extension_type, is_float_dtype, is_int64_dtype,
is_object_dtype, is_period_dtype, is_string_dtype, is_timedelta64_dtype)
from pandas.core.dtypes.dtypes import DatetimeTZDtype
from pandas.core.dtypes.generic import ABCIndexClass, ABCSeries
from pandas.core.dtypes.missing import isna
Expand Down Expand Up @@ -264,6 +264,8 @@ def _generate_range(cls, start, end, periods, freq, tz=None,
if closed is not None:
raise ValueError("Closed has to be None if not both of start"
"and end are defined")
if start is NaT or end is NaT:
raise ValueError("Neither `start` nor `end` can be NaT")

left_closed, right_closed = dtl.validate_endpoints(closed)

Expand Down Expand Up @@ -1652,6 +1654,13 @@ def maybe_convert_dtype(data, copy):
raise TypeError("Passing PeriodDtype data is invalid. "
"Use `data.to_timestamp()` instead")

elif is_categorical_dtype(data):
# GH#18664 preserve tz in going DTI->Categorical->DTI
# TODO: cases where we need to do another pass through this func,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the TODO case here, what is an example? do you have an xfailed test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example is Categorical(TimedeltaIndex) where we should do another pass through maybe_convert_dtype in order to issue the FutureWarning. I'm assuming there are other corner cases here that will need to be caught/tested.

# e.g. the categories are timedelta64s
data = data.categories.take(data.codes, fill_value=NaT)
# TODO: does this always make a copy? If so, set copy=False
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved

elif is_extension_type(data) and not is_datetime64tz_dtype(data):
# Includes categorical
# TODO: We have no tests for these
Expand Down
32 changes: 31 additions & 1 deletion pandas/tests/indexes/datetimes/test_construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,42 @@
from pandas import (
DatetimeIndex, Index, Timestamp, date_range, datetime, offsets,
to_datetime)
from pandas.core.arrays import period_array
from pandas.core.arrays import (
DatetimeArrayMixin as DatetimeArray, period_array)
import pandas.util.testing as tm


class TestDatetimeIndex(object):

@pytest.mark.parametrize('dt_cls', [DatetimeIndex, DatetimeArray])
def test_freq_validation_with_nat(self, dt_cls):
# GH#11587 make sure we get a useful error message when generate_range
# raises
msg = ("Inferred frequency None from passed values does not conform "
"to passed frequency D")
with pytest.raises(ValueError, match=msg):
dt_cls([pd.NaT, pd.Timestamp('2011-01-01')], freq='D')
with pytest.raises(ValueError, match=msg):
dt_cls([pd.NaT, pd.Timestamp('2011-01-01').value],
freq='D')

def test_categorical_preserves_tz(self):
# GH#18664 retain tz when going DTI-->Categorical-->DTI
# TODO: parametrize over DatetimeIndex/DatetimeArray
# once CategoricalIndex(DTA) works

dti = pd.DatetimeIndex(
[pd.NaT, '2015-01-01', '1999-04-06 15:14:13', '2015-01-01'],
tz='US/Eastern')

ci = pd.CategoricalIndex(dti)
carr = pd.Categorical(dti)
cser = pd.Series(ci)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parametrize

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will take a look

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not without adding casting code that makes the verbosity a wash.

Once Categorical(DatetimeArray) works (currently raises bc DatetimeArray doesn't have _constructor) then this test can be parametrized over DatetimeIndex/DatetimeArray

for obj in [ci, carr, cser]:
result = pd.DatetimeIndex(obj)
tm.assert_index_equal(result, dti)

def test_dti_with_period_data_raises(self):
# GH#23675
data = pd.PeriodIndex(['2016Q1', '2016Q2'], freq='Q')
Expand Down
14 changes: 12 additions & 2 deletions pandas/tests/indexes/datetimes/test_date_range.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,14 @@ def test_date_range_timestamp_equiv_preserve_frequency(self):


class TestDateRanges(TestData):
def test_date_range_nat(self):
# GH#11587
msg = "Neither `start` nor `end` can be NaT"
with pytest.raises(ValueError, match=msg):
date_range(start='2016-01-01', end=pd.NaT, freq='D')
with pytest.raises(ValueError, match=msg):
date_range(start=pd.NaT, end='2016-01-01', freq='D')

def test_date_range_out_of_bounds(self):
# GH#14187
with pytest.raises(OutOfBoundsDatetime):
Expand Down Expand Up @@ -533,12 +541,14 @@ class TestGenRangeGeneration(object):

def test_generate(self):
rng1 = list(generate_range(START, END, offset=BDay()))
rng2 = list(generate_range(START, END, time_rule='B'))
with tm.assert_produces_warning(FutureWarning):
rng2 = list(generate_range(START, END, time_rule='B'))
assert rng1 == rng2

def test_generate_cday(self):
rng1 = list(generate_range(START, END, offset=CDay()))
rng2 = list(generate_range(START, END, time_rule='C'))
with tm.assert_produces_warning(FutureWarning):
rng2 = list(generate_range(START, END, time_rule='C'))
assert rng1 == rng2

def test_1(self):
Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/indexes/datetimes/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,15 @@


class TestGetItem(object):
def test_ellipsis(self):
# GH#21282
idx = pd.date_range('2011-01-01', '2011-01-31', freq='D',
tz='Asia/Tokyo', name='idx')

result = idx[...]
assert result.equals(idx)
assert result is not idx

def test_getitem(self):
idx1 = pd.date_range('2011-01-01', '2011-01-31', freq='D', name='idx')
idx2 = pd.date_range('2011-01-01', '2011-01-31', freq='D',
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/indexes/period/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,14 @@


class TestGetItem(object):
def test_ellipsis(self):
# GH#21282
idx = period_range('2011-01-01', '2011-01-31', freq='D',
name='idx')

result = idx[...]
assert result.equals(idx)
assert result is not idx

def test_getitem(self):
idx1 = pd.period_range('2011-01-01', '2011-01-31', freq='D',
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/indexes/timedeltas/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,14 @@


class TestGetItem(object):
def test_ellipsis(self):
# GH#21282
idx = timedelta_range('1 day', '31 day', freq='D', name='idx')

result = idx[...]
assert result.equals(idx)
assert result is not idx

def test_getitem(self):
idx1 = timedelta_range('1 day', '31 day', freq='D', name='idx')

Expand Down
22 changes: 13 additions & 9 deletions pandas/tests/tseries/offsets/test_offsets.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,21 +49,25 @@ class WeekDay(object):
####


def test_time_rule_deprecated():
start = datetime(2007, 10, 1)
end = datetime(2012, 4, 9)

with tm.assert_produces_warning(FutureWarning):
# Note: generate_range returns a generator, and the warning is not
# issued until the first __next__ call
list(offsets.generate_range(start=start, end=end, time_rule="W"))


def test_to_m8():
valb = datetime(2007, 10, 1)
valu = _to_m8(valb)
assert isinstance(valu, np.datetime64)
# assert valu == np.datetime64(datetime(2007,10,1))

# def test_datetime64_box():
# valu = np.datetime64(datetime(2007,10,1))
# valb = _dt_box(valu)
# assert type(valb) == datetime
# assert valb == datetime(2007,10,1)

#####
# DateOffset Tests
#####
#####
# DateOffset Tests
#####


class Base(object):
Expand Down
4 changes: 4 additions & 0 deletions pandas/tseries/offsets.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from datetime import date, datetime, timedelta
import functools
import operator
import warnings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob going to fail the build as this is not needed


from dateutil.easter import easter
import numpy as np
Expand Down Expand Up @@ -2487,6 +2488,9 @@ def generate_range(start=None, end=None, periods=None,

"""
if time_rule is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you need to change any existing tests to account for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, tests.indexes.datetimes.test_date_range test_generate and test_generate_cday

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this sufficiently internal that we don't need to do a deprecation cycle?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let's just drop it entirely (fix the tests). I don't think generate_range is exposed, though maybe add in a note in api breaking changes just to cover.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. OK with moving this to arrays.datetimes while we're at it?

warnings.warn("`time_rule` is deprecated and will be removed in a "
"future version. Use `offset` instead.",
FutureWarning, stacklevel=2)
from pandas.tseries.frequencies import get_offset

offset = get_offset(time_rule)
Expand Down