Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Assorted DatetimeIndex bugfixes #24157

Merged
merged 32 commits into from
Dec 14, 2018
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
092b3b9
Deprecate time_rule
jbrockmendel Dec 7, 2018
fbcc04b
cleanup
jbrockmendel Dec 7, 2018
ac52857
fix test
jbrockmendel Dec 7, 2018
121c373
Fix indexing with ellipsis, GH#21282
jbrockmendel Dec 7, 2018
ef66bb9
isort tests.arithmetic, tests.dtypes
jbrockmendel Dec 7, 2018
91738d3
raise on date_range with NaT for either start or end
jbrockmendel Dec 7, 2018
bb0d065
revert noncentral changes
jbrockmendel Dec 7, 2018
ff3a5c0
more useful error messages on validate_frequency failure
jbrockmendel Dec 7, 2018
e54159d
Fix DTI-->Categorical -->DTI roundtrip
jbrockmendel Dec 7, 2018
44e0126
extend test to DTA
jbrockmendel Dec 8, 2018
12e0f4e
comment
jbrockmendel Dec 8, 2018
28bf2de
update GH reference
jbrockmendel Dec 8, 2018
06d0a8e
make TDI catch nanos valuerror
jbrockmendel Dec 8, 2018
81c7d0f
fix error message, hopefully
jbrockmendel Dec 8, 2018
97f976f
Merge branch 'master' of https://github.com/pandas-dev/pandas into deets
jbrockmendel Dec 8, 2018
9e55d97
Merge branch 'master' of https://github.com/pandas-dev/pandas into deets
jbrockmendel Dec 9, 2018
c7f280f
Merge branch 'master' of https://github.com/pandas-dev/pandas into deets
jbrockmendel Dec 9, 2018
af303c7
improve comments
jbrockmendel Dec 9, 2018
8ece686
use take
jbrockmendel Dec 9, 2018
30f01a0
whitespace
jbrockmendel Dec 9, 2018
9617b85
Merge branch 'master' of https://github.com/pandas-dev/pandas into deets
jbrockmendel Dec 11, 2018
3731098
Merge branch 'master' of https://github.com/pandas-dev/pandas into deets
jbrockmendel Dec 13, 2018
df05c88
do ellipsis check later
jbrockmendel Dec 13, 2018
8f39b23
Avoid AssertionError
jbrockmendel Dec 13, 2018
e958ce6
remove resolved comment
jbrockmendel Dec 13, 2018
97cb6b3
Merge branch 'master' of https://github.com/pandas-dev/pandas into deets
jbrockmendel Dec 13, 2018
36d37e4
more whatsnew
jbrockmendel Dec 13, 2018
3fc1c19
comment harder
jbrockmendel Dec 13, 2018
88f7094
comment harder
jbrockmendel Dec 13, 2018
e178bb9
remove time_rule entirely
jbrockmendel Dec 13, 2018
50f6b7e
whatsnew
jbrockmendel Dec 13, 2018
b927925
flake8 fixup
jbrockmendel Dec 13, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,7 @@ Backwards incompatible API changes
- ``max_rows`` and ``max_cols`` parameters removed from :class:`HTMLFormatter` since truncation is handled by :class:`DataFrameFormatter` (:issue:`23818`)
- :meth:`read_csv` will now raise a ``ValueError`` if a column with missing values is declared as having dtype ``bool`` (:issue:`20591`)
- The column order of the resultant :class:`DataFrame` from :meth:`MultiIndex.to_frame` is now guaranteed to match the :attr:`MultiIndex.names` order. (:issue:`22420`)
- :func:`pd.offsets.generate_range` argument ``time_rule`` has been removed; use ``offset`` instead (:issue:`24157`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably won't render as its not in the api.rst but ok


Percentage change on groupby changes
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -1133,7 +1134,6 @@ Deprecations
- In :meth:`Series.where` with Categorical data, providing an ``other`` that is not present in the categories is deprecated. Convert the categorical to a different dtype or add the ``other`` to the categories first (:issue:`24077`).
- :meth:`Series.clip_lower`, :meth:`Series.clip_upper`, :meth:`DataFrame.clip_lower` and :meth:`DataFrame.clip_upper` are deprecated and will be removed in a future version. Use ``Series.clip(lower=threshold)``, ``Series.clip(upper=threshold)`` and the equivalent ``DataFrame`` methods (:issue:`24203`)


.. _whatsnew_0240.deprecations.datetimelike_int_ops:

Integer Addition/Subtraction with Datetime-like Classes Is Deprecated
Expand Down Expand Up @@ -1310,6 +1310,9 @@ Datetimelike
- Bug in :class:`Index` where calling ``np.array(dtindex, dtype=object)`` on a timezone-naive :class:`DatetimeIndex` would return an array of ``datetime`` objects instead of :class:`Timestamp` objects, potentially losing nanosecond portions of the timestamps (:issue:`23524`)
- Bug in :class:`Categorical.__setitem__` not allowing setting with another ``Categorical`` when both are undordered and have the same categories, but in a different order (:issue:`24142`)
- Bug in :func:`date_range` where using dates with millisecond resolution or higher could return incorrect values or the wrong number of values in the index (:issue:`24110`)
- Bug in :class:`DatetimeIndex` where constructing a :class:`DatetimeIndex` from a :class:`Categorical` or :class:`CategoricalIndex` would incorrectly drop timezone information (:issue:`18664`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the note for #11587?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed, should be just below this line.

- Bug in :class:`DatetimeIndex` and :class:`TimedeltaIndex` where indexing with ``Ellipsis`` would incorrectly lose the index's ``freq`` attribute (:issue:`21282`)
- Clarified error message produced when passing an incorrect ``freq`` argument to :class:`DatetimeIndex` with ``NaT`` as the first entry in the passed data (:issue:`11587`)

Timedelta
^^^^^^^^^
Expand Down Expand Up @@ -1422,6 +1425,7 @@ Indexing
- Bug in :func:`Index.union` and :func:`Index.intersection` where name of the ``Index`` of the result was not computed correctly for certain cases (:issue:`9943`, :issue:`9862`)
- Bug in :class:`Index` slicing with boolean :class:`Index` may raise ``TypeError`` (:issue:`22533`)
- Bug in ``PeriodArray.__setitem__`` when accepting slice and list-like value (:issue:`23978`)
- Bug in :class:`DatetimeIndex`, :class:`TimedeltaIndex` where indexing with ``Ellipsis`` would lose their ``freq`` attribute (:issue:`21282`)

Missing
^^^^^^^
Expand Down
23 changes: 20 additions & 3 deletions pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,10 @@ def __getitem__(self, key):
freq = key.step * self.freq
else:
freq = self.freq
elif key is Ellipsis:
jreback marked this conversation as resolved.
Show resolved Hide resolved
# GH#21282 indexing with Ellipsis is similar to a full slice,
# should preserve `freq` attribute
freq = self.freq

attribs['freq'] = freq

Expand Down Expand Up @@ -547,9 +551,22 @@ def _validate_frequency(cls, index, freq, **kwargs):
if index.size == 0 or inferred == freq.freqstr:
return None

on_freq = cls._generate_range(start=index[0], end=None,
periods=len(index), freq=freq, **kwargs)
if not np.array_equal(index.asi8, on_freq.asi8):
try:
on_freq = cls._generate_range(start=index[0], end=None,
periods=len(index), freq=freq,
**kwargs)
if not np.array_equal(index.asi8, on_freq.asi8):
raise ValueError
except ValueError as e:
if "non-fixed" in str(e):
# non-fixed frequencies are not meaningful for timedelta64;
# we retain that error message
raise e
# GH#11587 the main way this is reached is if the `np.array_equal`
# check above is False. This can also be reached if index[0]
# is `NaT`, in which case the call to `cls._generate_range` will
# raise a ValueError, which we re-raise with a more targeted
# message.
raise ValueError('Inferred frequency {infer} from passed values '
'does not conform to passed frequency {passed}'
.format(infer=inferred, passed=freq.freqstr))
Expand Down
15 changes: 12 additions & 3 deletions pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@
from pandas.util._decorators import Appender

from pandas.core.dtypes.common import (
_INT64_DTYPE, _NS_DTYPE, is_datetime64_dtype, is_datetime64tz_dtype,
is_extension_type, is_float_dtype, is_int64_dtype, is_object_dtype,
is_period_dtype, is_string_dtype, is_timedelta64_dtype)
_INT64_DTYPE, _NS_DTYPE, is_categorical_dtype, is_datetime64_dtype,
is_datetime64tz_dtype, is_extension_type, is_float_dtype, is_int64_dtype,
is_object_dtype, is_period_dtype, is_string_dtype, is_timedelta64_dtype)
from pandas.core.dtypes.dtypes import DatetimeTZDtype
from pandas.core.dtypes.generic import ABCIndexClass, ABCSeries
from pandas.core.dtypes.missing import isna
Expand Down Expand Up @@ -264,6 +264,8 @@ def _generate_range(cls, start, end, periods, freq, tz=None,
if closed is not None:
raise ValueError("Closed has to be None if not both of start"
"and end are defined")
if start is NaT or end is NaT:
raise ValueError("Neither `start` nor `end` can be NaT")

left_closed, right_closed = dtl.validate_endpoints(closed)

Expand Down Expand Up @@ -1652,6 +1654,13 @@ def maybe_convert_dtype(data, copy):
raise TypeError("Passing PeriodDtype data is invalid. "
"Use `data.to_timestamp()` instead")

elif is_categorical_dtype(data):
# GH#18664 preserve tz in going DTI->Categorical->DTI
# TODO: cases where we need to do another pass through this func,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the TODO case here, what is an example? do you have an xfailed test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example is Categorical(TimedeltaIndex) where we should do another pass through maybe_convert_dtype in order to issue the FutureWarning. I'm assuming there are other corner cases here that will need to be caught/tested.

# e.g. the categories are timedelta64s
data = data.categories.take(data.codes, fill_value=NaT)
copy = False

elif is_extension_type(data) and not is_datetime64tz_dtype(data):
# Includes categorical
# TODO: We have no tests for these
Expand Down
32 changes: 31 additions & 1 deletion pandas/tests/indexes/datetimes/test_construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,42 @@
from pandas import (
DatetimeIndex, Index, Timestamp, date_range, datetime, offsets,
to_datetime)
from pandas.core.arrays import period_array
from pandas.core.arrays import (
DatetimeArrayMixin as DatetimeArray, period_array)
import pandas.util.testing as tm


class TestDatetimeIndex(object):

@pytest.mark.parametrize('dt_cls', [DatetimeIndex, DatetimeArray])
def test_freq_validation_with_nat(self, dt_cls):
# GH#11587 make sure we get a useful error message when generate_range
# raises
msg = ("Inferred frequency None from passed values does not conform "
"to passed frequency D")
with pytest.raises(ValueError, match=msg):
dt_cls([pd.NaT, pd.Timestamp('2011-01-01')], freq='D')
with pytest.raises(ValueError, match=msg):
dt_cls([pd.NaT, pd.Timestamp('2011-01-01').value],
freq='D')

def test_categorical_preserves_tz(self):
# GH#18664 retain tz when going DTI-->Categorical-->DTI
# TODO: parametrize over DatetimeIndex/DatetimeArray
# once CategoricalIndex(DTA) works

dti = pd.DatetimeIndex(
[pd.NaT, '2015-01-01', '1999-04-06 15:14:13', '2015-01-01'],
tz='US/Eastern')

ci = pd.CategoricalIndex(dti)
carr = pd.Categorical(dti)
cser = pd.Series(ci)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parametrize

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will take a look

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not without adding casting code that makes the verbosity a wash.

Once Categorical(DatetimeArray) works (currently raises bc DatetimeArray doesn't have _constructor) then this test can be parametrized over DatetimeIndex/DatetimeArray

for obj in [ci, carr, cser]:
result = pd.DatetimeIndex(obj)
tm.assert_index_equal(result, dti)

def test_dti_with_period_data_raises(self):
# GH#23675
data = pd.PeriodIndex(['2016Q1', '2016Q2'], freq='Q')
Expand Down
12 changes: 10 additions & 2 deletions pandas/tests/indexes/datetimes/test_date_range.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,14 @@ def test_date_range_timestamp_equiv_preserve_frequency(self):


class TestDateRanges(TestData):
def test_date_range_nat(self):
# GH#11587
msg = "Neither `start` nor `end` can be NaT"
with pytest.raises(ValueError, match=msg):
date_range(start='2016-01-01', end=pd.NaT, freq='D')
with pytest.raises(ValueError, match=msg):
date_range(start=pd.NaT, end='2016-01-01', freq='D')

def test_date_range_out_of_bounds(self):
# GH#14187
with pytest.raises(OutOfBoundsDatetime):
Expand Down Expand Up @@ -533,12 +541,12 @@ class TestGenRangeGeneration(object):

def test_generate(self):
rng1 = list(generate_range(START, END, offset=BDay()))
rng2 = list(generate_range(START, END, time_rule='B'))
rng2 = list(generate_range(START, END, offset='B'))
assert rng1 == rng2

def test_generate_cday(self):
rng1 = list(generate_range(START, END, offset=CDay()))
rng2 = list(generate_range(START, END, time_rule='C'))
rng2 = list(generate_range(START, END, offset='C'))
assert rng1 == rng2

def test_1(self):
Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/indexes/datetimes/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,15 @@


class TestGetItem(object):
def test_ellipsis(self):
# GH#21282
idx = pd.date_range('2011-01-01', '2011-01-31', freq='D',
tz='Asia/Tokyo', name='idx')

result = idx[...]
assert result.equals(idx)
assert result is not idx

def test_getitem(self):
idx1 = pd.date_range('2011-01-01', '2011-01-31', freq='D', name='idx')
idx2 = pd.date_range('2011-01-01', '2011-01-31', freq='D',
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/indexes/period/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,14 @@


class TestGetItem(object):
def test_ellipsis(self):
# GH#21282
idx = period_range('2011-01-01', '2011-01-31', freq='D',
name='idx')

result = idx[...]
assert result.equals(idx)
assert result is not idx

def test_getitem(self):
idx1 = pd.period_range('2011-01-01', '2011-01-31', freq='D',
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/indexes/timedeltas/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,14 @@


class TestGetItem(object):
def test_ellipsis(self):
# GH#21282
idx = timedelta_range('1 day', '31 day', freq='D', name='idx')

result = idx[...]
assert result.equals(idx)
assert result is not idx

def test_getitem(self):
idx1 = timedelta_range('1 day', '31 day', freq='D', name='idx')

Expand Down
12 changes: 3 additions & 9 deletions pandas/tests/tseries/offsets/test_offsets.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,17 +53,11 @@ def test_to_m8():
valb = datetime(2007, 10, 1)
valu = _to_m8(valb)
assert isinstance(valu, np.datetime64)
# assert valu == np.datetime64(datetime(2007,10,1))

# def test_datetime64_box():
# valu = np.datetime64(datetime(2007,10,1))
# valb = _dt_box(valu)
# assert type(valb) == datetime
# assert valb == datetime(2007,10,1)

#####
# DateOffset Tests
#####
#####
# DateOffset Tests
#####


class Base(object):
Expand Down
14 changes: 4 additions & 10 deletions pandas/tseries/offsets.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from datetime import date, datetime, timedelta
import functools
import operator
import warnings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob going to fail the build as this is not needed


from dateutil.easter import easter
import numpy as np
Expand Down Expand Up @@ -2457,8 +2458,7 @@ class Nano(Tick):
# ---------------------------------------------------------------------


def generate_range(start=None, end=None, periods=None,
offset=BDay(), time_rule=None):
def generate_range(start=None, end=None, periods=None, offset=BDay()):
"""
Generates a sequence of dates corresponding to the specified time
offset. Similar to dateutil.rrule except uses pandas DateOffset
Expand All @@ -2470,26 +2470,20 @@ def generate_range(start=None, end=None, periods=None,
end : datetime (default None)
periods : int, (default None)
offset : DateOffset, (default BDay())
time_rule : (legacy) name of DateOffset object to be used, optional
Corresponds with names expected by tseries.frequencies.get_offset

Notes
-----
* This method is faster for generating weekdays than dateutil.rrule
* At least two of (start, end, periods) must be specified.
* If both start and end are specified, the returned dates will
satisfy start <= date <= end.
* If both time_rule and offset are specified, time_rule supersedes offset.

Returns
-------
dates : generator object

"""
if time_rule is not None:
from pandas.tseries.frequencies import get_offset

offset = get_offset(time_rule)
from pandas.tseries.frequencies import to_offset
offset = to_offset(offset)

start = to_datetime(start)
end = to_datetime(end)
Expand Down