Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF/DEPR: DatetimeIndex constructor #23675

Merged
merged 40 commits into from
Dec 3, 2018
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
f8efbef
start untangling DatetimeIndex constructor; deprecate passing of time…
jbrockmendel Nov 13, 2018
9d20bc9
add GH references
jbrockmendel Nov 13, 2018
aef3f4c
Fix incorrect usage of DatetimeIndex
jbrockmendel Nov 13, 2018
66ae42b
dummy commit to force CI
jbrockmendel Nov 14, 2018
d0e8ee3
more explicit name, test with TimedeltaIndex
jbrockmendel Nov 14, 2018
e1f4e17
remove comment
jbrockmendel Nov 14, 2018
d18e0df
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 14, 2018
a4c8c77
make exception catching less specific
jbrockmendel Nov 14, 2018
5dc5980
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 14, 2018
7e5587e
Merge remote-tracking branch 'upstream/master' into jbrockmendel-pre-…
TomAugspurger Nov 14, 2018
e94e826
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 15, 2018
7464d15
checks in both to_datetime and DatetimeIndex.__new__
jbrockmendel Nov 15, 2018
80b5dbe
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 15, 2018
3c822f1
name and docstring
jbrockmendel Nov 15, 2018
ba7e5e8
isort and flake8 fixup
jbrockmendel Nov 15, 2018
3ba9da7
move freq check earlier
jbrockmendel Nov 15, 2018
49c11e1
Merge branch 'pre-fixes4' of https://github.com/jbrockmendel/pandas i…
jbrockmendel Nov 15, 2018
f1d3fd8
improve exc message
jbrockmendel Nov 16, 2018
d44055e
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 16, 2018
1471a2b
tests for to_datetime and PeriodDtype
jbrockmendel Nov 16, 2018
11b5f6c
use objects_to_datetime64ns in to_datetime
jbrockmendel Nov 16, 2018
9f56d23
isort fixup
jbrockmendel Nov 17, 2018
1c3a5aa
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 18, 2018
be4d472
requested edits, name changes
jbrockmendel Nov 18, 2018
145772d
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 19, 2018
7c99105
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 19, 2018
6b60da2
comments, remove has_format
jbrockmendel Nov 19, 2018
a7038bb
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 20, 2018
14d923b
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 22, 2018
c9dbf24
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 25, 2018
ce9914d
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 26, 2018
b3d5bb7
dummy commit to force CI
jbrockmendel Nov 26, 2018
09c88fc
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 27, 2018
0367d6f
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 27, 2018
7cc8577
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 27, 2018
b3a096b
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 29, 2018
fd5af18
Flesh out comment
jbrockmendel Dec 2, 2018
2cdd215
comment
jbrockmendel Dec 3, 2018
782ca81
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Dec 3, 2018
03d5b35
comment more
jbrockmendel Dec 3, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -985,6 +985,7 @@ Deprecations
`use_threads` to reflect the changes in pyarrow 0.11.0. (:issue:`23053`)
- :func:`pandas.read_excel` has deprecated accepting ``usecols`` as an integer. Please pass in a list of ints from 0 to ``usecols`` inclusive instead (:issue:`23527`)
- Constructing a :class:`TimedeltaIndex` from data with ``datetime64``-dtyped data is deprecated, will raise ``TypeError`` in a future version (:issue:`23539`)
- Constructing a :class:`DatetimeIndex` from data with ``timedelta64``-dtyped data is deprecated, will raise ``TypeError`` in a future version (:issue:`23675`)

.. _whatsnew_0240.deprecations.datetimelike_int_ops:

Expand Down
4 changes: 2 additions & 2 deletions pandas/_libs/tslibs/conversion.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -858,8 +858,8 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
- bool if True, treat all vals as DST. If False, treat them as non-DST
- 'NaT' will return NaT where there are ambiguous times

nonexistent : str
If arraylike, must have the same length as vals
nonexistent : {None, "NaT", "shift", "raise"}
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
How to handle non-existent times when converting wall times to UTC

.. versionadded:: 0.24.0

Expand Down
2 changes: 1 addition & 1 deletion pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ class DatetimeArrayMixin(dtl.DatetimeLikeArrayMixin):
_freq = None

@classmethod
def _simple_new(cls, values, freq=None, tz=None, **kwargs):
def _simple_new(cls, values, freq=None, tz=None):
"""
we require the we have a dtype compat for the values
if we are passed a non-dtype compat, then coerce using the constructor
Expand Down
69 changes: 52 additions & 17 deletions pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,11 @@
from pandas.util._decorators import Appender, Substitution, cache_readonly

from pandas.core.dtypes.common import (
_INT64_DTYPE, _NS_DTYPE, ensure_int64, is_datetime64_dtype,
is_datetime64_ns_dtype, is_datetimetz, is_dtype_equal, is_float,
is_integer, is_integer_dtype, is_list_like, is_period_dtype, is_scalar,
is_string_like, pandas_dtype)
_INT64_DTYPE, _NS_DTYPE, ensure_int64, is_datetime64_ns_dtype,
is_datetime64tz_dtype, is_dtype_equal, is_extension_type, is_float,
is_float_dtype, is_integer, is_list_like, is_object_dtype, is_period_dtype,
is_scalar, is_string_dtype, is_string_like, is_timedelta64_dtype,
pandas_dtype)
import pandas.core.dtypes.concat as _concat
from pandas.core.dtypes.generic import ABCSeries
from pandas.core.dtypes.missing import isna
Expand Down Expand Up @@ -252,19 +253,54 @@ def __new__(cls, data=None,
# if dtype has an embedded tz, capture it
tz = dtl.validate_tz_from_dtype(dtype, tz)

if not isinstance(data, (np.ndarray, Index, ABCSeries, DatetimeArray)):
# other iterable of some kind
if not isinstance(data, (list, tuple)):
if not hasattr(data, "dtype"):
# e.g. list, tuple
if np.ndim(data) == 0:
# i.e. generator
data = list(data)
data = np.asarray(data, dtype='O')
data = np.asarray(data)
copy = False
elif isinstance(data, ABCSeries):
data = data._values

# data must be Index or np.ndarray here
if not (is_datetime64_dtype(data) or is_datetimetz(data) or
is_integer_dtype(data) or lib.infer_dtype(data) == 'integer'):
data = tools.to_datetime(data, dayfirst=dayfirst,
yearfirst=yearfirst)
# By this point we are assured to have either a numpy array or Index

if is_float_dtype(data):
# Note: we must cast to datetime64[ns] here in order to treat these
# as wall-times instead of UTC timestamps.
data = data.astype(_NS_DTYPE)
copy = False
# TODO: Why do we treat this differently from integer dtypes?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback any idea why we treat floats differently from ints here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no idea, i would try to remove this special handling. only thing i can think of maybe this could have some odd rounding in the astype if its out of range of an int64 .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would try to remove this special handling

Yah, the immediate goal is to pass fewer cases to to_datetime since it is painfully circular and hides weird behavior like this float-dtype behavior.

If If we just cast floats to int64 (and mask with iNaT) then exactly one test fails in tests.dtypes.test_missing. The behavior (in master) that is different between float vs int is that floats are treated as wall-times instead of UTC times. eg:

iarr = np.array([0], dtype='i8')
farr = np.array([0], dtype='f8')

>>> pd.DatetimeIndex(iarr)._data
array(['1970-01-01T00:00:00.000000000'], dtype='datetime64[ns]')
>>> pd.DatetimeIndex(iarr, tz='US/Eastern')._data
array(['1970-01-01T00:00:00.000000000'], dtype='datetime64[ns]')

>>> pd.DatetimeIndex(farr)._data
array(['1970-01-01T00:00:00.000000000'], dtype='datetime64[ns]')
>>> pd.DatetimeIndex(farr, tz='US/Eastern')._data
array(['1970-01-01T05:00:00.000000000'], dtype='datetime64[ns]')

I don't see any especially good reason why it should work this way, save for keeping this test working and not introducing breaking changes.


elif is_timedelta64_dtype(data):
warnings.warn("Passing timedelta64-dtype data to {cls} is "
"deprecated, will raise a TypeError in a future "
"version".format(cls=cls.__name__),
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
FutureWarning, stacklevel=2)
data = data.view(_NS_DTYPE)

elif is_period_dtype(data):
# Note: without explicitly raising here, PeriondIndex
# test_setops.test_join_does_not_recur fails
raise TypeError("Passing PeriodDtype data to {cls} is invalid. "
"Use `data.to_timestamp()` instead"
.format(cls=cls.__name__))
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved

elif is_extension_type(data) and not is_datetime64tz_dtype(data):
# Includes categorical
# TODO: We have no tests for these
data = np.array(data, dtype=np.object_)
copy = False

if is_object_dtype(data) or is_string_dtype(data):
# TODO: We do not have tests specific to string-dtypes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could just write this as

try:
      data = datas.astype(np.int64)
except:
     ...

might be more clear

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with that is np.array(['20160405']) becomes np.array([20160405]) instead of 2016-04-05.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sure

# also complex or categorical or other extension
copy = False
if lib.infer_dtype(data) == 'integer':
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
data = data.astype(np.int64)
else:
data = tools.to_datetime(data, dayfirst=dayfirst,
yearfirst=yearfirst)

if isinstance(data, DatetimeArray):
if tz is None:
Expand All @@ -281,6 +317,7 @@ def __new__(cls, data=None,
subarr = data._data

if freq is None:
# TODO: Should this be the stronger condition of `freq_infer`?
freq = data.freq
verify_integrity = False
elif issubclass(data.dtype.type, np.datetime64):
Expand Down Expand Up @@ -319,17 +356,15 @@ def __new__(cls, data=None,
return subarr._deepcopy_if_needed(ref_to_data, copy)

@classmethod
def _simple_new(cls, values, name=None, freq=None, tz=None,
dtype=None, **kwargs):
def _simple_new(cls, values, name=None, freq=None, tz=None, dtype=None):
"""
we require the we have a dtype compat for the values
if we are passed a non-dtype compat, then coerce using the constructor
"""
# DatetimeArray._simple_new will accept either i8 or M8[ns] dtypes
assert isinstance(values, np.ndarray), type(values)

result = super(DatetimeIndex, cls)._simple_new(values, freq, tz,
**kwargs)
result = super(DatetimeIndex, cls)._simple_new(values, freq, tz)
result.name = name
result._reset_identity()
return result
Expand Down
13 changes: 13 additions & 0 deletions pandas/tests/indexes/datetimes/test_construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,19 @@

class TestDatetimeIndex(object):

def test_dti_with_timedelta64_data_deprecation(self):
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
# GH#23675
data = np.array([0], dtype='m8[ns]')
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
with tm.assert_produces_warning(FutureWarning):
result = DatetimeIndex(data)

assert result[0] == Timestamp('1970-01-01')

with tm.assert_produces_warning(FutureWarning):
result = DatetimeIndex(pd.TimedeltaIndex(data))

assert result[0] == Timestamp('1970-01-01')

def test_construction_caching(self):

df = pd.DataFrame({'dt': pd.date_range('20130101', periods=3),
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/scalar/timedelta/test_timedelta.py
Original file line number Diff line number Diff line change
Expand Up @@ -548,7 +548,7 @@ def test_overflow(self):

# mean
result = (s - s.min()).mean()
expected = pd.Timedelta((pd.DatetimeIndex((s - s.min())).asi8 / len(s)
expected = pd.Timedelta((pd.TimedeltaIndex((s - s.min())).asi8 / len(s)
).sum())

# the computation is converted to float so
Expand Down