Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF/DEPR: DatetimeIndex constructor #23675

Merged
merged 40 commits into from Dec 3, 2018
Merged
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
f8efbef
start untangling DatetimeIndex constructor; deprecate passing of time…
jbrockmendel Nov 13, 2018
9d20bc9
add GH references
jbrockmendel Nov 13, 2018
aef3f4c
Fix incorrect usage of DatetimeIndex
jbrockmendel Nov 13, 2018
66ae42b
dummy commit to force CI
jbrockmendel Nov 14, 2018
d0e8ee3
more explicit name, test with TimedeltaIndex
jbrockmendel Nov 14, 2018
e1f4e17
remove comment
jbrockmendel Nov 14, 2018
d18e0df
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 14, 2018
a4c8c77
make exception catching less specific
jbrockmendel Nov 14, 2018
5dc5980
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 14, 2018
7e5587e
Merge remote-tracking branch 'upstream/master' into jbrockmendel-pre-…
TomAugspurger Nov 14, 2018
e94e826
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 15, 2018
7464d15
checks in both to_datetime and DatetimeIndex.__new__
jbrockmendel Nov 15, 2018
80b5dbe
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 15, 2018
3c822f1
name and docstring
jbrockmendel Nov 15, 2018
ba7e5e8
isort and flake8 fixup
jbrockmendel Nov 15, 2018
3ba9da7
move freq check earlier
jbrockmendel Nov 15, 2018
49c11e1
Merge branch 'pre-fixes4' of https://github.com/jbrockmendel/pandas i…
jbrockmendel Nov 15, 2018
f1d3fd8
improve exc message
jbrockmendel Nov 16, 2018
d44055e
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 16, 2018
1471a2b
tests for to_datetime and PeriodDtype
jbrockmendel Nov 16, 2018
11b5f6c
use objects_to_datetime64ns in to_datetime
jbrockmendel Nov 16, 2018
9f56d23
isort fixup
jbrockmendel Nov 17, 2018
1c3a5aa
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 18, 2018
be4d472
requested edits, name changes
jbrockmendel Nov 18, 2018
145772d
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 19, 2018
7c99105
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 19, 2018
6b60da2
comments, remove has_format
jbrockmendel Nov 19, 2018
a7038bb
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 20, 2018
14d923b
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 22, 2018
c9dbf24
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 25, 2018
ce9914d
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 26, 2018
b3d5bb7
dummy commit to force CI
jbrockmendel Nov 26, 2018
09c88fc
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 27, 2018
0367d6f
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 27, 2018
7cc8577
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 27, 2018
b3a096b
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Nov 29, 2018
fd5af18
Flesh out comment
jbrockmendel Dec 2, 2018
2cdd215
comment
jbrockmendel Dec 3, 2018
782ca81
Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…
jbrockmendel Dec 3, 2018
03d5b35
comment more
jbrockmendel Dec 3, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
80 changes: 77 additions & 3 deletions pandas/core/arrays/datetimes.py
Expand Up @@ -1458,15 +1458,18 @@ def maybe_convert_dtype(data, copy):
"""
Convert data based on dtype conventions, issuing deprecation warnings
or errors where appropriate.
Parameters

Parameters
----------
data : np.ndarray or pd.Index
copy : bool
Returns

Returns
-------
data : np.ndarray or pd.Index
copy : bool
Raises

Raises
------
TypeError : PeriodDType data is passed
"""
Expand Down Expand Up @@ -1499,6 +1502,77 @@ def maybe_convert_dtype(data, copy):
return data, copy


def objects_to_datetime64ns(data, dayfirst, yearfirst,
utc=False, errors="raise",
require_iso8601=False, allow_object=False):
"""
Convert data to array of timestamps.

Parameters
----------
data : np.ndarray[object]
dayfirst : bool
yearfirst : bool
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
utc : bool, default False
Whether to convert timezone-aware timestamps to UTC
errors : {'raise', 'ignore', 'coerce'}
allow_object : bool
Whether to return an object-dtype ndarray instead of raising if the
data contains more than one timezone.

Returns
-------
result : ndarray
np.int64 dtype if returned values represent UTC timestamps
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to verify: getting an you'll get an np.int64-dtype result if and only if inferred_tz is not None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

np.datetime64[ns] if returned values represent wall times
object if mixed timezones
inferred_tz : tzinfo or None

Raises
------
ValueError : if data cannot be converted to datetimes
"""
assert errors in ["raise", "ignore", "coerce"]

# if str-dtype, convert
data = np.array(data, copy=False, dtype=np.object_)

try:
result, tz_parsed = tslib.array_to_datetime(
data,
errors=errors,
utc=utc,
dayfirst=dayfirst,
yearfirst=yearfirst,
require_iso8601=require_iso8601
)
except ValueError as e:
try:
values, tz_parsed = conversion.datetime_to_datetime64(data)
# If tzaware, these values represent unix timestamps, so we
# return them as i8 to distinguish from wall times
return values.view('i8'), tz_parsed
except (ValueError, TypeError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this addtiional level of try/except here? wouldn't this just raise anyhow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably to re-raise the original exception if the fallback fails. This is the existing behavior.

I think @mroeschke and I are vaguely planning to revisit this before long and combine datetime_to_datetime64 into array_to_datetime, fixing the many idiosyncracies of these calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe i wasn't clear. I think you can simply remove the try/except and it will work the way it is now. (and same below).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the claim you're making is that it will raise under the same conditions, I agree. If the claim is that it will raise the same exception, I don't. i.e. if the conversion.datetime_to_datetime64 were not inside its own try/except, then we could end up getting a TypeError in cases where we currently get a ValueError

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really? i don't think that is actually possible. the original exception is re-raised here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback did this resolve the miscommunication?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again if datetime_to_datetime64 raises
won’t it just bubble up which is the same thing as a reraise here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a "no" on the resolving miscommunication. Did I at least accurately summarize your suggested change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel is correct here.

side-note, we could clarify all this with python-3 style

except (ValueError, TypeError) as e2:
    raise e2 from e

or six's https://pythonhosted.org/six/#six.raise_from, but it's probably just easier to wait until next month.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a "no" on the resolving miscommunication. Did I at least accurately summarize your suggested change?

not resolved, but I see from @TomAugspurger which what i was getting at. i guess ok for now.

raise e

if tz_parsed is not None:
# We can take a shortcut since the datetime64 numpy array
# is in UTC
# Return i8 values to denote unix timestamps
return result.view('i8'), tz_parsed
elif is_datetime64_dtype(result):
# returning M8[ns] denotes wall-times; since tz is None
# the distinction is a thin one
return result, tz_parsed
elif is_object_dtype(result):
if allow_object:
# allowed by to_datetime, not by DatetimeIndex constructor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this difference from? why is this allowed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because to_datetime will return a Index of Timestamps in cases where there are mixed-tz or mixed tz-aware/naive entries. DatetimeIndex can't do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right and that is the correct behavior, actually DTI should just return a Index in that case, otherwise this is very confusing. I believe you have an issue talking about mixed-timestatmps. These should always just return an Index of objects. No conversion / inference should happen at all.

return result, tz_parsed
raise TypeError(result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just remove the else and do the raise TypeError(result) at the end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its worth distinguishing between hittable TypeError and defensive TypeError

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you comment more then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

else: # pragma: no cover
raise TypeError(result)


def _generate_regular_range(cls, start, end, periods, freq):
"""
Generate a range of dates with the spans between dates described by
Expand Down
24 changes: 16 additions & 8 deletions pandas/core/indexes/datetimes.py
Expand Up @@ -16,17 +16,17 @@

from pandas.core.dtypes.common import (
_INT64_DTYPE, _NS_DTYPE, ensure_int64, is_datetime64_dtype,
is_datetime64_ns_dtype, is_datetime64tz_dtype, is_datetimetz,
is_dtype_equal, is_float, is_integer, is_integer_dtype, is_list_like,
is_period_dtype, is_scalar, is_string_like, pandas_dtype)
is_datetime64_ns_dtype, is_datetime64tz_dtype, is_dtype_equal, is_float,
is_integer, is_list_like, is_object_dtype, is_period_dtype, is_scalar,
is_string_dtype, is_string_like, pandas_dtype)
import pandas.core.dtypes.concat as _concat
from pandas.core.dtypes.generic import ABCSeries
from pandas.core.dtypes.missing import isna

from pandas.core.arrays import datetimelike as dtl
from pandas.core.arrays.datetimes import (
DatetimeArrayMixin as DatetimeArray, _to_m8, maybe_convert_dtype,
maybe_infer_tz)
maybe_infer_tz, objects_to_datetime64ns)
from pandas.core.base import _shared_docs
import pandas.core.common as com
from pandas.core.indexes.base import Index, _index_shared_docs
Expand Down Expand Up @@ -268,10 +268,18 @@ def __new__(cls, data=None,
# By this point we are assured to have either a numpy array or Index
data, copy = maybe_convert_dtype(data, copy)

if not (is_datetime64_dtype(data) or is_datetimetz(data) or
is_integer_dtype(data) or lib.infer_dtype(data) == 'integer'):
data = tools.to_datetime(data, dayfirst=dayfirst,
yearfirst=yearfirst)
if is_object_dtype(data) or is_string_dtype(data):
# TODO: We do not have tests specific to string-dtypes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could just write this as

try:
      data = datas.astype(np.int64)
except:
     ...

might be more clear

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with that is np.array(['20160405']) becomes np.array([20160405]) instead of 2016-04-05.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sure

# also complex or categorical or other extension
copy = False
if lib.infer_dtype(data) == 'integer':
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved
data = data.astype(np.int64)
else:
# data comes back here as either i8 to denote UTC timestamps
# or M8[ns] to denote wall times
data, inferred_tz = objects_to_datetime64ns(
data, dayfirst=dayfirst, yearfirst=yearfirst)
tz = maybe_infer_tz(tz, inferred_tz)

if is_datetime64tz_dtype(data):
tz = maybe_infer_tz(tz, data.tz)
Expand Down
82 changes: 41 additions & 41 deletions pandas/core/tools/datetimes.py
Expand Up @@ -171,7 +171,8 @@ def _convert_listlike_datetimes(arg, box, format, name=None, tz=None,
- ndarray of Timestamps if box=False
"""
from pandas import DatetimeIndex
from pandas.core.arrays.datetimes import maybe_convert_dtype
from pandas.core.arrays.datetimes import (
maybe_convert_dtype, objects_to_datetime64ns)

if isinstance(arg, (list, tuple)):
arg = np.array(arg, dtype='O')
Expand Down Expand Up @@ -231,10 +232,11 @@ def _convert_listlike_datetimes(arg, box, format, name=None, tz=None,
require_iso8601 = not infer_datetime_format
format = None

try:
result = None
result = None
tz_parsed = None

if format is not None:
if format is not None:
try:
# shortcut formatting here
if format == '%Y%m%d':
try:
Expand Down Expand Up @@ -266,45 +268,43 @@ def _convert_listlike_datetimes(arg, box, format, name=None, tz=None,
if errors == 'raise':
raise
result = arg

if result is None and (format is None or infer_datetime_format):
result, tz_parsed = tslib.array_to_datetime(
arg,
errors=errors,
utc=tz == 'utc',
dayfirst=dayfirst,
yearfirst=yearfirst,
require_iso8601=require_iso8601
)
if tz_parsed is not None:
if box:
# We can take a shortcut since the datetime64 numpy array
# is in UTC
return DatetimeIndex._simple_new(result, name=name,
tz=tz_parsed)
else:
# Convert the datetime64 numpy array to an numpy array
# of datetime objects
result = [Timestamp(ts, tz=tz_parsed).to_pydatetime()
for ts in result]
return np.array(result, dtype=object)

except ValueError as e:
# Fallback to try to convert datetime objects
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, why the extra try/except?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cam you be slightly more specific on 'fallback', e.g. when would this be triggered

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll flesh out the comment. The gist of it is if tz-aware datetime objects are passed and utc=True is not passed, then array_to_datetime will raise a ValueError and datetime_to_datetime64 will handle this case.

It's a mess and is a big reason why I've been making array_to_datetime PRs: #24006 is an attempt to fix it, but there are some design questions @mroeschke and I are still batting around.

values, tz = conversion.datetime_to_datetime64(arg)
return DatetimeIndex._simple_new(values, name=name, tz=tz)
except (ValueError, TypeError):
raise e
jbrockmendel marked this conversation as resolved.
Show resolved Hide resolved

if result is None:
assert format is None or infer_datetime_format
utc = tz == 'utc'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this lowercase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how it is implemented in master. We'll improve this in the upcoming passes specific to to_datetime. Changing this behavior here is out of scope.

result, tz_parsed = objects_to_datetime64ns(
arg, dayfirst=dayfirst, yearfirst=yearfirst,
utc=utc, errors=errors, require_iso8601=require_iso8601,
allow_object=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i find this allow_object pretty obtuse

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is to raise unconditionally in objects_to_datetime64ns, include the result in the exception args, then catch and extract it here. That was actually what I did in the first pass, but this is much more straightforward.


if tz_parsed is not None:
if box:
# Ensure we return an Index in all cases where box=True
if is_datetime64_dtype(result):
return DatetimeIndex(result, tz=tz, name=name)
elif is_object_dtype(result):
# e.g. an Index of datetime objects
from pandas import Index
return Index(result, name=name)
return result
# We can take a shortcut since the datetime64 numpy array
# is in UTC
return DatetimeIndex._simple_new(result, name=name,
tz=tz_parsed)
else:
# Convert the datetime64 numpy array to an numpy array
# of datetime objects
result = DatetimeIndex(result, tz=tz_parsed).to_pydatetime()
return result

except ValueError as e:
try:
values, tz = conversion.datetime_to_datetime64(arg)
return DatetimeIndex._simple_new(values, name=name, tz=tz)
except (ValueError, TypeError):
raise e
if box:
# Ensure we return an Index in all cases where box=True
if is_datetime64_dtype(result):
return DatetimeIndex(result, tz=tz, name=name)
elif is_object_dtype(result):
# e.g. an Index of datetime objects
from pandas import Index
return Index(result, name=name)
jreback marked this conversation as resolved.
Show resolved Hide resolved
return result


def _adjust_to_origin(arg, origin, unit):
Expand Down