Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF/DEPR: DatetimeIndex constructor #23675

Merged
merged 40 commits into from Dec 3, 2018

Conversation

Projects
None yet
7 participants
@jbrockmendel
Copy link
Member

commented Nov 13, 2018

This is preliminary to implementing DatetimeArray._from_sequence. The attempt to implement that directly (i.e. without existing circularity) failed, so trying it as a 2-parter.

Similar to #23539, this deprecates the option of passing timedelta64 data to DatetimeIndex.

There are a few places where I've made TODO notes of the "is this really intentional?" form; I'll point this out in inline comments.

@mroeschke I edited a couple of docstrings for the recently-implemented "nonexistent" kwarg, can you confirm that I didn't mess them up?

@pep8speaks

This comment has been minimized.

Copy link

commented Nov 13, 2018

Hello @jbrockmendel! Thanks for submitting the PR.

# as wall-times instead of UTC timestamps.
data = data.astype(_NS_DTYPE)
copy = False
# TODO: Why do we treat this differently from integer dtypes?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 13, 2018

Author Member

@jreback any idea why we treat floats differently from ints here?

This comment has been minimized.

Copy link
@jreback

jreback Nov 14, 2018

Contributor

no idea, i would try to remove this special handling. only thing i can think of maybe this could have some odd rounding in the astype if its out of range of an int64 .

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Nov 14, 2018

Author Member

i would try to remove this special handling

Yah, the immediate goal is to pass fewer cases to to_datetime since it is painfully circular and hides weird behavior like this float-dtype behavior.

If If we just cast floats to int64 (and mask with iNaT) then exactly one test fails in tests.dtypes.test_missing. The behavior (in master) that is different between float vs int is that floats are treated as wall-times instead of UTC times. eg:

iarr = np.array([0], dtype='i8')
farr = np.array([0], dtype='f8')

>>> pd.DatetimeIndex(iarr)._data
array(['1970-01-01T00:00:00.000000000'], dtype='datetime64[ns]')
>>> pd.DatetimeIndex(iarr, tz='US/Eastern')._data
array(['1970-01-01T00:00:00.000000000'], dtype='datetime64[ns]')

>>> pd.DatetimeIndex(farr)._data
array(['1970-01-01T00:00:00.000000000'], dtype='datetime64[ns]')
>>> pd.DatetimeIndex(farr, tz='US/Eastern')._data
array(['1970-01-01T05:00:00.000000000'], dtype='datetime64[ns]')

I don't see any especially good reason why it should work this way, save for keeping this test working and not introducing breaking changes.

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Nov 13, 2018

azure failure is a Hypothesis failure

@codecov

This comment has been minimized.

Copy link

commented Nov 13, 2018

Codecov Report

Merging #23675 into master will increase coverage by <.01%.
The diff coverage is 94.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23675      +/-   ##
==========================================
+ Coverage   92.31%   92.31%   +<.01%     
==========================================
  Files         161      161              
  Lines       51562    51586      +24     
==========================================
+ Hits        47599    47622      +23     
- Misses       3963     3964       +1
Flag Coverage Δ
#multiple 90.71% <94.73%> (ø) ⬆️
#single 42.44% <63.15%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 96.55% <100%> (+0.02%) ⬆️
pandas/core/tools/datetimes.py 85.27% <92.3%> (-0.27%) ⬇️
pandas/core/arrays/datetimes.py 98.43% <94.73%> (+0.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 022f458...03d5b35. Read the comment docs.

# as wall-times instead of UTC timestamps.
data = data.astype(_NS_DTYPE)
copy = False
# TODO: Why do we treat this differently from integer dtypes?

This comment has been minimized.

Copy link
@jreback

jreback Nov 14, 2018

Contributor

no idea, i would try to remove this special handling. only thing i can think of maybe this could have some odd rounding in the astype if its out of range of an int64 .

Show resolved Hide resolved pandas/core/indexes/datetimes.py
Show resolved Hide resolved pandas/core/indexes/datetimes.py Outdated
Show resolved Hide resolved pandas/core/tools/datetimes.py Outdated
Show resolved Hide resolved pandas/tests/indexes/datetimes/test_construction.py Outdated
Show resolved Hide resolved pandas/tests/indexes/datetimes/test_construction.py Outdated
@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Nov 14, 2018

Merged master to fix the merge conflict.

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Nov 14, 2018

Travis error looks unrelated

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Nov 14, 2018

@jbrockmendel Can you give a bit more context here? Eg, why are you adding so much new code (while there are actually no user facing changes, I suppose, except from the deprecation) ?
What is the rationale of not using to_datetime (if we refactor things, I would rather think it is to share more implemenation) ?

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Nov 15, 2018

Can you give a bit more context here? Eg, why are you adding so much new code (while there are actually no user facing changes, I suppose, except from the deprecation) ?
What is the rationale of not using to_datetime (if we refactor things, I would rather think it is to share more implemenation) ?

Sharing implementation is fine, circularity is not. The fact that to_datetime calls the DatetimeIndex constructor and vice-versa makes it very difficult to sort out (in fact I spent a big chunk of Monday trying to untangle it in one pass and couldn't) what happens in which case. The addition of new code is to make these behaviors more clear/explicit.

Moreover I am not convinced that the float_dtype behavior is intentional.

There is an issue (need to dig it up) reporting a bug in passing a Categorical to DatetimeIndex. Point being: there is value in being more explicit about what we are doing and why.

@jorisvandenbossche

This comment has been minimized.

Copy link
Member

commented Nov 15, 2018

Sharing implementation is fine, circularity is not.

Yes, but that circularity already existed. What changes did you try to make that it became more problematic?

The addition of new code is to make these behaviors more clear/explicit.

But it also means duplicating in DatetimeIndex things that are already handled in to_datetime. Not sharing it will much more easily lead to a future change leading to inconsistencies between both?

For the TimedeltaIndex PR, you made some common functions that were used by both. Is a similar pattern not possible here?

@jorisvandenbossche
Copy link
Member

left a comment

To be clear, I find the logic of what to do by each dtype (as you are introducing here) nice and clear to follow, but mainly wondering why not doing that in to_datetime to avoid divergence between both

Show resolved Hide resolved pandas/core/indexes/datetimes.py Outdated
Show resolved Hide resolved pandas/core/indexes/datetimes.py Outdated
@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Nov 15, 2018

For the TimedeltaIndex PR, you made some common functions that were used by both. Is a similar pattern not possible here?

That's the goal behind #23702.

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Nov 30, 2018

yeah i agree this [treating floats as wall-times, ints as UTC] seems a little odd, these should just be cast to integer i think.

Change behavior here, deprecation cycle, or follow-up?

@jbrockmendel jbrockmendel referenced this pull request Nov 30, 2018

Merged

REF: DatetimeLikeArray #24024

7 of 12 tasks complete
@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Nov 30, 2018

gentle ping. Getting the constructors done will noticeably tighten the scope of #24024.

# If tzaware, these values represent unix timestamps, so we
# return them as i8 to distinguish from wall times
return values.view('i8'), tz_parsed
except (ValueError, TypeError):

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Dec 1, 2018

Contributor

@jbrockmendel is correct here.

side-note, we could clarify all this with python-3 style

except (ValueError, TypeError) as e2:
    raise e2 from e

or six's https://pythonhosted.org/six/#six.raise_from, but it's probably just easier to wait until next month.

Returns
-------
result : ndarray
np.int64 dtype if returned values represent UTC timestamps

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Dec 1, 2018

Contributor

Just to verify: getting an you'll get an np.int64-dtype result if and only if inferred_tz is not None?

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Dec 1, 2018

Author Member

correct

# If tzaware, these values represent unix timestamps, so we
# return them as i8 to distinguish from wall times
return values.view('i8'), tz_parsed
except (ValueError, TypeError):

This comment has been minimized.

Copy link
@jreback

jreback Dec 2, 2018

Contributor

That sounds like a "no" on the resolving miscommunication. Did I at least accurately summarize your suggested change?

not resolved, but I see from @TomAugspurger which what i was getting at. i guess ok for now.

return result, tz_parsed
elif is_object_dtype(result):
if allow_object:
# allowed by to_datetime, not by DatetimeIndex constructor

This comment has been minimized.

Copy link
@jreback

jreback Dec 2, 2018

Contributor

right and that is the correct behavior, actually DTI should just return a Index in that case, otherwise this is very confusing. I believe you have an issue talking about mixed-timestatmps. These should always just return an Index of objects. No conversion / inference should happen at all.

if allow_object:
# allowed by to_datetime, not by DatetimeIndex constructor
return result, tz_parsed
raise TypeError(result)

This comment has been minimized.

Copy link
@jreback

jreback Dec 2, 2018

Contributor

maybe just remove the else and do the raise TypeError(result) at the end

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Dec 2, 2018

Author Member

I think its worth distinguishing between hittable TypeError and defensive TypeError

This comment has been minimized.

Copy link
@jreback

jreback Dec 3, 2018

Contributor

can you comment more then.

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Dec 3, 2018

Author Member

sure

yearfirst=yearfirst)

if is_object_dtype(data) or is_string_dtype(data):
# TODO: We do not have tests specific to string-dtypes,

This comment has been minimized.

Copy link
@jreback

jreback Dec 2, 2018

Contributor

you could just write this as

try:
      data = datas.astype(np.int64)
except:
     ...

might be more clear

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Dec 2, 2018

Author Member

The issue with that is np.array(['20160405']) becomes np.array([20160405]) instead of 2016-04-05.

This comment has been minimized.

Copy link
@jreback

jreback Dec 3, 2018

Contributor

ok sure

raise e
except ValueError as e:
# Fallback to try to convert datetime objects
try:

This comment has been minimized.

Copy link
@jreback

jreback Dec 2, 2018

Contributor

cam you be slightly more specific on 'fallback', e.g. when would this be triggered

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Dec 2, 2018

Author Member

I'll flesh out the comment. The gist of it is if tz-aware datetime objects are passed and utc=True is not passed, then array_to_datetime will raise a ValueError and datetime_to_datetime64 will handle this case.

It's a mess and is a big reason why I've been making array_to_datetime PRs: #24006 is an attempt to fix it, but there are some design questions @mroeschke and I are still batting around.

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Dec 2, 2018

actually DTI should just return a Index in that case, otherwise this is very confusing. I believe you have an issue talking about mixed-timestatmps. These should always just return an Index of objects. No conversion / inference should happen at all.

I'm -0.5 on having DatetimeIndex.new return an object Index, would rather have it raise. Either way, that is a design change that should be orthogonal to this PR.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 3, 2018

I'm -0.5 on having DatetimeIndex.new return an object Index, would rather have it raise. Either way, that is a design change that should be orthogonal to this PR.

sure (and raising is prob correct)

return result, tz_parsed
raise TypeError(result)
else: # pragma: no cover
# GH#23675 this TypeError should never be hit, whereas the TypeError

This comment has been minimized.

Copy link
@jreback

jreback Dec 3, 2018

Contributor

this is not more clear.

Can you put the cases where each of these would be hit

@jbrockmendel

This comment has been minimized.

Copy link
Member Author

commented Dec 3, 2018

This is a blocker for DatetimeArray._from_sequence, so probably merits a 0.24.0 milestone tag

@jreback jreback added this to the 0.24.0 milestone Dec 3, 2018

@jreback

jreback approved these changes Dec 3, 2018

@jreback jreback merged commit e7991b3 into pandas-dev:master Dec 3, 2018

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20181203.6 succeeded
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 3, 2018

@jbrockmendel ok thanks. I realy really want to limit scope to things that we absolutely need to pushout the Datetime and associated EA's. sure there are other things to do, but theses need much more consideration that we have time for. (basically I am saying, this code IMHO is not strictly necessary for this refactor, though its a nice to do).

@jbrockmendel jbrockmendel deleted the jbrockmendel:pre-fixes4 branch Dec 3, 2018

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Pingviinituutti added a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.