API: DatetimeIndex creation with mixed tz timestamps #11488

Closed
sinhrks opened this Issue Oct 31, 2015 · 6 comments

Comments

Projects
None yet
2 participants
Member

sinhrks commented Oct 31, 2015

Related to #11456. Currently, DatetimeIndex handles mixed tz values like below. This behavior sometimes triggers coercion between tz-aware/tz-naive.

pd.Index([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02', tz='US/Eastern')])
# DatetimeIndex(['2010-12-31 19:00:00-05:00', '2011-01-02 00:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', freq=None)
# -> should be normal Index with object dtype?

pd.Index([pd.Timestamp('2011-01-01', tz='Asia/Tokyo'), pd.Timestamp('2011-01-02', tz='US/Eastern')])
# DatetimeIndex(['2010-12-31 15:00:00', '2011-01-02 05:00:00'], dtype='datetime64[ns]', freq=None)
# -> should be normal Index with object dtype?

 pd.Index([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02', tz='US/Eastern')], tz='Asia/Tokyo')
# DatetimeIndex(['2011-01-01 09:00:00+09:00', '2011-01-02 14:00:00+09:00'], dtype='datetime64[ns, Asia/Tokyo]', freq=None)
# -> OK, localized  to explicitly passed tz ('Asia/Tokyo')
Contributor

jreback commented Oct 31, 2015

yeh I think first 2 should be Index

sinhrks added this to the 0.18.0 milestone Oct 31, 2015

Member

sinhrks commented Oct 31, 2015

OK, I set milestone as 0.18 as it causes breaking changes.

How about following rules? The changes are marked as CHANGED.

Creation with Index

User wants Index, but doesn't specify its type. Thus, results may not be DatetimeIndex.

  • When all input have the same timezone or no timezone, result will be DatetimeIndex (no change)

    pd.Index([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02')])
    # DatetimeIndex(['2011-01-01', '2011-01-02'], dtype='datetime64[ns]', freq=None)
    
    pd.Index([pd.Timestamp('2011-01-01', tz='Asia/Tokyo'), pd.Timestamp('2011-01-02', tz='Asia/Tokyo')])
    # DatetimeIndex(['2011-01-01 00:00:00+09:00', '2011-01-02 00:00:00+09:00'], dtype='datetime64[ns, Asia/Tokyo]', freq=None)
    
  • When input have different timezones (cannot be represented by DatetimeIndex without tz conversion), result will be Index (dtype=object) (CHANGED)

    Before:

    pd.Index([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02', tz='US/Eastern')])
    # DatetimeIndex(['2010-12-31 19:00:00-05:00', '2011-01-02 00:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', freq=None)
    
    pd.Index([pd.Timestamp('2011-01-01', tz='Asia/Tokyo'), pd.Timestamp('2011-01-02', tz='US/Eastern')])
    # DatetimeIndex(['2010-12-31 15:00:00', '2011-01-02 05:00:00'], dtype='datetime64[ns]', freq=None)
    

    After:

    pd.Index([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02', tz='US/Eastern')])
    # Index([2011-01-01 00:00:00, 2011-01-02 00:00:00-05:00], dtype='object')
    
    pd.Index([pd.Timestamp('2011-01-01', tz='Asia/Tokyo'), pd.Timestamp('2011-01-02', tz='US/Eastern')])
    # Index([2011-01-01 00:00:00+09:00, 2011-01-02 00:00:00-05:00], dtype='object')
    
  • When user passes tz kw to Index, user wants to use the timezone. Convert/localize to passed tz, and the result will be DatetimeIndex (because it can be represented by dti) (no change)

    pd.Index([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02', tz='US/Eastern')], tz='Asia/Tokyo')
    # DatetimeIndex(['2011-01-01 09:00:00+09:00', '2011-01-02 14:00:00+09:00'], dtype='datetime64[ns, Asia/Tokyo]', freq=None)
    

Creation with DatetimeIndex

User wants DatetimeIndex. Raise if there is timezone mismatch.

  • When all input have the same timezone or no timezone, result will be DatetimeIndex (no change)

    pd.DatetimeIndex([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02')])
    # DatetimeIndex(['2011-01-01', '2011-01-02'], dtype='datetime64[ns]', freq=None)
    
    pd.DatetimeIndex([pd.Timestamp('2011-01-01', tz='Asia/Tokyo'), pd.Timestamp('2011-01-02', tz='Asia/Tokyo')])
    # DatetimeIndex(['2011-01-01 00:00:00+09:00', '2011-01-02 00:00:00+09:00'], dtype='datetime64[ns, Asia/Tokyo]', freq=None)
    
  • When input have different timezones, localize tz-naive but not convert tz-aware. (no change)

    pd.DatetimeIndex([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02', tz='US/Eastern')])
    # DatetimeIndex(['2010-12-31 19:00:00-05:00', '2011-01-02 00:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', freq=None)
    
    pd.DatetimeIndex([pd.Timestamp('2011-01-01', tz='Asia/Tokyo'), pd.Timestamp('2011-01-02', tz='US/Eastern')])
    # ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
    
  • When user passes tz kw to DatetimeIndex, user wants to use the timezone. Not convert tz-aware implicitly (no change)

    pd.DatetimeIndex([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02', tz='US/Eastern')], tz='Asia/Tokyo')
    # TypeError: Already tz-aware, use tz_convert to convert.
    
    pd.DatetimeIndex([pd.Timestamp('2011-01-01', tz='Asia/Tokyo'), pd.Timestamp('2011-01-02', tz='US/Eastern')], tz='Asia/Tikyo')
    # ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
    
Contributor

jreback commented Oct 31, 2015

0.18.0 is fine, though this is much more of a bug-fix than an API change. I think it is simply wrong that we are forcing conversions now. (in your changed section).

Member

sinhrks commented Oct 31, 2015

Yes, but I assume not few methods depends on the CHANGED logic to output the DTI.

I'm not sure how many ops relies on yet... Let me work once, and reconsider the milestone if it only affects to a narrow range.

Contributor

jreback commented Oct 31, 2015

sounds good!

Contributor

jreback commented Dec 10, 2015

closed by #11696

jreback closed this Dec 10, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment