ENH/BUG: Add is_dst method to DatetimeIndex and Timestamp to solve AmbiguousTimeError #22560

mroeschke · 2018-09-01T02:05:04Z

closes AmbiguousTimeError merging two timezone-aware DataFrames with DST change #18885
closes AmbiguousTimeError in floor() operation #18946
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

The issues above required a new method to keep track if each timestamp in the DatetimeIndex was previously in daylight savings time in order to pass into tz_localize(..., ambiguous=...)

Therefore, added a is_dst method to DatetimeIndex and Timestamp.

jbrockmendel · 2018-09-01T02:17:09Z

pandas/_libs/tslibs/nattype.pyx

@@ -353,7 +353,6 @@ class NaTType(_NaT):
    strptime = _make_error_func('strptime', datetime)
    strftime = _make_error_func('strftime', datetime)
    isocalendar = _make_error_func('isocalendar', datetime)
-    dst = _make_error_func('dst', datetime)


Why removing?

jbrockmendel · 2018-09-01T02:18:56Z

pandas/_libs/tslibs/timezones.pyx

+        Py_ssize_t n = len(values)
+        # Cython boolean memoryviews are not supported yet
+        # https://github.com/cython/cython/issues/2204
+        # bint[:] result


I think the workaround is to use a uint8

jbrockmendel · 2018-09-01T02:19:38Z

pandas/core/indexes/datetimes.py

+        >>> dti.is_dst()
+        Index([True, True, False, False], dtype='object')
+        """
+        return Index(timezones._is_dst(self.asi8, self.tz))


Make is_dst public?

jbrockmendel · 2018-09-01T02:20:49Z

Is this related to the “fold” arg added in py3?

mroeschke · 2018-09-04T00:28:06Z

Is this related to the “fold” arg added in py3?

Yup fairly similar. The ambiguous=True/False argument is essentially similar to the fold=0/1 argument in py3.

These AmbiguousTimeErrors stem from internal routines that transform a timestamp to an ambiguous time. I think it's best we try to return a sensible time based on the original timeseries instead of leaving the user with an error they cannot control.

codecov · 2018-09-04T02:23:21Z

Codecov Report

❗ No coverage uploaded for pull request base (master@a5fe9cf). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master   #22560   +/-   ##
=========================================
  Coverage          ?   92.04%           
=========================================
  Files             ?      169           
  Lines             ?    50789           
  Branches          ?        0           
=========================================
  Hits              ?    46747           
  Misses            ?     4042           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.45% <100%> (?)`
#single	`42.29% <33.33%> (?)`

Impacted Files	Coverage Δ
pandas/core/indexes/datetimelike.py	`97.77% <100%> (ø)`
pandas/core/indexes/datetimes.py	`95.56% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a5fe9cf...c03792e. Read the comment docs.

mroeschke · 2018-09-04T06:14:54Z

Ready for another look @jbrockmendel when you get the chance.

jreback · 2018-09-04T11:27:14Z

pandas/_libs/tslibs/timezones.pyx

@@ -186,16 +187,28 @@ cdef object get_utc_trans_times_from_dateutil_tz(object tz):
    return new_trans


-cdef int64_t[:] unbox_utcoffsets(object transinfo):
+cdef int64_t[:] unbox_utcoffsets(object transinfo, dst):


can you add bint to type dst

jreback · 2018-09-04T11:27:27Z

pandas/_libs/tslibs/timezones.pyx


    sz = len(transinfo)
    arr = np.empty(sz, dtype='i8')
-
+    key = int(dst)


you may not need this if typing bint

jreback · 2018-09-04T11:27:38Z

pandas/_libs/tslibs/timezones.pyx


    sz = len(transinfo)
    arr = np.empty(sz, dtype='i8')
-
+    key = int(dst)
    for i in range(sz):


can you add a comment about what you are extracting

jreback · 2018-09-04T11:27:48Z

pandas/_libs/tslibs/timezones.pyx

+    ----------
+    tz : object
+        timezone
+    dst : bool


type with bitn

jreback · 2018-09-04T11:28:18Z

pandas/_libs/tslibs/timezones.pyx

+    Parameters
+    ----------
+    tz : object
+        timezone


I think we accept only timezone objects and not strings here?

jreback · 2018-09-04T11:28:46Z

pandas/_libs/tslibs/timezones.pyx

+        return result
+    transitions, offsets, typ = get_dst_info(tz, True)
+    offsets = np.array(offsets)
+    # Fixed timezone offsets do not have DST transitions


new line here

jreback · 2018-09-04T11:28:58Z

pandas/_libs/tslibs/timezones.pyx

+    if typ not in {'pytz', 'dateutil'}:
+        return result
+    positions = transitions.searchsorted(values, side='right') - 1
+    # DST has nonzero offset


jreback · 2018-09-04T11:29:38Z

pandas/tests/indexes/datetimes/test_timezones.py

@@ -1012,6 +1012,22 @@ def test_iteration_preserves_nanoseconds(self, tz):
        for i, ts in enumerate(index):
            assert ts == index[i]

+    @pytest.mark.parametrize('arg, expected_arg', [
+        [[], []],
+        [date_range('2018-11-04', periods=4, freq='H', tz='US/Pacific'),


can you try with a dateutil as well (I think we default to pytz)

@pganssle Do dateutil timezones provide a way to access the possible daylight savings time shifts like pytz.timezone._transition_info does?

# I'm interested in the 2nd element of the tuples below In [4]: tz._transition_info Out[4]: [(datetime.timedelta(-1, 58020), datetime.timedelta(0), 'LMT'), (datetime.timedelta(-1, 57600), datetime.timedelta(0), 'PST'), (datetime.timedelta(-1, 61200), datetime.timedelta(0, 3600), 'PDT'), ...

Context: I am trying to create a DatetimeIndex.is_dst method that indicates whether a particular timestamp is in daylight savings time. I know it's possible to do bool(datetime.dst()) on each timestamp of the DatetimeIndex, but I am curious if there's a more efficient way determine DST for a dateutil timezone than iterating.

@mroeschke Yes and no. Yes in the sense that this is Python and obviously dateutil has to store that information somewhere. No in the sense that neither dateutil nor pytz exposes that information publicly. Note that _transition_info is a private member.

That said:

I'm not entirely sure why you want to implement a "fast isdst" - I'm not sure I've ever heard of a case where it was useful to know whether or not a time had or did not have DST.

I don't really see what this would buy you - the member you list looks like what you would get from utcoffset, dst and tzname, respectively. How do you expect to make use of this information?

jreback · 2018-09-04T11:30:05Z

pandas/tests/scalar/timestamp/test_timezones.py

+        assert ts_naive.is_dst() is False
+
+        ts_aware = ts_naive.tz_localize('US/Pacific')
+        assert ts_aware.is_dst() is True


can you test dateutil

pep8speaks · 2018-09-05T02:22:05Z

Hello @mroeschke! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/core/indexes/datetimelike.py !
There are no PEP8 issues in the file pandas/core/indexes/datetimes.py !
There are no PEP8 issues in the file pandas/tests/indexes/datetimes/test_timezones.py !
There are no PEP8 issues in the file pandas/tests/reshape/merge/test_merge.py !
There are no PEP8 issues in the file pandas/tests/scalar/test_nat.py !
There are no PEP8 issues in the file pandas/tests/scalar/timestamp/test_timezones.py !
There are no PEP8 issues in the file pandas/tests/series/test_datetime_values.py !

pganssle · 2018-09-05T12:12:42Z

I would recommend against adding an is_dst method. If this is for resolving ambiguous times, it would be preferable to use something more akin to a fold method, since there are several situations where an ambiguity cannot be resolved by querying DST or not (base offset shifts).

In general DST/not DST is only a proxy for the real information you want in all cases I can think of - the offset, the tzname or which side of an ambiguity you are on. All of those can be independently queried and/or specified without mentioning DST.

jbrockmendel · 2018-09-05T15:20:19Z

These AmbiguousTimeErrors stem from internal routines that transform a timestamp to an ambiguous time.

I'm not clear on what problem this is solving. Can you give an example of the internal routines that produce ambiguous times?

@pganssle and I don't always see eye-to-eye on how to treat public vs private attributes, but in general if he's got a strong opinion, it is worth listening to.

jorisvandenbossche · 2018-09-05T16:01:41Z

I also wanted to mention that we deliberately added an ambiguous keyword and deprecated is/infer_dst because not all ambiguities are DST issues, if I remember well (#7963, there was some discussion about it, but I am not very familiar with timezone handling details), so we should maybe consider that issue here as well. But I see that @pganssle in the meantime already noted something similar :)

mroeschke · 2018-09-07T06:17:18Z

@jbrockmendel

Can you give an example of the internal routines that produce ambiguous times?

When rounding a tz-aware date time to a particular frequency, essentially the steps are:

convert the date to a naive timestamp
round the i8 value to the frequency
relocalize the i8 value to the timezone

You can't necessarily convert to UTC first because it can affect the rounding (unless I am missing something).

So in #18946, a date that sits on the DST border 2017-10-29 02:00:00+02:00 that's rounded to an hour ('H'):

Convert to 2017-10-29 02:00:00
Round to the nearest hour (2017-10-29 02:00:00)
Relocalize to 2017-10-29 02:00:00+02:00 or 2017-10-29 02:00:00+01:00?

I guess #18946 can be solved by added an ambiguous kwarg to round also. However the routine above is the cause for other issues as well #18885, but that may be solved by converting to UTC first.

Using DST information may be a too hand-wavy solution to solve both #18946 and #18885. At minimum, I had also though having a is_dst would be a nice-to-have enhancement for DatetimeIndexes.

pganssle · 2018-09-08T16:27:49Z

@mroeschke I'm guessing that rounding issues in UTC is mainly at the day-or-higher level, right? Since rounding to the nearest minute or hour will cause no problems.

I think the correct solution here is to introduce the fold concept into pandas, and start deprecating pytz-isms.

I see a few good ways forward:

Write fold compatible wrappers for pytz and start deprecating pytz-isms like localize and normalize.
Write wrappers for dateutil time zones that expose some pytz-isms (which you can deprecate), and stop using pytz entirely.
Implement a third time zone library optimized for array types and drop both pytz and dateutil.

In all cases, I think fold semantics is the best way to go.

I am very strongly hoping that within a year I'll be able to get a fast, compiled Rust or C backend for many of dateutil's modules, including tz, which should make it a lot cheaper to make bulk function calls (particularly if I expose a C API or similar so you don't have any Python function call overhead), but number 3 might still be the best option since it's likely that supporting something that would work really well for array types in dateutil and work really well for normal Python implementations would be more complicated than just maintaining two separate libraries.

jbrockmendel · 2018-09-09T01:12:54Z

You can't necessarily convert to UTC first because it can affect the rounding (unless I am missing something).

I think there may be an implicit assumption in here that should be made explicit. Does the user want the rounding to be based on wall time or actual time? (My intuition is actual time)

ts = pd.Timestamp('2017-10-29 02:59:00+0200', tz='Europe/Madrid')
m = pd.Timedelta(minutes=1)

>>> ts + m
Timestamp('2017-10-29 02:00:00+0100', tz='Europe/Madrid')
>>> ts - 59*m
Timestamp('2017-10-29 02:00:00+0200', tz='Europe/Madrid')

If I were rounding ts to the nearest hour, it seems clear I'd want the one 1 minute away rather than 59 minutes away, despite the fact they have the same difference in wall time.

Is this intuition not applicable to the case you're considering?

pganssle · 2018-09-09T01:35:09Z

@jbrockmendel I think for time scales of hours or less it's probably actual time, but time scale of days they almost certainly want wall time.

Imagine rounding to midnight UTC, then going back to PST/PDT. No one wants to round to 17:00 sometimes and 16:00 others.

To be honest it's probably always wall time, but on sufficiently small time scales the two are equivalent.

mroeschke · 2018-09-09T01:42:56Z

I agree with @pganssle and would imagine users would want to round based on wall time in general.

Thanks for the feedback guys. I'm convinced I need to surface a fold (or ambiguous in pandas terms) argument in round then. Closing.

Matt Roeschke added 8 commits August 30, 2018 23:43

ENH: Add is_dst method to DatetimeIndex

be2b6dd

Address merge issue

18f8611

Add test for Nat

442f888

Add is_dst for timestamps

254b5a2

Add cython version of dst

7c73d0a

Add cython version of is_dst

b0938d5

Add cython version of is_dst pt2

641e295

Add fixedoffset test

820df35

jbrockmendel reviewed Sep 1, 2018

View reviewed changes

gfyoung added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype labels Sep 1, 2018

address review and test failures

9304b50

mroeschke added this to the 0.24.0 milestone Sep 4, 2018

jreback requested changes Sep 4, 2018

View reviewed changes

Matt Roeschke added 3 commits September 4, 2018 18:49

Merge remote-tracking branch 'upstream/master' into datetimeindex_dst

83109eb

type dst and document

627bb19

Add dateutil test for scalars

c03792e

mroeschke closed this Sep 9, 2018

mroeschke mentioned this pull request Sep 9, 2018

BUG/ENH: Handle AmbiguousTimeError in date rounding #22647

Merged

4 tasks

mroeschke deleted the datetimeindex_dst branch September 26, 2018 17:11

jreback mentioned this pull request Apr 20, 2019

list of day-light savings days #26160

Closed

mroeschke mentioned this pull request Nov 4, 2020

ENH: pandas.Timestamp.round infer correct timezone around DST #37592

Open

3 tasks

mroeschke mentioned this pull request Nov 14, 2021

DOC: Clarify DST rounding behavior in Timestamp/DatetimeIndex #44357

Merged

1 task

mroeschke mentioned this pull request May 26, 2022

ENH: Add fold attribute to DatetimeIndex #47130

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH/BUG: Add is_dst method to DatetimeIndex and Timestamp to solve AmbiguousTimeError #22560

ENH/BUG: Add is_dst method to DatetimeIndex and Timestamp to solve AmbiguousTimeError #22560

mroeschke commented Sep 1, 2018 •

edited

Loading

jbrockmendel Sep 1, 2018

jbrockmendel Sep 1, 2018

jbrockmendel Sep 1, 2018

jbrockmendel commented Sep 1, 2018

mroeschke commented Sep 4, 2018

codecov bot commented Sep 4, 2018 •

edited

Loading

mroeschke commented Sep 4, 2018

jreback Sep 4, 2018

jreback Sep 4, 2018

jreback Sep 4, 2018

jreback Sep 4, 2018

jreback Sep 4, 2018

jreback Sep 4, 2018

jreback Sep 4, 2018

jreback Sep 4, 2018

mroeschke Sep 5, 2018

pganssle Sep 5, 2018

jreback Sep 4, 2018

pep8speaks commented Sep 5, 2018

pganssle commented Sep 5, 2018

jbrockmendel commented Sep 5, 2018

jorisvandenbossche commented Sep 5, 2018

mroeschke commented Sep 7, 2018 •

edited

Loading

pganssle commented Sep 8, 2018

jbrockmendel commented Sep 9, 2018

pganssle commented Sep 9, 2018

mroeschke commented Sep 9, 2018

ENH/BUG: Add is_dst method to DatetimeIndex and Timestamp to solve AmbiguousTimeError #22560

ENH/BUG: Add is_dst method to DatetimeIndex and Timestamp to solve AmbiguousTimeError #22560

Conversation

mroeschke commented Sep 1, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Sep 1, 2018

mroeschke commented Sep 4, 2018

codecov bot commented Sep 4, 2018 • edited Loading

Codecov Report

mroeschke commented Sep 4, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Sep 5, 2018

pganssle commented Sep 5, 2018

jbrockmendel commented Sep 5, 2018

jorisvandenbossche commented Sep 5, 2018

mroeschke commented Sep 7, 2018 • edited Loading

pganssle commented Sep 8, 2018

jbrockmendel commented Sep 9, 2018

pganssle commented Sep 9, 2018

mroeschke commented Sep 9, 2018

mroeschke commented Sep 1, 2018 •

edited

Loading

codecov bot commented Sep 4, 2018 •

edited

Loading

mroeschke commented Sep 7, 2018 •

edited

Loading