Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API/ENH: tz_localize handling of nonexistent times: rename keyword + add shift option #22644

Merged
merged 64 commits into from
Oct 25, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
bf5e7bf
ENH: Add handling of nonexistent times
Sep 4, 2018
36d13c7
Merge remote-tracking branch 'upstream/master' into normalize_tz
Sep 5, 2018
a5ea445
Merge remote-tracking branch 'upstream/master' into normalize_tz
Sep 7, 2018
8753d00
correct misspelling
Sep 7, 2018
e1a6c6a
Merge remote-tracking branch 'upstream/master' into normalize_tz
Sep 7, 2018
a7c86c8
Merge remote-tracking branch 'upstream/master' into normalize_tz
Sep 8, 2018
1884c7b
Merge remote-tracking branch 'upstream/master' into normalize_tz
Sep 8, 2018
a6a05df
change method of handling nonexistent times
Sep 8, 2018
c4dc8aa
Add another comment
Sep 8, 2018
1bc81db
Add tests for timestamps
Sep 9, 2018
c81d58c
Add tests for datetimeindex
Sep 9, 2018
b2c8429
Add series test and entry in timeseries.rst
Sep 9, 2018
a65987d
Add whatsnew
Sep 9, 2018
710014c
Clean up docstring
Sep 9, 2018
93159e5
Fix nat doc
Sep 9, 2018
a0ffcdd
Merge remote-tracking branch 'upstream/master' into normalize_tz
Sep 11, 2018
219256f
Merge remote-tracking branch 'upstream/master' into normalize_tz
Sep 11, 2018
d435481
Merge remote-tracking branch 'upstream/master' into normalize_tz
Sep 14, 2018
7c849b6
Merge remote-tracking branch 'upstream/master' into normalize_tz
Sep 14, 2018
56ac4fe
Merge remote-tracking branch 'upstream/master' into normalize_tz
Sep 14, 2018
b7b09bd
Merge remote-tracking branch 'upstream/master' into normalize_tz
Sep 19, 2018
94a72a5
add versionadded
Sep 19, 2018
39b769e
Remove whitespace
Sep 19, 2018
18664d8
Merge remote-tracking branch 'upstream/master' into normalize_tz
Sep 24, 2018
8852d43
Depreciate errors and see what needs warning captures
Sep 24, 2018
38b95e9
Correct NaT docstring
Sep 24, 2018
c88b0d8
edit whatsnew and check for raised DeprecationWarning
Sep 24, 2018
1bae682
Address review
Sep 26, 2018
d30f891
change default errors argument to None
Sep 26, 2018
f337692
Map depreciation correctly and test
Sep 26, 2018
6a12a7e
Merge remote-tracking branch 'upstream/master' into normalize_tz
Sep 26, 2018
a7b8357
Try to correctly test for FutureWarning
Sep 26, 2018
7ad87ec
Try adjusting catching FutureWarning
Sep 27, 2018
abad726
Merge remote-tracking branch 'upstream/master' into normalize_tz
Sep 27, 2018
6be1c25
Reorder context managers
Sep 27, 2018
f8be4b6
clear previously seen FutureWarning
Sep 28, 2018
c192c9f
separate test
Sep 28, 2018
8909f38
Merge remote-tracking branch 'upstream/master' into normalize_tz
Sep 28, 2018
49f203f
Merge remote-tracking branch 'upstream/master' into normalize_tz
Sep 30, 2018
01678c7
adjust test
Sep 30, 2018
707fdde
Merge remote-tracking branch 'upstream/master' into normalize_tz
Sep 30, 2018
ae27a50
Remove errors argument to tz_localize_to_utc
Sep 30, 2018
85ed25e
Merge remote-tracking branch 'upstream/master' into normalize_tz
Oct 3, 2018
9041ebe
Add nonexistent assert
Oct 4, 2018
a4cdac2
Handle default None arg
Oct 5, 2018
0a9c1db
Merge remote-tracking branch 'upstream/master' into normalize_tz
Oct 5, 2018
efb382e
Address review
Oct 6, 2018
61c73ca
Catch another warning
Oct 6, 2018
20cc925
Merge remote-tracking branch 'upstream/master' into normalize_tz
Oct 7, 2018
394a0db
Add extra docstring
Oct 7, 2018
a5253ee
Merge remote-tracking branch 'upstream/master' into normalize_tz
Oct 8, 2018
5185683
Edit whatsnew
Oct 8, 2018
ba1bfed
Merge remote-tracking branch 'upstream/master' into normalize_tz
Oct 11, 2018
8b06c96
Address comments
Oct 11, 2018
42ae923
Remove stacklevel
Oct 12, 2018
fe575fe
Add back check_stacklevel
Oct 12, 2018
3482f92
Add blank line for rendering
Oct 17, 2018
f0e43e2
Merge remote-tracking branch 'upstream/master' into normalize_tz
Oct 18, 2018
b98d4cf
Merge remote-tracking branch 'upstream/master' into normalize_tz
Oct 18, 2018
e6c5b2d
Validate nonexistent argument
Oct 18, 2018
83423ad
Merge remote-tracking branch 'upstream/master' into normalize_tz
Oct 19, 2018
1ca0ab2
Fix type
Oct 19, 2018
5bcc977
Merge remote-tracking branch 'upstream/master' into normalize_tz
Oct 24, 2018
8cf16e2
Merge remote-tracking branch 'upstream/master' into normalize_tz
Oct 24, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions doc/source/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2357,6 +2357,38 @@ constructor as well as ``tz_localize``.
# tz_convert(None) is identical with tz_convert('UTC').tz_localize(None)
didx.tz_convert('UCT').tz_localize(None)

.. _timeseries.timezone_nonexistent:

Nonexistent Times when Localizing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A DST transition may also shift the local time ahead by 1 hour creating nonexistent
local times. The behavior of localizing a timeseries with nonexistent times
can be controlled by the ``nonexistent`` argument. The following options are available:

* ``raise``: Raises a ``pytz.NonExistentTimeError`` (the default behavior)
* ``NaT``: Replaces nonexistent times with ``NaT``
* ``shift``: Shifts nonexistent times forward to the closest real time

.. ipython:: python
dti = date_range(start='2015-03-29 01:30:00', periods=3, freq='H')
# 2:30 is a nonexistent time

Localization of nonexistent times will raise an error by default.

.. code-block:: ipython

In [2]: dti.tz_localize('Europe/Warsaw')
NonExistentTimeError: 2015-03-29 02:30:00
jorisvandenbossche marked this conversation as resolved.
Show resolved Hide resolved

Transform nonexistent times to ``NaT`` or the closest real time forward in time.

.. ipython:: python
dti
dti.tz_localize('Europe/Warsaw', nonexistent='shift')
dti.tz_localize('Europe/Warsaw', nonexistent='NaT')


.. _timeseries.timezone_series:

TZ Aware Dtypes
Expand Down
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,7 @@ Other Enhancements
- New attribute :attr:`__git_version__` will return git commit sha of current build (:issue:`21295`).
- Compatibility with Matplotlib 3.0 (:issue:`22790`).
- Added :meth:`Interval.overlaps`, :meth:`IntervalArray.overlaps`, and :meth:`IntervalIndex.overlaps` for determining overlaps between interval-like objects (:issue:`21998`)
- :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have gained the ``nonexistent`` argument for alternative handling of nonexistent times. See :ref:`timeseries.timezone_nonexsistent` (:issue:`8917`)

.. _whatsnew_0240.api_breaking:

Expand Down Expand Up @@ -912,6 +913,7 @@ Deprecations
- :meth:`FrozenNDArray.searchsorted` has deprecated the ``v`` parameter in favor of ``value`` (:issue:`14645`)
- :func:`DatetimeIndex.shift` and :func:`PeriodIndex.shift` now accept ``periods`` argument instead of ``n`` for consistency with :func:`Index.shift` and :func:`Series.shift`. Using ``n`` throws a deprecation warning (:issue:`22458`, :issue:`22912`)
- The ``fastpath`` keyword of the different Index constructors is deprecated (:issue:`23110`).
- :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have deprecated the ``errors`` argument in favor of the ``nonexistent`` argument (:issue:`8917`)

.. _whatsnew_0240.prior_deprecations:

Expand Down
82 changes: 48 additions & 34 deletions pandas/_libs/tslibs/conversion.pyx
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
# -*- coding: utf-8 -*-

import cython
from cython import Py_ssize_t

Expand Down Expand Up @@ -44,6 +43,7 @@ from nattype cimport NPY_NAT, checknull_with_nat
# Constants

cdef int64_t DAY_NS = 86400000000000LL
cdef int64_t HOURS_NS = 3600000000000
NS_DTYPE = np.dtype('M8[ns]')
TD_DTYPE = np.dtype('m8[ns]')

Expand Down Expand Up @@ -458,8 +458,7 @@ cdef _TSObject convert_str_to_tsobject(object ts, object tz, object unit,
if tz is not None:
# shift for localize_tso
ts = tz_localize_to_utc(np.array([ts], dtype='i8'), tz,
ambiguous='raise',
errors='raise')[0]
ambiguous='raise')[0]

except OutOfBoundsDatetime:
# GH#19382 for just-barely-OutOfBounds falling back to dateutil
Expand Down Expand Up @@ -826,7 +825,7 @@ def tz_convert(int64_t[:] vals, object tz1, object tz2):
@cython.boundscheck(False)
@cython.wraparound(False)
def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
object errors='raise'):
object nonexistent=None):
"""
Localize tzinfo-naive i8 to given time zone (using pytz). If
there are ambiguities in the values, raise AmbiguousTimeError.
Expand All @@ -837,7 +836,10 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
tz : tzinfo or None
ambiguous : str, bool, or arraylike
If arraylike, must have the same length as vals
errors : {"raise", "coerce"}, default "raise"
nonexistent : str
If arraylike, must have the same length as vals

.. versionadded:: 0.24.0
jorisvandenbossche marked this conversation as resolved.
Show resolved Hide resolved

Returns
-------
Expand All @@ -849,16 +851,13 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
ndarray ambiguous_array
Py_ssize_t i, idx, pos, ntrans, n = len(vals)
int64_t *tdata
int64_t v, left, right
int64_t v, left, right, val, v_left, v_right
ndarray[int64_t] result, result_a, result_b, dst_hours
npy_datetimestruct dts
bint infer_dst = False, is_dst = False, fill = False
bint is_coerce = errors == 'coerce', is_raise = errors == 'raise'
bint shift = False, fill_nonexist = False

# Vectorized version of DstTzInfo.localize

assert is_coerce or is_raise

if tz == UTC or tz is None:
return vals

Expand Down Expand Up @@ -888,39 +887,45 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
"the same size as vals")
ambiguous_array = np.asarray(ambiguous)

if nonexistent == 'NaT':
fill_nonexist = True
elif nonexistent == 'shift':
shift = True
else:
assert nonexistent in ('raise', None), ("nonexistent must be one of"
" {'NaT', 'raise', 'shift'}")

jreback marked this conversation as resolved.
Show resolved Hide resolved
trans, deltas, typ = get_dst_info(tz)

tdata = <int64_t*> cnp.PyArray_DATA(trans)
ntrans = len(trans)

# Determine whether each date lies left of the DST transition (store in
# result_a) or right of the DST transition (store in result_b)
result_a = np.empty(n, dtype=np.int64)
result_b = np.empty(n, dtype=np.int64)
result_a.fill(NPY_NAT)
result_b.fill(NPY_NAT)

# left side
idx_shifted = (np.maximum(0, trans.searchsorted(
idx_shifted_left = (np.maximum(0, trans.searchsorted(
vals - DAY_NS, side='right') - 1)).astype(np.int64)

for i in range(n):
v = vals[i] - deltas[idx_shifted[i]]
pos = bisect_right_i8(tdata, v, ntrans) - 1

# timestamp falls to the left side of the DST transition
if v + deltas[pos] == vals[i]:
result_a[i] = v

# right side
idx_shifted = (np.maximum(0, trans.searchsorted(
idx_shifted_right = (np.maximum(0, trans.searchsorted(
vals + DAY_NS, side='right') - 1)).astype(np.int64)

for i in range(n):
v = vals[i] - deltas[idx_shifted[i]]
pos = bisect_right_i8(tdata, v, ntrans) - 1
val = vals[i]
v_left = val - deltas[idx_shifted_left[i]]
pos_left = bisect_right_i8(tdata, v_left, ntrans) - 1
# timestamp falls to the left side of the DST transition
if v_left + deltas[pos_left] == val:
result_a[i] = v_left

v_right = val - deltas[idx_shifted_right[i]]
pos_right = bisect_right_i8(tdata, v_right, ntrans) - 1
# timestamp falls to the right side of the DST transition
if v + deltas[pos] == vals[i]:
result_b[i] = v
if v_right + deltas[pos_right] == val:
result_b[i] = v_right

if infer_dst:
dst_hours = np.empty(n, dtype=np.int64)
Expand All @@ -935,7 +940,7 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
stamp = _render_tstamp(vals[trans_idx])
raise pytz.AmbiguousTimeError(
"Cannot infer dst time from %s as there "
"are no repeated times" % stamp)
"are no repeated times".format(stamp))
# Split the array into contiguous chunks (where the difference between
# indices is 1). These are effectively dst transitions in different
# years which is useful for checking that there is not an ambiguous
Expand All @@ -960,18 +965,19 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
if switch_idx.size > 1:
raise pytz.AmbiguousTimeError(
"There are %i dst switches when "
"there should only be 1." % switch_idx.size)
"there should only be 1.".format(switch_idx.size))
switch_idx = switch_idx[0] + 1
# Pull the only index and adjust
a_idx = grp[:switch_idx]
b_idx = grp[switch_idx:]
dst_hours[grp] = np.hstack((result_a[a_idx], result_b[b_idx]))

for i in range(n):
val = vals[i]
left = result_a[i]
right = result_b[i]
if vals[i] == NPY_NAT:
result[i] = vals[i]
if val == NPY_NAT:
result[i] = val
elif left != NPY_NAT and right != NPY_NAT:
if left == right:
result[i] = left
Expand All @@ -986,19 +992,27 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
elif fill:
result[i] = NPY_NAT
else:
stamp = _render_tstamp(vals[i])
stamp = _render_tstamp(val)
raise pytz.AmbiguousTimeError(
"Cannot infer dst time from %r, try using the "
"'ambiguous' argument" % stamp)
"'ambiguous' argument".format(stamp))
elif left != NPY_NAT:
result[i] = left
elif right != NPY_NAT:
result[i] = right
else:
if is_coerce:
# Handle nonexistent times
if shift:
# Shift the nonexistent time forward to the closest existing
# time
remaining_minutes = val % HOURS_NS
new_local = val + (HOURS_NS - remaining_minutes)
delta_idx = trans.searchsorted(new_local, side='right') - 1
result[i] = new_local - deltas[delta_idx]
elif fill_nonexist:
result[i] = NPY_NAT
else:
stamp = _render_tstamp(vals[i])
stamp = _render_tstamp(val)
raise pytz.NonExistentTimeError(stamp)

return result
Expand Down
20 changes: 16 additions & 4 deletions pandas/_libs/tslibs/nattype.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -564,14 +564,26 @@ class NaTType(_NaT):
- 'NaT' will return NaT for an ambiguous time
- 'raise' will raise an AmbiguousTimeError for an ambiguous time

errors : 'raise', 'coerce', default 'raise'
nonexistent : 'shift', 'NaT', default 'raise'
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.

- 'shift' will shift the nonexistent time forward to the closest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, rst formatting nitpick: there needs to be a blank line between the first sentences, and the start of this list ... (getting rst right can be annoying ..)

existing time
- 'NaT' will return NaT where there are nonexistent times
jorisvandenbossche marked this conversation as resolved.
Show resolved Hide resolved
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times

.. versionadded:: 0.24.0

errors : 'raise', 'coerce', default None
- 'raise' will raise a NonExistentTimeError if a timestamp is not
valid in the specified timezone (e.g. due to a transition from
or to DST time)
or to DST time). Use ``nonexistent='raise'`` instead.
- 'coerce' will return NaT if the timestamp can not be converted
into the specified timezone
into the specified timezone. Use ``nonexistent='NaT'`` instead.

.. versionadded:: 0.19.0
.. deprecated:: 0.24.0

Returns
-------
Expand Down
43 changes: 37 additions & 6 deletions pandas/_libs/tslibs/timestamps.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -961,7 +961,8 @@ class Timestamp(_Timestamp):
def is_leap_year(self):
return bool(ccalendar.is_leapyear(self.year))

def tz_localize(self, tz, ambiguous='raise', errors='raise'):
def tz_localize(self, tz, ambiguous='raise', nonexistent='raise',
errors=None):
"""
Convert naive Timestamp to local time zone, or remove
timezone from tz-aware Timestamp.
Expand All @@ -978,14 +979,26 @@ class Timestamp(_Timestamp):
- 'NaT' will return NaT for an ambiguous time
- 'raise' will raise an AmbiguousTimeError for an ambiguous time

errors : 'raise', 'coerce', default 'raise'
nonexistent : 'shift', 'NaT', default 'raise'
jorisvandenbossche marked this conversation as resolved.
Show resolved Hide resolved
A nonexistent time does not exist in a particular timezone
where clocks moved forward due to DST.

- 'shift' will shift the nonexistent time forward to the closest
existing time
- 'NaT' will return NaT where there are nonexistent times
- 'raise' will raise an NonExistentTimeError if there are
nonexistent times

.. versionadded:: 0.24.0

errors : 'raise', 'coerce', default None
jorisvandenbossche marked this conversation as resolved.
Show resolved Hide resolved
- 'raise' will raise a NonExistentTimeError if a timestamp is not
valid in the specified timezone (e.g. due to a transition from
or to DST time)
or to DST time). Use ``nonexistent='raise'`` instead.
- 'coerce' will return NaT if the timestamp can not be converted
into the specified timezone
into the specified timezone. Use ``nonexistent='NaT'`` instead.

.. versionadded:: 0.19.0
.. deprecated:: 0.24.0
jorisvandenbossche marked this conversation as resolved.
Show resolved Hide resolved

Returns
-------
Expand All @@ -999,13 +1012,31 @@ class Timestamp(_Timestamp):
if ambiguous == 'infer':
raise ValueError('Cannot infer offset with only one time.')

if errors is not None:
warnings.warn("The errors argument is deprecated and will be "
"removed in a future release. Use "
"nonexistent='NaT' or nonexistent='raise' "
"instead.", FutureWarning)
if errors == 'coerce':
nonexistent = 'NaT'
elif errors == 'raise':
nonexistent = 'raise'
else:
raise ValueError("The errors argument must be either 'coerce' "
"or 'raise'.")

if nonexistent not in ('raise', 'NaT', 'shift'):
raise ValueError("The nonexistent argument must be one of 'raise',"
" 'NaT' or 'shift'")

if self.tzinfo is None:
# tz naive, localize
tz = maybe_get_tz(tz)
if not is_string_object(ambiguous):
ambiguous = [ambiguous]
value = tz_localize_to_utc(np.array([self.value], dtype='i8'), tz,
ambiguous=ambiguous, errors=errors)[0]
ambiguous=ambiguous,
nonexistent=nonexistent)[0]
return Timestamp(value, tz=tz)
else:
if tz is None:
Expand Down
Loading