Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC/CLN: Timezone section in timeseries.rst #24825

Merged
merged 9 commits into from Feb 3, 2019
204 changes: 83 additions & 121 deletions doc/source/user_guide/timeseries.rst
Expand Up @@ -2129,11 +2129,13 @@ These can easily be converted to a ``PeriodIndex``:
Time Zone Handling
------------------

Pandas provides rich support for working with timestamps in different time
zones using ``pytz`` and ``dateutil`` libraries. ``dateutil`` currently is only
supported for fixed offset and tzfile zones. The default library is ``pytz``.
Support for ``dateutil`` is provided for compatibility with other
applications e.g. if you use ``dateutil`` in other Python packages.
pandas provides rich support for working with timestamps in different time
zones using the ``pytz`` and ``dateutil`` libraries.

.. note::

pandas does not yet support ``datetime.timezone`` objects from the standard
library.

Working with Time Zones
~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -2145,94 +2147,87 @@ By default, pandas objects are time zone unaware:
rng = pd.date_range('3/6/2012 00:00', periods=15, freq='D')
rng.tz is None

To supply the time zone, you can use the ``tz`` keyword to ``date_range`` and
other functions. Dateutil time zone strings are distinguished from ``pytz``
time zones by starting with ``dateutil/``.
To localize these dates to a time zone (assign a particular time zone to a naive date),
you can use the ``tz_localize`` method or the ``tz`` keyword argument in
:func:`date_range`, :class:`Timestamp`, or :class:`DatetimeIndex`.
You can either pass ``pytz`` or ``dateutil`` time zone objects or Olson time zone database strings.
Olson time zone strings will return ``pytz`` time zone objects by default.
To return ``dateutil`` time zone objects, append ``dateutil/`` before the string.

* In ``pytz`` you can find a list of common (and less common) time zones using
``from pytz import common_timezones, all_timezones``.
* ``dateutil`` uses the OS timezones so there isn't a fixed list available. For
* ``dateutil`` uses the OS time zones so there isn't a fixed list available. For
common zones, the names are the same as ``pytz``.

.. ipython:: python

import dateutil

# pytz
rng_pytz = pd.date_range('3/6/2012 00:00', periods=10, freq='D',
rng_pytz = pd.date_range('3/6/2012 00:00', periods=3, freq='D',
tz='Europe/London')
rng_pytz.tz

# dateutil
rng_dateutil = pd.date_range('3/6/2012 00:00', periods=10, freq='D',
tz='dateutil/Europe/London')
rng_dateutil = pd.date_range('3/6/2012 00:00', periods=3, freq='D')
rng_dateutil = rng_dateutil.tz_localize('dateutil/Europe/London')
rng_dateutil.tz

# dateutil - utc special case
rng_utc = pd.date_range('3/6/2012 00:00', periods=10, freq='D',
rng_utc = pd.date_range('3/6/2012 00:00', periods=3, freq='D',
tz=dateutil.tz.tzutc())
rng_utc.tz

Note that the ``UTC`` timezone is a special case in ``dateutil`` and should be constructed explicitly
as an instance of ``dateutil.tz.tzutc``. You can also construct other timezones explicitly first,
which gives you more control over which time zone is used:
Note that the ``UTC`` time zone is a special case in ``dateutil`` and should be constructed explicitly
as an instance of ``dateutil.tz.tzutc``. You can also construct other time
zones objects explicitly first.

.. ipython:: python

import pytz

# pytz
tz_pytz = pytz.timezone('Europe/London')
rng_pytz = pd.date_range('3/6/2012 00:00', periods=10, freq='D',
tz=tz_pytz)
rng_pytz = pd.date_range('3/6/2012 00:00', periods=3, freq='D')
rng_pytz = rng_pytz.tz_localize(tz_pytz)
rng_pytz.tz == tz_pytz

# dateutil
tz_dateutil = dateutil.tz.gettz('Europe/London')
rng_dateutil = pd.date_range('3/6/2012 00:00', periods=10, freq='D',
rng_dateutil = pd.date_range('3/6/2012 00:00', periods=3, freq='D',
tz=tz_dateutil)
rng_dateutil.tz == tz_dateutil

Timestamps, like Python's ``datetime.datetime`` object can be either time zone
naive or time zone aware. Naive time series and ``DatetimeIndex`` objects can be
*localized* using ``tz_localize``:

.. ipython:: python

ts = pd.Series(np.random.randn(len(rng)), rng)

ts_utc = ts.tz_localize('UTC')
ts_utc

Again, you can explicitly construct the timezone object first.
You can use the ``tz_convert`` method to convert pandas objects to convert
tz-aware data to another time zone:
To convert a time zone aware pandas object from one time zone to another,
you can use the ``tz_convert`` method.

.. ipython:: python

ts_utc.tz_convert('US/Eastern')
rng_pytz.tz_convert('US/Eastern')

.. warning::

Be wary of conversions between libraries. For some zones ``pytz`` and ``dateutil`` have different
definitions of the zone. This is more of a problem for unusual timezones than for
Be wary of conversions between libraries. For some time zones, ``pytz`` and ``dateutil`` have different
definitions of the zone. This is more of a problem for unusual time zones than for
'standard' zones like ``US/Eastern``.

.. warning::

Be aware that a timezone definition across versions of timezone libraries may not
be considered equal. This may cause problems when working with stored data that
is localized using one version and operated on with a different version.
See :ref:`here<io.hdf5-notes>` for how to handle such a situation.
Be aware that a time zone definition across versions of time zone libraries may not
be considered equal. This may cause problems when working with stored data that
is localized using one version and operated on with a different version.
See :ref:`here<io.hdf5-notes>` for how to handle such a situation.

.. warning::

It is incorrect to pass a timezone directly into the ``datetime.datetime`` constructor (e.g.,
``datetime.datetime(2011, 1, 1, tz=timezone('US/Eastern'))``. Instead, the datetime
needs to be localized using the localize method on the timezone.
For ``pytz`` time zones, it is incorrect to pass a time zone object directly into
the ``datetime.datetime`` constructor
(e.g., ``datetime.datetime(2011, 1, 1, tz=pytz.timezone('US/Eastern'))``.
Instead, the datetime needs to be localized using the ``localize`` method
on the ``pytz`` time zone object.

Under the hood, all timestamps are stored in UTC. Scalar values from a
``DatetimeIndex`` with a time zone will have their fields (day, hour, minute)
Under the hood, all timestamps are stored in UTC. Values from a time zone aware
:class:`DatetimeIndex` or :class:`Timestamp` will have their fields (day, hour, minute, etc.)
localized to the time zone. However, timestamps with the same UTC value are
still considered to be equal even if they are in different time zones:

Expand All @@ -2241,114 +2236,78 @@ still considered to be equal even if they are in different time zones:
rng_eastern = rng_utc.tz_convert('US/Eastern')
rng_berlin = rng_utc.tz_convert('Europe/Berlin')

rng_eastern[5]
rng_berlin[5]
rng_eastern[5] == rng_berlin[5]

Like ``Series``, ``DataFrame``, and ``DatetimeIndex``; ``Timestamp`` objects
can be converted to other time zones using ``tz_convert``:

.. ipython:: python

rng_eastern[5]
rng_berlin[5]
rng_eastern[5].tz_convert('Europe/Berlin')

Localization of ``Timestamp`` functions just like ``DatetimeIndex`` and ``Series``:

.. ipython:: python

rng[5]
rng[5].tz_localize('Asia/Shanghai')

rng_eastern[2]
rng_berlin[2]
rng_eastern[2] == rng_berlin[2]

Operations between ``Series`` in different time zones will yield UTC
``Series``, aligning the data on the UTC timestamps:
Operations between :class:`Series` in different time zones will yield UTC
:class:`Series`, aligning the data on the UTC timestamps:

.. ipython:: python

ts_utc = pd.Series(range(3), pd.date_range('20130101', periods=3, tz='UTC'))
eastern = ts_utc.tz_convert('US/Eastern')
berlin = ts_utc.tz_convert('Europe/Berlin')
result = eastern + berlin
result
result.index

To remove timezone from tz-aware ``DatetimeIndex``, use ``tz_localize(None)`` or ``tz_convert(None)``.
``tz_localize(None)`` will remove timezone holding local time representations.
``tz_convert(None)`` will remove timezone after converting to UTC time.
To remove time zone information, use ``tz_localize(None)`` or ``tz_convert(None)``.
``tz_localize(None)`` will remove the time zone yielding the local time representation.
``tz_convert(None)`` will remove the time zone after converting to UTC time.

.. ipython:: python

didx = pd.date_range(start='2014-08-01 09:00', freq='H',
periods=10, tz='US/Eastern')
periods=3, tz='US/Eastern')
didx
didx.tz_localize(None)
didx.tz_convert(None)

# tz_convert(None) is identical with tz_convert('UTC').tz_localize(None)
# tz_convert(None) is identical to tz_convert('UTC').tz_localize(None)
didx.tz_convert('UTC').tz_localize(None)

.. _timeseries.timezone_ambiguous:

Ambiguous Times when Localizing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In some cases, localize cannot determine the DST and non-DST hours when there are
duplicates. This often happens when reading files or database records that simply
duplicate the hours. Passing ``ambiguous='infer'`` into ``tz_localize`` will
attempt to determine the right offset. Below the top example will fail as it
contains ambiguous times and the bottom will infer the right offset.
``tz_localize`` may not be able to determine the UTC offset of a timestamp
because daylight savings time (DST) in a local time zone causes some times to occur
twice within one day ("clocks fall back"). The following options are available:

* ``'raise'``: Raises a ``pytz.AmbiguousTimeError`` (the default behavior)
* ``'infer'``: Attempt to determine the correct offset base on the monotonicity of the timestamps
* ``'NaT'``: Replaces ambiguous times with ``NaT``
* ``bool``: ``True`` represents a DST time, ``False`` represents non-DST time. An array-like of ``bool`` values is supported for a sequence of times.

.. ipython:: python

rng_hourly = pd.DatetimeIndex(['11/06/2011 00:00', '11/06/2011 01:00',
'11/06/2011 01:00', '11/06/2011 02:00',
'11/06/2011 03:00'])
'11/06/2011 01:00', '11/06/2011 02:00'])

This will fail as there are ambiguous times
This will fail as there are ambiguous times (``'11/06/2011 01:00'``)

.. code-block:: ipython

In [2]: rng_hourly.tz_localize('US/Eastern')
AmbiguousTimeError: Cannot infer dst time from Timestamp('2011-11-06 01:00:00'), try using the 'ambiguous' argument

Infer the ambiguous times

.. ipython:: python

rng_hourly_eastern = rng_hourly.tz_localize('US/Eastern', ambiguous='infer')
rng_hourly_eastern.to_list()

In addition to 'infer', there are several other arguments supported. Passing
an array-like of bools or 0s/1s where True represents a DST hour and False a
non-DST hour, allows for distinguishing more than one DST
transition (e.g., if you have multiple records in a database each with their
own DST transition). Or passing 'NaT' will fill in transition times
with not-a-time values. These methods are available in the ``DatetimeIndex``
constructor as well as ``tz_localize``.
Handle these ambiguous times by specifying the following.

.. ipython:: python

rng_hourly_dst = np.array([1, 1, 0, 0, 0])
rng_hourly.tz_localize('US/Eastern', ambiguous=rng_hourly_dst).to_list()
rng_hourly.tz_localize('US/Eastern', ambiguous='NaT').to_list()

didx = pd.date_range(start='2014-08-01 09:00', freq='H',
periods=10, tz='US/Eastern')
didx
didx.tz_localize(None)
didx.tz_convert(None)

# tz_convert(None) is identical with tz_convert('UTC').tz_localize(None)
didx.tz_convert('UCT').tz_localize(None)
rng_hourly.tz_localize('US/Eastern', ambiguous='infer')
rng_hourly.tz_localize('US/Eastern', ambiguous='NaT')
rng_hourly.tz_localize('US/Eastern', ambiguous=[True, True, False, False])

.. _timeseries.timezone_nonexistent:

Nonexistent Times when Localizing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A DST transition may also shift the local time ahead by 1 hour creating nonexistent
local times. The behavior of localizing a timeseries with nonexistent times
local times ("clocks spring forward"). The behavior of localizing a timeseries with nonexistent times
can be controlled by the ``nonexistent`` argument. The following options are available:

* ``'raise'``: Raises a ``pytz.NonExistentTimeError`` (the default behavior)
Expand Down Expand Up @@ -2382,58 +2341,61 @@ Transform nonexistent times to ``NaT`` or shift the times.

.. _timeseries.timezone_series:

TZ Aware Dtypes
~~~~~~~~~~~~~~~
Time Zone Series Operations
~~~~~~~~~~~~~~~~~~~~~~~~~~~

``Series/DatetimeIndex`` with a timezone **naive** value are represented with a dtype of ``datetime64[ns]``.
A :class:`Series` with time zone **naive** values is
represented with a dtype of ``datetime64[ns]``.

.. ipython:: python

s_naive = pd.Series(pd.date_range('20130101', periods=3))
s_naive

``Series/DatetimeIndex`` with a timezone **aware** value are represented with a dtype of ``datetime64[ns, tz]``.
A :class:`Series` with a time zone **aware** values is
represented with a dtype of ``datetime64[ns, tz]`` where ``tz`` is the time zone

.. ipython:: python

s_aware = pd.Series(pd.date_range('20130101', periods=3, tz='US/Eastern'))
s_aware

Both of these ``Series`` can be manipulated via the ``.dt`` accessor, see :ref:`here <basics.dt_accessors>`.
Both of these :class:`Series` time zone information
can be manipulated via the ``.dt`` accessor, see :ref:`the dt accessor section <basics.dt_accessors>`.

For example, to localize and convert a naive stamp to timezone aware.
For example, to localize and convert a naive stamp to time zone aware.

.. ipython:: python

s_naive.dt.tz_localize('UTC').dt.tz_convert('US/Eastern')


Further more you can ``.astype(...)`` timezone aware (and naive). This operation is effectively a localize AND convert on a naive stamp, and
a convert on an aware stamp.
Time zone information can also be manipulated using the ``astype`` method.
This method can localize and convert time zone naive timestamps or
convert time zone aware timestamps.

.. ipython:: python

# localize and convert a naive timezone
# localize and convert a naive time zone
s_naive.astype('datetime64[ns, US/Eastern]')

# make an aware tz naive
s_aware.astype('datetime64[ns]')

# convert to a new timezone
# convert to a new time zone
s_aware.astype('datetime64[ns, CET]')

.. note::

Using :meth:`Series.to_numpy` on a ``Series``, returns a NumPy array of the data.
NumPy does not currently support timezones (even though it is *printing* in the local timezone!),
therefore an object array of Timestamps is returned for timezone aware data:
NumPy does not currently support time zones (even though it is *printing* in the local time zone!),
therefore an object array of Timestamps is returned for time zone aware data:

.. ipython:: python

s_naive.to_numpy()
s_aware.to_numpy()

By converting to an object array of Timestamps, it preserves the timezone
By converting to an object array of Timestamps, it preserves the time zone
information. For example, when converting back to a Series:

.. ipython:: python
Expand Down