Skip to content

Commit

Permalink
DOC: Document breaking change to read_csv (#24989)
Browse files Browse the repository at this point in the history
  • Loading branch information
TomAugspurger authored and jreback committed Jan 29, 2019
1 parent 3fd47fe commit ba7b895
Show file tree
Hide file tree
Showing 3 changed files with 84 additions and 3 deletions.
30 changes: 30 additions & 0 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -989,6 +989,36 @@ a single date rather than the entire array.
os.remove('tmp.csv')
.. _io.csv.mixed_timezones:

Parsing a CSV with mixed Timezones
++++++++++++++++++++++++++++++++++

Pandas cannot natively represent a column or index with mixed timezones. If your CSV
file contains columns with a mixture of timezones, the default result will be
an object-dtype column with strings, even with ``parse_dates``.


.. ipython:: python
content = """\
a
2000-01-01T00:00:00+05:00
2000-01-01T00:00:00+06:00"""
df = pd.read_csv(StringIO(content), parse_dates=['a'])
df['a']
To parse the mixed-timezone values as a datetime column, pass a partially-applied
:func:`to_datetime` with ``utc=True`` as the ``date_parser``.

.. ipython:: python
df = pd.read_csv(StringIO(content), parse_dates=['a'],
date_parser=lambda col: pd.to_datetime(col, utc=True))
df['a']
.. _io.dayfirst:


Expand Down
46 changes: 46 additions & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -648,6 +648,52 @@ that the dates have been converted to UTC
pd.to_datetime(["2015-11-18 15:30:00+05:30",
"2015-11-18 16:30:00+06:30"], utc=True)
.. _whatsnew_0240.api_breaking.read_csv_mixed_tz:

Parsing mixed-timezones with :func:`read_csv`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`read_csv` no longer silently converts mixed-timezone columns to UTC (:issue:`24987`).

*Previous Behavior*

.. code-block:: python
>>> import io
>>> content = """\
... a
... 2000-01-01T00:00:00+05:00
... 2000-01-01T00:00:00+06:00"""
>>> df = pd.read_csv(io.StringIO(content), parse_dates=['a'])
>>> df.a
0 1999-12-31 19:00:00
1 1999-12-31 18:00:00
Name: a, dtype: datetime64[ns]
*New Behavior*

.. ipython:: python
import io
content = """\
a
2000-01-01T00:00:00+05:00
2000-01-01T00:00:00+06:00"""
df = pd.read_csv(io.StringIO(content), parse_dates=['a'])
df.a
As can be seen, the ``dtype`` is object; each value in the column is a string.
To convert the strings to an array of datetimes, the ``date_parser`` argument

.. ipython:: python
df = pd.read_csv(io.StringIO(content), parse_dates=['a'],
date_parser=lambda col: pd.to_datetime(col, utc=True))
df.a
See :ref:`whatsnew_0240.api.timezone_offset_parsing` for more.

.. _whatsnew_0240.api_breaking.period_end_time:

Time values in ``dt.end_time`` and ``to_timestamp(how='end')``
Expand Down
11 changes: 8 additions & 3 deletions pandas/io/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,9 +203,14 @@
* dict, e.g. {{'foo' : [1, 3]}} -> parse columns 1, 3 as date and call
result 'foo'
If a column or index contains an unparseable date, the entire column or
index will be returned unaltered as an object data type. For non-standard
datetime parsing, use ``pd.to_datetime`` after ``pd.read_csv``
If a column or index cannot be represented as an array of datetimes,
say because of an unparseable value or a mixture of timezones, the column
or index will be returned unaltered as an object data type. For
non-standard datetime parsing, use ``pd.to_datetime`` after
``pd.read_csv``. To parse an index or column with a mixture of timezones,
specify ``date_parser`` to be a partially-applied
:func:`pandas.to_datetime` with ``utc=True``. See
:ref:`io.csv.mixed_timezones` for more.
Note: A fast-path exists for iso8601-formatted dates.
infer_datetime_format : bool, default False
Expand Down

0 comments on commit ba7b895

Please sign in to comment.