BUG: tslib.tz_convert and tslib.tz_convert_single may output different result in DST #7798

Merged
merged 1 commit into from Aug 3, 2014

Conversation

Projects
None yet
3 participants
Member

sinhrks commented Jul 18, 2014

These functions may return different result in case of DST. There seems to be 2 problems:

  • tslib.tz_convert checks DST and change deltas by adjusting pos by 1. If input has time-gaps more than 2 DST spans, result will be incorrect.
  • tslib.tz_convert_single results incorrect if input is just on DST edge.
import pandas as pd
import numpy as np
import datetime
import pytz

idx = pd.date_range('2014-03-01', '2015-01-10', freq='H')

f = lambda x: pd.tslib.tz_convert_single(x, pd.tslib.maybe_get_tz('US/Eastern'), 'UTC')
result = pd.tslib.tz_convert(idx.asi8, pd.tslib.maybe_get_tz('US/Eastern'), 'UTC')
result_single = np.vectorize(f)(idx.asi8)
result[result != result_single]
# [1394370000000000000 1394373600000000000 1394377200000000000 ...,
#  1414918800000000000 1414922400000000000 1414926000000000000]

Note

Additionally, it was modifed to close #7880 also.

Member

sinhrks commented Jul 18, 2014

Perf result. I think it increases a performance of inferred_freq for tz-aware data a bit.

...
write_store                                  |  11.6420 |   9.7557 |   1.1934 |
reindex_fillna_pad                           |   1.0404 |   0.8606 |   1.2089 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Ratio < 1.0 means the target commit is faster then the baseline.
Seed used: 1234

Target [7a61a96] : BUG: tslib.tzconvert handles DST incorrectly
Base   [30764bd] : Merge pull request #7783 from lexual/docs_categorical_tidyup
Contributor

jreback commented Jul 18, 2014

can you run the perf test here: pydata#7652 (and the 'test' in #7633,
which couldn't figure out how to put in a vbench, its basically a str(df) with a big period index).

just to make sure the perf is still ok

(your fix looks good though)

jreback added this to the 0.15.0 milestone Jul 18, 2014

Member

sinhrks commented Jul 19, 2014

OK, done.

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
datetimeindex_normalize                      |   5.6077 |   5.7670 |   0.9724 |

Target [3c8e2b3] : BUG: tslib.tzconvert handles DST incorrectly
Base   [8cd3dd6] : Merge pull request #7652 from jreback/dst_transitions
Member

sinhrks commented Jul 19, 2014

And test in #7633 takes few seconds.

import pandas as pd

import numpy as np
period = 1.0 / 2048 * 1e9
freq = pd.datetools.Nano(period)
df = pd.DataFrame({'a': np.random.randn(6e6)}, index=pd.date_range('now', periods=6e6, freq=freq, tz='EST'))
print(df.a)
...
# 2014-07-19 03:08:12.685023438-05:00    0.650118
# 2014-07-19 03:08:12.685511719-05:00   -0.683781
# Freq: 488281N, Name: a, Length: 6000000
# [Finished in 1.9s]
Contributor

rockg commented Jul 20, 2014

Should we add a test for the fall DST transition similar to what you did for spring? I understand there is not really any specific "spring" code per se, but sometimes strange things popup. Otherwise the changes look fine.

Member

sinhrks commented Jul 22, 2014

@rockg Yep, added tests. Could you check whether it is valid, otherwise lmk better one?

@jreback jreback commented on an outdated diff Jul 22, 2014

pandas/tseries/tests/test_timezones.py
@@ -784,6 +784,43 @@ def test_utc_with_system_utc(self):
# check that the time hasn't changed.
self.assertEqual(ts, ts.tz_convert(dateutil.tz.tzutc()))
+ def test_tslib_tz_convert_trans_pos_plus_1__bug(self):
+ # Regression test for tslib.tz_convert(vals, tz1, tz2).
+ # See https://github.com/pydata/pandas/issues/4496 for details.
+ idx = date_range(datetime(2011, 3, 26, 23), datetime(2011, 3, 27, 1), freq='1min')
+ idx = idx.tz_localize('UTC')
+ idx = idx.tz_convert('Europe/Moscow')
+
+ expected = np.repeat(np.array([3, 4, 5]), np.array([60, 60, 1]))
+ self.assert_numpy_array_equal(idx.hour, expected)
+
+ def test_tslib_tz_convert_dst(self):
+ # Start DST
+ idx = date_range('2014-03-08 23:00', '2014-03-09 09:00', freq='1min', tz='UTC')
@jreback

jreback Jul 22, 2014

Contributor

can you add a freq loop around this with several freqs? (I know a bit annoying as you have to specify the result as well). you have T, maybe H and D? (as D should be unaffected by the DST transitions, while T and H are)

Contributor

rockg commented Jul 22, 2014

That's exactly what I was thinking.

Member

sinhrks commented Jul 22, 2014

OK, modified test to cover other freqs.

Member

sinhrks commented Jul 30, 2014

Added the fix for #7880.

Member

sinhrks commented Aug 2, 2014

@jreback I think it is ready, lmk if anything.

Contributor

jreback commented Aug 2, 2014

looks fine, what was the problem with #7880 ? (e.g. what fixed it)

Member

sinhrks commented Aug 3, 2014

Currently, it includes 2 fixes (described in release note)

  • Original issue is difference between tz_convert and tz_convert_single described in problem description.
  • Additionally, added the fix for #7880 after receiving the issue. (because #7880 can be fixed by small modification on the same part)

@jreback jreback added a commit that referenced this pull request Aug 3, 2014

@jreback jreback Merge pull request #7798 from sinhrks/tz_convert_bug
BUG: tslib.tz_convert and tslib.tz_convert_single may output different result in DST
ce5a4ef

@jreback jreback merged commit ce5a4ef into pandas-dev:master Aug 3, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
Contributor

jreback commented Aug 3, 2014

thanks!

sinhrks deleted the sinhrks:tz_convert_bug branch Aug 4, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment