Timestamp.tz_localize() NonExistentTimeError handling #8917

mskrajnowski · 2014-11-28T11:17:53Z

I'm trying to use pandas to speed up timezone conversions on many datetimes and I can't get around NonExistentTimeErrors. Pandas tz_localize() seems to ignore the ambiguous argument for the non existent time case.

Example:

import pytz
import pandas

tz = pytz.timezone('Europe/Warsaw')
non_existent = datetime.datetime(2015, 3, 29, 2, 30)

tz.normalize(tz.localize(non_existent))
#2015-03-29 03:30:00+02:00

tz.normalize(tz.localize(non_existent, is_dst=False))
#2015-03-29 03:30:00+02:00

pandas.Timestamp(non_existent).tz_localize(tz)
# NonExistentTimeError: 2015-03-29 02:30:00

pandas.Timestamp(non_existent).tz_localize(tz, ambiguous=0)
# NonExistentTimeError: 2015-03-29 02:30:00

It would be nice if the ambiguous argument worked the same as is_dst in pytz. As it is now, it's impossible afaik to reliably localize a series of datetimes using pandas, since any of them might cause a NonExistentTimeError and there's no way of guiding pandas what to do with such datetimes.

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.5.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-39-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.15.1
nose: 1.3.4
Cython: None
numpy: 1.9.1
scipy: None
statsmodels: None
IPython: 2.3.1
sphinx: 1.2.2
patsy: None
dateutil: 2.2
pytz: 2013.9
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: 0.9
apiclient: 1.3.1
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: 2.5.4 (dt dec pq3 ext)

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2014-11-28T11:40:15Z

cc @rockg @ischwabacher

rockg · 2014-11-28T12:38:36Z

ambiguous is only structured to differentiate duplicate times in the fall transition. It would be very easy to extend to the spring transition.

rockg · 2014-11-30T12:24:29Z

@mskrajnowski I have been thinking about this more and wonder if your data is really in a standard time zone and you should be localizing to that and then converting. For example, what does the data look like for that Spring DST change day? Are there 24 hours or just 23? Maybe the localizing issue is masking something else.

ischwabacher · 2014-12-01T15:40:09Z

It seems to me that the only overlap between the sorts of options that make sense for handling ambiguous times and the options that make sense for nonexistent times are NaT and raise. Here are the other possible behaviors I can come up with, though some of them are pretty wacky:

Choose the time at the jump. So Timestamp('2014-03-09 02:30:00').tz_localize('America/Chicago') is Timestamp('2014-03-09 03:00:00-0500', tz='America/Chicago').
Apply the (non-DST) offset before the discontinuity to the given time, then normalize (in the pytz sense). So Timestamp('2014-03-09 02:30:00').tz_localize('America/Chicago') becomes
Timestamp('2014-03-09 03:30:00-0500', tz='America/Chicago').
Apply the (DST) offset after the discontinuity to the given time, then normalize.
Apply the "before" offset, then don't normalize. This yields a nonexistent time, but would allow date_range to be emulated by repeated subtraction of an offset from a Timestamp. (So that for instance Timestamp('2014-03-10 02:30:00-0500', tz='America/Chicago') - Day() - Day() could equal Timestamp('2014-03-08 02:30:00-0600', tz='America/Chicago'). But this is a lot of crazy complexity for an invariant that I'm starting to think is a bad idea anyway.
Apply the "after" offset, then don't normalize. This is like the previous option but for repeated addition instead of subtraction.

How do these match up against the options for ambiguous time handling? Does the knob for nonexistent time handling need to be separate from the one for ambiguous times?

mskrajnowski · 2014-12-03T14:00:44Z

@rockg I'm implementing a scheduling application. The user defines working hours in his/her own timezone, then I'm combining the time provided by the user with dates, to get actual work start and end utc datetimes. I can't really forbid the user from setting 02:30 as his work start/end time (maybe he/she likes to wake up in the middle of the night and code ;) ), so I need a way to reliably localize any datetime. Even if technically a given time doesn't exist, I still need to output some logical utc datetime.

@ischwabacher I'd go with the way pytz handles non existent/ambiguous time.

ischwabacher · 2014-12-03T17:13:07Z

Unfortunately, pytz uses is_dst=True to mean option 5, is_dst=False to mean option 4, and is_dst=None to mean raise. This is partly due to pytz's workaround for limitations in the datetime API, which are tentatively scheduled to be fixed in python3.5 (woohoo!), so we will probably see pytz switch to behaviors 3 and 2, respectively.

One issue I have here is that is_dst=True returns a time that (once normalized) is not DST but would be the given time if it were DST, while is_dst=False returns one that is DST but would be the given time if it were not DST. I am not sure whether this is more or less confusing than swapping them.

Also, if your users set a time of 2:30 as a work start/end time, do you think your users will be least surprised by an alarm at 1:30, 3:00, or 3:30? Does it depend on whether it's a start or end time?

mskrajnowski · 2014-12-03T17:29:24Z

Since 2:00 becomes 3:00 in that transition, imho it's logical that 2:30 would become 3:30. That's what I get with pytz using localize and normalize (without passing is_dst). Of course, now the question is, what to do with a time interval of 2:30 - 3:00, which would become 3:30 - 3:00.
However, I'd still prefer that problem, because now I have better data to work with.

mskrajnowski · 2014-12-03T18:11:25Z

Another thing that would help is if we had an api which would enable us to fix ambiguous and non existent times. Maybe tz_localize could return NaT along with information, why a time wasn't localized? If I had such information, I could, for example , add an hour to non existent times and retry. At the moment tz_localize raises on first error, which isn't very helpful when working with series.

ischwabacher · 2014-12-03T23:08:21Z

I definitely think there should be an option to return NaT instead of raising, but it doesn't seem feasible to attach any other information to that unless you just want a warning, which we should not emit unless it's explicitly requested (since nonexistent times can arise without necessarily coming from a programmer error).

As far as defaults go, I think we should keep raise as the default (or possibly raise for single operations and NaT for vectorized operations if we can do that?) because "errors should not pass silently // unless explicitly silenced".

esvhd · 2017-06-23T21:29:07Z

Hi guys - Was this released in 0.20.2 yet? I'm seeing the same problem here.
thanks.

jorisvandenbossche · 2017-06-26T21:38:01Z

This issue is still open, so no, this has not been fixed in 0.20.2. Contributions are welcome.

randomgambit · 2017-07-17T16:55:54Z

@jorisvandenbossche @esvhd same problem here. Got some timestamps in central time, and try to localize in EST give me this error. Is there a quick and dirty fix for that? Thanks!

rockg · 2017-07-17T17:12:21Z

There should be nothing wrong converting to EST as that does not have DST offsets. Can you post your example?

In [10]: ts = pd.Timestamp("2017-03-12 02:30")

In [11]: ts.tz_localize("EST")
Out[11]: Timestamp('2017-03-12 02:30:00-0500', tz='EST')

randomgambit · 2017-07-17T17:18:23Z

@rockg No I mean I convert from Central time to EST.

df.timestamp.map(lambda x: x.tz_localize('US/Central', ambiguous = 'NaT').tz_convert('US/Eastern').tz_localize(None))

randomgambit · 2017-07-17T17:38:29Z

@jorisvandenbossche @mskrajnowski @rockg would this be the pytz- equivalent (and correct) way to convert from US/Central to US/Eastern in a Pandas dataframe?

Assume df.timestamp is a naive pd.to_datetime() timestamp

import pytz

tz_est = pytz.timezone('US/Eastern')
tz_central = pytz.timezone('US/Central')

df.timestamp.map(lambda x: tz_central.localize(x).astimezone(tz_est).tz_localize(None)))

randomgambit · 2017-07-18T01:26:43Z

@jorisvandenbossche @mskrajnowski @rockg any ideas? sorry for the spam but this is an important question in my humble opinion :D

jreback added Enhancement Timezones Timezone data dtype labels Nov 28, 2014

jreback added this to the 0.16.0 milestone Nov 28, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

mikofski mentioned this issue May 10, 2016

add timezone for package overview pvlib/pvlib-python#162

Merged

mroeschke mentioned this issue Sep 9, 2018

API/ENH: tz_localize handling of nonexistent times: rename keyword + add shift option #22644

Merged

4 tasks

mroeschke modified the milestones: Contributions Welcome, 0.24.0 Sep 11, 2018

jreback closed this as completed in #22644 Oct 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timestamp.tz_localize() NonExistentTimeError handling #8917

Timestamp.tz_localize() NonExistentTimeError handling #8917

mskrajnowski commented Nov 28, 2014

jorisvandenbossche commented Nov 28, 2014

rockg commented Nov 28, 2014

rockg commented Nov 30, 2014

ischwabacher commented Dec 1, 2014

mskrajnowski commented Dec 3, 2014

ischwabacher commented Dec 3, 2014

mskrajnowski commented Dec 3, 2014

mskrajnowski commented Dec 3, 2014

ischwabacher commented Dec 3, 2014

esvhd commented Jun 23, 2017

jorisvandenbossche commented Jun 26, 2017

randomgambit commented Jul 17, 2017

rockg commented Jul 17, 2017

randomgambit commented Jul 17, 2017

randomgambit commented Jul 17, 2017 •

edited

randomgambit commented Jul 18, 2017

Timestamp.tz_localize() NonExistentTimeError handling #8917

Timestamp.tz_localize() NonExistentTimeError handling #8917

Comments

mskrajnowski commented Nov 28, 2014

jorisvandenbossche commented Nov 28, 2014

rockg commented Nov 28, 2014

rockg commented Nov 30, 2014

ischwabacher commented Dec 1, 2014

mskrajnowski commented Dec 3, 2014

ischwabacher commented Dec 3, 2014

mskrajnowski commented Dec 3, 2014

mskrajnowski commented Dec 3, 2014

ischwabacher commented Dec 3, 2014

esvhd commented Jun 23, 2017

jorisvandenbossche commented Jun 26, 2017

randomgambit commented Jul 17, 2017

rockg commented Jul 17, 2017

randomgambit commented Jul 17, 2017

randomgambit commented Jul 17, 2017 • edited

randomgambit commented Jul 18, 2017

randomgambit commented Jul 17, 2017 •

edited