-
-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Off-by-one microsecond when parsing with certain timezones #74
Comments
It's not limited to >>> import pendulum
>>> pendulum.create(2016, 11, 12, 2, 9, 39, 594000, 'America/Panama').microsecond
593999 After checking, it seems that the wrong value is due to the C extension to calculate offsets. If you install
|
It seems the difference between the Python and C versions is that Python rounds the value before casting to int which the C version does not. C microsecond = (int64_t) (unix_time * 1000000) % 1000000;
if (microsecond < 0) {
microsecond += 1000000;
} Python microsecond = int(round(unix_time % 1, 6) * 1e6) Using the >>> 1.594000 % 1
0.5940000000000001
>>> 123.594000 % 1
0.5939999999999941 and without the >>> int((123.594000 % 1 ) * 1e6)
593999
>>> int(round(123.594000 % 1, 6) * 1e6)
594000 |
As described in sdispater#74 there is a difference in the behavior of the C extension and Python implementation in terms of handling microseconds. The problem is that a microsecond value passed in may have an off-by-one difference for certain values due to how floating point numbers are handled, unless rounding is performed. >>> import pendulum >>> pendulum.create(2016, 11, 12, 2, 9, 39, 594000, 'America/Panama').microsecond 593999 This commit adds the same rounding behavior to the C extension as used in the Python version.
As described in sdispater#74 there is a difference in the behavior of the C extension and Python implementation in terms of handling microseconds. The problem is that a microsecond value passed in may have an off-by-one difference for certain values due to how floating point numbers are handled, unless rounding is performed. >>> import pendulum >>> pendulum.create(2016, 11, 12, 2, 9, 39, 594000, 'America/Panama').microsecond 593999 This commit adds the same rounding behavior to the C extension as used in the Python version.
As @dokai pointed, it's a floating point truncating error, but I think this has something similar to what I described in #71. Always rounding instead of truncating would be a workaround to the issue, but I think Pendulum should never use floating point numbers internally, unless the result itself has to be a float number. In this case, it goes back to the timezone unix_time = tr.unix_time - (tr.pre_time - dt).total_seconds()
unix_time = tr.unix_time + (dt - tr.time).total_seconds() The method The given example: >>> import pendulum
>>> from datetime import datetime
>>> dt = datetime(2016, 11, 12, 2, 9, 39, 594000)
>>> tz = pendulum.timezone("America/Panama")
>>> tr = tz.transitions[-1] This method would be internally called on creation: >>> tz._normalize(dt, "post")
(2016, 11, 12, 2, 9, 39, 593999, <TimezoneInfo [America/Panama, -18000, False]>) And it does this: >>> unix_time = tr.unix_time + (dt - tr.time).total_seconds()
>>> offset = tz._tzinfos[tr._transition_type_index].offset
>>> pendulum._extensions._helpers.local_time(unix_time, offset)
(2016, 11, 12, 2, 9, 39, 593999) But for very high year values, the microsecond is simply lost (i.e., the >>> dt = datetime(2316, 11, 12, 2, 9, 39, 857)
>>> unix_time = tr.unix_time + (dt - tr.time).total_seconds()
>>> "%.18f" % unix_time # Rounding/truncating wouldn't be enough
'10945955379.000856399536132812'
>>> dt = datetime(2222, 11, 12, 2, 9, 39, 1454)
>>> unix_time = tr.unix_time + (dt - tr.time).total_seconds()
>>> "%.18f" % unix_time # Neither rounding nor truncating would do it, again
'7979584179.001453399658203125'
>>> dt = datetime(2180, 7, 4, 13, 16, 8, 12)
>>> unix_time = tr.unix_time + (dt - tr.time).total_seconds()
>>> "%.18f" % unix_time
'6643016168.000011444091796875' |
@danilobellini I agree that working with floating point numbers are prone to error and approximations. I will check if I can cook up a better implementation to be sure we don't lose information. |
Commit 184b94a on the >>> import pendulum
>>> dt = pendulum.parse('2016-11-12T02:09:39.594000', 'America/Panama')
>>> dt.isoformat()
'2016-11-12T02:09:39.594000-05:00'
>>> dt.microsecond
594000 >>> dt = pendulum.create(2316, 11, 12, 2, 9, 39, 857, 'America/Panama')
>>> dt.isoformat()
'2316-11-12T02:09:39.000857-05:00'
>>> dt.microsecond
857 Basically, microseconds are now treated separately to avoid having to round the value. |
I found a curious behavior when parsing timestamps where the microsecond value is off by one depending on the timezone used.
As a minimal example I'm attempting to parse the string
2016-11-12T02:09:39.594000
.Parsing with the default timezone works as excepted:
Using a particular timezone shows the off-by-one difference.
Several timezones (but not all by any means) have this behavior
I don't think anything in the particular choice of time or timezone should result in a difference of a single microsecond?
The text was updated successfully, but these errors were encountered: