Time zone awareness in perspective-python #867
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements and tests
perspective-python
's time zone semantics, and offers an explanation to how Perspective handles time zones.The Problem
In the browser use case, time zones do not pose a problem as all times are localized to the browser's time zone. When running
perspective-python
on the server, however, the server may not be in the same time zone as the client, and time zone handling must be defined.Currently,
perspective-python
makes the assumption that alldatetime
andTimestamp
objects are defined in local time, and thetzinfo
attribute is ignored. Internally, the C++ engine stores datetime values as Unix timestamps in milliseconds since epoch, and the conversion fromdatetime
to Unix timestamps is not time zone aware.When datetime values are serialized through the use of
to_dict
,to_records
, etc., PyBind treats the Unix timestamp as local time, and the resultingdatetime
object that is created has notzinfo
attribute and is in local time.When the data from Python is consumed by a
perspective-viewer
in the browser, it is serialized back to a Unix timestamp before being sent over the network, and then the browser callsnew Date()
on the timestamp value to create the final representation inside the browser.Because
new Date()
treats the timestamp as local time on the browser, which could be different than local time on the server, there could exist a difference between the values passed intoperspective-python
and the values a user sees inperspective-viewer
, made more difficult by the unclear and unspecified semantics around datetime and time zone handling. This PR attempts to codify time zone semantics and provide an explanation to the behavior.The Solution
This PR modifies the date validator in Python by converting all time-zone aware
datetime
andTimestamp
objects (or any object passed in that has thetzinfo
attribute set) to UTC before storing the timestamp into Perspective. Naive datetimes are assumed to be in local time, and will be processed as-is without conversion.When the timestamp is serialized, it is converted (via Pybind) to local time as determined by the Python runtime.
perspective-viewer
will continue to treatnew Date()
in local time as determined by the browser.This implementation does not change the behavior of
perspective-python
for naive datetimes, but it allows us to use aware datetimes and view them in local time:The same datetime, but as a naive datetime:
Caveat: Pandas DataFrames
For Pandas DataFrames, any
datetime64
columns will always be treated as UTC and always serialized from Perspective in local time. For timestamps to be serialized exactly as they were entered, usetz_localize
ortz_convert
fromPandas
in order to localize the Timestamps into your local timezone:Changelog
datetime
andpandas.Timestamp
objects to UTCpandas.Timestamps
for all of the above