Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time zone awareness in perspective-python #867

Merged
merged 3 commits into from Jan 15, 2020
Merged

Time zone awareness in perspective-python #867

merged 3 commits into from Jan 15, 2020

Conversation

sc1f
Copy link
Collaborator

@sc1f sc1f commented Jan 10, 2020

This PR implements and tests perspective-python's time zone semantics, and offers an explanation to how Perspective handles time zones.

The Problem

In the browser use case, time zones do not pose a problem as all times are localized to the browser's time zone. When running perspective-python on the server, however, the server may not be in the same time zone as the client, and time zone handling must be defined.

Currently, perspective-python makes the assumption that all datetime and Timestamp objects are defined in local time, and the tzinfo attribute is ignored. Internally, the C++ engine stores datetime values as Unix timestamps in milliseconds since epoch, and the conversion from datetime to Unix timestamps is not time zone aware.

When datetime values are serialized through the use of to_dict, to_records, etc., PyBind treats the Unix timestamp as local time, and the resulting datetime object that is created has no tzinfo attribute and is in local time.

When the data from Python is consumed by a perspective-viewer in the browser, it is serialized back to a Unix timestamp before being sent over the network, and then the browser calls new Date() on the timestamp value to create the final representation inside the browser.

Because new Date() treats the timestamp as local time on the browser, which could be different than local time on the server, there could exist a difference between the values passed into perspective-python and the values a user sees in perspective-viewer, made more difficult by the unclear and unspecified semantics around datetime and time zone handling. This PR attempts to codify time zone semantics and provide an explanation to the behavior.

The Solution

This PR modifies the date validator in Python by converting all time-zone aware datetime and Timestamp objects (or any object passed in that has the tzinfo attribute set) to UTC before storing the timestamp into Perspective. Naive datetimes are assumed to be in local time, and will be processed as-is without conversion.

When the timestamp is serialized, it is converted (via Pybind) to local time as determined by the Python runtime. perspective-viewer will continue to treat new Date() in local time as determined by the browser.

This implementation does not change the behavior of perspective-python for naive datetimes, but it allows us to use aware datetimes and view them in local time:

import pytz, datetime, perspective
pst = pytz.timezone("America/Los_Angeles")
# 8AM PST
data = {"time": [pst.localize(datetime.datetime(2020, 1, 10, 8, 30, 45))]}
table = perspective.Table(data)
print(table.view().to_dict())
# 11AM EST (my local time)
# {"time": [datetime.datetime(2020, 1, 10, 11, 30, 45)]}

The same datetime, but as a naive datetime:

import datetime, perspective
# 8AM local time
data = {"time": [datetime.datetime(2020, 1, 10, 8, 30, 45)]}
table = perspective.Table(data)
print(table.view().to_dict())
# 8AM EST (local time)
# {"time": [datetime.datetime(2020, 1, 10, 8, 30, 45)]}

Caveat: Pandas DataFrames

For Pandas DataFrames, any datetime64 columns will always be treated as UTC and always serialized from Perspective in local time. For timestamps to be serialized exactly as they were entered, use tz_localize or tz_convert from Pandas in order to localize the Timestamps into your local timezone:

from datetime import datetime
import pandas, pytz, perspective

EST = pytz.timezone("US/Eastern")

# Localized to EST, my local time
data = pandas.DataFrame({
  "time": [pandas.Timestamp(datetime(2019, 3, 9, 12, 30)).tz_localize(EST)]
})

table = perspective.Table(data)
print(table.view().to_records())
# {"time": [datetime(2019, 3, 9, 12, 30)]}

Changelog

  • Converts aware datetime and pandas.Timestamp objects to UTC
  • Documents time zone semantics within Python and C++ code
  • Comprehensive test suite asserting correctness for:
    • pytz and dateutil.tz timezones
    • Non-conversion of naive datetimes, and conversion of aware datetimes
    • Conversion of datetimes from one timezone to another, including DST correctness
    • Conversion of pandas.Timestamps for all of the above

Copy link
Member

@texodus texodus left a comment

Thanks for the PR! Awesome work and great write up!

@texodus texodus merged commit 3dfcdfe into master Jan 15, 2020
6 checks passed
@texodus texodus added bug Python labels Jan 27, 2020
@texodus texodus deleted the tz branch Sep 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug cla-present Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants