Consider options for accepting UTC datetimes #145

Closed
thobbs opened this Issue May 23, 2012 · 5 comments

Projects

None yet

3 participants

@thobbs
pycassa member

Rather than always using time.mktime(datetime.timetuple()), which assumes the local timezone when converting, this needs to be configurable with an easy way to assume UTC datetimes.

@spladug

Perhaps honoring the tzinfo could work? Then all times could be converted to UTC for storage and when demarshalled have a choice of generating the naive datetime or an aware one with tzinfo=<UTC>. I'm running into this right now and would be happy to hack this up if that sounds appropriate.

@spladug

For your consideration, two patches with slightly different approaches.

Option one: maintain compatibility with the existing interface, while
quietly doing The Right Thing for timezone-aware datetime objects.
This assumes naive datetimes are in local time and deserializes to a
naive local-time datetime.

diff --git a/pycassa/marshal.py b/pycassa/marshal.py
index d935b67..b652166 100644
--- a/pycassa/marshal.py
+++ b/pycassa/marshal.py
@@ -6,6 +6,7 @@ in Cassandra.
 import uuid
 import time
 import struct
+import calendar
 from datetime import datetime

 import pycassa.util as util
@@ -64,7 +65,10 @@ def _get_composite_name(typestr):
 def _to_timestamp(v):
     # Expects Value to be either date or datetime
     try:
-        converted = time.mktime(v.timetuple())
+        if v.tzinfo:
+            converted = calendar.timegm(v.utctimetuple())
+        else:
+            converted = time.mktime(v.timetuple())
         converted = converted * 1e3 + getattr(v, 'microsecond', 0)/1e3
     except AttributeError:
         # Ints and floats are valid timestamps too

Option two: break compatibility and assume that naive datetime objects
are in UTC, and do the right thing for aware objects. Deserialization
would generate naive datetimes in UTC.

diff --git a/pycassa/marshal.py b/pycassa/marshal.py
index d935b67..1b9aec4 100644
--- a/pycassa/marshal.py
+++ b/pycassa/marshal.py
@@ -4,8 +4,8 @@ in Cassandra.
 """

 import uuid
-import time
 import struct
+import calendar
 from datetime import datetime

 import pycassa.util as util
@@ -64,7 +64,7 @@ def _get_composite_name(typestr):
 def _to_timestamp(v):
     # Expects Value to be either date or datetime
     try:
-        converted = time.mktime(v.timetuple())
+        converted = calendar.timegm(v.utctimetuple())
         converted = converted * 1e3 + getattr(v, 'microsecond', 0)/1e3
     except AttributeError:
         # Ints and floats are valid timestamps too
@@ -269,10 +269,10 @@ def unpacker_for(typestr):

     elif data_type == 'DateType':
         if _have_struct:
-            return lambda v: datetime.fromtimestamp(
+            return lambda v: datetime.utcfromtimestamp(
                     _long_packer.unpack(v)[0] / 1e3)
         else:
-            return lambda v: datetime.fromtimestamp(
+            return lambda v: datetime.utcfromtimestamp(
                     struct.unpack('>q', v)[0] / 1e3)

     elif data_type == 'BooleanType':

I'd personally prefer the latter for my use case since I'd be able to just
do value.replace(tzinfo=pytz.UTC) rather than converting back from local
time to UTC, but thought it may be more important not to break clients.

@cmcmaugh

I think the 2nd approach is preferable for a number of reasons:

  • best practice for applications is to use UTC internally and convert to local time as close to presentation as possible
  • pycassa is predominantly used in server applications and server location has no real relationship to where the application is used, and correct default behaviour should not depend on the server's timezone being set to UTC.
  • the defaulting to local time for naive datetimes is particularly insidious if you live in Ireland/UK timezone where localtime is the same as UTC for half the year
@thobbs
pycassa member

Thanks for the patches and thoughts, I really appreciate it.

I definitely agree that keeping datetimes in UTC as long as possible is ideal. I was hesitant to break backwards compatibility, but dealing with timezones can be such a pain that I think it might be worth it. So, I agree that we should go for the second approach.

Do you think there would be any value in seeing if pytz is available and setting the tzinfo to pytz.utc when unpacking? At least in that case, it would be clear what the datetime is and there would be less manipulation needed for conversion later on. Perhaps just including the UTC implementation from pytz and always using that would be a better option, instead.

Thoughts?

@spladug

Yay, I like option 2 a lot more :)

It sounds fair to me to use pytz.UTC when it's available. The UTC object is a singleton that does some extra work to unpickle to the same object, so it may make sense not to copy that code. That said, other than the pickling stuff it's the same as the one in the datetime docs, so maybe it's fine to copy in?

@thobbs thobbs closed this May 29, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment