New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand datetime
support
#220
Comments
cc @provinzkraut for thoughts, since you've been providing some good feedback lately. Feel free to ignore if this issue doesn't affect your use case. |
Also cc @adriangb for thoughts ^^. |
After thinking about this throughout the day, I'm now wondering if a simpler don't-try-to-stop-the-user-from-shooting-themselves-in-the-foot solution may be better.
Pros: simpler. No encoding configuration needed. Decoding by default roundtrips all types, and users have the flexibility to mandate aware types if needed (usually this is needed). Cons: easy to shoot yourself in the foot by using Still not sure what the best option is, or if this solution is better than the one above. Just something I'm thinking about. Feedback on any of this very welcome. |
So, in general, this is very welcome! Broadened support for Regarding the type vs. As for the footgun. I think dropping the timezone by default isn't good, as it's very counterintuitive. If I put a timezone aware |
Thanks for the feedback!
I think there's some confusion between the options presented above - I'm not advocating for dropping the timezone at all. Apologies, I should have done a clearer writeup of what the design options and questions are. There are 4 types of objects that I'm not clear on how we should handle: import datetime
# 1. Timezone Aware Datetime: "2022-12-03T11:42:31.123+00:00"
tz_aware_datetime = datetime.datetime(
2022, 12, 3, 11, 42, 31, 123, datetime.timezone.utc
)
# 2. Timezone Naive Datetime: "2022-12-03T11:42:31.123"
tz_naive_datetime = datetime.datetime(2022, 12, 3, 11, 42, 31, 123)
# 3. Timezone Aware Time: "11:42:31.123+00:00"
tz_aware_time = datetime.time(11, 42, 31, 123, datetime.timezone.utc)
# 4. Timezone Naive Time: "11:42:31.123"
tz_naive_time = datetime.time(11, 42, 31, 123) Currently only the first (timezone aware datetimes) is supported for encoding and decoding. The rest can be handled through custom The reason we don't support naive datetimes currently is that RFC3339 (the RFC we follow for our datetime encoding format) doesn't support datetimes without a timezone component. Neither does
On the decoding side the consuming code often is written to only work with aware (or naive) datetime objects. As such, I'm a bit reluctant to make the On the other hand, a user might be confused why a Given all that info, the open questions are:
|
|
I think it was me who was being unclear actually, and I misread something 🙃 I think in general, it would be the most intuitive approach if you get out what you put in, that is, if the input includes a timezone, the output should include a timezone. If the input doesn't include a timezone, the output shouldn't include a timezone. This is the least confusing behaviour in my opinion. Regarding the open questions this would mean:
Naive objects should be supported by default. They should not include a timezone component. This mimics the behaviour of
Naive objects should be supported by default. They should not include a timezone component. This mimics the behaviour of
Both naive and aware should be supported. If the input data includes a timezone, the result should be an aware
Both naive and aware should be supported. If the input data includes a timezone, the result should be an aware Handling of constraints Further constraints could be addressed with
|
I think there's an option missing that only accepts naive objects and errors if any timezone info is included. I would also use literal strings or an enum instead of True/False.
To support different behaviors for encoding/decoding you could have For what it's worth I believe Pydantic just encodes/decodes naive/aware by default and offers no options (other than custom validators). I don't think I've heard many complaints in this department, despite the possible footgun. It's not uncommon for things to pass around naive timezones and assume they're UTC so I think (hope maybe) that most developers are used to double checking timestamps for timezone handling. |
Thanks y'all, I think this all makes sense, and mostly matches the option presented in #220 (comment). This feedback is really helpful. One last open question - since datetimes are common, I want to provide some type aliases so the user doesn't need to write
Not critical, just curious what names seem clearest. All of these would be implemented using python type hint features, from typing import TypeVar, Annotated
from msgspec import Meta
T = TypeVar("T")
Aware = Annotated[T, Meta(tz=True)] |
Ok, this is mostly done in #224. Added a new Remaining todos:
|
Closing this now that |
Currently msgspec supports encoding and decoding only timezone-aware
datetime.datetime
objects, holding strict conformance to RFC3339. Naivedatetime.datetime
objects can be encoded using a customenc_hook
, but there's no way to decode a naivedatetime.datetime
object.I would like to expand our builtin support for
datetime
types to include:datetime.datetime
(both aware and naive)datetime.date
datetime.time
(both aware and naive)Here's the plan I've come up with:
Encoding
To support encoding, we add a
support_naive_datetimes
keyword argument tomsgspec.*.decode
andmsgspec.*.Decoder
to configure the treatment of naive datetimes. This would take one of:False
: the default. Naivedatetime
andtime
objects error on encoding.True
: allow encoding naivedatetime
andtime
objects. These will be encoded as their RFC3339 compatible counterparts, just missing theoffset
component"UTC"
: naivedatetime
andtime
objects will be treated as if they have a UTC timezone.tzinfo
object: naivedatetime
andtime
objects will be treated as if they have this timezone.I'm not attached to the keyword name (or boolean options), so if someone can think of a nicer spelling I'd be happy. I think this supports all the common options.
One benefit of supporting these options builtin is that we no longer have the weird behavior of
enc_hook
only being called for naivedatetime.datetime
objects. This would admittedly be less weird if Python had different types for aware and naive datetimes.I could hear an argument that the default should be
True
(encoding naive datetimes/times by default), but I'm hesitant to make that change. Having an error by default if you're using a naive datetime will force users to think about timezones early on - if they really want a naive datetime they can explicitly opt into it. Supporting naive datetimes/times by default could let programming errors slip by, since most times the user does want an aware datetime rather than a naive datetime.Decoding
To support decoding, we want to handle the following use cases:
Since
msgspec
will only ever decode an object into a datetime if type information is provided, then the natural place to enable this configuration is through our existing type annotations system. The question then is - what does an unannotateddatetime.datetime
mean?I want
msgspec
to make it easy to do the right thing, and (within reason) possible to do the flexible thing. As such, I'd argue that rawdatetime.datetime
anddatetime.time
annotations should only decode timezone-aware objects. This means that by default APIs built with msgspec are compatible withjson-schema
(which lacks a naive datetime/time format), and common web languages like golang (which requires RFC3339 compatible strings in JSON by default).To support naive-datetime or any-datetime types, we'd add a new config to
Meta
annotations. Something like:Like above, I don't love the
timezone=True
(aware),timezone=False
(naive),timezone=None
(aware or naive) syntax, if anyone can think of a better API spelling please let me know.We could also add type aliases in a new submodule
msgspec.types
to make this easier to spell (since datetimes are common):Msgpack Complications
Currently we use msgpack's timestamp extension (https://github.com/msgpack/msgpack/blob/master/spec.md#timestamp-extension-type) when encoding datetimes to msgpack. This extension by design only supports timezone-aware datetimes.
msgpack
has no standard representation for naive datetimes (or time/date objects in general). To handle this, I plan to encode naive datetimes as strings in the same format as JSON. This is an edge case that I don't expect to affect most users. I think the main benefit of supporting it is parity between types supported by both protocols.The text was updated successfully, but these errors were encountered: