Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to add Timestamp predefined ext type: -1 #207

Closed
wants to merge 1 commit into from

Conversation

tagomoris
Copy link
Member

In my opinion, Timestamp is likely most common data for many workload, and many language/programming environments provide feature to represent time (by Class or something such that).
I propose to add Time as first predefined ext format, to support to serialize/deserialize time objects more easily than ever.

Many programming languages have different precision about time, but some of these use nanosecond. So nanosecond precision looks enough to satisfy these requirement.

@tagomoris
Copy link
Member Author

@frsyuki ping?

@frsyuki
Copy link
Member

frsyuki commented Dec 22, 2015

I agree that it is good idea to add timestamp type.
I checked timestamp support of some programming environments:

  • Java 8 java.time.Instant: nanoseconds
  • Ruby Time: nanoseconds
  • Python datetime: microseconds
  • Swift / Objective-C NSDate: nanoseconds
  • C# DateTime: 100 nanoseconds
  • PHP localtime: seconds, microtime: microseconds
  • C/UNIX gettimeofday(3): nanoseconds
  • JavaScript Date: milliseconds
  • Erlang & Elixir erlang.timestamp: microseconds
  • D std.datetime: 100 nanoseconds
  • PostgreSQL TIMESTAMP: microseconds
  • MySQL TIMESTAMP: microseconds
  • Oracle TIMESTAMP: nanoseconds
  • SQL Server DATETIME2: 100 nanoseconds

I think having 1-nanosecond precision is the best.

About timezone, I think it is not good idea to include timezone information with timestamp. Timestamp can represent an instant point, and time zone will be an additional information to represent the timestamp using a calendar. Therefore applications should be able to choose storing only a timestamp or both timestamp and time zone information because not all applications need to represent a timestamp using a calendar.

About format, I expect 3 use cases:

  • a) An application doesn't care about milliseconds. In this case, application wants to store seconds as small as possible.
  • An application wants to store milliseconds or more precision:
    • b) the time is reasonably recent (around 2015-12-22)
    • c) the time is not recent (older than 1970-01-01 or newer than 2900-01-01...)

To support 3 use cases, I created a proposal at #209.

@ludocode
Copy link

I am so glad you decided to go against the mess of bitflags and variable lengths that seemed to be winning in #130. I was really not on board with that, and it's so unlike the rest of MessagePack.

I'm generally opposed to adding a bunch of new types to MessagePack. Each new type increases the complexity of decoding libraries and of the code that uses them, and is a new avenue for bugs and security issues. The current supported types are good because they are used by the majority of users. Everybody needs maps, arrays, strings and ints. Most people need floats. Lots of people need binary blobs (as evidenced by the preponderence of base64 strings in JSON.)

With datetime, timestamps, bigintegers, and so on, we're starting to get into the territory of niche types for very specific use cases. This is especially problematic when these can be trivially put in a custom ext type by users, or even just an ordinary string, like an ISO 8601-encoded date. So, since the very good str/bin/ext split has passed, I'm glad MessagePack has been avoiding new types.

Handling of dates and timezones is insanely complicated, more complicated than all of MessagePack, so it was weird seeing the author of Joda-Time suggesting moving all of this complexity into the protocol instead of handling it at a higher level from a more opaque type. This is really tangential to structured data serialization. If you want all of this data, just put it in an ISO 8601 string.

The timestamp proposal in #209 though is simple and straightforward enough that I can get behind it. I like having explicit ext sizes to discriminate precision rather than letting it be variable, and the result is extremely efficient, since second-precision timestamps will be six bytes and (virtually) all nanosecond timestamps will be ten bytes.

I'm somewhat torn on the issue of timezones, and I'd also like to err on the side of leaving it out, especially since this proposal is forward compatible: since this defines sizes 4, 8 and 12, we could always later define sizes 6, 10 and 14 which just add the timezone.

One possibility we should consider is to make the 34-bit seconds field signed instead of unsigned in timestamp64. With the current proposal there's no way to represent a time before 1970 in less than timestamp96 precision. Making it signed would allow timestamp64 to represent dates from 1698-2242 instead of 1970-2514. It would be easy to implement it as signed if the seconds were placed above the nanoseconds instead of below, since downshifting from signed would extend the sign. (The more I think about it the more this seems like a bad idea, but it's something we should probably discuss at least.)

Anyway I like the idea of standardizing a simple nanosecond timestamp, and I can see some immediate uses for it in my own projects. I'm on board, and if #209 is accepted you can count on MPack implementing it 👍

@sonots
Copy link

sonots commented Jan 27, 2016

Any progress on this? I want to use Timestamp type for embulk's JSON type (Internally, it is a Msgpack) ref. embulk/embulk#417

@jodastephen
Copy link
Contributor

jodastephen commented Jan 16, 2017

As I noted in the other thread, representing date and time as a simple timestamp is next to useless, as it doesn't capture the semantic meaning of the data. However, it is entirely reasonable to question whether MsgPack wants to get into this.

A newish format Temporenc has appeared (not to do with me, still Alpha I think), which provides for a compact representation of date/time. Its not perfect (as it doesn't have support for min/max) but would work well with MsgPack - ie. MsgPack would simply decide on an extension code for Temporenc, and make no further changes.

@tagomoris
Copy link
Member Author

Timestamp is now included in spec.

@tagomoris tagomoris closed this Sep 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants