New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for timestamp type #209

Merged
merged 3 commits into from Aug 10, 2017

Conversation

Projects
None yet
7 participants
@frsyuki
Member

frsyuki commented Dec 22, 2015

Follows up #130 and #207.
See #207 for discussion.

* Timestamp 32 format can represent a timestamp in [1970-01-01 00:00:00 UTC, 2106-02-07 06:28:16 UTC) range. Nanoseconds part is 0.
* Timestamp 64 format can represent a timestamp in [1970-01-01 00:00:00.000000000 UTC, 2514-05-30 01:53:04.000000000 UTC) range.
* Timestamp 96 format can represent a timestamp in [-584554047284-02-23 16:59:44 UTC, 584554051223-11-09 07:00:16.000000000 UTC) range.
* In timestamp 64 and timestamp 96 formats, nanoseconds must not be larger than 999999999.

This comment has been minimized.

@ludocode

ludocode Dec 22, 2015

Interesting. This line seems to be the first rule regarding "canonical" representation of data when multiple representations would otherwise be available. None of the other types have such restrictions. For example a uint16 doesn't say the value must be at least 256 (since if it's less than that, it could have been represented by a uint8.)

Not that I'm complaining. I do think being strict about representation is a good thing, and I'd love it if the spec were strict everywhere else whenever multiple representations are available. UTF-8 forbids overlong sequences after all. (Although looking at the Protocol Buffers spec, it doesn't explicitly forbid overlong varints. I'm not sure why. Maybe it's unimportant.)

@ludocode

ludocode Dec 22, 2015

Interesting. This line seems to be the first rule regarding "canonical" representation of data when multiple representations would otherwise be available. None of the other types have such restrictions. For example a uint16 doesn't say the value must be at least 256 (since if it's less than that, it could have been represented by a uint8.)

Not that I'm complaining. I do think being strict about representation is a good thing, and I'd love it if the spec were strict everywhere else whenever multiple representations are available. UTF-8 forbids overlong sequences after all. (Although looking at the Protocol Buffers spec, it doesn't explicitly forbid overlong varints. I'm not sure why. Maybe it's unimportant.)

This comment has been minimized.

@frsyuki

frsyuki Dec 22, 2015

Member

I added this restriction specifically because deserialization becomes unexpectedly complicated if nanosecond part includes carry over, and following this restriction doesn't make serialization code complicated.

@frsyuki

frsyuki Dec 22, 2015

Member

I added this restriction specifically because deserialization becomes unexpectedly complicated if nanosecond part includes carry over, and following this restriction doesn't make serialization code complicated.

@vmihailenco

This comment has been minimized.

Show comment
Hide comment
@vmihailenco

vmihailenco Feb 24, 2016

This proposal looks good and I would like to implement it for https://github.com/vmihailenco/msgpack. Any chance it is going to be merged/changed soon?

vmihailenco commented Feb 24, 2016

This proposal looks good and I would like to implement it for https://github.com/vmihailenco/msgpack. Any chance it is going to be merged/changed soon?

@tagomoris

This comment has been minimized.

Show comment
Hide comment
@tagomoris

tagomoris Feb 24, 2016

Member

@vmihailenco msgpack organization doesn't have golang implementation, and msgpack.org introduces ugorji/go/codec. AFAIK, msgpack organization want not to increase # of repositories under itself.
Of course, you can implement anything on your own repository, but it's off topic for this pull-request.

Member

tagomoris commented Feb 24, 2016

@vmihailenco msgpack organization doesn't have golang implementation, and msgpack.org introduces ugorji/go/codec. AFAIK, msgpack organization want not to increase # of repositories under itself.
Of course, you can implement anything on your own repository, but it's off topic for this pull-request.

@vmihailenco

This comment has been minimized.

Show comment
Hide comment
@vmihailenco

vmihailenco Feb 25, 2016

I am asking if this proposal going to be merged. Reference to the repository is just a proof that I am really interested in it.

vmihailenco commented Feb 25, 2016

I am asking if this proposal going to be merged. Reference to the repository is just a proof that I am really interested in it.

@tagomoris

This comment has been minimized.

Show comment
Hide comment
@tagomoris

tagomoris Feb 25, 2016

Member

I got it. Sorry for misunderstanding.

Member

tagomoris commented Feb 25, 2016

I got it. Sorry for misunderstanding.

@tagomoris

This comment has been minimized.

Show comment
Hide comment
@tagomoris

tagomoris Feb 25, 2016

Member

IMO, we can merge this new spec because there's no objections...

Member

tagomoris commented Feb 25, 2016

IMO, we can merge this new spec because there's no objections...

@drewnoakes

This comment has been minimized.

Show comment
Hide comment
@drewnoakes

drewnoakes Nov 17, 2016

+1 for merging this. Looks solid and very useful.

It'd be useful to know the process, if any, for evolving the spec. Publishing such details would build confidence in using the MsgPack format.

drewnoakes commented Nov 17, 2016

+1 for merging this. Looks solid and very useful.

It'd be useful to know the process, if any, for evolving the spec. Publishing such details would build confidence in using the MsgPack format.

@tagomoris

This comment has been minimized.

Show comment
Hide comment
@tagomoris
Member

tagomoris commented Nov 17, 2016

@PotterDai

This comment has been minimized.

Show comment
Hide comment
@PotterDai

PotterDai Apr 25, 2017

Will this be merged?

PotterDai commented Apr 25, 2017

Will this be merged?

Show outdated Hide outdated spec.md

@frsyuki frsyuki merged commit 40e3d3d into master Aug 10, 2017

@frsyuki frsyuki deleted the extension-timestamp branch Aug 10, 2017

@tagomoris

This comment has been minimized.

Show comment
Hide comment
@tagomoris

tagomoris Aug 10, 2017

Member

🎉

Member

tagomoris commented Aug 10, 2017

🎉

@ludocode

This comment has been minimized.

Show comment
Hide comment
@ludocode

ludocode Sep 27, 2017

When de-serializing these new timestamps, does it make sense to consider an extension of type -1 with unrecognized length an error?

I'm currently implementing this in MPack and I'm trying to decide how to handle this. The spec doesn't explicitly say that such data should be considered invalid, but the pseudocode in the spec does say "error" if the length is not 4, 8 or 12. So I'm guessing raising an error is the expected behaviour, and judging by the implementors linking to this issue so far, this seems to be the consensus.

It seems to me that this is a bit of a problem though because data that was previously considered valid, like say an extension of type -1 and length 7, is now considered invalid and will flag errors in the newest parsers. Although negative extension types were reserved, this still feels like a backwards-incompatible tightening of the rules. It seems to contradict the new description of extensions which is that it is the application (not the library) that gets to decide whether to treat it as opaque or reject it as invalid.

I had considered reporting an extension of type -1 as a timestamp if it's parseable (i.e. the length is 4, 8 or 12 and the nanoseconds are in bounds), and otherwise just reporting it as an opaque extension object like any other instead of raising an error. Extension functions (like accessing the raw data) would still be available for extensions of type -1 regardless of length and regardless of whether it's recognized as a timestamp. This way if someone for whatever reason was using -1, this change would not break their code or data.

I suppose we can just say that nobody should have been using an extension of type -1 in the first place. So maybe it doesn't matter and I'm worrying over nothing. Still, as far as I know most libraries do not differentiate between reserved extensions and application-defined extensions (at least not through anything more than documentation.) So there's nothing stopping a user from accidentally choosing a reserved type and having it break later when libraries start rejecting their data.

For this reason I'm starting to think that maybe libraries and the spec should completely differentiate between reserved extensions and application-defined extensions. I may decide to handle reserved extensions as an entirely separate type within MPack, that way if users use them, it will be obvious that they may break in the future and they're on their own.

Something else to keep in mind with raising errors on unknown lengths is that this prevents us from inserting new lengths for timestamp in the future. For example lots of people wanted timezones, so I had previously suggested that we could extend this later to add one or two bytes to each length to specify it. This is actually not possible because it won't be backwards compatible. Existing libraries will report a parsing error (probably discarding the whole message) when they encounter a timestamp with the new lengths. So this means that this definition of timestamps is frozen forever: no other lengths can be added, and any change or addition will have to be introduced as a different extension type.

ludocode commented Sep 27, 2017

When de-serializing these new timestamps, does it make sense to consider an extension of type -1 with unrecognized length an error?

I'm currently implementing this in MPack and I'm trying to decide how to handle this. The spec doesn't explicitly say that such data should be considered invalid, but the pseudocode in the spec does say "error" if the length is not 4, 8 or 12. So I'm guessing raising an error is the expected behaviour, and judging by the implementors linking to this issue so far, this seems to be the consensus.

It seems to me that this is a bit of a problem though because data that was previously considered valid, like say an extension of type -1 and length 7, is now considered invalid and will flag errors in the newest parsers. Although negative extension types were reserved, this still feels like a backwards-incompatible tightening of the rules. It seems to contradict the new description of extensions which is that it is the application (not the library) that gets to decide whether to treat it as opaque or reject it as invalid.

I had considered reporting an extension of type -1 as a timestamp if it's parseable (i.e. the length is 4, 8 or 12 and the nanoseconds are in bounds), and otherwise just reporting it as an opaque extension object like any other instead of raising an error. Extension functions (like accessing the raw data) would still be available for extensions of type -1 regardless of length and regardless of whether it's recognized as a timestamp. This way if someone for whatever reason was using -1, this change would not break their code or data.

I suppose we can just say that nobody should have been using an extension of type -1 in the first place. So maybe it doesn't matter and I'm worrying over nothing. Still, as far as I know most libraries do not differentiate between reserved extensions and application-defined extensions (at least not through anything more than documentation.) So there's nothing stopping a user from accidentally choosing a reserved type and having it break later when libraries start rejecting their data.

For this reason I'm starting to think that maybe libraries and the spec should completely differentiate between reserved extensions and application-defined extensions. I may decide to handle reserved extensions as an entirely separate type within MPack, that way if users use them, it will be obvious that they may break in the future and they're on their own.

Something else to keep in mind with raising errors on unknown lengths is that this prevents us from inserting new lengths for timestamp in the future. For example lots of people wanted timezones, so I had previously suggested that we could extend this later to add one or two bytes to each length to specify it. This is actually not possible because it won't be backwards compatible. Existing libraries will report a parsing error (probably discarding the whole message) when they encounter a timestamp with the new lengths. So this means that this definition of timestamps is frozen forever: no other lengths can be added, and any change or addition will have to be introduced as a different extension type.

@tagomoris

This comment has been minimized.

Show comment
Hide comment
@tagomoris

tagomoris Sep 27, 2017

Member

@ludocode It looks an issue about merged spec. Could you create another issue for it?
We can't follow the discussion under closed pull-requests.

Member

tagomoris commented Sep 27, 2017

@ludocode It looks an issue about merged spec. Could you create another issue for it?
We can't follow the discussion under closed pull-requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment