Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MessagePack timestamp type #168

Merged
merged 18 commits into from Jun 20, 2019

Conversation

@gfx
Copy link
Member

gfx commented Jun 17, 2019

This PR implements the timestamp type of MessagePack: timestamp32, imestamp64, and timestamp96. It also registers Time as the timestamp type for DefaultFactory.

MessagePack::Timestamp class

This is the core logic for the timestamp type.

This is independent from the built-in Time class because I'll use the timestamp type in a Rails app, overriding it by MessagePack::DefaultFactory.register_type(MessagePack::TImestamp::TYPE, ActiveSupport::TimeWithZone).

Default behavior

This PR does not change the default behavior because it is not a trivial thing.

However, factory.register_type(-1, Time) is tested.

cf. d31e669

Specification

https://github.com/msgpack/msgpack/blob/master/spec.md

@gfx gfx changed the title start implementing MessagePack timestamp type MessagePack timestamp type Jun 17, 2019
@gfx gfx marked this pull request as ready for review Jun 17, 2019
case data.length
when 4
# timestamp32 = (uint32be)
sec = data.unpack('L>').first

This comment has been minimized.

Copy link
@gfx

gfx Jun 17, 2019

Author Member

FYI ">" indicates big-endian.

This comment has been minimized.

Copy link
@tagomoris

tagomoris Jun 18, 2019

Member

Let's have it as comments!

This comment has been minimized.

Copy link
@gfx

gfx Jun 18, 2019

Author Member

Hmm, but it's just a built-in feature of Ruby. "uint32be" isn't enough?

This comment has been minimized.

Copy link
@tagomoris

tagomoris Jun 18, 2019

Member

Probably yes!
I don't have a strong opinion, but "If you want to add notes on a pull-req beforehand, it should be done as code comments".

This comment has been minimized.

Copy link
@gfx

gfx Jun 18, 2019

Author Member

Added comments about the big-endian specifier with a link to the Ruby reference! 7d16656

module MessagePack
Timestamp = Struct.new(:sec, :nsec)

class Timestamp # a.k.a. "TimeSpec"

This comment has been minimized.

Copy link
@tagomoris

tagomoris Jun 18, 2019

Member

That looks non-standard way to add methods to the class from Struct.new. Use block parameter for Struct.new to define methods on that class: https://docs.ruby-lang.org/en/2.6.0/Struct.html#method-c-new

This comment has been minimized.

Copy link
@gfx

gfx Jun 18, 2019

Author Member

I can't find how to use the Struct.new BLOCK syntax with constants:

NG(1):

$ ruby -e 'C = Struct.new(:foo) do CONST = 42 end; pp(C::CONST)'
Traceback (most recent call last):
-e:1:in `<main>': uninitialized constant C::CONST (NameError)

NG(2):

$ ruby -e 'Struct.new("::C", :foo) do pp self; CONST = 42 end; pp(C::CONST)'
Traceback (most recent call last):
	1: from -e:1:in `<main>'
-e:1:in `new': identifier ::C needs to be constant (NameError)

NG(3; it's not what I want):

$ ruby -e 'Struct.new("C", :foo) do pp self; self::CONST = 42 end; pp(Struct::C::CONST)'
Struct::C
42

I'm sorry I'm not good at Ruby 😭

This comment has been minimized.

Copy link
@tagomoris

tagomoris Jun 18, 2019

Member

Do you mean that there is no way to refer MessagePack::Timestamp::TYPE from other classes?
It's not impossible (for example, C = Struct.new(); C.const_set(:CONST, 42); p C::CONST), but I guess we should define normal modules/classes instead of using Struct.
Is there any reason to use Struct?

This comment has been minimized.

Copy link
@gfx

gfx Jun 18, 2019

Author Member

I use Struct because of its auto-generated #== method, which is useful only for testing.

But yes, I'll use a normal class; it's just easy.

This comment has been minimized.

Copy link
@gfx

gfx Jun 18, 2019

Author Member

done: 733ed14

sec = data.unpack('L>').first
new(sec, 0)
when 8
# timestamp64 = (uint32be, uint32be)

This comment has been minimized.

Copy link
@tagomoris

tagomoris Jun 18, 2019

Member

This comment is misleading... actually it's (uint30be, uint34be).
I understood that (uint32be, uint32be) is the format to unpack the binary, but it confused me for a while.

when 8
# timestamp64 = (uint32be, uint32be)
n, s = data.unpack('L>2')
sec = (n & 0x3) * 0x100000000 + s

This comment has been minimized.

Copy link
@tagomoris

tagomoris Jun 18, 2019

Member

Can you use 0b11 as the mask and << 32 instead of * 0x100000000?
It's easy to read for me because my recommendation shows "lower 2bit mask" and "use it as upper bytes". Counting the number of zero in 0x100000000 is scary.

This comment has been minimized.

Copy link
@tagomoris

tagomoris Jun 18, 2019

Member

And... this is the just an idea, which might be a kind of too-early optimization.
How about this code? sec = (n == 0 ? s : (n & 0b11 << 32) + s)
Until 2038, n is 0 always (or in most cases).

This comment has been minimized.

Copy link
@gfx

gfx Jun 18, 2019

Author Member

I think its optimization is a bit more complicated like this:

sec_hi_2bits = n & 0b11
sec = sec_hi_2bits == 0 ? s : (sec_hi_2bits << 32) + s

I'm not sure it is faster than just (n & 0b11) << 32) + s.

This comment has been minimized.

Copy link
@tagomoris

tagomoris Jun 18, 2019

Member

Ah, your code looks right. And here's my microbenchmark:

require 'benchmark'

t = Time.now
n = t.nsec << 2
s = t.sec

times = 1_000_000
Benchmark.bm do |x|
  x.report(:simple) do
    for i in 1..times
      sec = (n & 0b11 << 32) + s
    end
  end
  x.report(:optimized) do
    for i in 1..times
      sec_hi_2bits = n & 0b11
      sec = sec_hi_2bits == 0 ? s : (sec_hi_2bits << 32) + 2
    end
  end
end

The result was:

       user     system      total        real
simple  0.055794   0.000129   0.055923 (  0.056197)
optimized  0.047181   0.000082   0.047263 (  0.047393)

Let's forget this idea...

class Timestamp # a.k.a. "TimeSpec"
TYPE = -1 # timestamp extension type

TIMESTAMP32_MAX_SEC = 0x100000000 # 32-bit unsigned int

This comment has been minimized.

Copy link
@tagomoris

tagomoris Jun 18, 2019

Member

IMO "max" means the inclusive upper limit. For example, the max of [0, 99] is 99, but 98 for [0, 99).
The max value of 32-bit bytes looks the 8 chars of f, not 9 chars.
How do you feel about it?

This comment has been minimized.

Copy link
@gfx

gfx Jun 18, 2019

Author Member

Yep, it's confusing. (1 << 32) - 1 is absolutely better.

[sec].pack('L>')
else
# timestamp64 (uint32be, uint32be)
sec_high = sec << 32

This comment has been minimized.

Copy link
@tagomoris

tagomoris Jun 18, 2019

Member

I have the same idea to deserialization: What do you think to have if sec < TIMESTAMP32_MAX_SEC here?

This comment has been minimized.

Copy link
@gfx

gfx Jun 18, 2019

Author Member

I'm not sure what you said. I think it's already checked the above if condition.

This comment has been minimized.

Copy link
@tagomoris

tagomoris Jun 18, 2019

Member

Oh, sorry. This comment was about the optimization idea until 2038. Nevermind.

case data.length
when 4
# timestamp32 = (uint32be)
sec = data.unpack('L>').first

This comment has been minimized.

Copy link
@tagomoris

tagomoris Jun 18, 2019

Member

Let's have it as comments!

class Time
include MessagePack::CoreExt

def self.from_msgpack_ext(payload)

This comment has been minimized.

Copy link
@tagomoris

tagomoris Jun 18, 2019

Member

This may break existing user code if users defined their own Time.from_msgpack_ext (and Time#to_msgpack_ext).
It seems better not to have these methods, and to add options to enable predefined types on Factory class. What do you think about this idea?
(It requires patches for C/Java ext, so I'm ok to do it by myself, out of this change)

This comment has been minimized.

Copy link
@gfx

gfx Jun 18, 2019

Author Member

The latest code no longer break the existing code: Instead, use MessagePack::Time::{Packer,Unpacker} by hand:

      factory.register_type(
        MessagePack::Timestamp::TYPE,
        Time,
        packer: MessagePack::Time::Packer,
        unpacker: MessagePack::Time::Unpacker
      )

as tests show.

!!::Time.at(0, 0, :nanosecond)
rescue ArgumentError
false
end

This comment has been minimized.

Copy link
@gfx

gfx Jun 18, 2019

Author Member

FYI the above crazy indentations are made by rubocop.

This comment has been minimized.

Copy link
@tagomoris

tagomoris Jun 18, 2019

Member

How about this code?

TIME_AT_3_AVAILABLE = !!::Time.at(0, 0, :nanosecond) rescue false

This comment has been minimized.

Copy link
@gfx

gfx Jun 18, 2019

Author Member

rubocop fixes it to:

    TIME_AT_3_AVAILABLE = begin
                            !!::Time.at(0, 0, :nanosecond)
                          rescue StandardError
                            false
                          end

but yes, it's better than the current, so I've applied it f49a8ff

I wonder the rubocop behavior that it sometimes seems to change the code at random.

gfx added 2 commits Jun 18, 2019
@gfx

This comment has been minimized.

Copy link
Member Author

gfx commented Jun 18, 2019

Everything is done!

BTW should I add documents about it in this PR?

@tagomoris

This comment has been minimized.

Copy link
Member

tagomoris commented Jun 18, 2019

@gfx It's better to have if possible!

I'm thinking about the option to move code in MessagePack::Time to Timestamp (those two look confusing to me...), but I'll add fixes on those at the same time with the switch to enable predefined ext types on Factory.
So, this change looks good to me. I'll merge this after the commit for docs.

@gfx

This comment has been minimized.

Copy link
Member Author

gfx commented Jun 18, 2019

I think MessagePack::Time and MessagePack::Timestamp are completely different classes.

MessagePack::Timestamp provides utilities for implementing a MessagePack timestamp mapping.

MessagePack::Time provides the packer and unpacker for built-in Time class. If msgpack-ruby supports ActiveSupport::TImeWithZone, its name will be MessagePack::ActiveSupportTimeWithZone or something alike.

@gfx

This comment has been minimized.

Copy link
Member Author

gfx commented Jun 18, 2019

doc is added! 7566db5

@tagomoris tagomoris merged commit ceab59e into msgpack:master Jun 20, 2019
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@gfx gfx deleted the gfx:timestamp_type branch Jun 20, 2019
@tagomoris

This comment has been minimized.

Copy link
Member

tagomoris commented Jun 20, 2019

I've added some tests 6c89d3b and released v1.3.0.
Thank you for the great patches!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.