Skip to content
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

Add method BSON::OID::from_epoch #2

Closed
wants to merge 3 commits into from

Conversation

oliwer
Copy link
Contributor

@oliwer oliwer commented Jul 3, 2016

This method allows to create a new OID using a specific epoch time.
Mango users really need this feature because their application's
logic relies on it in some cases.

This method allows to create a new OID using a specific epoch time.
Mango users really need this feature because their application's
logic relies on it in some cases.
That was a hard one...
@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 87.375% when pulling 7da2b01 on oliwer:oid_from_epoch into 3ec51dc on mongodb:master.

@xdg
Copy link
Contributor

xdg commented Jul 5, 2016

My team lead, @behackett, raised a concern that constructing OIDs from a timestamp can violate the uniqueness properties of OIDs. It's still useful for queries, but dangerous for inserts.

See how pymongo implemented it and the caveats they included. They also zero the non-timestamp fields, which is particularly useful for queries.

I think the documentation caveat is useful no matter what. Then there are three broad options (I think) for what to do with the non-timestamp fields (last 8 bytes):

  • Zero them – this is the best for range queries
  • Randomize them – a 64-bit random value, while expensive to generate, has a probabalistically minimized chance of violating the uniqueness property
  • Generate as normal – I think this is bad, as it helps neither range queries nor uniqueness and depends on just the 24-bit counter to avoid collisions

I'm inclined to match what pymongo does. If users want their own special OIDs, they can always generate their own 12-byte strings and pass them as the oid parameter.

What do you think?

@oliwer
Copy link
Contributor Author

oliwer commented Jul 5, 2016

Ho right! I will update the documentation.

While I agree pymongo does "the right thing", I know some (badly designed) applications which require unique OIDs for the same date. And while the 3 bytes counter has been enough so far, it would be better to use the whole 64 bits like you said. Does it have to be a random number? How about i concatenate 2 special counters randomly initialized?

But I also like the possibility of generating "clean" OID with only the date. Maybe I could modify the from_epoch method to accept an optional argument which would tell the function to use zeros.

@xdg
Copy link
Contributor

xdg commented Jul 7, 2016

I think two counters is really no better than one.

For compatibility, how about this as a possible API:

BSON::OID->from_epoch( $time );          # "compatible": time + host + pid + counter
BSON::OID->from_epoch( $time, $int );    # "custom"    : time + (random) int packed to 64 bits
BSON::OID->from_epoch( $time, 0 );       # "zeroed"    : time + "\0" x 8

This leaves the default behavior compatible with current practice, gives an option for users to improve the uniqueness property by providing their own random value to pack into the 64-bits of the host+pid+counter fields, and abuses the int-packing to give an easy way to have zeroed host+pid+counter fields for queries.

@oliwer
Copy link
Contributor Author

oliwer commented Jul 7, 2016

I think your API looks good. I'm going to implement it as soon as possible. Note that from_epoch must be callable as BSON::OID->from_epoch and bson_oid->from_epoch.

@xdg
Copy link
Contributor

xdg commented Jul 7, 2016

bson_oid->from_epoch should work but is inefficient, as I think it would construct a new BSON::OID object, call from_epoch on it to return a new BSON::OID, and throw away the first.

This method now supports 3 modes: standard, zeroed and randomized.
@coveralls
Copy link

coveralls commented Jul 9, 2016

Coverage Status

Coverage decreased (-0.09%) to 87.094% when pulling 8faf4b7 on oliwer:oid_from_epoch into 3ec51dc on mongodb:master.

@xdg
Copy link
Contributor

xdg commented Jul 11, 2016

I'm sorry -- I think you misunderstood what I meant. I didn't mean that from_epoch($time, 1) should have us generate a random number.

I meant that the second argument should literally be an integer, and that we should pack whatever they give us. If we want to be really nice, we could pack Math::BigInt as well. See pack_int64 in BSON::PP -- that and its dependent subs could be pulled out into BSON::Util or something for us in multiple places.

The reasons I have for this behavior:

  • consistency -- 0 or non-zero second args are both packed
  • flexibility -- users can control how random they want their number to be, they aren't forced to use the internal rand() call
  • hurdle -- making users come up with the random number reinforces the notion that they shouldn't be using this for insertion; making it harder discourages that.

If this feels more daunting to do that you have time for currently, I'm willing to take it forward from here, as I do have strong feelings about the API and I do want this to happen to support Mango::BSON. :-)

@xdg
Copy link
Contributor

xdg commented Jul 18, 2016

@oliwer, ping. Is this something you want to do or should I take it from here?

@xdg
Copy link
Contributor

xdg commented Jul 27, 2016

I've amended the method the way I described, more or less, and expanded the documentation a bit. I hope you find this is sufficient for what Mango needs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants