-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated V7 generator to Draft04. #112
Conversation
generator_test.go
Outdated
func makeTestNewV7TestVector() func(t *testing.T) { | ||
return func(t *testing.T) { | ||
pRand := make([]byte, 10) | ||
//TODO make the comparison work with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to reconcile this TODO before thinking about merging this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove the TODO but I failed to do the actual validation on the random data compared with the example from the draft. At least for now I think a partial validation is better than nothing (the test now only asserts the first 15bytes).
Codecov ReportBase: 100.00% // Head: 100.00% // No change to project coverage 👍
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more Additional details and impacted files@@ Coverage Diff @@
## master #112 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 4 4
Lines 473 498 +25
=========================================
+ Hits 473 498 +25
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Thanks for the submission @bgadrian! Would you mind rebasing this branch with master? Also, thoughts on the code coverage loss? |
u[1] = byte(ms >> 32) | ||
u[2] = byte(ms >> 24) | ||
u[3] = byte(ms >> 16) | ||
u[4] = byte(ms >> 8) | ||
u[5] = byte(ms) | ||
|
||
//The 6th byte contains the version and partially rand_a data. | ||
//We will lose the most significant bites from the clockSeq (with SetVersion), but it is ok, we need the least significant that contains the counter to ensure the monotonic property | ||
binary.BigEndian.PutUint16(u[6:8], clockSeq) // set rand_a with clock seq which is random and monotonic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be better to make the API user-selectable whether to consider batch generation or not.
Because getClockSequence
performs a mutex lock, and using it will result in worse performance and reduced generation capability.
For non-batch generation use cases, it is probably undesirable to have getClockSequence
run, so a user-selectable API might be better.
(For example, the implementation related to draft allows breaking changes, so add isBatch
to the NewV7()
argument.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we moved this line from the top so that we can batch generate the UUID better, yes?
Can we call out in a comment here that this is done here specifically to support batching? I can see someone moving it around and unintentionally breaking that behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cameracker I moved that line for an improved readability. It was confusing to me first to fill bytes 8+ first, and then fill the 1-8 bytes. By moving that line specifically after or before the first bytes it would not affect the result, but all the lines after this one needs to be in order because of the overrides.
@convto If the user does not generate batches then it means it calls the method once each ms, so contention is not a problem. And without the monotonic seq the UUID is not v7 according to the specification.
As alternative we could refactor or add a new method that uses atomic package instead of a mutex to improve the concurrency If needed.
On Tue, Jan 10, 2023, at 07:10, YuyaOkumura wrote:
***@***.**** commented on this pull request.
In generator.go <#112 (comment)>:
> u[1] = byte(ms >> 32)
u[2] = byte(ms >> 24)
u[3] = byte(ms >> 16)
u[4] = byte(ms >> 8)
u[5] = byte(ms)
+ //The 6th byte contains the version and partially rand_a data.
+ //We will lose the most significant bites from the clockSeq (with SetVersion), but it is ok, we need the least significant that contains the counter to ensure the monotonic property
+ binary.BigEndian.PutUint16(u[6:8], clockSeq) // set rand_a with clock seq which is random and monotonic
It may be better to make the API user-selectable whether to consider batch generation or not.
This is because `getClockSequence` performs a mutex lock, and using it will result in worse performance and reduced generation capability.
For non-batch generation use cases, it is probably undesirable to have getClockSequence run, so a user-selectable API might be better.
(For example, if breaking changes are allowed, adding `isBatch` to the NewV7 argument.)
—
Reply to this email directly, view it on GitHub <#112 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGKUMMCHZL5SX2JPLMRRSTWRTVLTANCNFSM6AAAAAATPXQRRM>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
/*******************************/
⚛ Bledea-Georgescu Adrian
🖧 Sofware Engineer
✍ https://coder.today
📧 ***@***.***
🖥 github.com/bgadrian
💁 linkedin.com/in/bgadrian <https://www.linkedin.com/in/bgadrian/>
|
@bgadrian Thanks for your reply! |
Hello, I have addressed the comments, restored the code coverage and rebased with the master. |
Hi @bgadrian ! I'm still planning on reviewing and accepting this contribution but haven't had the time to study the new updates to the draft to check for correct implementation. I'll do my best to get to it this week, I appreciate your patience. |
Also, I am tentatively planning on putting out a release as soon as this is merged. Another thing @bgadrian , a couple of PRs have been merged to Master. One meaningful PR is the change to how coverage is collected. The addition of Generator options may have a small impact on this PR but unclear. Would you rather me update this PR for you or would you like to process these updates yourself? |
| rand_b | | ||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ */ | ||
|
||
ms, clockSeq, err := g.getClockSequence(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, and then just to make sure I understand: this isnt really strictly needed for the PR, it looks like this is just a refactor to move this calculation into the clock sequence rather than just doing it here to meet the ms requirement for this specific uuid. Is that the case? I don't have a strong preference here but I'll say that the boolean flag parameteter on getClockSequence is slightly more mysterious if we're trying to understand "why" that flag exists. It's private so it's not a big deal and I won't to ask for a reshuffle if other maintainers are ok with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I wanted to reuse the code sequencer and the mutex, but with a different timestamp, hence the flag.
u[1] = byte(ms >> 32) | ||
u[2] = byte(ms >> 24) | ||
u[3] = byte(ms >> 16) | ||
u[4] = byte(ms >> 8) | ||
u[5] = byte(ms) | ||
|
||
//The 6th byte contains the version and partially rand_a data. | ||
//We will lose the most significant bites from the clockSeq (with SetVersion), but it is ok, we need the least significant that contains the counter to ensure the monotonic property | ||
binary.BigEndian.PutUint16(u[6:8], clockSeq) // set rand_a with clock seq which is random and monotonic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we moved this line from the top so that we can batch generate the UUID better, yes?
Can we call out in a comment here that this is done here specifically to support batching? I can see someone moving it around and unintentionally breaking that behavior.
@@ -272,28 +281,50 @@ func (g *Gen) getClockSequence() (uint64, uint16, error) { | |||
// NewV7 returns a k-sortable UUID based on the current millisecond precision | |||
// UNIX epoch and 74 bits of pseudorandom data. | |||
// | |||
// This is implemented based on revision 03 of the Peabody UUID draft, and may | |||
// This is implemented based on revision 04 of the Peabody UUID draft, and may | |||
// be subject to change pending further revisions. Until the final specification | |||
// revision is finished, changes required to implement updates to the spec will | |||
// not be considered a breaking change. They will happen as a minor version | |||
// releases until the spec is final. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this draft focuses on being more tentative about how strongly the implementations need to respect monotonicity of the increments vs unguessability, do we owe it to users to be explicit about which behavior we're leaning towards in the implementation?
Ok, I completed a review. Sorry it took me so long. And thank you so much for the contribution. Last request: Could we update the README.md to reflect which version of the Draft we're implementing for v6 and v7? As an overall comment, I believe this PR correctly implements the v6 and v7 UUIDs to specification, but I'm getting the sense that we're not being as clear as we could be on which "MAY" "SHOULD" behaviors we chose to address in this implementation and worry that some user is going to pick up those UUIDs and run into "undefined behavior" sort of problems. What do you think? Should we be more explicit on our approach anywhere? @convto @bgadrian @dylan-bourque |
I have merged with the latest master, updated the Readme and addressed some comments. As for being explicit or not, I think the v4 specifications is not a MAY or SHOULD, it is mandatory (SHOULD) to ensure the monotonic property
But the specs does not enforce which algorithm to use
The draft states that
This indeed makes the Batching optional, which is confusing, but the problem is that, the users will not know if they need or not batching most likely, I presume most real world scenarios of generating UUIDs are based on events that cannot be controlled (new users, new resources), so the "need" or "not need" of batching cannot be guaranteed, only presumed that is ok 99.99% of the time. |
Updated V7 generator to enforce the monotonic property for ids generated in the same timestamp.
Updated tests and go docs.