Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated V7 generator to Draft04. #112

Merged
merged 7 commits into from
Jan 26, 2023

Conversation

bgadrian
Copy link
Contributor

@bgadrian bgadrian commented Jan 3, 2023

Updated V7 generator to enforce the monotonic property for ids generated in the same timestamp.
Updated tests and go docs.

generator.go Outdated Show resolved Hide resolved
generator.go Outdated Show resolved Hide resolved
func makeTestNewV7TestVector() func(t *testing.T) {
return func(t *testing.T) {
pRand := make([]byte, 10)
//TODO make the comparison work with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to reconcile this TODO before thinking about merging this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the TODO but I failed to do the actual validation on the random data compared with the example from the draft. At least for now I think a partial validation is better than nothing (the test now only asserts the first 15bytes).

@codecov-commenter
Copy link

codecov-commenter commented Jan 3, 2023

Codecov Report

Base: 100.00% // Head: 100.00% // No change to project coverage 👍

Coverage data is based on head (6088057) compared to base (7b40032).
Patch coverage: 100.00% of modified lines in pull request are covered.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #112   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            4         4           
  Lines          473       498   +25     
=========================================
+ Hits           473       498   +25     
Impacted Files Coverage Δ
generator.go 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@cameracker
Copy link
Collaborator

Thanks for the submission @bgadrian! Would you mind rebasing this branch with master?

Also, thoughts on the code coverage loss?

u[1] = byte(ms >> 32)
u[2] = byte(ms >> 24)
u[3] = byte(ms >> 16)
u[4] = byte(ms >> 8)
u[5] = byte(ms)

//The 6th byte contains the version and partially rand_a data.
//We will lose the most significant bites from the clockSeq (with SetVersion), but it is ok, we need the least significant that contains the counter to ensure the monotonic property
binary.BigEndian.PutUint16(u[6:8], clockSeq) // set rand_a with clock seq which is random and monotonic
Copy link
Contributor

@convto convto Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to make the API user-selectable whether to consider batch generation or not.
Because getClockSequence performs a mutex lock, and using it will result in worse performance and reduced generation capability.
For non-batch generation use cases, it is probably undesirable to have getClockSequence run, so a user-selectable API might be better.

(For example, the implementation related to draft allows breaking changes, so add isBatch to the NewV7() argument.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we moved this line from the top so that we can batch generate the UUID better, yes?

Can we call out in a comment here that this is done here specifically to support batching? I can see someone moving it around and unintentionally breaking that behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cameracker I moved that line for an improved readability. It was confusing to me first to fill bytes 8+ first, and then fill the 1-8 bytes. By moving that line specifically after or before the first bytes it would not affect the result, but all the lines after this one needs to be in order because of the overrides.

@bgadrian
Copy link
Contributor Author

bgadrian commented Jan 10, 2023 via email

@convto
Copy link
Contributor

convto commented Jan 10, 2023

@bgadrian
The Rev04 draft monotonic counter specification was defined with SHOULD and MAY requirement levels, so I wanted to give the user a flexible option. But as you commented, it doesn't seem to be much of a problem.

Thanks for your reply!

@bgadrian
Copy link
Contributor Author

Thanks for the submission @bgadrian! Would you mind rebasing this branch with master?

Also, thoughts on the code coverage loss?

Hello, I have addressed the comments, restored the code coverage and rebased with the master.

@cameracker
Copy link
Collaborator

Hi @bgadrian ! I'm still planning on reviewing and accepting this contribution but haven't had the time to study the new updates to the draft to check for correct implementation. I'll do my best to get to it this week, I appreciate your patience.

@cameracker
Copy link
Collaborator

Also, I am tentatively planning on putting out a release as soon as this is merged.

Another thing @bgadrian , a couple of PRs have been merged to Master. One meaningful PR is the change to how coverage is collected. The addition of Generator options may have a small impact on this PR but unclear. Would you rather me update this PR for you or would you like to process these updates yourself?

generator.go Outdated Show resolved Hide resolved
generator.go Show resolved Hide resolved
| rand_b |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ */

ms, clockSeq, err := g.getClockSequence(true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, and then just to make sure I understand: this isnt really strictly needed for the PR, it looks like this is just a refactor to move this calculation into the clock sequence rather than just doing it here to meet the ms requirement for this specific uuid. Is that the case? I don't have a strong preference here but I'll say that the boolean flag parameteter on getClockSequence is slightly more mysterious if we're trying to understand "why" that flag exists. It's private so it's not a big deal and I won't to ask for a reshuffle if other maintainers are ok with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wanted to reuse the code sequencer and the mutex, but with a different timestamp, hence the flag.

generator.go Show resolved Hide resolved
u[1] = byte(ms >> 32)
u[2] = byte(ms >> 24)
u[3] = byte(ms >> 16)
u[4] = byte(ms >> 8)
u[5] = byte(ms)

//The 6th byte contains the version and partially rand_a data.
//We will lose the most significant bites from the clockSeq (with SetVersion), but it is ok, we need the least significant that contains the counter to ensure the monotonic property
binary.BigEndian.PutUint16(u[6:8], clockSeq) // set rand_a with clock seq which is random and monotonic
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we moved this line from the top so that we can batch generate the UUID better, yes?

Can we call out in a comment here that this is done here specifically to support batching? I can see someone moving it around and unintentionally breaking that behavior.

@@ -272,28 +281,50 @@ func (g *Gen) getClockSequence() (uint64, uint16, error) {
// NewV7 returns a k-sortable UUID based on the current millisecond precision
// UNIX epoch and 74 bits of pseudorandom data.
//
// This is implemented based on revision 03 of the Peabody UUID draft, and may
// This is implemented based on revision 04 of the Peabody UUID draft, and may
// be subject to change pending further revisions. Until the final specification
// revision is finished, changes required to implement updates to the spec will
// not be considered a breaking change. They will happen as a minor version
// releases until the spec is final.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this draft focuses on being more tentative about how strongly the implementations need to respect monotonicity of the increments vs unguessability, do we owe it to users to be explicit about which behavior we're leaning towards in the implementation?

@cameracker
Copy link
Collaborator

cameracker commented Jan 24, 2023

Ok, I completed a review. Sorry it took me so long. And thank you so much for the contribution.

Last request: Could we update the README.md to reflect which version of the Draft we're implementing for v6 and v7?

As an overall comment, I believe this PR correctly implements the v6 and v7 UUIDs to specification, but I'm getting the sense that we're not being as clear as we could be on which "MAY" "SHOULD" behaviors we chose to address in this implementation and worry that some user is going to pick up those UUIDs and run into "undefined behavior" sort of problems. What do you think? Should we be more explicit on our approach anywhere? @convto @bgadrian @dylan-bourque

@bgadrian
Copy link
Contributor Author

I have merged with the latest master, updated the Readme and addressed some comments.

As for being explicit or not, I think the v4 specifications is not a MAY or SHOULD, it is mandatory (SHOULD) to ensure the monotonic property

Additionally, care SHOULD be taken to ensure UUIDs generated in batches are also monotonic. That is, if one-thousand UUIDs are generated for the same timestamp; there is sufficient logic for organizing the creation order of those one-thousand UUIDs.

But the specs does not enforce which algorithm to use

MAY utilize a monotonic counter

The draft states that

For single-node UUID implementations that do not need to create batches of UUIDs,

This indeed makes the Batching optional, which is confusing, but the problem is that, the users will not know if they need or not batching most likely, I presume most real world scenarios of generating UUIDs are based on events that cannot be controlled (new users, new resources), so the "need" or "not need" of batching cannot be guaranteed, only presumed that is ok 99.99% of the time.

@cameracker cameracker merged commit 8345c9a into gofrs:master Jan 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants