-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop getClockSequence mutex in favour of atomics #83
Conversation
// Should increase clock sequence. | ||
if timeNow <= g.lastTime { | ||
g.clockSequence++ | ||
timeHi := atomic.LoadUint64(&g.lastTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the sync/atomic
docs:
On ARM, x86-32, and 32-bit MIPS, it is the caller's responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically. The first word in a variable or in an allocated struct, array, or slice can be relied upon to be 64-bit aligned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be. sync.Once
structs are 96 bits each, io.Reader
as an interface is 2*word size, epochFunc
and hwAddrFunc
are word-sized func pointers, which puts lastTime
at bit 320. Am I missing something?
Regardless of the above I agree that the PR should have this at the very least mentioned (in the struct declaration?) if it were to get merged. I'll wait for additonal feedback before I commit further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alcore would it be fair to say one risk would be if the standard library changed the sync.Once
struct? Would we be able to mitigate that risk by moving the lastTime
field to the beginning of the struct definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structure of sync.Once
should be irrelevant to the alignment unless it suddenly starts containing a 128-bit value. But this concern then applies to just about any struct member. If Go ever ran on a platform with 128-bit memory addressing, the function pointers would become 128-bit sized and the compiler would be using 128-bit alignment for the struct - assuming current rules.
I think my previous comment might've been a bit misleading here. Alignment is inferred from the largest sized field of a struct (as per https://golang.org/ref/spec#Size_and_alignment_guarantees). In our case that's precisely the timestamp. From the top of my head - if it were stored as a pointer directly (instead of being referenced for the CAS in getClockSequence
), on 32-bit platforms it would be a 32-bit value and therefore drop the entire struct's alignment to 32-bits since no other field would then be larger than a 32-bit word.
My previous statement made the mistake of counting the memory offset and potential padding (i.e. there is none) instead of alignment.
tl;dr: lastTime
can be moved to the beginning without any adverse effect and fall into the The first word in (...) an allocated struct (...) can be relied upon to be 64-bit aligned.
guarantee, but as I understand https://golang.org/ref/spec#Size_and_alignment_guarantees - it is not necessary. It could even be moved to the very end - that would increase the size of the generator (padding after hardwareAddr
), but as per Go's spec, the alignment would still be that of the largest member. I believe the comment in the sync/atomic
docs is merely a reminder for people who juggle around with unsafe, raw memory pointers, since the 64-bit atomics are implemented as 32-bit words on 32-bit ARM IIRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but as per Go's spec, the alignment would still be that of the largest member
This is not correct. The alignment of the beginning of the struct is equal to the alignment of the field with the most alignment (which is often not the largest field). There are no types on 32-bit platforms with 64-bit alignment. This creates a requirement that the sum of the sizes of the fields before it is a multiple of 8 bytes and that this struct is in its own allocation.
So we really do need to move lastTime
to the beginning of the struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An issue with my phrasing, where the suggestion that it can be moved to the end was segueing on a count of the preceding bits - it'd end up on bit 384 on 32-bit platforms (hardwareAddr being padded), which puts it on a 64-bit boundary again.
But yes, the fact this conversation is even taking place is proof enough that it should be moved to the 0-offset, along with the quote from the sync/atomic
docs as a reminder why it's there :-) Makes it oh so much simpler.
Sidenote: Storing it as a pointer would also alleviate the requirements altogether, wouldn't it? It will get its own alloc once it escapes to the heap during struct construction, and by extension Go will allocate the value on a 64-bit boundary, while the pointer in the struct will take the alignment of the struct. I generally do just that, but didn't want to introduce a pointer to chase in this case.
I am not personally in favor of merging this PR, as it makes the code a lot more complicated, and doesn't seem to have a huge impact. The lock is unlikely to be problematically contended unless the user code does nothing but generate UUIDs. |
@alcore thank you for this contribution. I'm interested in merging this in, and want to understand the alignment risk before merging. I'm also aligned with Jaden on the readability concern. Edit: I do wonder if you could speak more about the system or environment that saw benefit from this change. |
@theckman I hope my comment in #83 (comment) clears up the memory alignment concern. As for readability - so am I :-) I could try to reduce the verbiage of the comments a bit, but as for the code itself, I can't currently think of anything that would simplify it. The The setup where this mattered is (was) precisely one which @jaddr2line mentioned as In the end I simply rolled my own, but I thought this library could benefit from this change, since the ~20% speedup is - IMHO - relevant not just to cases with such contention. Edit: The code can be simplified, as I mentioned in the PR comment. Essentially if increasing the clock sequence is not gated behind the... if timeNow <= timeHi {... ... condition but applies to every ID, then getting and setting the clock sequence becomes a single But this then turns into a case of
... as specifying when it has to be changed, but not that it can't be increased all the time. It doesn't really affect the safety, as the sequence will just wrap around anyways. But it would be a breaking change if anyone relies on the assumption that it only increases for IDs in the same time frame (I can't think of a use case for this assumption). |
@alcore Out of curiosity is there any reason why the logic has changed such that you no longer set the last generation time when the time has gone backwards? https://github.com/gofrs/uuid/pull/83/files#diff-0d0495ad16c6f603876bbf484c43b549f53d0b33d4cd74c908b0ee95a94369eaR238-R240 vs https://github.com/gofrs/uuid/blob/master/generator.go#L237-L242 |
Thanks for the contribution here. For the most part, this review has sat idle for a while due to some strong objections about increase of code complexity and potential byte alignment concerns. Going to close this. If anybody has any objections, we can continue to talk about the changes here, and the PR can be re-opened. |
This PR drops the mutex used by getClockSequence in favour of an atomic CAS and atomic sequence increment, primarily motivated by performance.
It introduces a (documented) edge case wherein under extreme contention a time progression can result in the first UUID in the new time interval not reusing the old clock sequence but instead ending up with an incremented sequence (as a safety to avoid duplicates). This changes the internal behaviour and may be not compliant with the spec, depending on the interpretation. The safety net could be simplified if the generator was allowed to increment the clockSequence along with each time progression (which I refrained from), not just within the current (unchanged) time interval.
The gains on
go 1.14.1, Windows 10, i7 4770k @ 4.4GHz (windows/amd64)
are as follows:Sequential:
Concurrent (8 threads):
Since there is no concurrent benchmark in the test suite, I used the trivial...
The PR includes a new test case for the V1 generator (and V2 by extension) which simulates a concurrent race of G=1024 goroutines attempting to generate N=256 UUIDs each, to tickle the race detector. Afterwards it checks for duplicates (i.e. primarily to cover the documented edge case). The test is deliberately slow (about 3s on my platform) to give the CPU and Go's scheduler some chances to shuffle things around in the execution flow.
Sidenote: This is only actual test that utilizes the race detector, given that no other tests in the suite run concurrently to actually make use of it (while it's set for Travis builds). If there's interest, I may rectify this in a subsequent PR and race through all generators in a similar manner.