GuidGenerator.GenerateTimeBasedGuid() can create duplicate Time-based UUIDs #66
Comments
That was the reader we used DateTimePrecise. But it was problematic and It was removed for the stability of the code. So I am happy to see somebody working on a better solution. See issue #48 for more info. |
Tell me what you think of this new revamped Thoughts?? |
It is much cleaner and easier to understand, that is for sure. However, I don't think it is going to work because even though you are adding a high-precision number of seconds to _baseTime, it appears it is rounded to the nearest millisecond. I think sub-millisecond precision is needed. See this MSDN page: http://msdn.microsoft.com/en-us/library/system.datetimeoffset.addseconds(v=vs.100).aspx I will work on confirming that this does not work as precise as expected. In the meantime, @nberardi I think you're on the right track with resetting the stopwatch. I haven't thought about it much yet, but I assume we should be able to make some transformation between DateTime.Ticks and Stopwatch.Ticks since the Frequency of the stopwatch is known. |
That is easy to overcome by turning the seconds into datetime ticks (which i think are nanosecond based) and I have verified the article that I referenced. That you do get a much more |
Found it. Multiply the seconds by 10,000,000 and that will give you ticks. Then we should be able to just ass ticks to the base time. |
A couple comments. First, I think we can just use the
Second, @GrLawrence wrote a test function on one of his branches to test out his proposal in issue #66, and I am still getting Guid collisions using this method. I'll work on it a little more to see if it is really just happening so fast that the operations are on the same Tick (seems hard to believe) or if there is some other bug. |
Actually we can't use Elapsed, take a look at this article. Alot of precision is lost when converted to TimeSpan. Specifically look under the title "Sidebar: What’s the Frequency?" |
After we get the precision part of the equation down for DateTime. Which is not only important for TimeUUID, but also the column timestamp. We should look at actually making the clockSequence of TimeUUID unique per instance created. https://github.com/managedfusion/fluentcassandra/blob/master/src/GuidGenerator.cs#L112 I think this is where some of the uniqueness problems come from, because I never properly created a clockSequence that changes with each UUID generated. I propose we take the last two bytes from I think @GrLawrence sort of touched on this in his code where he was increasing the tick by 1. https://github.com/GrLawrence/fluentcassandra/blob/timeuuid/src/GuidGenerator.cs#L104 |
Re: Elapsed. Yes, when you look at those two raw numbers it looks like you lost a lot of precision. In reality, when you are multiplying the "precise" seconds to normalize it to 100-ns Ticks ( Pop open a console app and try it:
|
I see what you are saying, but I am getting small rounding variations every once and a while. It seems like TimeSpan truncates the numbers.
who knows does that rounding make a difference? it might just be simpler to just Add the TimeSpan to make the code more readable. |
OK I think we put the DateTime issue to bed. Now just need to get a real varying clockSequence in TimeUUID and I think we will be golden. |
Have either of you guys seen this? http://stackoverflow.com/questions/1008345/system-diagnostics-stopwatch-returns-negative-numbers-in-elapsed-properties Apparently on Virtual Machines Stopwatch can report negative numbers. So we may want to consider boxing the elapsed TimeSpan to make sure it is a positive number. |
Running on bare metal right now, but I can try it on a Windows Server 2008R2 VM. Re: clockSequence - I was trying to make sense of the RFC, and it sounded like the clockSequence is only to be incremented when the clock attempts to go backwards - not on each UUID. Maybe I just don't understand it.. I'll give it another look tomorrow. |
RE: Stopwatch - so the more I read about the issues with the Stopwatch and it switching CPU's, getting different timings, it producing negative values, unusually large values. I think we need to build in a little more error checking. And at the very least default back to DateTimeOffset.UtcNow when we feel uncomfortable with the value it is returning. RE: clockSequence - From what I see from other languages who have implemented it, nobody follows the exact letter of the law for the clockSequence in TimeUUID. Most just get a semi-random value generated by the system to use. Which I think is fine, because the TimeUUID is unlikely to conflict with another system's generated TimeUUID because of the randomness built into the node. And the clockSequence is merely there to make sure the same TimeUUID isn't generated twice for the same date, which I think will be easy to accomplish using the system timestamp from Stopwatch.GetTimestamp(). I know this diverges from the exact letter of the law of the RFC, but I still think it keeps the spirit of the RFC in mind. |
I am hoping both of these changes that I made this morning will work. Let me know if you see any glaring or corner cases that I might have missed. |
So, if the purpose of the clock sequence is to prevent collisions when the same DateTimeOffset timestamp is being used (either by speed or system clock going backwards), isn't it going to continue to collide with this implementation since the sequence is also based on time? |
Shouldn't occur. Because GetTimestamp() changes a lot faster than DateTime and between two side by side calls will produce different numbers unlike DateTime.Now. GetTimestamp() is what the Stopwatch uses under the covers to produce the timing. So I feel pretty safe. Also by the way GetTimestamp() is the CPU's clock, which is why you get much more precise values from Stopwatch. public static long GetTimestamp()
{
if (IsHighResolution)
{
long num = 0L;
SafeNativeMethods.QueryPerformanceCounter(out num);
return num;
}
return DateTime.UtcNow.Ticks;
} So I feel much safer using it over |
What do you guys think? Can we put this one to bed? |
@GrLawrence has been working on it more. I think we have something pretty solid. Look for an update from him within the next 24 hours and then we can see what you think. |
Don't spend too much time on trying to perfect the clock sequence, it is throw away data. And not even Cassandra changes the clock sequence as the RFC says. https://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/utils/UUIDGen.java In Cassandra the clock sequence is read only. |
I’ve checked in a change on my fork to address the generation of TimeUUIDs. This implementation locks around the generation of the UUID to help in multi-threaded scenarios. Also, this goes back to using random bytes for the clock sequence, but only generates a new sequence if it appears that the clock has gone backward, or stalled. So this honors trying to keep the time portion close to the system time, but recognizes it must be from a different sequence. This seems closer to the spec as well for when to start a new sequence. It also continues to use DateTimePrecise. See pull request. |
Finally think we have a good option for this. |
In a tight loop GuidGenerator.GenerateTimeBasedGuid() will generate duplicate guids. When using these guids for column names, values could potentially be overwritten.
I'm working on a fix based on code from a java project (http://johannburkard.de/software/uuid/) which is used in Hector (a Java client for Cassandra). Basically, it keeps track of the ticks used for the previously generated guid, if the next guid would use the same (or earlier) ticks value, the value would be incremented by 1, stored as the last used value, and then the new guid would be generated. I think (at least hope) the code will be clearer.
The text was updated successfully, but these errors were encountered: