In a tight loop GuidGenerator.GenerateTimeBasedGuid() will generate duplicate guids. When using these guids for column names, values could potentially be overwritten.
I'm working on a fix based on code from a java project (http://johannburkard.de/software/uuid/) which is used in Hector (a Java client for Cassandra). Basically, it keeps track of the ticks used for the previously generated guid, if the next guid would use the same (or earlier) ticks value, the value would be incremented by 1, stored as the last used value, and then the new guid would be generated. I think (at least hope) the code will be clearer.
Tell me what you think of this new revamped DateTimePrecise. After looking at and studying the old one, I think the flaw in the design was leaving the Stopwatch running forever. It eventually by passed the overflow point and caused the process the crash. This new one will reset the stopwatch every 10 seconds. My inspiration came from the following post which shows the resolution you get by just dividing ElapsedTicks by Frequency.
Attempting a rewrite of DateTimePrecise for issue #66 and #48
It is much cleaner and easier to understand, that is for sure. However, I don't think it is going to work because even though you are adding a high-precision number of seconds to _baseTime, it appears it is rounded to the nearest millisecond. I think sub-millisecond precision is needed. See this MSDN page: http://msdn.microsoft.com/en-us/library/system.datetimeoffset.addseconds(v=vs.100).aspx
I will work on confirming that this does not work as precise as expected.
In the meantime, @nberardi I think you're on the right track with resetting the stopwatch. I haven't thought about it much yet, but I assume we should be able to make some transformation between DateTime.Ticks and Stopwatch.Ticks since the Frequency of the stopwatch is known.
Found it. Multiply the seconds by 10,000,000 and that will give you ticks. Then we should be able to just ass ticks to the base time.
changed from adding seconds to adding ticks, since the resolution get…
…s truncated to milliseconds when adding seconds to a date time, thanks to @dlbromen for pointing this out in issue #66
added back in support for the new DateTimePrecise for issue #66
A couple comments.
First, I think we can just use the stopwatch.Elapsed property to save some math.
TimeSpan elapsed = _stopwatch.Elapsed;
if (elapsed.TotalSeconds > _syncSeconds)
Second, @GrLawrence wrote a test function on one of his branches to test out his proposal in issue #66, and I am still getting Guid collisions using this method. I'll work on it a little more to see if it is really just happening so fast that the operations are on the same Tick (seems hard to believe) or if there is some other bug.
Actually we can't use Elapsed, take a look at this article.
Alot of precision is lost when converted to TimeSpan. Specifically look under the title "Sidebar: What’s the Frequency?"
After we get the precision part of the equation down for DateTime. Which is not only important for TimeUUID, but also the column timestamp. We should look at actually making the clockSequence of TimeUUID unique per instance created.
I think this is where some of the uniqueness problems come from, because I never properly created a clockSequence that changes with each UUID generated. I propose we take the last two bytes from Stopwatch.GetTimestamp() and use that as the clock sequence.
I think @GrLawrence sort of touched on this in his code where he was increasing the tick by 1. https://github.com/GrLawrence/fluentcassandra/blob/timeuuid/src/GuidGenerator.cs#L104
Re: Elapsed. Yes, when you look at those two raw numbers it looks like you lost a lot of precision. In reality, when you are multiplying the "precise" seconds to normalize it to 100-ns Ticks (Convert.ToInt64(elapsedSeconds * TicksInOneSecond), you are getting the same answer. I guess it doesn't really matter to me all that much, just so the info we're putting out there is accurate.
Convert.ToInt64(elapsedSeconds * TicksInOneSecond
Pop open a console app and try it:
Stopwatch stopWatch2 = new Stopwatch();
Console.WriteLine("SW Precise Seconds To DT Ticks " + Convert.ToInt64((stopWatch2.ElapsedTicks / (double)Stopwatch.Frequency) * 10000000));
Console.WriteLine("SW Elapsed Ticks " + stopWatch2.Elapsed.Ticks);;
I see what you are saying, but I am getting small rounding variations every once and a while. It seems like TimeSpan truncates the numbers.
SW Precise Seconds To DT Ticks: 50004973
SW Elapsed Ticks: 50004972
who knows does that rounding make a difference? it might just be simpler to just Add the TimeSpan to make the code more readable.
OK I think we put the DateTime issue to bed. Now just need to get a real varying clockSequence in TimeUUID and I think we will be golden.
Have either of you guys seen this? http://stackoverflow.com/questions/1008345/system-diagnostics-stopwatch-returns-negative-numbers-in-elapsed-properties
Apparently on Virtual Machines Stopwatch can report negative numbers. So we may want to consider boxing the elapsed TimeSpan to make sure it is a positive number.
Running on bare metal right now, but I can try it on a Windows Server 2008R2 VM.
Re: clockSequence - I was trying to make sense of the RFC, and it sounded like the clockSequence is only to be incremented when the clock attempts to go backwards - not on each UUID. Maybe I just don't understand it.. I'll give it another look tomorrow.
RE: Stopwatch - so the more I read about the issues with the Stopwatch and it switching CPU's, getting different timings, it producing negative values, unusually large values. I think we need to build in a little more error checking. And at the very least default back to DateTimeOffset.UtcNow when we feel uncomfortable with the value it is returning.
RE: clockSequence - From what I see from other languages who have implemented it, nobody follows the exact letter of the law for the clockSequence in TimeUUID. Most just get a semi-random value generated by the system to use. Which I think is fine, because the TimeUUID is unlikely to conflict with another system's generated TimeUUID because of the randomness built into the node. And the clockSequence is merely there to make sure the same TimeUUID isn't generated twice for the same date, which I think will be easy to accomplish using the system timestamp from Stopwatch.GetTimestamp(). I know this diverges from the exact letter of the law of the RFC, but I still think it keeps the spirit of the RFC in mind.
added clockSequence back in using the Environment.TickCount to close …
…out issue #66
Environment.Ticks doesn't have enough resolution for the TimeUUID's c…
…lockSequence issue #66
I am hoping both of these changes that I made this morning will work. Let me know if you see any glaring or corner cases that I might have missed.
So, if the purpose of the clock sequence is to prevent collisions when the same DateTimeOffset timestamp is being used (either by speed or system clock going backwards), isn't it going to continue to collide with this implementation since the sequence is also based on time?
Shouldn't occur. Because GetTimestamp() changes a lot faster than DateTime and between two side by side calls will produce different numbers unlike DateTime.Now. GetTimestamp() is what the Stopwatch uses under the covers to produce the timing. So I feel pretty safe. Also by the way GetTimestamp() is the CPU's clock, which is why you get much more precise values from Stopwatch.
public static long GetTimestamp()
long num = 0L;
So I feel much safer using it over Environment.TickCount which is only millisecond based.
What do you guys think? Can we put this one to bed?
@GrLawrence has been working on it more. I think we have something pretty solid. Look for an update from him within the next 24 hours and then we can see what you think.
Don't spend too much time on trying to perfect the clock sequence, it is throw away data. And not even Cassandra changes the clock sequence as the RFC says.
In Cassandra the clock sequence is read only.
I’ve checked in a change on my fork to address the generation of TimeUUIDs. This implementation locks around the generation of the UUID to help in multi-threaded scenarios. Also, this goes back to using random bytes for the clock sequence, but only generates a new sequence if it appears that the clock has gone backward, or stalled. So this honors trying to keep the time portion close to the system time, but recognizes it must be from a different sequence. This seems closer to the spec as well for when to start a new sequence. It also continues to use DateTimePrecise. See pull request.
the random generator is faster than the timestamp as talked about in …
decided to add an option of Fast vs NoDuplicates, since NoDuplicates …
…is 10 times slower, this should satisfy issue #68 and #66
Finally think we have a good option for this.