Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Deadlock caused by lock in static constructor #1193
I've found a rather subtle deadlock condition in Noda Time that can be triggered by creating patterns from different threads.
Here's a minimal program that reproduces the issue:
The program may not deadlock every time, as there is a race condition involved. To reproduce the deadlock consistently, follow these steps in Visual Studio:
Suppose the thread "Invariant" starts running first. It eventually calls into
Thread "en-US" now runs. It also first goes through a
To ensure the static constructor is called only once, the CLR acquires an unique lock for the static constructor of
Suppose the thread "Invariant" is now scheduled again. It is still in the
Here are the stack traces for both threads at the time of the deadlock:
Generally speaking, blocking (directly or indirectly) inside a static constructor or initializer should probably be avoided as much as possible. As this example shows, it's not always obvious when a static constructor will run. This makes it very difficult to prevent deadlocks like this, as the order the locks are taken in can be hard to control.
So in my opinion the goal should be to eliminate blocking calls from static constructors (or initializers). Unfortunately I'm not versed enough in the intricacies of Noda Time to make that change myself or propose an alternative solution to this particular problem.
Okay, I've now read through in more detail and I believe I understand. While it would be great not to have anything that needs to lock within the type initializers in Noda Time, there are a lot of "natural constants" that we wouldn't want to give up. (We've had a similar problem before, but with a slightly different cause.)
It feels to me like the biggest problem is that Cache.GetOrAdd calls arbitrary code within the lock. I'll need to think about how much of a problem it is (if at all) to call the value factory more times than necessary; if that's not a problem, then it shouldn't be too hard to fix.
Alternatively, I can look into whether ConcurrentDictionary is supported by all the platforms we now target, as that would be a simple fix as well.
Note: to observe the issue, I've had to change the pattern to "o" to ensure that the Patterns nested class is actually initialized. As it doesn't have a static constructor in the source code, the timing is implementation-specific.
With that change, I can reproduce the problem. I'm now going to try to reproduce it outside the debugger by adding some manual sleeps where we froze/thawed the threads.
Okay, I've reproduced this without breaking into the debugger until we think it's deadlocked by adding these changes:
Importantly, this doesn't require changing the cache... which means I will be able to validate that when the cache implementation is changed, it works...
Jon, thank you for looking into this so quickly.
That's interesting, I did not know that. I should have mentioned that I tested this with .NET Framework 4.7.1.
ConcurrentDictionary is available almost anywhere (.NET Core >= 1.0, .NET Framework >= 4.0, .NET Standard >= 1.1), so it should be safe to use. But I think we would still need manual synchronization for the cache eviction, right?
Cache eviction is a little interesting, but I've got a solution which I think will work okay.
Another thing to work out: how/where we're going to release this.
I was wondering whether to create a 2.4.0 branch now, cherry pick this work into there, and do a single release, leaving older versions broken. The alternative is to create a 2.3.1 and then do 2.4.0 separately. What do you think?
added a commit
Aug 18, 2018
It wouldn't have been nearly as quick a fix if you hadn't gone into such detail reproducing it. Incredibly helpful! I'm sorry you've run into this at all - the bug has probably been there since Noda Time 1.0 in 2012...
I'll try to get 2.4.0 released either tomorrow or some time over the next week. (It'll be good to have the netstandard2.0 target in there too - adding NodaTime in NuGet will be a lot less scary in terms of dependencies.)