Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Event system can cause deadlock #5372
There's an issue with the event system which can cause what is effectively a deadlock during normal operations. It's not a logical deadlock, it's just that sending events can end up taking forever, which blocks other things. We often send events while holding other locks, which makes it worse.
Specifically, when an event listener no longer polls for events the Log() call will block for 150 ms per non-polling listener. If there are several dead listeners this can add up, and it blocks other Log() calls during this time as well due to needing to hold the lock.
There are a couple of places where we can end up with dead event listeners. The watch aggregator can exit but never unsubscribes. But more seriously, anything that 1) ceases to process events 2) unsubscribes, perhaps as part of a defer, causes this issue. For example, sendIndexes.
The reason for the latter is that once we stop reading events, the Unsubscribe() can take effectively forever. There might be a thousand event senders producing events quite frequently (for example, lots of folders which are scanning on a short interval). These all compete for the mutex which must be held to unsubscribe. As each of these calls can take over a second and there are thousands and thousands of them waiting for the lock, the odds that our Unsubscribe() call actually gets to acquire the lock in reasonable time are slim.
I have a test that reproduces this sad situation.
I see several potential solutions; first of all we must ensure that every subscribe is matched by an unsubscribe. Then;
I think we should do number 1 at minimum, and possibly number 3. In the cases where we expect many events and have to catch 'em all, that's what the buffered subscription is for.
As a motivator, here's a massive backtrace showing the problem (on 0.14.48).