Fix the race on the timers #240
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Formerly, we were doing a logic like the following: Check the
timer in front of the priority queue of timers. If it is expired,
call the callback and pop it from the queue. However, this logic
is flawed because the timers queue was accessed from multiple threads.
So, it may be the case that, we check the timer in front of the queue,
run its callback, then, some other thread adds another timer to the
queue concurrently and it is put in front of the queue. Now, when the
reactor thread pops the item in front of the queue, it will be the newly
added timer, before its callback being executed.
To solve this, we use a double buffering approach. A thead-safe queue
is used to store newly added timers which can be efficiently modified
on both ends on multiple threads. Then, the reactor thread periodically
pops items from that queue in FIFO order and maintains a min heap
using a list with the help of heappush and heappop functions. Only if
the min heap has some elements, timers are popped from that and executed.
Also, the unneeded
timer_cancelled_cbis field is removed. We wereremoving the timer from the queue after canceling it. That was the
only usage of this field. We do the same with the new approach,
if it is canceled, it will return
Trueoncheck_timercall withoutexecuting its
timer_ended_cband will be removed from the heap.