Skip to content

Conversation

@mdumandag
Copy link
Contributor

Formerly, we were doing a logic like the following: Check the
timer in front of the priority queue of timers. If it is expired,
call the callback and pop it from the queue. However, this logic
is flawed because the timers queue was accessed from multiple threads.
So, it may be the case that, we check the timer in front of the queue,
run its callback, then, some other thread adds another timer to the
queue concurrently and it is put in front of the queue. Now, when the
reactor thread pops the item in front of the queue, it will be the newly
added timer, before its callback being executed.

To solve this, we use a double buffering approach. A thead-safe queue
is used to store newly added timers which can be efficiently modified
on both ends on multiple threads. Then, the reactor thread periodically
pops items from that queue in FIFO order and maintains a min heap
using a list with the help of heappush and heappop functions. Only if
the min heap has some elements, timers are popped from that and executed.

Also, the unneeded timer_cancelled_cb is field is removed. We were
removing the timer from the queue after canceling it. That was the
only usage of this field. We do the same with the new approach,
if it is canceled, it will return True on check_timer call without
executing its timer_ended_cb and will be removed from the heap.

@mdumandag mdumandag added this to the 4.0 milestone Oct 28, 2020
@mdumandag mdumandag self-assigned this Oct 28, 2020
@mdumandag mdumandag force-pushed the timer-race branch 2 times, most recently from 7e800e1 to a3ab7ad Compare October 28, 2020 15:39
Formerly, we were doing a logic like the following: Check the
timer in front of the priority queue of timers. If it is expired,
call the callback and pop it from the queue. However, this logic
is flawed because the timers queue was accessed from multiple threads.
So, it may be the case that, we check the timer in front of the queue,
run its callback, then, some other thread adds another timer to the
queue concurrently and it is put in front of the queue. Now, when the
reactor thread pops the item in front of the queue, it will be the newly
added timer, before its callback being executed.

To solve this, we use a double buffering approach. A thead-safe queue
is used to store newly added timers which can be efficiently modified
on both ends on multiple threads. Then, the reactor thread periodically
pops items from that queue in FIFO order and maintains a min heap
using a list with the help of heappush and heappop functions. Only if
the min heap has some elements, timers are popped from that and executed.

Also, the unneeded `timer_cancelled_cb` is field is removed. We were
removing the timer from the queue after canceling it. That was the
only usage of this field. We do the same with the new approach,
if it is canceled, it will return `True` on `check_timer` call without
executing its `timer_ended_cb` and will be removed from the heap.
Copy link

@puzpuzpuz puzpuzpuz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mdumandag mdumandag merged commit b404158 into hazelcast:master Nov 10, 2020
@mdumandag mdumandag deleted the timer-race branch November 10, 2020 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants