Fix the race on the timers #240

mdumandag · 2020-10-28T14:59:50Z

Formerly, we were doing a logic like the following: Check the
timer in front of the priority queue of timers. If it is expired,
call the callback and pop it from the queue. However, this logic
is flawed because the timers queue was accessed from multiple threads.
So, it may be the case that, we check the timer in front of the queue,
run its callback, then, some other thread adds another timer to the
queue concurrently and it is put in front of the queue. Now, when the
reactor thread pops the item in front of the queue, it will be the newly
added timer, before its callback being executed.

To solve this, we use a double buffering approach. A thead-safe queue
is used to store newly added timers which can be efficiently modified
on both ends on multiple threads. Then, the reactor thread periodically
pops items from that queue in FIFO order and maintains a min heap
using a list with the help of heappush and heappop functions. Only if
the min heap has some elements, timers are popped from that and executed.

Also, the unneeded timer_cancelled_cb is field is removed. We were
removing the timer from the queue after canceling it. That was the
only usage of this field. We do the same with the new approach,
if it is canceled, it will return True on check_timer call without
executing its timer_ended_cb and will be removed from the heap.

Formerly, we were doing a logic like the following: Check the timer in front of the priority queue of timers. If it is expired, call the callback and pop it from the queue. However, this logic is flawed because the timers queue was accessed from multiple threads. So, it may be the case that, we check the timer in front of the queue, run its callback, then, some other thread adds another timer to the queue concurrently and it is put in front of the queue. Now, when the reactor thread pops the item in front of the queue, it will be the newly added timer, before its callback being executed. To solve this, we use a double buffering approach. A thead-safe queue is used to store newly added timers which can be efficiently modified on both ends on multiple threads. Then, the reactor thread periodically pops items from that queue in FIFO order and maintains a min heap using a list with the help of heappush and heappop functions. Only if the min heap has some elements, timers are popped from that and executed. Also, the unneeded `timer_cancelled_cb` is field is removed. We were removing the timer from the queue after canceling it. That was the only usage of this field. We do the same with the new approach, if it is canceled, it will return `True` on `check_timer` call without executing its `timer_ended_cb` and will be removed from the heap.

puzpuzpuz

LGTM

hazelcast/reactor.py

mdumandag added Type: Defect Source: Internal labels Oct 28, 2020

mdumandag added this to the 4.0 milestone Oct 28, 2020

mdumandag self-assigned this Oct 28, 2020

mdumandag force-pushed the timer-race branch 2 times, most recently from 7e800e1 to a3ab7ad Compare October 28, 2020 15:39

mdumandag force-pushed the timer-race branch from a3ab7ad to 67157cf Compare November 10, 2020 07:30

mdumandag force-pushed the timer-race branch from 67157cf to 6186e09 Compare November 10, 2020 12:18

puzpuzpuz approved these changes Nov 10, 2020

View reviewed changes

hazelcast/reactor.py Outdated Show resolved Hide resolved

fix the typo

dc46e66

mdumandag merged commit b404158 into hazelcast:master Nov 10, 2020

mdumandag deleted the timer-race branch November 10, 2020 14:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix the race on the timers #240

Fix the race on the timers #240

Uh oh!

mdumandag commented Oct 28, 2020

Uh oh!

puzpuzpuz left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix the race on the timers #240

Fix the race on the timers #240

Uh oh!

Conversation

mdumandag commented Oct 28, 2020

Uh oh!

puzpuzpuz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants