Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: reset kScheduleRemote when we pull the queue #311

Merged
merged 1 commit into from
Sep 8, 2024
Merged

fix: reset kScheduleRemote when we pull the queue #311

merged 1 commit into from
Sep 8, 2024

Conversation

romange
Copy link
Owner

@romange romange commented Sep 2, 2024

Before, we reset kScheduleRemote bit right after we pushed into the remote queue.
But if this thread is slow, it is possible that the notified fiber will wake up before we reset the bit,
and then it will go sleeping again and then another notifier will try to awake it and will check fail at line
scheduler.cc:238 when it checks that the bit is set.
If we reset it upon waking up, the fiber will reset it before suspending, so this situation can not happen.

In addition, we print the stacktrace when we reach unexpected state.

@@ -253,10 +253,6 @@ void Scheduler::ScheduleFromRemote(FiberInterface* cntx) {
cntx->DEBUG_remote_epoch = remote_epoch_.fetch_add(1, memory_order_relaxed);
remote_ready_queue_.Push(cntx);


// clear the bit after we pushed to the queue.
cntx->flags_.fetch_and(~FiberInterface::kScheduleRemote, memory_order_release);
Copy link
Contributor

@kostasrim kostasrim Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it slightly confusing that if isScheduledRemotely() is true above we clear the bit here whereas for this case we clear the bit when we pop the item in from the remote queue in the remote scheduler. What I had in mind is to clear the bit here and on the remote scheduler side to place back the item at the back of the remote_queue. That way this flow still remains in control of unseting the bit.

IMO I don't see a clear benefit in my suggestion so your change should be good as well.

Before, we reset kScheduleRemote bit right after we pushed into the remote queue.
But if this thread is slow, it is possible that the notified fiber will wake up before we reset the bit,
and then it will go sleeping again and then another notifier will try to awake it and will check fail at line
scheduler.cc:238 when it checks that the bit is set.
If we reset it upon waking up, the fiber will reset it before suspending, so this situation can not happen.

In addition, we print the stacktrace when we reach unexpected state.

Signed-off-by: Roman Gershman <romange@gmail.com>
@romange romange merged commit ff26222 into master Sep 8, 2024
5 checks passed
@romange romange deleted the Pr1 branch September 8, 2024 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants