Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_central: Break dispatch_to_cpu loop when running out of buffer slots #26

Merged
merged 1 commit into from
Dec 12, 2023

Conversation

kkdwivedi
Copy link
Contributor

@kkdwivedi kkdwivedi commented Dec 12, 2023

For the case where many tasks being popped from the central queue cannot be dispatched to the local DSQ of the target CPU, we will keep bouncing them to the fallback DSQ and continue the dispatch_to_cpu loop until we find one which can be dispatch to the local DSQ of the target CPU.

In a contrived case, it might be so that all tasks pin themselves to CPUs != target CPU, and due to their affinity cannot be dispatched to that CPU's local DSQ. If all of them are filling up the central queue, then we will keep looping in the dispatch_to_cpu loop and eventually run out of slots for the dispatch buffer. The nr_mismatched counter will quickly rise and the sched_ext core will notice the error and unload the BPF scheduler.

To remedy this, ensure that we break the dispatch_to_cpu loop when we can no longer perform a dispatch operation. The outer loop in central_dispatch for the central CPU should ensure the loop breaks when we run out of these slots and schedule a self-IPI to the local core, and allow the sched-ext core to consume the dispatch buffer before restarting the dispatch loop again.

A basic way to reproduce this scenario is to do:
taskset -c 0 perf bench sched messaging

The error in the kernel will be:
sched_ext: BPF scheduler "central" errored, disabling
sched_ext: runtime error (dispatch buffer overflow)
bpf_prog_6a473147db3cec67_dispatch_to_cpu+0xc2/0x19a
bpf_prog_c9e51ba75372a829_central_dispatch+0x103/0x1a5

Copy link
Contributor

@htejun htejun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me. A couple minor suggestions.

@@ -142,6 +142,13 @@ static bool dispatch_to_cpu(s32 cpu)
s32 pid;

bpf_repeat(BPF_MAX_LOOPS) {
/* We might run out of dispatch buffer slots if we continue dispatching
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use fully winged comment style to be consistent with other comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, will do.

* to the fallback DSQ, without dispatching to the local DSQ of the
* target CPU. In such a case, break the loop.
*/
if (!scx_bpf_dispatch_nr_slots())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we know that there are slots on entry to the function, would it be better to check nr_slots in the FALLBACK_DSQ block right before continue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but I was thinking of a case where say you have 2 slots remaining, you pull out 2 tasks from the central_q which go to the fallback DSQ, and now you pull a 3rd which can go to the target CPU's local DSQ, but it will error out. So while it is true we have slots on entry it may not be true for the SCX_DSQ_LOCAL_ON case for later iterations.

Does that make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see what you mean, please ignore the above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... so, we have either:

while (true) {
  if (no slot)
    break;
  find task;
  if (cpu doesn't match) {
    dispatch(FALLBACK);
    continue;
  }
  dispatch(target_cpu);
  break;
}

or

while (true) {
  find task;
  if (cpu doesn't match) {
    dispatch(FALLBACK);
    if (more slots)
      continue;
    else
      break;
  }
  dispatch(target_cpu);
  break;
}

If we know that there are slots on entry, the behavior should be identical between the two, right? The only difference is that the former would do an extra check which will always indicate more slots.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you're right, I was not parsing english properly. I've addressed both comments.

@kkdwivedi
Copy link
Contributor Author

I have addressed both comments. I was a bit confused about the second one, please ignore that.

For the case where many tasks being popped from the central queue cannot
be dispatched to the local DSQ of the target CPU, we will keep bouncing
them to the fallback DSQ and continue the dispatch_to_cpu loop until we
find one which can be dispatch to the local DSQ of the target CPU.

In a contrived case, it might be so that all tasks pin themselves to
CPUs != target CPU, and due to their affinity cannot be dispatched to
that CPU's local DSQ. If all of them are filling up the central queue,
then we will keep looping in the dispatch_to_cpu loop and eventually run
out of slots for the dispatch buffer. The nr_mismatched counter will
quickly rise and sched-ext will notice the error and unload the BPF
scheduler.

To remedy this, ensure that we break the dispatch_to_cpu loop when we
can no longer perform a dispatch operation. The outer loop in
central_dispatch for the central CPU should ensure the loop breaks when
we run out of these slots and schedule a self-IPI to the central core,
and allow sched-ext to consume the dispatch buffer before restarting the
dispatch loop again.

A basic way to reproduce this scenario is to do:
taskset -c 0 perf bench sched messaging

The error in the kernel will be:
sched_ext: BPF scheduler "central" errored, disabling
sched_ext: runtime error (dispatch buffer overflow)
bpf_prog_6a473147db3cec67_dispatch_to_cpu+0xc2/0x19a
bpf_prog_c9e51ba75372a829_central_dispatch+0x103/0x1a5

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
@htejun htejun merged commit fbb0164 into sched-ext:main Dec 12, 2023
@kkdwivedi kkdwivedi deleted the central-fix-nr-slots branch December 12, 2023 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants