Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_rustland: per-task cpumask generation counter #133

Merged
merged 1 commit into from
Feb 10, 2024

Conversation

arighi
Copy link
Collaborator

@arighi arighi commented Feb 10, 2024

Introduce a per-task generation counter to check the validity of the cpumask at dispatch time.

The logic is the following:

  • the cpumask generation number is incremented every time a task calls .set_cpumask()

  • when a task is enqueued the current generation number is stored in the queued_task_ctx and relayed to the user-space scheduler

  • the user-space scheduler can decide to dispatch the task on the CPU determined by the BPF layer in .select_cpu(), redirect the task to any other specific CPU, or redirect to the first CPU available (using NO_CPU)

  • task is then dispatched back to the BPF code along with its cpumask generation counter

  • at dispatch time the BPF code checks if the generation number is the same and it discards the dispatch attempt if the cpumask is not valid anymore (the task will be automatically re-enqueued by the sched-ext core code, potentially selecting another CPU / cpumask)

  • if the cpumask is valid, but the CPU selected by the user-space scheduler is invalid (according to the cpumask), the task will be transparently bounced by the BPF code to the shared DSQ (in this way the user-space code can be completely abstracted and dispatches that target invalid CPUs can be automatically fixed by the BPF layer)

This solution can prevent stalls due to dispatches targeting invalid CPUs and it can also avoid redundant dispatch events, making the code more efficient and the cpumask interlocking more reliable.

Copy link
Contributor

@htejun htejun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left a minor comment. I suppose the next step is making the userspace cpu selection code to consider the allowed cpumask?

return;

/* Read current cpumask generation counter */
curr_cpumask_cnt = __sync_fetch_and_add(&tctx->cpumask_cnt, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think __sync op is necessary here, curr_cpumask_cnt = tctx->cpumask should be sufficient. All the involved operations are already fully synchronized.

tctx = lookup_task_ctx(p);
if (!tctx)
return;
task->cpumask_cnt = __sync_fetch_and_add(&tctx->cpumask_cnt, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

tctx = lookup_task_ctx(p);
if (!tctx)
return;
__sync_fetch_and_add(&tctx->cpumask_cnt, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this can just be tctx->cpumask_cnt++.

Introduce a per-task generation counter to check the validity of the
cpumask at dispatch time.

The logic is the following:

 - the cpumask generation number is incremented every time a task
   calls .set_cpumask()

 - when a task is enqueued the current generation number is stored in
   the queued_task_ctx and relayed to the user-space scheduler

 - the user-space scheduler can decide to dispatch the task on the CPU
   determined by the BPF layer in .select_cpu(), redirect the task to
   any other specific CPU, or redirect to the first CPU available (using
   NO_CPU)

 - task is then dispatched back to the BPF code along with its cpumask
   generation counter

 - at dispatch time the BPF code checks if the generation number is the
   same and it discards the dispatch attempt if the cpumask is not valid
   anymore (the task will be automatically re-enqueued by the sched-ext
   core code, potentially selecting another CPU / cpumask)

 - if the cpumask is valid, but the CPU selected by the user-space
   scheduler is invalid (according to the cpumask), the task will be
   transparently bounced by the BPF code to the shared DSQ (in this way
   the user-space code can be completely abstracted and dispatches that
   target invalid CPUs can be automatically fixed by the BPF layer)

This solution can prevent stalls due to dispatches targeting invalid
CPUs and it can also avoid redundant dispatch events, making the code
more efficient and the cpumask interlocking more reliable.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
@arighi
Copy link
Collaborator Author

arighi commented Feb 10, 2024

LGTM. Left a minor comment. I suppose the next step is making the userspace cpu selection code to consider the allowed cpumask?

Updated (you're right, we don't need _sync_fetch*(), the cpumask counter is per task and it's all synchronized already).

And yes, the next step will be to expose the cpumask to the user-space so that the scheduler can make better decisions (and at that point we could even make the auto-bounce to the shared DSQ logic optional with a flag or similar).

Thanks for the review!

@arighi arighi merged commit 7ce0d03 into main Feb 10, 2024
1 check passed
@Byte-Lab Byte-Lab deleted the rustland-cpumask-gen-cnt branch March 14, 2024 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants