scx_rustland: per-task cpumask generation counter #133

arighi · 2024-02-10T10:49:42Z

Introduce a per-task generation counter to check the validity of the cpumask at dispatch time.

The logic is the following:

the cpumask generation number is incremented every time a task calls .set_cpumask()
when a task is enqueued the current generation number is stored in the queued_task_ctx and relayed to the user-space scheduler
the user-space scheduler can decide to dispatch the task on the CPU determined by the BPF layer in .select_cpu(), redirect the task to any other specific CPU, or redirect to the first CPU available (using NO_CPU)
task is then dispatched back to the BPF code along with its cpumask generation counter
at dispatch time the BPF code checks if the generation number is the same and it discards the dispatch attempt if the cpumask is not valid anymore (the task will be automatically re-enqueued by the sched-ext core code, potentially selecting another CPU / cpumask)
if the cpumask is valid, but the CPU selected by the user-space scheduler is invalid (according to the cpumask), the task will be transparently bounced by the BPF code to the shared DSQ (in this way the user-space code can be completely abstracted and dispatches that target invalid CPUs can be automatically fixed by the BPF layer)

This solution can prevent stalls due to dispatches targeting invalid CPUs and it can also avoid redundant dispatch events, making the code more efficient and the cpumask interlocking more reliable.

htejun

LGTM. Left a minor comment. I suppose the next step is making the userspace cpu selection code to consider the allowed cpumask?

htejun · 2024-02-10T16:41:09Z

scheds/rust/scx_rustland/src/bpf/main.bpf.c

+			return;
+
+		/* Read current cpumask generation counter */
+		curr_cpumask_cnt = __sync_fetch_and_add(&tctx->cpumask_cnt, 0);


nit: I don't think __sync op is necessary here, curr_cpumask_cnt = tctx->cpumask should be sufficient. All the involved operations are already fully synchronized.

htejun · 2024-02-10T16:42:34Z

scheds/rust/scx_rustland/src/bpf/main.bpf.c

+	tctx = lookup_task_ctx(p);
+	if (!tctx)
+		return;
+	task->cpumask_cnt = __sync_fetch_and_add(&tctx->cpumask_cnt, 0);


htejun · 2024-02-10T16:42:56Z

scheds/rust/scx_rustland/src/bpf/main.bpf.c

+	tctx = lookup_task_ctx(p);
+	if (!tctx)
+		return;
+	__sync_fetch_and_add(&tctx->cpumask_cnt, 1);


And this can just be tctx->cpumask_cnt++.

Introduce a per-task generation counter to check the validity of the cpumask at dispatch time. The logic is the following: - the cpumask generation number is incremented every time a task calls .set_cpumask() - when a task is enqueued the current generation number is stored in the queued_task_ctx and relayed to the user-space scheduler - the user-space scheduler can decide to dispatch the task on the CPU determined by the BPF layer in .select_cpu(), redirect the task to any other specific CPU, or redirect to the first CPU available (using NO_CPU) - task is then dispatched back to the BPF code along with its cpumask generation counter - at dispatch time the BPF code checks if the generation number is the same and it discards the dispatch attempt if the cpumask is not valid anymore (the task will be automatically re-enqueued by the sched-ext core code, potentially selecting another CPU / cpumask) - if the cpumask is valid, but the CPU selected by the user-space scheduler is invalid (according to the cpumask), the task will be transparently bounced by the BPF code to the shared DSQ (in this way the user-space code can be completely abstracted and dispatches that target invalid CPUs can be automatically fixed by the BPF layer) This solution can prevent stalls due to dispatches targeting invalid CPUs and it can also avoid redundant dispatch events, making the code more efficient and the cpumask interlocking more reliable. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>

arighi · 2024-02-10T17:07:57Z

LGTM. Left a minor comment. I suppose the next step is making the userspace cpu selection code to consider the allowed cpumask?

Updated (you're right, we don't need _sync_fetch*(), the cpumask counter is per task and it's all synchronized already).

And yes, the next step will be to expose the cpumask to the user-space so that the scheduler can make better decisions (and at that point we could even make the auto-bounce to the shared DSQ logic optional with a flag or similar).

Thanks for the review!

htejun approved these changes Feb 10, 2024

View reviewed changes

arighi force-pushed the rustland-cpumask-gen-cnt branch from 7d5efa6 to 61d1ed3 Compare February 10, 2024 17:04

arighi merged commit 7ce0d03 into main Feb 10, 2024
1 check passed

Byte-Lab deleted the rustland-cpumask-gen-cnt branch March 14, 2024 18:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scx_rustland: per-task cpumask generation counter #133

scx_rustland: per-task cpumask generation counter #133

arighi commented Feb 10, 2024

htejun left a comment

htejun Feb 10, 2024

htejun Feb 10, 2024

htejun Feb 10, 2024

arighi commented Feb 10, 2024

scx_rustland: per-task cpumask generation counter #133

scx_rustland: per-task cpumask generation counter #133

Conversation

arighi commented Feb 10, 2024

htejun left a comment

Choose a reason for hiding this comment

htejun Feb 10, 2024

Choose a reason for hiding this comment

htejun Feb 10, 2024

Choose a reason for hiding this comment

htejun Feb 10, 2024

Choose a reason for hiding this comment

arighi commented Feb 10, 2024