scx_layered: Implement layer properties `exclusive` and `min_exec_us` #185

htejun · 2024-03-12T23:00:40Z

exclusive: A task in an exclusive layer occupies the whole core when it executes.
min_exec_us: Minimum execution time a task in a given layer is charged. This can be used to penalize tasks which keeps waking up and going immediately back to sleep without doing much.

Actual implementation isn't done yet.

A task in an exclusive grouped or open layer occupied a whole core - the sibling CPU is kept idle.

which can be used to penalize tasks which wake up very frequently without doing much.

… changes Sometimes io_wait time goes in the wrong direction. Use saturating sub.

GSTAT_TASK_CTX_FREE_FAILED should report total while EXCL_* should report delta pct. Fix them.

dschatzberg

This all looks reasonable to me, just some of the bpf-side logic is a little tricky to parse, see questions inline

dschatzberg · 2024-03-13T14:18:45Z

scheds/rust/scx_layered/src/bpf/main.bpf.c

@@ -399,7 +417,7 @@ s32 BPF_STRUCT_OPS(layered_select_cpu, struct task_struct *p, s32 prev_cpu, u64

 	/* not much to do if bound to a single CPU */
 	if (p->nr_cpus_allowed == 1) {
-		if (!bpf_cpumask_test_cpu(cpu, layer_cpumask))
+		if (!bpf_cpumask_test_cpu(prev_cpu, layer_cpumask))


I'm kind of surprised verifier didn't choke on this previously - isn't this fixing a call with an uninitialized variable?

Yes, it is fixing an uninitialized use bug, and yeah, no idea why BPF didn't bark on it. That said, clang was generating warnings on it. We just didn't see them because we only show clang messages when compilation fails. I don't know how but we really should fix that.

Could we make warnings fatal?

I'm generally not a big fan of making warnings fatal because compilers sometimes aren't great and it makes build flaky for silly reasons like compiler generating spurious warnings in some releases but maybe clang is better in this regard? So, yeah, we can do that but being able to report warnings as warnings would be better.

I've never had that experience myself (at least, when using -Wall; -Weverything is a different story), but I expect that's because I haven't been using bleeding-edge compiler releases like we are for this project. That said, IMO it will end up saving us more time / issues in the long term to treat warnings as failures. Consider that even if a warning is spurious that we'll probably want to address it so that the build output is clean. Otherwise, they might just start to look like noise, no?

gcc more than once introduced warnings which triggered obviously spuriously which then got cleaned up in later point releases and -Werror can push issues which can just be nuisances to actual downstream breakages. It's not a big issue, just not ideal. So, if there's no good way to make the build process dump clang warnings, -Werror is better than the current situation.

dschatzberg · 2024-03-13T14:39:35Z

scheds/rust/scx_layered/src/bpf/main.bpf.c

+		if (sib_cctx && !sib_cctx->maybe_idle) {
+			lstat_inc(LSTAT_EXCL_PREEMPT, layer, cctx);
+			scx_bpf_kick_cpu(sib, SCX_KICK_PREEMPT);
+		}


This whole logic is a bit hard to parse. Some comments could help:

sib_cctx is only set if layer->exclusive so this second condition is really the same as layer->exclusive && ...

!sib_cctx->maybe_idle is still a bit hard for me to follow. I guess the idea here is to kick if there's any chance the sibling could be running something.

Yeah, so testing for sib_cctx is, ignoring the tests which cannot fail but have to be there for verifier, the same as testing exclusive && if the CPU has a sibling.

If this task is exclusive, we want to make sure the sibling CPU is idle. so if the other CPU is not idle, I kick it w/ PREEMPT so that it expires the current task and enters the dispatch path, where it will see that I'm running an exclusive task and return without dispatching anything forcing it into idle.

Will add comments.

scheds/rust/scx_layered/src/bpf/main.bpf.c

dschatzberg · 2024-03-13T14:43:40Z

scheds/rust/scx_layered/src/bpf/main.bpf.c


 	/* scale the execution time by the inverse of the weight and charge */
 	p->scx.dsq_vtime += used * 100 / p->scx.weight;
+	cctx->maybe_idle = true;


Why is this maybe_idle? Just because it's racy to access from other cores?

Yeah, it's not just racy but also inaccurate because it gets asserted between stopping() of the prev task and running() of the next task. It's a good enough but not great approximation.

htejun added 8 commits March 11, 2024 12:07

scx_layered: Add exclusive option to Open and Grouped layers

76cc337

Actual implementation isn't done yet.

scx_layered: Implement exclusive property

0c62b24

A task in an exclusive grouped or open layer occupied a whole core - the sibling CPU is kept idle.

scx_layered: Implement min_exec_us option

be21027

which can be used to penalize tasks which wake up very frequently without doing much.

scx_layered: Better pct formatting when printing stats

342a494

scx_layered: Use saturating sub when reading system stats, other misc…

37006d1

… changes Sometimes io_wait time goes in the wrong direction. Use saturating sub.

scx_layered: warn if omitted stats aren't zero

58cbc53

scx_layered: Fix stat reporting

a642fc8

GSTAT_TASK_CTX_FREE_FAILED should report total while EXCL_* should report delta pct. Fix them.

scx_layered: stat reporting updates

a9457a4

htejun requested a review from Byte-Lab March 12, 2024 23:00

dschatzberg approved these changes Mar 13, 2024

View reviewed changes

scx_layered: Add more comments

60b346c

htejun merged commit 6048992 into main Mar 13, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scx_layered: Implement layer properties `exclusive` and `min_exec_us` #185

scx_layered: Implement layer properties `exclusive` and `min_exec_us` #185

htejun commented Mar 12, 2024

dschatzberg left a comment

dschatzberg Mar 13, 2024

htejun Mar 13, 2024

Byte-Lab Mar 13, 2024

htejun Mar 13, 2024

Byte-Lab Mar 13, 2024

htejun Mar 13, 2024

dschatzberg Mar 13, 2024

htejun Mar 13, 2024

dschatzberg Mar 13, 2024

htejun Mar 13, 2024

scx_layered: Implement layer properties exclusive and min_exec_us #185

scx_layered: Implement layer properties exclusive and min_exec_us #185

Conversation

htejun commented Mar 12, 2024

dschatzberg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scx_layered: Implement layer properties `exclusive` and `min_exec_us` #185

scx_layered: Implement layer properties `exclusive` and `min_exec_us` #185