Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scx_layered: Implement layer properties exclusive and min_exec_us #185

Merged
merged 9 commits into from
Mar 13, 2024

Conversation

htejun
Copy link
Contributor

@htejun htejun commented Mar 12, 2024

  • exclusive: A task in an exclusive layer occupies the whole core when it executes.
  • min_exec_us: Minimum execution time a task in a given layer is charged. This can be used to penalize tasks which keeps waking up and going immediately back to sleep without doing much.

A task in an exclusive grouped or open layer occupied a whole core - the
sibling CPU is kept idle.
which can be used to penalize tasks which wake up very frequently without
doing much.
… changes

Sometimes io_wait time goes in the wrong direction. Use saturating sub.
GSTAT_TASK_CTX_FREE_FAILED should report total while EXCL_* should report
delta pct. Fix them.
@htejun htejun requested a review from Byte-Lab March 12, 2024 23:00
Copy link
Contributor

@dschatzberg dschatzberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks reasonable to me, just some of the bpf-side logic is a little tricky to parse, see questions inline

@@ -399,7 +417,7 @@ s32 BPF_STRUCT_OPS(layered_select_cpu, struct task_struct *p, s32 prev_cpu, u64

/* not much to do if bound to a single CPU */
if (p->nr_cpus_allowed == 1) {
if (!bpf_cpumask_test_cpu(cpu, layer_cpumask))
if (!bpf_cpumask_test_cpu(prev_cpu, layer_cpumask))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kind of surprised verifier didn't choke on this previously - isn't this fixing a call with an uninitialized variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is fixing an uninitialized use bug, and yeah, no idea why BPF didn't bark on it. That said, clang was generating warnings on it. We just didn't see them because we only show clang messages when compilation fails. I don't know how but we really should fix that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make warnings fatal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally not a big fan of making warnings fatal because compilers sometimes aren't great and it makes build flaky for silly reasons like compiler generating spurious warnings in some releases but maybe clang is better in this regard? So, yeah, we can do that but being able to report warnings as warnings would be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never had that experience myself (at least, when using -Wall; -Weverything is a different story), but I expect that's because I haven't been using bleeding-edge compiler releases like we are for this project. That said, IMO it will end up saving us more time / issues in the long term to treat warnings as failures. Consider that even if a warning is spurious that we'll probably want to address it so that the build output is clean. Otherwise, they might just start to look like noise, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gcc more than once introduced warnings which triggered obviously spuriously which then got cleaned up in later point releases and -Werror can push issues which can just be nuisances to actual downstream breakages. It's not a big issue, just not ideal. So, if there's no good way to make the build process dump clang warnings, -Werror is better than the current situation.

if (sib_cctx && !sib_cctx->maybe_idle) {
lstat_inc(LSTAT_EXCL_PREEMPT, layer, cctx);
scx_bpf_kick_cpu(sib, SCX_KICK_PREEMPT);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole logic is a bit hard to parse. Some comments could help:

  1. sib_cctx is only set if layer->exclusive so this second condition is really the same as layer->exclusive && ...
  2. !sib_cctx->maybe_idle is still a bit hard for me to follow. I guess the idea here is to kick if there's any chance the sibling could be running something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yeah, so testing for sib_cctx is, ignoring the tests which cannot fail but have to be there for verifier, the same as testing exclusive && if the CPU has a sibling.
  2. If this task is exclusive, we want to make sure the sibling CPU is idle. so if the other CPU is not idle, I kick it w/ PREEMPT so that it expires the current task and enters the dispatch path, where it will see that I'm running an exclusive task and return without dispatching anything forcing it into idle.

Will add comments.

scheds/rust/scx_layered/src/bpf/main.bpf.c Show resolved Hide resolved

/* scale the execution time by the inverse of the weight and charge */
p->scx.dsq_vtime += used * 100 / p->scx.weight;
cctx->maybe_idle = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this maybe_idle? Just because it's racy to access from other cores?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not just racy but also inaccurate because it gets asserted between stopping() of the prev task and running() of the next task. It's a good enough but not great approximation.

@htejun htejun merged commit 6048992 into main Mar 13, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants