-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scx_layered: Implement layer properties exclusive
and min_exec_us
#185
Conversation
Actual implementation isn't done yet.
A task in an exclusive grouped or open layer occupied a whole core - the sibling CPU is kept idle.
which can be used to penalize tasks which wake up very frequently without doing much.
… changes Sometimes io_wait time goes in the wrong direction. Use saturating sub.
GSTAT_TASK_CTX_FREE_FAILED should report total while EXCL_* should report delta pct. Fix them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks reasonable to me, just some of the bpf-side logic is a little tricky to parse, see questions inline
@@ -399,7 +417,7 @@ s32 BPF_STRUCT_OPS(layered_select_cpu, struct task_struct *p, s32 prev_cpu, u64 | |||
|
|||
/* not much to do if bound to a single CPU */ | |||
if (p->nr_cpus_allowed == 1) { | |||
if (!bpf_cpumask_test_cpu(cpu, layer_cpumask)) | |||
if (!bpf_cpumask_test_cpu(prev_cpu, layer_cpumask)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm kind of surprised verifier didn't choke on this previously - isn't this fixing a call with an uninitialized variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is fixing an uninitialized use bug, and yeah, no idea why BPF didn't bark on it. That said, clang was generating warnings on it. We just didn't see them because we only show clang messages when compilation fails. I don't know how but we really should fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make warnings fatal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally not a big fan of making warnings fatal because compilers sometimes aren't great and it makes build flaky for silly reasons like compiler generating spurious warnings in some releases but maybe clang is better in this regard? So, yeah, we can do that but being able to report warnings as warnings would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never had that experience myself (at least, when using -Wall; -Weverything is a different story), but I expect that's because I haven't been using bleeding-edge compiler releases like we are for this project. That said, IMO it will end up saving us more time / issues in the long term to treat warnings as failures. Consider that even if a warning is spurious that we'll probably want to address it so that the build output is clean. Otherwise, they might just start to look like noise, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gcc more than once introduced warnings which triggered obviously spuriously which then got cleaned up in later point releases and -Werror
can push issues which can just be nuisances to actual downstream breakages. It's not a big issue, just not ideal. So, if there's no good way to make the build process dump clang warnings, -Werror
is better than the current situation.
if (sib_cctx && !sib_cctx->maybe_idle) { | ||
lstat_inc(LSTAT_EXCL_PREEMPT, layer, cctx); | ||
scx_bpf_kick_cpu(sib, SCX_KICK_PREEMPT); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole logic is a bit hard to parse. Some comments could help:
sib_cctx
is only set iflayer->exclusive
so this second condition is really the same aslayer->exclusive && ...
!sib_cctx->maybe_idle
is still a bit hard for me to follow. I guess the idea here is to kick if there's any chance the sibling could be running something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Yeah, so testing for
sib_cctx
is, ignoring the tests which cannot fail but have to be there for verifier, the same as testingexclusive && if the CPU has a sibling
. - If this task is exclusive, we want to make sure the sibling CPU is idle. so if the other CPU is not idle, I kick it w/ PREEMPT so that it expires the current task and enters the dispatch path, where it will see that I'm running an exclusive task and return without dispatching anything forcing it into idle.
Will add comments.
|
||
/* scale the execution time by the inverse of the weight and charge */ | ||
p->scx.dsq_vtime += used * 100 / p->scx.weight; | ||
cctx->maybe_idle = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this maybe_idle? Just because it's racy to access from other cores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's not just racy but also inaccurate because it gets asserted between stopping()
of the prev task and running()
of the next task. It's a good enough but not great approximation.
exclusive
: A task in an exclusive layer occupies the whole core when it executes.min_exec_us
: Minimum execution time a task in a given layer is charged. This can be used to penalize tasks which keeps waking up and going immediately back to sleep without doing much.