Fix SCHED_FLAG_KEEP_PARAMS use for sched_runtime#148
Conversation
SCHED_FLAG_KEEP_PARAMS is too coarse-grained for preserving only the sched_runtime value set by system managers since it will also prevent other parameters from being changed. The result is that the SCHED_OTHER nice value can now only be changed when sched_runtime != 0. Instead, when sched_runtime is not requested by the user, set it to the value read by sched_getattr. Signed-off-by: Petre Tudor <petre-ionut.tudor@arm.com>
| perror("sched_getattr: failed to get SCHED_OTHER attributes"); | ||
| exit(EXIT_FAILURE); | ||
| } | ||
| sa_params.sched_runtime = current_sa_params.sched_runtime; |
There was a problem hiding this comment.
You only update sched_runtime but what happen for sched_nice, sched_deadline ...
should we not always call sched_getattr() and then only apply the change
There was a problem hiding this comment.
Yeah, I guess this is the way to do this.
There was a problem hiding this comment.
I also agree, but won't it break a few scripts? At this point there probably is an expectation that when the user does not set a scheduling-related value, it gets a known default from rt-app.
Examples:
- user sets
SCHED_OTHERexpecting a default nice value of 0, but they get some other default. - user sets
SCHED_FIFOas the policy, but not priority, expecting to getSCHED_FIFOwith priority 10 (DEFAULT_THREAD_PRIORITY), but what they get is the system default priority from kernel or systemd. - user sets
SCHED_DEADLINEand a valid deadline triple, but the system sets a priority != 0 andsched_setattr()fails. Now they have to also set the priority to 0 on every deadline task
There was a problem hiding this comment.
I mean something like below.
`--- a/src/rt-app.c
+++ b/src/rt-app.c
@@ -969,6 +969,16 @@ static void __set_thread_sched_other_attrs(thread_data_t *data,
sched_data->runtime);
tid = gettid();
-
/* sched_runtime was not requested - keep the current value */ -
ret = sched_getattr(tid, &sa_params, -
sizeof(struct sched_attr), flags); -
if (ret) { -
log_critical("[%d] sched_getattr returned %d", -
data->ind, ret); -
errno = ret; -
perror("sched_getattr: failed to get SCHED_OTHER attributes"); -
exit(EXIT_FAILURE); -
} sa_params.size = sizeof(struct sched_attr); sa_params.sched_policy = sched_data->policy; sa_params.sched_priority = __sched_priority(data, sched_data);
@@ -982,8 +992,6 @@ static void __set_thread_sched_other_attrs(thread_data_t *data,
if(sched_data->runtime)
sa_params.sched_runtime = sched_data->runtime;
-
else -
sa_params.sched_flags = SCHED_FLAG_KEEP_PARAMS; ret = sched_setattr(tid, &sa_params, flags); if (ret) {
`
Beside this, we probably need to think about a way to describe that we don't want to make any change instead of applying default values. I think that adding the below and setting "default_policy": "SCHED_SAME" in the "global" node should be enough (not compiled, not tested)
diff --git a/src/rt-app_utils.c b/src/rt-app_utils.c
index 209d99176b3b..81d6ab1f228c 100644
--- a/src/rt-app_utils.c
+++ b/src/rt-app_utils.c
@@ -186,6 +186,8 @@ string_to_policy(const char *policy_name, policy_t *policy)
*policy = fifo;
else if (strcmp(policy_name, "SCHED_DEADLINE") == 0)
*policy = deadline;
-
else if (strcmp(policy_name, "SCHED_SAME") == 0) -
*policy = same; else return 1; return 0;
There was a problem hiding this comment.
I think this behaviour should affect the sched_attr we set regardless of which scheduling class the policy belongs to. So for v2 of this PR I propose:
- We introduce a new global parameter, called "defaults" which controls the source of default values and can take two values: "system" and "rt-app". By default it would be set to "rt-app" to preserve existing behaviour.
- Change
sched_setattr(or add a new function) to check for this flag and in the "system" case do what you showed in your comment above, only extended to all individualsched_attrfields rather than just priority/nice and runtime. In horrible pseudocode:
fn sched_setattr_2(..., sched_data_t *sched_data, ...)
{
if defaults == "system" {
sa_params = sched_getattr();
for each $field in struct sched_data_t {
if sched_data.$field is set {
// Need to find values which describe what it means to be "unset".
sa_params.$field = sched_data.$field;
}
}
} else {
// current behaviour: overwrite with union of rt-app defaults and user values
sa_params = sched_data;
}
return sched_setattr(..., sa_params, ...);
}
This shouldn't interfere with policy_t::same as it takes effect before set_thread_param starts applying scheduling parameters.
Does this make sense?
|
Do you now want to fix more than this nice or runtime overwrite in case sched_data doesn't have the appropriate value? I thought a fix like this will cure the issue just fine? 0001-rt-app.c-Allow-to-set-only-nice-or-runtime-value-for.patch Hope you can see the patch? The source code directly in the comments is rendered awfully. |
|
That's what I was proposing. IMO this behaviour should not be specific to fair policies, but extend to RT, deadline and idle as well. This plus a flag to make it more explicit means the user always knows what to expect from these values and we shouldn't break anything. With your fix, if the user explicitly sets priority to 0 they get the system value for nice. Same with the runtime value. Also, since rt-app's default values for nice and runtime are also both 0, it effectively eliminates rt-app defaults from the equation, so now you can only set nice and runtime to 0 if those are the system defaults. I would say this is not what users expect and it's also inconsistent with how the other set_thread_ functions behave. |
|
I should mention my patch also has the same problems, but only affecting the runtime value. Hence why I proposed the approach above. |
|
Right, 'priority == 0' doesn't work as unspecified. So then lets use THREAD_PRIORITY_UNCHANGED and enable it to be routed into __set_thread_sched_other_attrs(). I guess runtime == 0 is fine since it should only be allowed to be set between 0.5 and 100ms. Can you share you patch as a file here or as a patch I can fetch for testing? I attach my new version here (only Fair tasks): IMHO, you can't set nice or slice via sched_setattr for SCHED_IDLE: __sched_setscheduler() fair_policy() only returns true for SCHED_NORMAL (SCHED_EXT) & SCHED_BATCH. 0001-rt-app.c-Allow-to-set-only-nice-or-runtime-value-for.patch |
|
If I understand correctly, with your patch the user should set the prio to If
I was refering to the v1 patch I submitted with this PR, sorry for the confusion. |
|
Can't you see the patch I attached to my last comment? parse_sched_data() sets 'prior_def = THREAD_PRIORITY_UNCHANGED' in case 'policy == same'. My patch just makes sure we route this into__set_thread_sched_other_attrs(). |
Would be easier if you sent your proposal either in this PR or create a new one so we can comment directly. |
|
Yeah, you're right, let me do this. |
SCHED_FLAG_KEEP_PARAMS is too coarse-grained for preserving only the sched_runtime value set by system managers since it will also prevent other parameters from being changed. The result is that the SCHED_OTHER nice value can now only be changed when sched_runtime != 0.
Instead, when sched_runtime is not requested by the user, set it to the value read by sched_getattr.
As @deggeman pointed out on the commit, this is a very narrow fix. AFAIK systemd can also set
sched_policy,sched_nice,sched_priority, theRESET_ON_FORKfield insched_flagsand uclamp values with cgroup v2. And this is not accounting for other things systemd can control like affinity, NUMA etc.Maybe we should revisit the strategy for how we decide default values for scheduling parameters? Should everything behave like "if not set by the user, set it to an inherited value"?