-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scx_userland: allocate tasks array based on kernel.pid_max #29
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the fix Andrea. Just left one comment but otherwise it's good to go.
/* | ||
* Maximum amount of tasks enqueued/dispatched between kernel and user-space. | ||
*/ | ||
#define USERLAND_MAX_TASKS 8192 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this macro still used? If not let's just delete it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still used in .bpf.c. There, it's just the max size of the ring buffer and overflow handling seems okay. Maybe it should be renamed for clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep as @htejun says, it's still used in .bpf.c as the max amount of tasks enqueued/dispatched between kernel and userspace, that's why I added the comment to better clarify its usage, but I think it would be nice to also rename it at this point...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gah, right, my bad. Yeah, maybe something like MAX_ENQUEUED_TASKS
? And we can probably just bring it into scx_userland.bpf.c directly. It's also an option to set the max entries value from userspace before the map is loaded, but keeping the macro seems preferable given the fallback logic as you pointed out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's quite an internal parameter, basically it defines how many items can be pushed at most to the enqueued/dispatched maps, so I don't think there's much value to export this to user-space as a configurable option... but I think we can definitely bring it into .bpf.c, since it's used only there at this point. And I like the name MAX_ENQUEUED_TASKS
, it's definitely more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, I meant that you could set the max entries in the map to the value of /proc/sys/kernel/pid_max
(so not exposed to the end user, just set from user space). But yeah I think bringing the macro into the bpf file and statically setting it there is fine / preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and updated.
Setting pid-max entries in the bpf maps would be a bit of a waste IMHO, there's no way a system will have pid-max concurrent tasks running. :)
And if we fail to enqueue a task from in the bpf part the task will just be dispatched immediately. OTOH if we fail to push the task to the dispatched map from the .c we would error and exit, maybe this part can be improved a bit, i.e., re-adding the pid to the vruntime_head list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, sounds like another nice improvement!
Currently the array of enqueued tasks is statically allocated to a fixed size of USERLAND_MAX_TASKS to avoid potential deadlocks that could be introduced by performing dynamic allocations in the enqueue path. However, this also adds a limit on the maximum pid that the scheduler can handle, since the pid is used as the index to access the array. In fact, it is quite easy to trigger the following failure on an average desktop system (making this scheduler pretty much unusable in such scenario): $ sudo scx_userland ... Failed to enqueue task 33258: No such file or directory EXIT: BPF scheduler unregistered Prevent this by using sysctl's kernel.pid_max as the size of the tasks array (and still allocate it all at once during initialization). The downside of this change is that scx_userland may require additional memory to start and in small systems it could even trigger OOMs. For this reason add an explicit message to the command help, suggesting to reduce kernel.pid_max in case of OOM conditions. Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
ea1c182
to
1e9e677
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks a lot @arighi
Currently the array of enqueued tasks is statically allocated to a fixed size of USERLAND_MAX_TASKS to avoid potential deadlocks that could be introduced by performing dynamic allocations in the enqueue path.
However, this also adds a limit on the maximum pid that the scheduler can handle, since the pid is used as the index to access the array.
In fact, it is quite easy to trigger the following failure on an average desktop system (making this scheduler pretty much unusable in such scenario):
$ sudo scx_userland
...
Failed to enqueue task 33258: No such file or directory
EXIT: BPF scheduler unregistered
Prevent this by using sysctl's kernel.pid_max as the size of the tasks array (and still allocate it all at once during initialization).
The downside of this change is that scx_userland may require additional memory to start and in small systems it could even trigger OOMs. For this reason add an explicit message to the command help, suggesting to reduce kernel.pid_max in case of OOM conditions.