Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run plugins and compression with I/O idle class #1844

Closed
aieri opened this issue Nov 1, 2019 · 6 comments
Closed

run plugins and compression with I/O idle class #1844

aieri opened this issue Nov 1, 2019 · 6 comments
Milestone

Comments

@aieri
Copy link
Contributor

aieri commented Nov 1, 2019

In at least version 3.6-1ubuntu0.16.04.3 and 3.6-1ubuntu0.18.04.3, neither the sosreport process itself, nor (and more importantly) the compression binary are run with an explicit I/O nice class.
This results in the use of the best-effort class, and can cause serious iowait on hosts with limited I/O capabilities:

$ ionice -p `pgrep xz`
none: prio 0
$ ionice -p `pgrep sosreport`
none: prio 0

I propose any command sosreport runs to be reniced to use the idle class, or at least a low-priority best-effort profile.

@bmr-cymru
Copy link
Member

We generally leave things like this to the control of the administrator - they are in the best position to know what's appropriate for their system (and the urgency of the data collection). There's no reason that users can't control this today with standard tools - like CPU priority and scheduling class, the ionice settings for the commands sos runs should be inherited from the parent process.

We can add defaults and convenience settings of things like this if there is a broad consensus but this is difficult since different distros and users may want to control things in different ways (e.g. ionice vs. using a resource constrained cgroup for sos).

@TurboTurtle
Copy link
Member

I think policy defined settings would be a good approach here.

We could leave the default behavior the same as we have today - I.E. no restrictions or limitations on resource usage. We then add a --low-priority or somesuch flag that causes Policy initialization to do some distro-specific configuration that restricts resources and is done entirely "behind the scenes".

In this scenario, users wouldn't have the direct ability to modify the way sos restricts itself. You either run unrestricted, or run in a policy-specific defined way (E.G. RedHatPolicy could assign sos to a cgroup profile). That way policies can provide a pre-packaged "recommended and (hopefully) tested" way to reduce a sos run's impact on a live system, without forcing the restriction outright.

If you as a user don't like how the policy does it, you still have the option of doing this configuration yourself "outside" of sos (just like today).

@TurboTurtle TurboTurtle added this to the 4.0 milestone Jan 28, 2020
@TurboTurtle TurboTurtle modified the milestones: 4.0, 4.1 Aug 17, 2020
@TurboTurtle
Copy link
Member

Cycling around on this, I'm not sure how realistic this is for us to implement in any meaningful way.

  • For cgroups control, that would require the use of modules that afaik are not packaged outside of pypi - and beyond that sos is understandably hesitant to put runtime dependencies on modules outside the python standard library. For collect we even go through the process of not requiring them for a base installation, but detecting missing dependencies at runtime. Policy-specific dependencies are not something I want to get the project involved with.
  • For ionice this also requires the use of a non-standard module, psutil. Same story here, we don't want to expand the dependency footprint of a "help! my system is broken!" diagnostic tool.
  • There is the resource module, however it only provides limits on CPU time, not CPU usage on a per-core basis or total point-in-time usage. E.G. we can say "sos can't run for more than X seconds" but we can't say "sos can only use 1 cpu core at Y percent of total CPU time on that 1 core".
    • resource can be used for memory limits. But I don't think only setting hard memory limits on ourselves really works out that well. We start generating MemoryError exceptions which then have to be handled as well.

There may be some merit in a --low-priority option that toggles a small set of tunables such as --threads, --log-size, and a resource based memory limit, but it feels...lacking if we aren't able to reliably place a CPU limit on ourselves at the same time (to say nothing of restricting disk utilization).

I'm leaning towards closing this out due to being out of scope for the sos project to implement.

@TurboTurtle
Copy link
Member

I'm going to close this out at this point. The posts above explain the technical challenges in doing something like this for sos, even in a policy-controlled fashion. Beyond that, the typical expectation is for sysadmins to control aspects like nice-ness if there is a desire for non-default behavior.

If a method becomes available to control this easily from a sos-wide perspective that doesn't involve either distribution-specific dependencies or wrapping collections in more layers (e.g. running behind timeout) than we do today, I'm all for it.

@portante
Copy link
Contributor

Why not provide an option for a containerized version of sos, run from within a container with appropriate privileges, but just resource constrained by the podman command limits?

@TurboTurtle
Copy link
Member

It's of course an option for end users to run sos in a container. There's even registry.redhat.io/rhel8/support-tools that's purpose-built for it (podman container runlabel RUN support-tools). But resource constraints are still up to the end user, not something imposed by sos directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants