Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting oom_score_adj while launching applications #115

Closed
wking opened this issue Aug 26, 2015 · 6 comments
Closed

Setting oom_score_adj while launching applications #115

wking opened this issue Aug 26, 2015 · 6 comments

Comments

@wking
Copy link
Contributor

wking commented Aug 26, 2015

This spins off a more tightly-scoped version of #114. The oom_score_adj is more of a host-side and/or multi-container-orchestration issue, and less of a bundle issue, which means it probably should be in runtime.json if/once #88 lands. Possible approaches for setting this include:

  1. The runtime writes to /proc/<pid>/oom_score_adj, which would need a config-side setting.
  2. The host injects a pre-start hook to write to /proc/<pid>/oom_score_adj. This would require a hook with sufficient permissions for the write.
  3. The application has a startup phase where it handles this sort of thing before execing the main process. This would require an application have sufficient permissions for the write, although it could drop them after writing and before execing the “real” application.

A number of attributes where you could use (2) currently have explicit, (1)-style configs for via hooks (e.g. setting up networking and creating cgroups and namespaces). I'd guess the balance involves “how easy is it to handle without (1)?” and “how frequently will folks be tweaking this attribute?”, with high-cost or high-frequency attributes being handled via (1). So which way do we think makes the most sense for this particular setting?

Is it easy to handle via (2) or (3)? It seems like (2) would be easy assuming sufficient hook permissions, but (3) is probably too annoying to be worth the trouble.

How frequently do we expect folks will use this? I can't weigh in here, since I haven't set this. And I expect most runtime managers that set this will be doing it automatically, so in that case it's a wash between (1) and (2) for difficulty.

If those assumptions are correct, then I think we should go with (2), since that is the least work on the spec/runtime-implementation side. If nobody chimes in with anti-(2) thoughts in the next few days, I'll merge opencontainers/runc#160 locally see whether I can get it working ;).

@wking
Copy link
Contributor Author

wking commented Sep 6, 2015

I merged opencontainers/runc#160 (at opencontainers/runc@4f7ff04) into runc's master (at opencontainers/runc@0f85e4e), reverted opencontainers/runc@cc232c47 (part of opencontainers/runc#232) to avoid clobbering my adjustment, and wrote a host-side hook to handle the adjustment:

$ cat /tmp/oom-score-adj.sh
#!/bin/sh
ADJ="$1"
#PID=$(jq -r .pid)
PID=$(jq -r .init_process_pid)
echo "${ADJ}" > "/proc/${PID}/oom_score_adj"

The init_process_pid vs. pid in the state JSON is a difference between runC's state as of opencontainers/runc#160 and the spec that landed in #87 (after the initial runC work).

Then I added the hook to my config.json:

$ cat config.json
{
  …
  "hooks": {
    "prestart": [
      {
        "path": "/tmp/oom-score-adj.sh",
        "args": ["oom-score-adj.sh", "+10"]
      }
    ],
    "poststop": null
  },
  …
}

Then launching the container gives the right score:

# runc
/ # cat /proc/1/oom_score
10

So the hook approach ((2) in my original post here) is valid, and not particularly difficult. I propose we document that approach in this spec (for the reasons outlined in my original post here) and roll back opencontainers/runc#232 in runC.

@vishh
Copy link
Contributor

vishh commented Oct 12, 2015

The init process is a runC thing. In my mind, pre-start hooks are launched when the container environment has been established. Whether there is an init process involved or not is an implementation detail.

Do we have other use cases which demand the need for giving access to the init process from a hook?

@wking
Copy link
Contributor Author

wking commented Oct 12, 2015

On Mon, Oct 12, 2015 at 01:18:28PM -0700, Vish Kannan wrote:

The init process is a runC thing.

Regardless of implementation, a runtime will almost certainly need to
call setns(2), unshare(2), pivot_root(2) or some such activity before
execing the user-specified container process. I don't know how you'd
do that without some runtime-specified code being run in the container
process before the user-specified code is executed.

In my mind, pre-start hooks are launched when the container
environment has been established.

Agreed. And the container isn't established until there's a process
in there holding open PID namespaces, mount namespaces, etc., etc.

Do we have other use cases which demand the need for giving access
to the init process from a hook?

What do you mean “giving access”? Just “listing the
container-process's PID in the pre-start state JSON”?

@wking
Copy link
Contributor Author

wking commented Oct 13, 2015

On Mon, Oct 12, 2015 at 01:33:20PM -0700, W. Trevor King wrote:

Mon, Oct 12, 2015 at 01:18:28PM -0700, Vish Kannan:

The init process is a runC thing.

Regardless of implementation, a runtime will almost certainly need
to call setns(2), unshare(2), pivot_root(2) or some such activity
before execing the user-specified container process. I don't know
how you'd do that without some runtime-specified code being run in
the container process before the user-specified code is executed.

@vishh pointed out that the unshare and pivot_root calls can live in a
separate container-side process that exits before the process that
will become the user-specified container process starts 1. You'll
still need runtime-specified setns calls in the process that will
become the user-specified container process, but if you allow for a
missing PID namespace at pre-start time those hooks may land in the
window between the two container processes:

  1. Launch container-side setup process to unshare, pivot_root, etc.
  2. Bind mount those new namespaces somewhere persistent.
  3. Container-side setup process exits.
  4. Run pre-start hooks
  5. Launch container-side process to unshare a PID namespace (if the
    user wanted a new one), configure uid/group mappings, setns,
    setuid, setgid, setgroups, drop caps, and exec user-specified code.

You can get a fair ways toward handling its unshare/setns bits with
util-linux's unshare(1) and nsenter(1). But by the time you add all
the rest of the logic I still think you're outside the realm of what's
easily possible with stock binaries, and we might as well just use the
same container process for (1) and (5).

So I don't think it's a particularly large win to support the “two
separate container process's” runtime implementation. And once you've
got a single container process, your pre-start hooks can rely on
having a PID namespace and a useful PID in their state JSON.

Am I missing something?

wking referenced this issue in duglin/runc Oct 29, 2015
While in there, to show why someone may want it, I also added support for:
runc run cmd args...
runc batch batchFileOfCommands | -

Signed-off-by: Doug Davis <dug@us.ibm.com>
@vbatts
Copy link
Member

vbatts commented Mar 16, 2016

with hooks being in the spec, the setting of this should be supported.
Closing this issue.

@vbatts vbatts closed this as completed Mar 16, 2016
@wking
Copy link
Contributor Author

wking commented Mar 16, 2016

On Wed, Mar 16, 2016 at 10:24:02AM -0700, Vincent Batts wrote:

with hooks being in the spec, the setting of this should be supported.

This issue was “how do we want to support things like oom_score_adj”,
with me arguing for folks to handle it in a hook instead (option 2 in
the topic post) of adding a new config-side setting (option 1) or the
user-configured application (option 3). The bigger picture issue is
which kernel APIs this spec wraps and how thin a wrapper it should be
(see also 1). I think those are still important questions to sort
out, but in this case #222 chose in favor of a new config-side
setting. @vishh had a few comments/links motivating that choice
[2,3,4], but I'm still not clear on why a config-side setting was
chosen over the hook-based approach 5.

Anyway, I'm happy to continue this discussion on the mailing list, or
as needed as new settings are proposed.

 Subject: removal of cgroups from the OCI Linux spec
 Date: Wed, 28 Oct 2015 17:01:59 +0000
 Message-ID: <CAD2oYtO1RMCcUp52w-xXemzDTs+J6t4hS5Mm4mX+uBnVONGDfA@mail.gmail.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants