New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroup: add PIDs cgroup controller support #446

Merged
merged 4 commits into from Jan 12, 2016

Conversation

Projects
None yet
6 participants
@cyphar
Member

cyphar commented Dec 20, 2015

This is a fixed up version of the now-reverted #58. The main issue with that code was that some of the memory cgroup's limits weren't being set at all if you were using systemd. Unfortunately, I seem to be unable to test the systemd portion (and the systemd test cases for memory don't actually test that the limits cause a process to fail).

All other cgroups were unaffected by this bug (as their limits were set using .Set(), and were joined either explicitly or using systemd), regardless of whether they were unsupported, partial supported or fully supported by systemd.

Some testing has revealed that the old setup didn't work either (because getSubsystemPath doesn't appear to work as expected).

/cc @mrunalp @hqhq @vishh @crosbymichael @dqminh

@cyphar cyphar changed the title from Add PIDs cgroup controller support to cgroupAdd PIDs cgroup controller support Dec 20, 2015

@cyphar cyphar changed the title from cgroupAdd PIDs cgroup controller support to cgroup: add PIDs cgroup controller support Dec 20, 2015

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Dec 20, 2015

Member

@mrunalp The main change I made to #58 was to actually explicitly join the cgroups and set memory. In addition, there we a couple of subtle bugs (needed to move the CheckCpushares check). After building Docker with my patches, systemd support appears to work (it doesn't work on master right now).

It should be noted that the "fix" for getting systemd to "work" is to just use the fs driver for basically everything. Unfortunately, there's deeper systemd issues here. At least it all works now.

Member

cyphar commented Dec 20, 2015

@mrunalp The main change I made to #58 was to actually explicitly join the cgroups and set memory. In addition, there we a couple of subtle bugs (needed to move the CheckCpushares check). After building Docker with my patches, systemd support appears to work (it doesn't work on master right now).

It should be noted that the "fix" for getting systemd to "work" is to just use the fs driver for basically everything. Unfortunately, there's deeper systemd issues here. At least it all works now.

@cyphar

This comment has been minimized.

Show comment
Hide comment
@hqhq

This comment has been minimized.

Show comment
Hide comment
@hqhq

hqhq Dec 22, 2015

Contributor

What problems are you saying about systemd cgroup in docker upstream exactly? I'm sure we have some systemd cgroup users and we don't see reports from them. And what's the problem about getSubsystemPath in systemd cgroup? Can you elaborate?

Contributor

hqhq commented Dec 22, 2015

What problems are you saying about systemd cgroup in docker upstream exactly? I'm sure we have some systemd cgroup users and we don't see reports from them. And what's the problem about getSubsystemPath in systemd cgroup? Can you elaborate?

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Dec 22, 2015

Member

@hqhq Well, if you try to use the version on master and set something like --cpu-shares you get errors when the container tries to start. There's also a few issues we've seen with customers where systemd randomly starts moving processes around slices.

Member

cyphar commented Dec 22, 2015

@hqhq Well, if you try to use the version on master and set something like --cpu-shares you get errors when the container tries to start. There's also a few issues we've seen with customers where systemd randomly starts moving processes around slices.

@hqhq

This comment has been minimized.

Show comment
Hide comment
@hqhq

hqhq Dec 23, 2015

Contributor

@cyphar No I don't get any errors when setting --cpu-shares with the latest docker version on master, my box is centos 7.1 with systemd-219, but I do know there might be problems with some very new systemd versions, what systemd version are you using.

BTW, I don't have a box with latest kernel to test this PR out, but over all the codes in this PR looks good, if there are any systemd issues with docker, I can help diagnose.

Contributor

hqhq commented Dec 23, 2015

@cyphar No I don't get any errors when setting --cpu-shares with the latest docker version on master, my box is centos 7.1 with systemd-219, but I do know there might be problems with some very new systemd versions, what systemd version are you using.

BTW, I don't have a box with latest kernel to test this PR out, but over all the codes in this PR looks good, if there are any systemd issues with docker, I can help diagnose.

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Dec 23, 2015

Member

@hqhq I'm testing this on Arch (with a custom-compiled kernel). The systemd problems I'm referring to are not related to this PR (there are a few). I'll open a separate issue (or PR) for these problems if appropriate (a SUSE customer has seen issues with systemd on SLE).

Member

cyphar commented Dec 23, 2015

@hqhq I'm testing this on Arch (with a custom-compiled kernel). The systemd problems I'm referring to are not related to this PR (there are a few). I'll open a separate issue (or PR) for these problems if appropriate (a SUSE customer has seen issues with systemd on SLE).

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Jan 6, 2016

Member

I've rebased (which was annoying, as I just discovered the git revert commits love to mess with git rebase). PTAL, the systemd issues have been resolved and we should merge this soon if we want PIDs cgroup support in Docker 1.10.

/ping @mrunalp @hqhq @vishh @crosbymichael @dqminh

Member

cyphar commented Jan 6, 2016

I've rebased (which was annoying, as I just discovered the git revert commits love to mess with git rebase). PTAL, the systemd issues have been resolved and we should merge this soon if we want PIDs cgroup support in Docker 1.10.

/ping @mrunalp @hqhq @vishh @crosbymichael @dqminh

@mrunalp

This comment has been minimized.

Show comment
Hide comment
@mrunalp

mrunalp Jan 6, 2016

Contributor

Do you have a docker branch with this integrated?

Sent from my iPhone

On Jan 6, 2016, at 5:38 AM, Aleksa Sarai notifications@github.com wrote:

PTAL /cc @mrunalp @hqhq @vishh @crosbymichael @dqminh


Reply to this email directly or view it on GitHub.

Contributor

mrunalp commented Jan 6, 2016

Do you have a docker branch with this integrated?

Sent from my iPhone

On Jan 6, 2016, at 5:38 AM, Aleksa Sarai notifications@github.com wrote:

PTAL /cc @mrunalp @hqhq @vishh @crosbymichael @dqminh


Reply to this email directly or view it on GitHub.

@LK4D4

This comment has been minimized.

Show comment
Hide comment
Contributor

LK4D4 commented Jan 6, 2016

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Jan 6, 2016

Contributor

I tried with docker. It works perfectly with fs cgroups. But seems like systemd cgroups doesn't work at all in master, so I can't test it.

Contributor

LK4D4 commented Jan 6, 2016

I tried with docker. It works perfectly with fs cgroups. But seems like systemd cgroups doesn't work at all in master, so I can't test it.

@mrunalp

This comment has been minimized.

Show comment
Hide comment
@mrunalp

mrunalp Jan 6, 2016

Contributor

@LK4D4 Here is a fix for systemd in docker moby/moby#19149

Contributor

mrunalp commented Jan 6, 2016

@LK4D4 Here is a fix for systemd in docker moby/moby#19149

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Jan 7, 2016

Member

@LK4D4 @mrunalp note that you'll have to re-run hack/vendor.sh, since @jfrazelle hasn't updated her branch yet.

Member

cyphar commented Jan 7, 2016

@LK4D4 @mrunalp note that you'll have to re-run hack/vendor.sh, since @jfrazelle hasn't updated her branch yet.

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Jan 7, 2016

Contributor

I tested definitely with this branch :)

Contributor

LK4D4 commented Jan 7, 2016

I tested definitely with this branch :)

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Jan 7, 2016

Member

Good to hear. 😸

Member

cyphar commented Jan 7, 2016

Good to hear. 😸

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Jan 7, 2016

Contributor

I tested and it works with systemd.

Contributor

LK4D4 commented Jan 7, 2016

I tested and it works with systemd.

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Jan 8, 2016

Contributor

LGTM

Contributor

LK4D4 commented Jan 8, 2016

LGTM

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Jan 11, 2016

Member

Due to some upstream kernel changes introduced in v4.4 (namely, allowing cgroups to track zombies) the tests that assume a small pids.max will work are broken. However, there also appears to be a difference in the functionality (there appears to be a near-constant overhead of about 3. This is probably caused by the v4.4 stuff, but I'm not sure why this overhead isn't resolved almost immediately).

Ultimately, we do the best we can. Anything else is either due to the fact we use Go (which isn't going to change soon) or are potential kernel bugs (gulp).

NOTE: This doesn't actually affect the driver code, just the tests.

/cc @LK4D4 @mrunalp

Member

cyphar commented Jan 11, 2016

Due to some upstream kernel changes introduced in v4.4 (namely, allowing cgroups to track zombies) the tests that assume a small pids.max will work are broken. However, there also appears to be a difference in the functionality (there appears to be a near-constant overhead of about 3. This is probably caused by the v4.4 stuff, but I'm not sure why this overhead isn't resolved almost immediately).

Ultimately, we do the best we can. Anything else is either due to the fact we use Go (which isn't going to change soon) or are potential kernel bugs (gulp).

NOTE: This doesn't actually affect the driver code, just the tests.

/cc @LK4D4 @mrunalp

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Jan 11, 2016

Member

In any case, PTAL so we can merge this soon.

/ping @mrunalp @hqhq @vishh @crosbymichael @dqminh

Member

cyphar commented Jan 11, 2016

In any case, PTAL so we can merge this soon.

/ping @mrunalp @hqhq @vishh @crosbymichael @dqminh

@mrunalp

This comment has been minimized.

Show comment
Hide comment
@mrunalp

mrunalp Jan 11, 2016

Contributor

@cyphar Yes, I am going to test this today. Thanks!

Contributor

mrunalp commented Jan 11, 2016

@cyphar Yes, I am going to test this today. Thanks!

Show outdated Hide outdated libcontainer/integration/exec_test.go
t.Fatalf("expected fork() to succeed with permissive pids limit")
}
// Enforce a restrictive limit limit (shell + 6 * true + 3).

This comment has been minimized.

@mrunalp

mrunalp Jan 11, 2016

Contributor

s/limit limit/limit/

@mrunalp

mrunalp Jan 11, 2016

Contributor

s/limit limit/limit/

Show outdated Hide outdated libcontainer/process_linux.go
// Programmer error.
panic("No error following JSON procError payload.")
default:
return newSystemError(fmt.Errorf("Invalid JSON synchronisation payload from child."))

This comment has been minimized.

@mrunalp

mrunalp Jan 11, 2016

Contributor

Start error string with lower case.

@mrunalp

mrunalp Jan 11, 2016

Contributor

Start error string with lower case.

This comment has been minimized.

@mrunalp

mrunalp Jan 11, 2016

Contributor

Also, no punctuation. We are following this convention https://github.com/golang/go/wiki/CodeReviewComments#error-strings

@mrunalp

mrunalp Jan 11, 2016

Contributor

Also, no punctuation. We are following this convention https://github.com/golang/go/wiki/CodeReviewComments#error-strings

Show outdated Hide outdated libcontainer/process_linux.go
}
}
if !sentRun {
return newSystemError(fmt.Errorf("Couldn't synchronise with container process."))

This comment has been minimized.

@mrunalp

mrunalp Jan 11, 2016

Contributor

Same here.

@mrunalp

mrunalp Jan 11, 2016

Contributor

Same here.

@mrunalp

This comment has been minimized.

Show comment
Hide comment
@mrunalp

mrunalp Jan 11, 2016

Contributor

I tested and it worked fine for me. Besides the nits, LGTM

Contributor

mrunalp commented Jan 11, 2016

I tested and it worked fine for me. Besides the nits, LGTM

cyphar added some commits Dec 14, 2015

libcontainer: cgroups: add pids controller support
Add support for the pids cgroup controller to libcontainer, a recent
feature that is available in Linux 4.3+.

Unfortunately, due to the init process being written in Go, it can spawn
an an unknown number of threads due to blocked syscalls. This results in
the init process being unable to run properly, and thus small pids.max
configs won't work properly.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
libcontainer: cgroups: loudly fail with Set
It is vital to loudly fail when a user attempts to set a cgroup limit
(rather than using the system default). Otherwise the user will assume
they have security they do not actually have. This mirrors the original
Apply() (that would set cgroup configs) semantics.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
libcontainer: set cgroup config late
Due to the fact that the init is implemented in Go (which seemingly
randomly spawns new processes and loves eating memory), most cgroup
configurations are required to have an arbitrary minimum dictated by the
init. This confuses users and makes configuration more annoying than it
should. An example of this is pids.max, where Go spawns multiple
processes that then cause init to violate the pids cgroup constraint
before the container can even start.

Solve this problem by setting the cgroup configurations as late as
possible, to avoid hitting as many of the resources hogged by the Go
init as possible. This has to be done before seccomp rules are applied,
as the parent and child must synchronise in order for the parent to
correctly set the configurations (and writes might be blocked by seccomp).

Signed-off-by: Aleksa Sarai <asarai@suse.com>
libcontainer: cgroups: don't Set in Apply
Apply and Set are two separate operations, and it doesn't make sense to
group the two together (especially considering that the bootstrap
process is added to the cgroup as well). The only exception to this is
the memory cgroup, which requires the configuration to be set before
processes can join.

One of the weird cases to deal with is systemd. Systemd sets some of the
cgroup configuration options, but not all of them. Because memory is a
special case, we need to explicitly set memory in the systemd Apply().
Otherwise, the rest can be safely re-applied in .Set() as usual.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Jan 11, 2016

Member

@mrunalp Nits addressed.

Member

cyphar commented Jan 11, 2016

@mrunalp Nits addressed.

@mrunalp

This comment has been minimized.

Show comment
Hide comment
@mrunalp

mrunalp Jan 12, 2016

Contributor

@cyphar Thanks! LGTM

Contributor

mrunalp commented Jan 12, 2016

@cyphar Thanks! LGTM

mrunalp added a commit that referenced this pull request Jan 12, 2016

Merge pull request #446 from cyphar/18-add-pids-controller
cgroup: add PIDs cgroup controller support

@mrunalp mrunalp merged commit 4c767d7 into opencontainers:master Jan 12, 2016

2 checks passed

docker/dco-signed All commits signed
Details
janky Jenkins build runc-PRs 906 has succeeded
Details

@cyphar cyphar deleted the cyphar:18-add-pids-controller branch Jan 12, 2016

@jimmidyson jimmidyson referenced this pull request Jan 12, 2016

Closed

New tag please #467

@yijia2413

This comment has been minimized.

Show comment
Hide comment

great!!

stefanberger pushed a commit to stefanberger/runc that referenced this pull request Sep 8, 2017

Merge pull request #446 from hqhq/hq_remove_MUST_JSON
Remove one JSON related MUST requirement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment