Proposal: Support custom cgroups #8551

ibuildthecloud · 2014-10-14T05:39:00Z

Containers are mostly the combination of capabilities, namespaces, and cgroups. Docker already has custom capabilities support with --cap-add and --cap-drop. Custom namespace support is halfway there already with --net=* and --ipc=* is being worked on. The last piece of the puzzle is to be able to control is cgroups.

I propose that the cgroup paths be added to the HostConfig such that on start custom cgroup paths can optionally be used instead of the cgroups that Docker would setup. This would allow component outside of docker to control, create, and manage the cgroups but then the Docker container would just join them.

One could find many use cases for this I assume, but initially this feature can be used to better tie together cgroups managed by systemd. The background of this is rooted in a hack (https://github.com/ibuildthecloud/systemd-docker) that I've put together to better managed Docker under systemd. systemd-docker does various things that are useful that better integrates Docker with systemd and most of it should probably stay as a project outside of Docker. One critical piece that makes systemd-docker work today is that it moves the running processes from one cgroup to the service's cgroup. This is what makes it a hack and also not 100% reliable. If Docker could just support the ability to use a custom cgroup, then systemd-docker could become a production worthy stop gap solution until a superior integration between systemd and docker existed natively.

The text was updated successfully, but these errors were encountered:

Ulexus · 2014-10-14T19:41:43Z

+1

jonboulle · 2014-10-14T20:35:35Z

👍

j0hnsmith · 2014-12-01T13:58:33Z

+1

j0hnsmith · 2014-12-01T15:02:00Z

To elaborate on my use case, I have a multiple containers working together to provide a service and I want to limit the resources the service can consume.

I'd like to be able to create a cgroup (manually) then tell docker to run the containers for my service with that cgroup (eg multiple containers using the same cgroup).

Something like

docker run --cgroup my_cgroup ...

hustcat · 2014-12-02T06:37:27Z

+1

bgrant0607 · 2014-12-18T00:26:30Z

More use cases:

One use case is the pod-like scenario (#8781) -- multiple co-deployed containers.

Another is differentiated quality of service. Some workloads need a high degree of predictability, while others just want to use whatever resources are available. We'd like to protect the former from the latter by putting all of the latter into a bucket that is constrained such that it can't interfere with the predictable workload. This same approach can be applied hierarchically in order to support more than 2 QoS tiers. More discussion of this can be found in presentations and documentation about lmctfy:

We'd like to similarly protect Docker and other system agents/daemons from user containers. We've received a number of reports from users who bricked nodes due to using up all the memory.

Not all 3 of these cases necessarily need to be expressed in the same way in the API.

For the pod case, one might be tempted to apply the current pattern of referring to other containers, such as with VolumesFrom and NetworkMode=container:id, using something like CgroupParent=container-id. However, this approach is problematic for a number of reasons. One is due to the coupling of the container lifetime and process lifetime. In the case of a system OOM, for example, such processes can die, even if they use minimal resources, which creates complicated failure modes. Another is the lack of reasonable mechanisms for managing and introspecting groups of related containers.

For differentiated quality of service, I'd like to specify higher-level semantic intent rather than concrete slices or cgroup paths, but there needs to be a general way to pass extra options down to the exec driver, and such mechanisms keep getting shot down, or even removed after being added (e.g., #4833). Alternatively, we'd be happy to make a proposal for first-class support in the API.

Configuration to protect Docker and other system agents could be specified with specified with flags when starting the daemon.

/cc @vishh @rjnagal @thockin @vmarmol @dchen1107

thockin · 2015-01-08T16:39:49Z

+1

rjnagal · 2015-01-08T16:51:58Z

Having a way to expose the parent cgroup to use to create new cgroup under would go a long way in solving the issues @bgrant0607 pointed out. If the container cgroups are not tied directly to where docker daemon runs, it would help a lot in better protecting critical system daemons. libcontainer already accepts parent as a parameter. I think the actual work required to make this happen would be minimal.

vishh · 2015-01-15T20:29:05Z

+1. Ping @crosbymichael

crosbymichael · 2015-01-19T19:01:14Z

What do you think the user facing API should look like? API and flags for solving your issues?

vishh · 2015-01-19T19:48:54Z

@crosbymichael: For many of the use cases mentioned above, adding a --parent_cgroup=/<cgroup_path> flag to docker run is what is needed. With this option set, docker would create container cgroups under the hierarchy mentioned in --parent_cgroup.
To provide differentiated quality of service, this option would let users group low priority containers into a bucket and cap the total amount of resources the low priority containers can consume. This option can be used to place resource restrictions across a Pod. In the case of systemd, the service cgroup hierarchy can be used to group docker containers into a logical systemd cgroup.

crosbymichael · 2015-01-20T01:16:27Z

And that's it? Nothing else required?

thockin · 2015-01-20T02:37:34Z

I think that is pretty much right. What else were tiu expecting to see?
On Jan 19, 2015 5:17 PM, "Michael Crosby" notifications@github.com wrote:

And that's it? Nothing else required?

Reply to this email directly or view it on GitHub
#8551 (comment).

chakri-nelluri · 2015-01-20T06:39:23Z

+1

timothysc · 2015-01-20T14:20:07Z

+1, this simplifies process tracking for cluster managers.

vishh · 2015-01-20T17:35:00Z

@crosbymichael: We will need another flag to alter the oom_score_adj on containers to offer differentiated QOS.

bgrant0607 · 2015-01-20T18:19:00Z

Let's keep the oom_score_adj issue separate -- please file a separate issue, since I don't see one. The cgroup parent alone will solve several problems for us.

tnachen · 2015-01-20T19:00:10Z

+1 as well.
I think one nice to have for all systems integrating with docker, is that the cgroup path for a container is also available via docker inspect.

thockin · 2015-01-20T19:34:23Z

FWIW, we should mention that this will remove the ability to examine the
/docker cgroup and see all containers. I'm OK with that.

On Tue, Jan 20, 2015 at 11:01 AM, Timothy Chen notifications@github.com
wrote:

+1 as well.
I think one nice to have for all systems integrating with docker, is that
the cgroup path for a container is also available via docker inspect,
therefore it's at a set place to look up.

Reply to this email directly or view it on GitHub
#8551 (comment).

vishh · 2015-02-02T16:55:06Z

@crosbymichael: Can we get a +1 for this feature? I can send out a PR soon. This is an important feature that will help improve system reliability a lot for kubernetes.

crosbymichael · 2015-02-02T17:00:13Z

@vishh yes, I will bring it up with the other maintainers today

bytesandwich · 2015-02-20T06:05:32Z

+1

vishh · 2015-02-20T17:54:45Z

Ping @crosbymichael!

On Thu, Feb 19, 2015 at 10:06 PM, Jack notifications@github.com wrote:

+1

—
Reply to this email directly or view it on GitHub
#8551 (comment).

vishh · 2015-02-27T18:39:54Z

I plan to post a PR soon since no concerns have been expressed for this feature.

jessfraz · 2015-02-27T19:03:31Z

I think that is awesome @vishh I am +1 :)

jessfraz · 2015-02-27T19:03:49Z

anything to get rid of systemd cgroups :P

bgrant0607 · 2015-02-27T19:13:06Z

I discussed this with @crosbymichael at the last DGAB meeting, and my understanding is that we have the go-ahead for this.

crosbymichael · 2015-03-11T18:38:27Z

+1 for --cgroup-parent

ConnorDoyle · 2015-03-12T00:53:09Z

+1, this will strengthen the multi-tenancy story when running Kubernetes as a Mesos framework and --cgroup-parent should be sufficient for our use case.

mohitsoni · 2015-03-13T17:32:01Z

+1 for --cgroup-parent

jdef · 2015-03-14T04:41:48Z

+1 for --cgroup_parent

crosbymichael · 2015-03-19T21:42:24Z

merged in #11428

vishh · 2015-03-19T22:11:38Z

Thanks for the quick review everyone :)

thockin · 2015-03-19T22:13:53Z

w00t! This is a good one. Thanks everyone.

On Thu, Mar 19, 2015 at 3:12 PM, Vish Kannan notifications@github.com
wrote:

Thanks for the quick review everyone :)
If the integration tests happen to flaky, ping me and I can fix them. There
should be enough logs now to identify any issue.

On Thu, Mar 19, 2015 at 2:43 PM, Michael Crosby notifications@github.com
wrote:

merged in #11428 #11428

—
Reply to this email directly or view it on GitHub
#8551 (comment).

—
Reply to this email directly or view it on GitHub
#8551 (comment).

vmarmol · 2015-03-19T22:15:16Z

Yay cgroups! :D That was one of the fastest feature merges I've seen.

ConnorDoyle · 2015-03-19T22:40:59Z

👍

bgrant0607 · 2015-03-19T23:40:36Z

Awesome, thanks a lot.

jdef · 2015-03-19T23:44:45Z

+1

On Thu, Mar 19, 2015 at 7:41 PM, Brian Grant notifications@github.com
wrote:

Awesome, thanks a lot.

—
Reply to this email directly or view it on GitHub
#8551 (comment).

James DeFelice
585.241.9488 (voice)
650.649.6071 (fax)

ibuildthecloud mentioned this issue Nov 1, 2014

Proposal: Make Pods (collections of containers) a first order container object. #8781

Open

jdef mentioned this issue Nov 9, 2014

restrict pods to the resource constraints declared in their manifest mesosphere/kubernetes-mesos#68

Closed

bgrant0607 mentioned this issue Dec 17, 2014

Add the possibility with docker to choose the parent slice of containers’ cgroup. #9436

Closed

thockin mentioned this issue Jan 8, 2015

Containers startup throttling kubernetes/kubernetes#3312

Closed

crosbymichael added the area/runtime label Jan 19, 2015

crosbymichael added the Proposal label Feb 6, 2015

jessfraz removed the kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny label Feb 26, 2015

jessfraz added the kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny label Feb 26, 2015

crosbymichael mentioned this issue Mar 16, 2015

feature: add option for changing docker daemon cpu shares #11273

Closed

This was referenced Mar 16, 2015

Adding '--cgroup-parent' option. #11428

Merged

Proposal: Add real-time priorities for processes #10459

Closed

crosbymichael closed this as completed Mar 19, 2015

thaJeztah mentioned this issue Apr 28, 2015

Improve documentation for custom cgroups (--cgroup-parent) #12849

Open

Proposal: Support custom cgroups #8551

Proposal: Support custom cgroups #8551

Comments

ibuildthecloud commented Oct 14, 2014

Ulexus commented Oct 14, 2014

jonboulle commented Oct 14, 2014

j0hnsmith commented Dec 1, 2014

j0hnsmith commented Dec 1, 2014

hustcat commented Dec 2, 2014

bgrant0607 commented Dec 18, 2014

thockin commented Jan 8, 2015

rjnagal commented Jan 8, 2015

vishh commented Jan 15, 2015

crosbymichael commented Jan 19, 2015

vishh commented Jan 19, 2015

crosbymichael commented Jan 20, 2015

thockin commented Jan 20, 2015

chakri-nelluri commented Jan 20, 2015

timothysc commented Jan 20, 2015

vishh commented Jan 20, 2015

bgrant0607 commented Jan 20, 2015

tnachen commented Jan 20, 2015

thockin commented Jan 20, 2015

vishh commented Feb 2, 2015

crosbymichael commented Feb 2, 2015

bytesandwich commented Feb 20, 2015

vishh commented Feb 20, 2015

vishh commented Feb 27, 2015

jessfraz commented Feb 27, 2015

jessfraz commented Feb 27, 2015

bgrant0607 commented Feb 27, 2015

crosbymichael commented Mar 11, 2015

ConnorDoyle commented Mar 12, 2015

mohitsoni commented Mar 13, 2015

jdef commented Mar 14, 2015

crosbymichael commented Mar 19, 2015

vishh commented Mar 19, 2015

thockin commented Mar 19, 2015

vmarmol commented Mar 19, 2015

ConnorDoyle commented Mar 19, 2015

bgrant0607 commented Mar 19, 2015

jdef commented Mar 19, 2015