Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet: Create a top-level container for pods #5671

Closed
vmarmol opened this issue Mar 19, 2015 · 24 comments
Closed

Kubelet: Create a top-level container for pods #5671

vmarmol opened this issue Mar 19, 2015 · 24 comments
Assignees
Labels
area/isolation area/kubelet priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.
Milestone

Comments

@vmarmol
Copy link
Contributor

vmarmol commented Mar 19, 2015

Thanks to @vishh's awesome work, moby/moby#11428 went in with cgroup_parent support! With Docker 1.6 we'll be able to make top-level containers for pods.

@vmarmol vmarmol added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. labels Mar 19, 2015
@vmarmol vmarmol added this to the v1.0 milestone Mar 19, 2015
@dchen1107
Copy link
Member

Great! I assume you are talking about one top-level kubernetes container as parent for all kubernetes containers to protect our daemons, not per pod container here.

@vishh
Copy link
Contributor

vishh commented Mar 19, 2015

A top level cgroup for placing a bounding bucket across all pods mill most
definitely help with node stability.

On Thu, Mar 19, 2015 at 3:01 PM, Dawn Chen notifications@github.com wrote:

Great! I assume you are talking about one top-level kubernetes container
as parent for all kubernetes containers to protect our daemons, not per pod
container here.


Reply to this email directly or view it on GitHub
#5671 (comment)
.

@vmarmol
Copy link
Contributor Author

vmarmol commented Mar 19, 2015

I was referring to the per-pod containers :) I think if we have that we won't need the container for all kubernetes containers.

@dchen1107
Copy link
Member

Then this is not for v1.0. In v1.0 there is no resource limit on pod, it doesn't help on node stability at all.

@dchen1107 dchen1107 removed this from the v1.0 milestone Mar 19, 2015
@dchen1107 dchen1107 added priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Mar 19, 2015
@vishh
Copy link
Contributor

vishh commented Mar 19, 2015 via email

@vmarmol
Copy link
Contributor Author

vmarmol commented Mar 19, 2015

The bounding box is gonna be a nightmare to migrate off of when we do do per pod limits. We can set aside space today for daemons and allocate the other space for the containers.

I don't disagree that this is not v1, but I don't think we need bounding-box for v1 either.

@bgrant0607
Copy link
Member

@vmarmol Would we need to eliminate the bounding box when implementing per-pod limits?

I'd also like a bounding box to protect pods/containers with limits from those without, by putting the latter into a box.

@bgrant0607
Copy link
Member

Ref #147, #168.

@vishh
Copy link
Contributor

vishh commented Mar 20, 2015

I discussed with @vmarmol offline and explained the rationale behind nested
bounding boxes.

On Thu, Mar 19, 2015 at 4:48 PM, Brian Grant notifications@github.com
wrote:

Ref #147 #147,
#168 #168.


Reply to this email directly or view it on GitHub
#5671 (comment)
.

@vmarmol
Copy link
Contributor Author

vmarmol commented Mar 20, 2015

For representing QoS tiers, the bounding boxes make sense. Having a top-level container for all containers doesn't buy is much in terms of protecting the node and only partially aligns to our longer term strategies. I just want us to be careful rolling out a hierarchy plan and having that change in the medium term.

I also spoke with @dchen1107 and we agreed that short term this doesn't buy us much. The model with limit and non-limit (batch) containers makes sense as our first step (the model you mentioned). As far as we know, that is not a v1 work item and there are a few things missing from the scheduler to enable it.

@bgrant0607 no we shouldn't need to remove the bounding boxes for pod-level limits. I think whether those are container or pod-level won't affect the overall plans.

@ConnorDoyle
Copy link
Contributor

Here's one more use case to consider. When Kubernetes is run as an Apache Mesos framework, we're currently spinning up at most one Kubelet per host in the same process as a custom Mesos executor (framework component responsible for interpreting and executing tasks). Mesos already manages a container that encloses the executor process, and that container's resource limits grow and shrink as tasks (pods) come and go. From our perspective, it would be great if we could pass the executor's cgroup to the Kubelet to use as its parent. That way the Kubelet could still manage a hierarchy of containers, and Mesos could provide accurate global resource accounting and isolation. Could this (or something like it) be worked into the design?

@jdef
Copy link
Contributor

jdef commented Apr 9, 2015

xref mesosphere/kubernetes-mesos#68

@vishh
Copy link
Contributor

vishh commented Apr 9, 2015

@ConnorDoyle: IIUC you want the kubelet to be the parent of all the docker containers it runs. As of now we were thinking of running all system daemons, which includes the kubelet, in a separate hierarchy from that of the docker containers. I

@jdef
Copy link
Contributor

jdef commented Apr 9, 2015

@vishh @ConnorDoyle I think what would work really well for the kubernetes-mesos integration is to be able to specify a parent cgroup in the kubelet's configuration, so that the containers launched by kubelet are parented under that cgroup. The current integration blends the kubelet and mesos executor implementations into a single executable: the mesos executor actually configures the Kubelet instance directly:

It would be great if we could pass in some sort of parentCgroup parameter there.

@vmarmol
Copy link
Contributor Author

vmarmol commented Apr 9, 2015

@jdef being able to specify where Kubelet makes top-level containers seems perfectly reasonable (and defaulting to root "/"). That may be the same as that Kubelet or another cgroup. That shouldn't be a problem.

@ConnorDoyle
Copy link
Contributor

@jdef, yes that's what I had in mind but your explanation is clearer.
@vishh, that would be OK as long as everything (k8s daemons, and the top-level container for the pod containers) could be in the executor's cgroup hierarchy.

@vishh
Copy link
Contributor

vishh commented Apr 9, 2015

@jdef @ConnorDoyle: SGTM

@dchen1107 dchen1107 added this to the v1.1 milestone Jul 31, 2015
@dchen1107 dchen1107 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. labels Jul 31, 2015
@jdef
Copy link
Contributor

jdef commented Aug 31, 2015

we've started implementing something along these lines. the currently exposed flags of the kubelet aren't quite sufficient since all kubelet-configured container paths, for example DockerDaemonContainer, aren't exposed. we've hacked around this for now, but it would be nice if future changes to the kubelet kept in mind that the kubelet may not be the only service on the host and that other management/provisioning tooling may want the ultimate say re: host-level configuration changes.

@dchen1107 dchen1107 modified the milestones: v1.2, v1.1 Sep 25, 2015
@bgrant0607 bgrant0607 modified the milestones: v1.2, v1.2-candidate Oct 9, 2015
@vishh vishh self-assigned this May 9, 2016
@bgrant0607
Copy link
Member

cc @derekwaynecarr

@vishh
Copy link
Contributor

vishh commented May 23, 2016

cc @dubstack Your pending proposal on pod level cgroups is tied to this upstream issue.

@vishh vishh modified the milestones: v1.4, next-candidate Jun 21, 2016
k8s-github-robot pushed a commit that referenced this issue Jul 25, 2016
Automatic merge from submit-queue

Kubelet: Pod level Resource Management

This proposal outlines our plan for improving resource management in Kubernetes by having a Cgroup hierarchy with QoS and Pod level Cgroups. 

This is the initial proposal which broadly covers our goals and how we plan to achieve it. At this point we would really appreciate feedback from the community. 

This is tied to the upstream issue #5671. So i would request
@vishh @dchen1107 @bgrant0607  @jdef PTAL.

[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
@goltermann
Copy link
Contributor

@vishh in plan for 1.4, or punt to 1.5?

@vishh vishh modified the milestones: v1.5, v1.4 Sep 7, 2016
@vishh
Copy link
Contributor

vishh commented Sep 7, 2016

@goltermann Yes. This feature has been punted to v1.5

@dims
Copy link
Member

dims commented Nov 15, 2016

looks like we have to punt to 1.6 :)

@dims dims modified the milestones: v1.6, v1.5 Nov 15, 2016
@vishh
Copy link
Contributor

vishh commented Mar 9, 2017

This feature is completed as of v1.6. cc @derekwaynecarr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/isolation area/kubelet priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

8 participants