Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Container and pod resource limits #168

Closed
bgrant0607 opened this Issue Jun 19, 2014 · 49 comments

Comments

Owner

bgrant0607 commented Jun 19, 2014

Before we implement QoS tiers (#147), we need to support basic resource limits for containers and pods. All resource values should be integers.

For inspiration, see lmctfy:
https://github.com/google/lmctfy/blob/master/include/lmctfy.proto

Arguably we should start with pods first, to at least provide isolation between pods. However, that would require the ability to start Docker containers within cgroups. The support we need for individual containers already exists.

We should allow both minimum and maximum resource values to be provided, as lmctfy does. But let's not reuse lmctfy's limit and max_limit terminology. I like "requested" (amount scheduler will use for placement) and "limit" (hard limit beyond which the pod/container is throttled or killed).

Even without limit enforcement, the scheduler could use resource information for placement decisions.

@jbeda jbeda added the enhancement label Jun 19, 2014

This was referenced Jun 20, 2014

Member

timothysc commented Jul 8, 2014

+1.

Owner

bgrant0607 commented Jul 11, 2014

We have cpu and memory in the container manifest:
// Optional: Defaults to unlimited.
Memory int yaml:"memory,omitempty" json:"memory,omitempty"
// Optional: Defaults to unlimited.
CPU int yaml:"cpu,omitempty" json:"cpu,omitempty"

However, AFAICT, we don't do anything with them. Besides, I think we want something more similar to lmctfy's API (request, limit, qos for each resource).

Another consideration: We could make it fairly easy to add new resources. Kubelet needs to understand each individual resource's characteristics, for isolation, QoS, overcommitment, etc. OTOH, the scheduler could deal with resources entirely abstractly. It could get resources and their capacities from the machines. Similarly, we'd need to make it possible to request abstract resources in the container/pod manifest.

Owner

thockin commented Jul 11, 2014

What we described internally was that "common" resources like CPU, memory,
disk, etc were described as first-class things. Other resources are
handled essentially as opaque counters. E.g. a node says "I have 5
resources with ID 12345", a client says "I need 2 resources with ID 12345".
The scheduler maps them.

On Fri, Jul 11, 2014 at 2:04 PM, bgrant0607 notifications@github.com
wrote:

We have cpu and memory in the container manifest:
// Optional: Defaults to unlimited.
Memory int yaml:"memory,omitempty" json:"memory,omitempty"
// Optional: Defaults to unlimited.
CPU int yaml:"cpu,omitempty" json:"cpu,omitempty"

However, AFAICT, we don't do anything with them. Besides, I think we want
something more similar to lmctfy's API (request, limit, qos for each
resource).

Another consideration: We could make it fairly easy to add new resources.
Kubelet needs to understand each individual resource's characteristics, for
isolation, QoS, overcommitment, etc. OTOH, the scheduler could deal with
resources entirely abstractly. It could get resources and their capacities
from the machines. Similarly, we'd need to make it possible to request
abstract resources in the container/pod manifest.

Reply to this email directly or view it on GitHub
GoogleCloudPlatform#168 (comment)
.

Owner

erictune commented Jul 15, 2014

Consider that the resource types and units used for pod/container requests could also be used for describing how to subdivide cluster resources (see GoogleCloudPlatform#442 ). For example, if team A is limited to using 10GB RAM at the cluster level, then team A can run 10 pods x 1GB RAM; or 2 pods x 5GB per pod; or some combination, etc.

+1 to all of this. Mesos has a very similar model, with the scheduler/allocator able to work with any custom resource, but the slave/containerizer needs to know enough details to map it to an isolator. This would also be the appropriate separation for requested resource vs. resource limits.

@bgrant0607 bgrant0607 added this to the v1.0 milestone Aug 27, 2014

@brendandburns brendandburns modified the milestone: 0.7, v1.0 Sep 24, 2014

@bgrant0607 bgrant0607 modified the milestone: v0.8, v0.7 Sep 26, 2014

@bgrant0607 bgrant0607 added the area/api label Oct 2, 2014

Owner

bgrant0607 commented Oct 2, 2014

/cc @johnwilkes @davidopp @rjnagal @smarterclayton @brendandburns @thockin

The resource model doc has been created. We should align our API with it. v1beta3 leaves resource requests unchanged, though the ResourceList type was added in order to represent node capacity. We could either add the new fields in a backwards-compatible way, or replace the existing Container Memory and CPU fields in v1beta3 -- if we prefer to do the latter, we should add this issue to #1519.

I propose that we add an optional ResourceSpec struct containing optional Request and Limit ResourceList fields to both PodSpec and Container.

@smarterclayton smarterclayton referenced this issue Oct 2, 2014

Closed

Implement v1beta3 api #1519

16 of 20 tasks complete
Owner

bgrant0607 commented Oct 2, 2014

Clarification: The separation of desired-state fields into a ResourceSpec struct was deliberate, conforming to the careful separation of desired and current state in v1beta3. Usage-related fields would go into a ResourceStatus struct, as would effective settings, such as soft or hard container limits. @johnwilkes agreed this made sense. At some point, we should clarify this in resources.md.

Owner

thockin commented Oct 2, 2014

I don't think we want pod-level resources yet, or if we do then we accept
EITHER pod resources OR container resources, but never both on a single
pod. Not yet.

On Thu, Oct 2, 2014 at 2:56 PM, bgrant0607 notifications@github.com wrote:

/cc @johnwilkes https://github.com/johnwilkes @davidopp
https://github.com/davidopp @rjnagal https://github.com/rjnagal
@smarterclayton https://github.com/smarterclayton @brendandburns
https://github.com/brendandburns @thockin https://github.com/thockin

The resource model doc
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/resources.md
has been created. We should align our API with it. v1beta3
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/api/v1beta3/types.go
leaves resource requests unchanged, though the ResourceList type was
added in order to represent node capacity. We could either add the new
fields in a backwards-compatible way, or replace the existing Container
Memory and CPU fields in v1beta3 -- if we prefer to do the latter, we
should add this issue to #1519
GoogleCloudPlatform#1519.

I propose that we add an optional ResourceSpec struct containing optional
Request and Limit ResourceList fields to both PodSpec and Container.

Reply to this email directly or view it on GitHub
GoogleCloudPlatform#168 (comment)
.

Owner

bgrant0607 commented Oct 2, 2014

Fair enough. We can't support pod limits until libcontainer and Docker do, so I'd be fine with omitting that for now.

Owner

bgrant0607 commented Dec 3, 2014

P1 to implement the model described by resources.md in v1beta3.

Member

timothysc commented Dec 3, 2014

I'm spinning back in again, and I read through the ResourceSpec but I did have some questions and comments.

  1. There does not appear to be a feedback mechanism from when a job was run. e.g. request+limit, but usage is not there. This is kind of a concern, b/c users are notoriously bad at this, ref: (http://www.industry-academia.org/download/2014-asplos-quasar-Stanford-paper.pdf) e.g. - request,limit,used.
  2. quantization and fragmentation. With quantization users could request odd ball values which then could lead to packing issues, which then leads to fragmentation, which then leads to the dark side ;-)
    e.g. - request cpu: 3.14125... memory: "314125Mi"
    quantization would force boundaries, which could be defined. Memory rounds to 256MB intervals cpu rounds to whole units, etc.
Owner

erictune commented Dec 5, 2014

We have considerable success with unquantized values. The problem does
come up with odd aspect ratios but this can be address in a number of ways
without quantization.
On Dec 3, 2014 1:32 PM, "Timothy St. Clair" notifications@github.com
wrote:

I'm spinning back in again, and I read through the ResourceSpec but I did
have some questions and comments.

  1. There does not appear to be a feedback mechanism from when a job
    was run. e.g. request+limit, but usage is not there. This is kind of a
    concern, b/c users are notoriously bad at this, ref: (
    http://www.industry-academia.org/download/2014-asplos-quasar-Stanford-paper.pdf
    )

e.g. - request,limit,used.

  1. quantization and fragmentation.
    1. With quantization users could request odd ball values which then
      could lead to packing issues, which then leads to fragmentation, which then
      leads to the dark side ;-) e.g. - request cpu: 3.14125... memory:
      "314125Mi" quantization would force boundaries, which could be defined.
      Memory rounds to 256MB intervals cpu rounds to whole units, etc.


Reply to this email directly or view it on GitHub
GoogleCloudPlatform#168 (comment)
.

Owner

bgrant0607 commented Dec 5, 2014

@timothysc Welcome back!

The parts of Kubernetes that (should) deal with resource specification, cpu normalization, scheduling, resource monitoring, feedback, prediction, auto-scaling, isolation, aggressor and hot-spot mitigation, rescheduling/migration, oversubscription, differentiated quality of service, etc. are very rudimentary to non-existent right now. A lot more work needs to be done, to say the least.

Usage was pushed to the bottom of the doc because it was a more complex problem and requires a separate data path, not because it's unnecessary or unimportant. Other demand signals are important, too (e.g., cpu load, latency to acquire cpu, paging, OOMs).

Yes, many/most users are bad at estimating resources, and many/most don't even try. How could we deal with that? There's a wide spectrum of choices.

We generally break down the problem of whether a pod will fit on a node into 2 problems:

  1. How to determine the available capacity of a node
  2. How to determine how much capacity a pod will consume

We had an intern work on worst-fit scheduling based on usage from cAdvisor (#274, #471). That work should be completed.

Christina and Christos have presented about Quasar here (more papers at http://web.stanford.edu/~cdel/Publications.html). Quasar doesn't use any "reservation-based" approach. I think the main challenge is to figure out how to do it all online, without explicit training runs, and without requiring users to specify performance targets, because they're also bad at that, and it may require application instrumentation. If you want to try it, go for it. I'd be interested in seeing how well it worked.

That said, one can go a long way with the "reservation-based" approach, and I recommend starting with that. It's straightforward for users to understand, can start simple, and can be made better and better incrementally/iteratively. For instance, specified resource requests could be used as hints for how much capacity pods will consume, in the absence of history/predictions.

We're not at the point where we're trying to push the envelope on utilization. In fact, our goal is almost the opposite right now: make apps run reasonably well.

It's also the case that many other factors will affect the scheduling decision: failure-domain spreading, availability of non-fungible resources, administrative/architectural/software constraints, sole tenancy, ... Many of these factors affect whether a particular workload or scenario will work acceptably or not, and may be more urgent to address than modest utilization improvements.

/cc @davidopp @rjnagal

Member

rjnagal commented Dec 5, 2014

Our goal with scheduling right now is to make sure that scheduler doesn't make unreasonable decisions that prevent apps from starting or running. Fragmentation and utilization are not on radar right now.

The approach we are leaning towards is to treat jobs with specified limits differently than the one with no limits. If a job has limits, we would stick to those limits and not try to guess or override - these limits won't be hints. For jobs with no limits, we would start by scheduling them with 'large-enough' default limits and react to observed usage/failures.

Right now, jobs with no limits can take over the whole node and kill it. Our first few fixes are towards making the node more resilient to an unlimited container's resource usage.

Contributor

smarterclayton commented Dec 8, 2014

I think because this is additive it does not block #1519 (we can add it afterwards).

Owner

bgrant0607 commented Jan 10, 2015

I'd really like to get the resource spec into v1beta3.
/cc @vmarmol

@vishh vishh self-assigned this Jan 28, 2015

@goltermann goltermann removed this from the v0.8 milestone Feb 6, 2015

@bgrant0607 bgrant0607 removed this from the v0.8 milestone Feb 6, 2015

@davidopp davidopp added the sig/node label Feb 8, 2015

Owner

bgrant0607 commented Feb 28, 2015

The remaining thing we're doing in the near term is taking advantage of the proposed Docker parent cgroup feature to permit bounding pod resources.

Owner

bgrant0607 commented Mar 19, 2015

--cgroup_parent is in: docker/docker#11428

@thockin thockin pushed a commit to thockin/kubernetes that referenced this issue Jun 3, 2015

@miekg miekg Suppress duplicate CNAMEs
Fixes issue #168
Duplicate CNAMEs like:

    a.ipaddr.skydns.test.   3600    IN  A   172.16.1.1
    b.ipaddr.skydns.test.   3600    IN  A   172.16.1.2
    mx.skydns.test. 3600    IN  CNAME   a.ipaddr.skydns.test.
    mx.skydns.test. 3600    IN  CNAME   b.ipaddr.skydns.test.

are not allowed in the DNS and should be suppressed. Right now we just
allow one in the non-round-robin case the latter one (b.ipaddr.. in
this case). If round-robin is enabled we just flip a coin on wether to
include the "new" CNAME are not. The A/AAAA record that is not
referenced is still included in the answer, but this should not case any
harm, and should be ignored by the client.
f384738
Owner

erictune commented Jun 30, 2015

Docker appear to support a --cpu-quota flag now. Should we set that where it is supported? This brings the behavior of limit for CPU closer to memory. https://docs.docker.com/reference/run/#runtime-constraints-on-resources

Owner

davidopp commented Jun 30, 2015

Would using that mean people couldn't burst over their requested (limit) CPU (at least over non-short timescales)? I'm not sure that's desirable. IMO it is better to distribute up 100% of the CPU in proportion to what people have requested (limit) rather than limit people to what they requested.

Contributor

smarterclayton commented Jun 30, 2015

That matches our experience - and how we run today (fair, but allow
bursting when it's available).

On Jun 30, 2015, at 3:39 PM, David Oppenheimer notifications@github.com
wrote:

Would using that mean people couldn't burst over their requested (limit)
CPU (at least over non-short timescales)? I'm not sure that's desirable.
IMO it is better to distribute up 100% of the CPU in proportion to what
people have requested (limit) rather than limit people to what they
requested.


Reply to this email directly or view it on GitHub
GoogleCloudPlatform#168 (comment)
.

Owner

erictune commented Jun 30, 2015

@davidopp It's comforting to see that our positions remain unchanged and opposite after all these years. 😉

Owner

davidopp commented Jun 30, 2015

Heh. One thing we could consider is making it a cluster-level or even pod-level configuration option. I don't think this is a question where one answer is obviously wrong, I just think one answer is slightly more useful to people than the other. But I can definitely imagine that some people would want the behavior you like, and it doesn't seem hard to make it an option.

@davidopp davidopp added the team/master label Jul 4, 2015

Member

derekwaynecarr commented Jul 23, 2015

I am interested in also allowing cluster operators to enable --cpu-quota to provide a ceiling to the maximum amount of CPU a given container/pod can consume before its actively throttled, so a +1 to @erictune suggestion to use it where its supported.

I also agree that not all operators will want to use that instead of --cpu-shares so I agree with @davidopp on that point as well.

At minimum, it should be a choice of the cluster operator. We have some scenarios where its definitely more important to not use relative CPU shares just to provide a more consistent end-user experience, and we have other scenarios where we want to use as much as is available.

I also agree that we have scenarios where we want to expose it as an end-user option so a pod configurable field is also appropriate. @davidopp when you suggest this, I assume you would still just only have Limits in the resource model, but it would be a flag to say how those limits are converted on the Pod.Spec? If so, we would also want some measure of admission control to prohibit a bad flag choice if the cluster operator configured to not support the option.

I think this is a near-term concern for us as part of general overcommit, so I am interested in picking up this discussion and getting others opinions on what we would or would not expose as flag to cluster or in end-user facing API.

Member

derekwaynecarr commented Jul 23, 2015

Is there a thought about a Burst field in addition to Limits that would allow you to exceed beyond your Limit if there is excess capacity, but not to 100% of the available node cpu?

Owner

bgrant0607 commented Jul 23, 2015

I don't want to be excessively explicit in this area, because we can gradually improve enforcement over time, by monitoring and throttling excessive usage, monitoring CPI to detect and kill (move) aggressors and/or scale usage, allowing some amount of bursting, etc.

Owner

davidopp commented Jul 23, 2015

I think it's sufficient to just have a flag that says whether the specified limit is a hard cap or a soft cap.

How multiple pods (or containers or whatever) are arbitrated, e.g. whether one process is allowed to starve another and if so under what circumstances, seems like it falls into the QoS proposal. But I think being able to tell the system to cap you at your limit (or not) cam be considered a separate feature, since it makes sense even when you're the only process on a machine.

Owner

bgrant0607 commented Jul 23, 2015

I do not want to add a knob for hard/soft cap. The hard-cap option would be seldom used and is not intent-oriented, thereby impacting our ability to deliver any performance SLO. Options:
a) Use cpu limit to mean hard cap. Most pods/containers would likely specify only request.
b) Perform no hard capping by default. Gradually improve enforcement and isolation over time. We'd do this even in the case of (a).

Owner

erictune commented Jul 23, 2015

Regarding (a):
Under the limit/request proposal for QoS in #147, limit > request is a different QoS than limit == request.
How would request set and limit unset be interpreted in that case? Would we end up with different QoS for cpu vs memory?

Contributor

smarterclayton commented Jul 26, 2015

If we use limit to mean hard cap, we can still vertical autosize / default when a user specifies request. As Derek notes, we want to offer predictable behavior when pods run on multiple heterogenous workloads for end users and admins (which shares can't guarantee), which negatively impacts utilization in some cases but offers a predictable experience for admins.

Do we have the tools to distinguish qos properly when an autosizer is present? Should the autosizer know enough about qos to reflect the appropriate use?

Owner

davidopp commented Jul 26, 2015

@erictune My understanding is that request set (say to value "R") and limit unset would be interpreted the way Borg handles memory limit = R with allow_overlimit_mem, and CPU limit = R.

Member

timothysc commented Jul 28, 2015

a) Use cpu limit to mean hard cap. Most pods/containers would likely specify only request.

agreed, that only hard limits as an option.

b) Perform no hard capping by default. Gradually improve enforcement and isolation over time. We'd do this even in the case of (a).

disagree, not hard capping will affect SLO.

I do believe what's missing in this conversation is the notion of history in proper sizing. Right now there is no job-history service to yield best-guess on actual usage.

Gentlemen,

I'm running Monte Carlo on google cloud cluster now, and to emulate batch scheduling I had to set CPU limit to some strange values.

@jlowdermilk jlowdermilk pushed a commit to jlowdermilk/kubernetes that referenced this issue Oct 27, 2015

@eparis eparis Merge pull request #168 from eparis/more-info
Display github e2e queue in submit queue web page
ae57f9d
Owner

vishh commented Dec 28, 2015

FYI: Docker now supports updates to cgroups #15078

@vishh vishh pushed a commit to vishh/kubernetes that referenced this issue Apr 6, 2016

@vmarmol vmarmol Merge pull request #168 from rjnagal/master
Factor out data comparator for storage tests.
494c63c
Member

andyxning commented Aug 21, 2016

Allowing memory limit over-commiting may cause unpredicatable process killing by triggering kernel OOM killers.

I have run a program which allocate 50GB memory in a pod whose memory limit is 118GB on a node with 64GB. when the program is running for several seconds, it is oom killed and i can get the oom killer log in /var/log/syslog.

Hard limits for CPU are very important for our video transcoding pods we run on Google Container Engine. We need to have nodes with lots of cores for speed but also don't want a single pod greedily using up all the cores. It would be ideal to set their limit at 3/4 of the total nodes CPU.

We can currently do this for scheduling with requests so we don't put two transcoders on a single node but the lack of hard limits mean that when the pod is running it uses all the cores even with limits set. This had led us to having two clusters, one especially for transcoding large media and the other for small media and the rest of our services.

Owner

thockin commented Sep 11, 2016

I thought we used shares for "request" and quota for "limit" thereby
providing true hard limits. Did I mis-comprehend?

On Sat, Sep 10, 2016 at 6:05 PM, Montana Flynn notifications@github.com
wrote:

Hard limits for CPU are very important for our video transcoding pods we
run on Google Container Engine. We need to have nodes with lots of cores
for speed but also don't want a single pod greedily using up all the cores.
It would be ideal to set their limit at 3/4 of the total nodes CPU.

We can currently do this for scheduling with requests so we don't put two
transcoders on a single node but the lack of hard limits mean that when the
pod is running it uses all the cores even with limits set. This had led
us to having two clusters, one especially for transcoding large media and
the other for small media and the rest of our services.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#168 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVDzTg30uN-jKpxxsJMU9x61_z6KVks5qo1PegaJpZM4CFzkL
.

Contributor

smarterclayton commented Sep 12, 2016

montanaflynn commented Sep 12, 2016 edited

It seems hard limits came with v1.2 based on the changelog. I remember when I first started with kubernetes there was a warning saying that CPU limits were not enforced. Maybe it was that my host OS that didn't support it. Looking at the compute resources documentation it looks like kubernetes does support hard limits by default now.

CPU hardcapping will be enabled by default for containers with CPU limit set, if supported by the kernel. You should either adjust your CPU limit, or set CPU request only, if you want to avoid hardcapping. If the kernel does not support CPU Quota, NodeStatus will contain a warning indicating that CPU Limits cannot be enforced.

Owner

thockin commented Sep 12, 2016

Note that CPU hardlimits can be surprising. All it guarantees is that you
can use X core-seconds per wall-second. Consider a 16 core machine, and a
pod that has an 8 core limit. If your app is multi-threaded or
multi-process, and the number of executable threads/processes is larger
than 8, you could use up all 8 cores of your limit in less than 1 wall
second. If you used all 16 cores for 0.5 seconds, you would leave your pod
ineligible to run for 0.5 seconds (that's a long time!), giving you
terrible tail latency.

Now, in reality the time slice is smaller, but it is still in the tens or
hundreds of milliseconds. If you're not careful, you really could find
yourself with unexpected latency blips of 50 or 100 milliseconds or more.

On Sun, Sep 11, 2016 at 9:54 PM, Montana Flynn notifications@github.com
wrote:

It seems hard limits came with v1.2 based on the changelog
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG.md/#action-required-11.
I remember when I first started with kubernetes there was a warning saying
that CPU limits were not enforced. Maybe it was that my host OS that didn't
support it. Looking at the [compute resources documentation] it looks like
kubernetes does support hard limits by default now.

CPU hardcapping will be enabled by default for containers with CPU limit
set, if supported by the kernel. You should either adjust your CPU limit,
or set CPU request only, if you want to avoid hardcapping. If the kernel
does not support CPU Quota, NodeStatus will contain a warning indicating
that CPU Limits cannot be enforced.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#168 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVCtWesObV0ElJqI9oeeeu1XUwL8Fks5qpNsUgaJpZM4CFzkL
.

Member

timothysc commented Sep 12, 2016

If it's a hard constraint to not tolerate the blips, then you're likely looking for cpu affinity or cpu-sets. xref: #10570

I'm on google cloud's container engine and found the warning I referenced above is still shown while running master and nodes kubernetes version 1.3.5:

The warning is displayed with kubectl describe nodes isWARNING: CPU hardcapping unsupported.

$ kubectl describe nodes
Name:           gke-cluster-1-default-pool-777adf16-an5j
Labels:         beta.kubernetes.io/arch=amd64
            beta.kubernetes.io/instance-type=n1-highcpu-8
            beta.kubernetes.io/os=linux
            cloud.google.com/gke-nodepool=default-pool
            failure-domain.beta.kubernetes.io/region=us-central1
            failure-domain.beta.kubernetes.io/zone=us-central1-b
            kubernetes.io/hostname=gke-cluster-1-default-pool-777adf16-an5j
Taints:         <none>
CreationTimestamp:  Wed, 12 Sep 2016 08:14:45 -0700
Phase:
Conditions:
  Type          Status  LastHeartbeatTime           LastTransitionTime          Reason              Message
  ----          ------  -----------------           ------------------          ------              -------
  NetworkUnavailable    False   Mon, 01 Jan 0001 00:00:00 +0000     Wed, 07 Sep 2016 18:15:58 -0700     RouteCreated            RouteController created a route
  OutOfDisk         False   Mon, 12 Sep 2016 14:15:13 -0700     Wed, 07 Sep 2016 18:14:45 -0700     KubeletHasSufficientDisk    kubelet has sufficient disk space available
  MemoryPressure    False   Mon, 12 Sep 2016 14:15:13 -0700     Wed, 07 Sep 2016 18:14:45 -0700     KubeletHasSufficientMemory  kubelet has sufficient memory available
  Ready         True    Mon, 12 Sep 2016 14:15:13 -0700     Wed, 07 Sep 2016 18:15:21 -0700     KubeletReady            kubelet is posting ready status. WARNING: CPU hardcapping unsupported
Owner

vishh commented Sep 12, 2016

@montanaflynn On Google Container Engine can you switch to GCI as the image type. You can upgrade your node-pool to GCI by setting --image-type=gci or pass that flag while creating a new cluster.
GCI is the new version of the existing debian 7 based base image on GKE. CPU limits are supported there.

@vishh where / how could I set --image-type=gci for an existing cluster?

Owner

vishh commented Sep 12, 2016

@montanaflynn Assuming you have only the default node-pool, run gcloud container clusters upgrade <your_cluster_name> --image-type gci --node-pool default-pool.
This change is disruptive since it restarts existing nodes.
Another option is to create a new node pool for your cluster that is using GCI and then slowly turn down the default node pool. - gcloud container node-pools create --cluster <your_cluster_name> --image-type gci

Thanks! Will container engine be using that image by default in the future?

Owner

vishh commented Sep 12, 2016

Yes. That might happen as early as v1.4 on GKE.

Member

timothysc commented Dec 7, 2016

I think we should move to close this issue, the root topic has been addressed but there are multiple side-threads that are on this issue where I believe they would be better served on other issues.

@vishh thoughts?

Member

derekwaynecarr commented Dec 7, 2016

@timothysc timothysc closed this Dec 8, 2016

@metadave metadave pushed a commit to metadave/kubernetes that referenced this issue Feb 22, 2017

@prydonius prydonius Merge pull request #168 from mgoodness/spartakus
Spartakus - static container name
5c096f5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment