Memory pressure shouldn't hard evict static pods #38322

bprashanth · 2016-12-07T20:50:03Z

If a node runs out of memory, we kill something. Then if there is memory pressure, we don't admit best effort (https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/eviction/eviction_manager.go#L110). Maybe we can admit static-pods, even if they're best effort?

@kubernetes/sig-node if we can't differentiate static from non-static, maybe we can use the scheduler.alpha.kubernetes.io/critical-pod annotation?

The text was updated successfully, but these errors were encountered:

MrHohn · 2016-12-15T01:15:02Z

This issue is now actually causing customer outage. Should bump it up to P0.

@thockin @bowei

euank · 2016-12-15T01:32:19Z

Is there any reason any static pods are best effort?

Shouldn't, ideally, all static pods have be guaranteed? Rather than including a "scheduler.alpha.kubernetes.io/critical-pod" annotation, I'd rather modify the pod resource request/limits to make them guaranteed.

euank · 2016-12-15T02:13:10Z

Two possible solutions (of others, I'm sure) and why I'd prefer guaranteed

Make static pods 'special'

Having static pods skip all admittance checks is implemented in #38795.

I don't like this because of the special-casing, security implications, and because I think there's probably a more correct solution.

I think this solution will lead to the chance for static pods to thrash if the node's in a messy state.

Make static pods guaranteed

I think that making all static pods guaranteed might be a better solution. It results in a lower total node-allocatable for users, but the reality was users shouldn't have allocated those resources already because the static pods needed them.

The hardest part of this solution is picking the request/limit number for each static pod correctly, but honestly we need to do that anyways.

bprashanth · 2016-12-15T02:41:17Z

I think the issue to discuss big picture solutions is #22212

The suggested oneliner to fix this can go right after https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/eviction/eviction_manager.go#L110:

if notBestEffot || criticalPod {
 return lifecycle.PodAdmitResult{Admit: true}
}

where criticalPod is set based on

annotations:
  node.alpha.kubernetes.io/critical-pod: ''

euank · 2016-12-15T05:52:33Z

@bprashanth I'm not really happy with such an annotation though. The basically oneline change in #38795 is now "don't apply memory-admit failures to static pods", but I'd only see that as a temporary workaround for the proper solutions to this.

Fixes kubernetes#38322 This assumes that all static pods are important to the point that eviction denials do not apply to them.

bprashanth · 2016-12-15T18:47:22Z

Yeah I'm fine doing that too. It's basically the recommeded fix but looking at the source instead of an annotation. I think we'll need the annotation though.

To recap varios other bugs, there are actually multiple issues here:

Admit all static pods, unless there's either a node condition or some other "hard" conflict (eg: hostport)
Set a lower oom score on (some/all) static pods
If a static pod fails admission because of node condition, keep retrying

your pr fixes 1

we still need something like an annotation to fix 2 in the short term (obviously fixable with preemption priority). We could use the existing podSource again (but that means something like fluent running in burstable, would also get a low oom score, so may not be the best tradeoff because we know that things is a pig).

3 involves treating file source different from apiserver source in many more places. Right now, failing admission check marks a pod Failed and the kubelet assumes that some other controller will recreate the pod, and the scheduler will re-assign to a healthy node. The problem of course, is that the kubelet is the controller for static-pods, so really, it needs to continuously reconcile static pods that have failed admission checks and process them as new creates instead of terminated pods.

dchen1107 · 2016-12-15T19:16:08Z

@bprashanth I like your idea using using scheduler.alpha.kubernetes.io/critical-pod annotation here, because:

This is more align with our long term fix: Preemption priority / scheme #22212 to have a generic schema describing how critical and importance a pod is.
Once the annotation is specified for a pod, when the local controller modules(admission at Kubelet, eviction manager, etc.) make the decision, they all can follow the same set of rule. Kubelet also can give critical pods a lower oom score to make them not likely being killed.
More important is that later when we enhance scheduler, rescheduler and daemonset_controller, etc. they all can follow the same rules to avoid the ping-pong issue due to conflict protocol (Kubelet keeps reject some pods, while the remote controller / scheduler keep putting those pods back).
This proposal also decouples static pods from the importance / criticalness of a pod. Not all static pods should be treated equally same. For example, fluentd is static pod today, but it is known havign memory leakage issue. Are we planning to treat fluentd pod same as kube-proxy? We are considering when node has the resource starvation issue, losing some logging message is equally bad as losing the node availability?

@euank I don't like the treat static pods special because only kubelet knows those pods are file-based. There is no API field indicating it, so very likely we run into the issues described above in 3) and 4).

davidopp · 2016-12-15T20:22:12Z

+1, using the critical pod annotation is a good idea and (I think) takes pressure off of us to design a full solution for #22212 right away.

davidopp · 2016-12-15T20:28:39Z

Just to be clear, though -- the "critical pod" annotation was never intended for this purpose. It is intended to indicate which pods rescheduler should work to get scheduled. Rescheduler only operates on pods that go through the normal default scheduler -- but static pods (and DaemonSet pods) never go through the scheduler. So, it may be confusing to use it here, and instead maybe we should use a different annotation. Reusing the annotation here won't break anything but it seems likely to cause confusion. Can you use a new annotation?

bprashanth · 2016-12-15T20:30:56Z

I had suggested a new annotation above (note the "node" prefix). Dawn's argument for resuing is that it future proofs the pod template (eg: moved to a deployment or daemonset, it'll work). I'm currently going to keep it the same.

davidopp · 2016-12-15T20:35:01Z

I can understand the future-proofing argument, but realistically #22212 is almost certainly going to change (hopefully eliminate) that annotation anyway, so the future is really only until we do #22212.

euank · 2016-12-15T20:44:03Z

Thanks for the explaining @dchen1107, having a temporary annotation as a stop-gap for a proper solution makes more sense to me now.

fgrzadkowski · 2016-12-21T08:55:28Z

The issue can happen also other pods, like fluentd, or other logging agents. We should apply a similar "fix" as in #38836 for fluentd.

@crassirostris Can you please patch fluentd configuration to be treated as critical? I think it should be patched to 1.5 together with #38836

crassirostris · 2016-12-21T17:00:19Z

@fgrzadkowski Sure

Automatic merge from submit-queue (batch tested with PRs 39114, 36004) assign -998 as the oom_score_adj for critical pods (e.g. kube-proxy) I also validated this with a testing cluster: Fresh built cluster, and kill kube-proxy pod, etc. ``` root 2660 2643 0 Dec21 ? 00:00:00 /bin/sh -c kube-proxy --master=https://104.198.79.64 --kubeconfig=/var/lib/kube-proxy/kubeconfig --cluster-cidr=10.180.0.0/14 --resource-container="" --v=4 1>>/var/log/kube-proxy.log 2>&1 root 2667 2660 0 Dec21 ? 00:03:14 kube-proxy --master=https://104.198.79.64 --kubeconfig=/var/lib/kube-proxy/kubeconfig --cluster-cidr=10.180.0.0/14 --resource-container= --v=4 # cat /proc/2660/oom_score_adj -998 # cat /proc/2667/oom_score_adj -998 ``` In this pr, I also include a small fix for import cycle issue. The right fix should remove the dependency on qos package from pkg/apis/componentconfig/v1alpha1. But since we plan to cherrypick this pr to both 1.5 and 1.4 (possible), I want touch the source as little as possible. Partial fix: #38322

vishh · 2017-01-12T05:33:56Z

@davidopp 5 in #38322 (comment) is necessary for admitting static pods for example. Let's discuss the eviction semantics in a different issue though as you suggested.

thockin · 2017-01-12T06:11:14Z

To the best of my knowledge nobody has this goal on their 1.6 critical list. @mikedanese or ??? is there a state dump to be had?

…

On Wed, Jan 11, 2017 at 9:32 PM, Vish Kannan ***@***.***> wrote: Requires that they all (including kube-proxy) run as daemonset so that things like resources are modifiable. kube-proxy needs to scale with the size of the cluster. @thockin <https://github.com/thockin> can we prioritize porting kube proxy to a daemonset? Have we scoped out the work involved? Is anyone working on it? As a stop-gap solution, may be we can keep updating the static pod manifest using a daemonset of something. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#38322 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFVgVFgmRR2-gtJx2d9dS3NJ8qP6xIMGks5rRbsJgaJpZM4LHFxk> .

dchen1107 · 2017-01-17T18:04:46Z

I don't like the short-term solution initially, and tried to prioritize the proper fix through #22212 more than a year. But we just have too much to carry as a team, and something failed to be properly prioritized. At the end, I accepted the short-term fixes due to two reasons:

The users are suffering. kube-proxy, fluentd pods were evicted without restart. kube-dns cannot be admitted even it is very critical for the overall health. @thockin
Providing the long term proper fix (Preemption priority / scheme #22212) is prioritized as P0 for 1.6 release. In this case, we can retire those short hacks very soon. @davidopp

Meanwhile, I did some risk analysis on three prs being cherrypicked into the release branch a couple of weeks ago through a private email thread. Now I copied & paste here:

From dawnchen@ to the customers:

The cherrypick pr(s) include the following fixes:

pr/38836: Kubelet admits critical pods including kube-proxy even under memory pressure:
With pr/39059 below being included, kube-proxy should be covered already since it, as the static pod, it wouldn't be evicted completely. But this pr will help other critical kube-system pods (for example: kube-dns) being admitted. The risk of introducing the current fix is when the node is already overcommit with the memory, the node won't reject any critical pod, which potentially makes the overcommit situation worse, and affected the running processes (daemons and other workloads etc.). But the risk should be bounded since the pr only admit the critical pods upon the memory pressure, not out-of-disk condition. If the starvation gets worse, the kubelet eviction manager will be invoked to recover from the situation.

One suggestion from me is that please do not use critical-pod-annotation for your pods, and just leave it only for kube-system pods for now.

pr/39114: Make critical pods have the same oom score as the guaranteed pods
This shouldn't have any risk, instead we should make this even better in a long term.
pr/39059: Stop evicting static pods.
This one does imply a bigger risk theoretically if the static pods are the ones causing the resource starvation issue, especially the disk starvation. But the risk is bounded too since GKE node by default only has two static pods: kube-proxy and fluentd. Both static pods are not using much disk space, and evicting them doesn't help with out-of-disk situation anyway. Fluentd itself has known memory leakage issue. But since 1.4 release, we make it as guaranteed pod with a resource limit associated. Once the memory usage hits the limit, the kernel will oom the container itself. Based on this risk analysis, I rejected pr/38914 as the patch even it more align with our long term solution due to the complexity.

The suggestion I have for this is that please do not introduce new static pods to your node.

Obviously all above are short term workaround, and the long term solution should be addressed through issue/22212. issues/39124 includes two issues: 1) one small part of issue/22212 2) disk management issue. We currently prioritized both as P0 for Q1. Hope I answered your question and concern here. Please let me know if you have more issues.

vishh · 2017-01-18T23:06:49Z

@thockin @davidopp @dchen1107 @erictune @janetkuo and myself discussed this issue further today. Following is the summary:

The primary problems we addressed are as follows:

Static system pods (kube-proxy & fluentd on certain deployments) get evicted and are not scheduled back on nodes
Static system pods are not "guaranteed" to run on a node and can fail feasibility checks.
Existing alpha API for specifying pod priority using pod level annotations causes security issues

In the short term,

Continue using "pod level annotations" for priority.
Feature gate the "Critical Pod Annotation" across the system, tie it to a configurable namespace (or just to kube-system), and disable it by default. GKE will enable this feature.
Static pods can be marked as "critical"
Kubelet will not evict "critical pods" and instead restart them.
Kubelet will guarantee that static pod manifests are processed before API pods are processed. This is to ensure that static pods can fit on a node.
A static pod or a daemonset that gets updated might not be accepted by the node due to resource constraints, even if they are critical.

We decided to have only two levels of priority (Critical & QoS) instead of three (Static, Critical and QoS) for evictions. This is to align with the long term plan.

This short term solution will solve problems 1 & 3 mentioned above.

Now for the long term,

A preemption & priority scheme will be designed that will allow us to deprecate the "Critical" pod level annotation. @davidopp is working on this. This solves problem 3. ETA: Design in v1.6, Alpha (or beta?) in v1.7
As part of the this design, Kubelet's role in preemptions will be finalized based on which static pods can be admitted on to a fully committed node. Solves problem 2. ETA
Critical static pods will use this new priority scheme to guarantee their availability. Solves problem 1. ETA: TBD (based on availability of the feature)
[Closely related] Existing Static pods (kube-proxy & fluentd) will switch to using Daemon Sets once high priority daemon set pods are "guaranteed" to run on a node. As of now "Daemon Set" pods bypass the scheduler and can get "rejected" by the kubelet.

@derekwaynecarr @liggitt @smarterclayton does this satisfy your needs?

smarterclayton · 2017-01-19T02:03:37Z

4 is acceptable - we prohibit regular users from creating daemon sets.

…

On Wed, Jan 18, 2017 at 6:07 PM, Vish Kannan ***@***.***> wrote: @thockin <https://github.com/thockin> @davidopp <https://github.com/davidopp> @dchen1107 <https://github.com/dchen1107> @erictune <https://github.com/erictune> @janetkuo <https://github.com/janetkuo> and myself discussed this issue further today. Following is the summary: The primary problems we addressed are as follows: 1. Static system pods (kube-proxy & fluentd on certain deployments) get evicted and are not scheduled back on nodes 2. Static system pods are not "guaranteed" to run on a node and can fail feasibility checks. 3. Existing alpha API for specifying pod priority using pod level annotations causes security issues In the short term, 1. Continue using "pod level annotations" for priority. 2. Feature gate the "Critical Pod Annotation" across the system, tie it to a configurable namespace (or just to kube-system), and disable it by default. GKE will enable this feature. 3. Static pods can be marked as "critical" 4. Kubelet will not evict "critical pods" and instead restart them. 5. Kubelet will guarantee that static pod manifests are processed before API pods are processed. This is to ensure that static pods can fit on a node. 6. A static pod or a daemonset that gets updated might not be accepted by the node due to resource constraints, even if they are critical. We decided to have only two levels of priority (Critical & QoS) instead of three (Static, Critical and QoS) for evictions. This is to align with the long term plan. This short term solution will solve problems 1 & 3 mentioned above. Now for the long term, 1. A preemption & priority scheme will be designed that will allow us to deprecate the "Critical" pod level annotation. @davidopp <https://github.com/davidopp> is working on this. This solves problem 3. ETA: Design in v1.6, Alpha (or beta?) in v1.7 2. As part of the this design, Kubelet's role in preemptions will be finalized based on which static pods can be admitted on to a fully committed node. Solves problem 2. ETA 3. Critical static pods will use this new priority scheme to guarantee their availability. Solves problem 1. ETA: TBD (based on availability of the feature) 4. [Closely related] Existing Static pods (kube-proxy & fluentd) will switch to using Daemon Sets once high priority daemon set pods are "guaranteed" to run on a node. As of now "Daemon Set" pods bypass the scheduler and can get "rejected" by the kubelet. @derekwaynecarr <https://github.com/derekwaynecarr> @liggitt <https://github.com/liggitt> @smarterclayton <https://github.com/smarterclayton> does this satisfy your needs? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#38322 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p59METCXejO-vP0EN1gi09HtQo44ks5rTpsZgaJpZM4LHFxk> .

derekwaynecarr · 2017-01-19T18:29:14Z

@vishh -- for the near term plan, does a critical pod get a higher oom_score, or do we just let it be restarted as normal? for 1.6, i assume we are ok with critical pods being constrained on their memory usage relative to their qos tier?

vishh · 2017-01-19T19:10:56Z

@derekwaynecarr I was thinking of not making "Critical" a new QoS class, but instead use it just for preemption and eviction decisions. So oom_score will not be impacted. A Critical pod can still get OOM killed by the kernel, but will not be evicted. I think that is a good design because even critical pods can have memory leaks.

dchen1107 · 2017-01-19T19:46:07Z

@vishh Thanks for summarizing the discussion.

@smarterclayton @derekwaynecarr @liggitt #38322 (comment) is aligned the node team roadmap: https://docs.google.com/a/google.com/spreadsheets/d/1-hADEbGEUrW04QP4bVk7xf1CWuqbFU6ulMAH1jqZQrU/edit?usp=sharing (row 25). This is P0 for both sig-scheduling and sig-node team.

liggitt · 2017-01-19T19:49:05Z

Feature gate the "Critical Pod Annotation" across the system, tie it to a configurable namespace (or just to kube-system), and disable it by default. GKE will enable this feature.

that addresses my security concerns, thanks

davidopp · 2017-01-20T10:01:52Z

Feature gate the "Critical Pod Annotation" across the system, tie it to a configurable namespace (or just to kube-system), and disable it by default.

Can you clarify what "feature gating" means here? Does it mean critical pod annotation cannot be added when the feature is disabled, or that it can always be added but will be ignored for the purposes of the eviction policy when the feature is disabled? The second is fine but the first is not because rescheduler uses it. I assume you mean the second but I just want to be sure.

derekwaynecarr · 2017-01-20T18:21:14Z

@davidopp -- i assume the feature gate is not restricting the presence of the annotation, but instead is just a kubelet flag that says if the annotations presence gives special node local decisions. the annotation is free to be used by the rescheduler assuming the rescheduler is enabled (which also appears to default to false).

derekwaynecarr · 2017-01-20T18:22:38Z

@davidopp -- ignore previous comment, i see its on by default. btw, i see rescheduler also restricts the annotation as only being meaningful to a single namespace (which is good). anyway, i assume its the second in your original comment.

vishh · 2017-01-20T18:45:53Z

@derekwaynecarr would having the critical pod annotation be turned on by default OK for OpenShift?

…

On Fri, Jan 20, 2017 at 10:23 AM, Derek Carr ***@***.***> wrote: @davidopp <https://github.com/davidopp> -- ignore previous comment, i see its on by default. btw, i see rescheduler also restricts the annotation as only being meaningful to a single namespace (which is good). anyway, i assume its the second in your original comment. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#38322 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGvIKLZmM23WRUhMzkmkhfNgmKwg9-qgks5rUPuUgaJpZM4LHFxk> .

derekwaynecarr · 2017-01-20T21:17:43Z

@vishh alpha features should not be on by default.

derekwaynecarr · 2017-01-20T21:21:14Z

@vishh - to elaborate, in code, the default must be off. in kube-up deployments, the default can change, and i have no opinion there.

vishh · 2017-01-20T21:30:56Z

Ack! That's what I was thinking. I think @davidopp is concerned because the critical pods annotation for rescheduling has already been exposed to users by default.

…

On Fri, Jan 20, 2017 at 1:21 PM, Derek Carr ***@***.***> wrote: @vishh <https://github.com/vishh> - to elaborate, in code, the default must be off. in kube-up deployments, the default can change, and i have no opinion there. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#38322 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGvIKLCH5yzTd3YXVqgk7NzYCyYJ3O7Lks5rUSVkgaJpZM4LHFxk> .

davidopp · 2017-01-20T23:22:22Z

Right -- it's fine to disable the eviction behavior of the annotation by default. Just don't prevent people from setting the annotation by default, since it is used by a completely unrelated subsystem (the "rescheduler").

vishh · 2017-01-22T23:57:22Z

I was not proposing to "prevent" users from using the pod annotation, instead I was recommending "disabling" any special behavior across kubernetes for that pod annotation. The rescheduler will ignore this annotation if the feature is "disabled", which it will be by default. I assume consuming the pod annotation for rescheduling use cases is also considered problematic.

…

On Fri, Jan 20, 2017 at 3:22 PM, David Oppenheimer ***@***.*** > wrote: Right -- it's fine to disable the eviction behavior of the annotation by default. Just don't prevent people from setting the annotation by default, since it is used by a completely unrelated subsystem (the "rescheduler"). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#38322 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGvIKDYdYTe-lgzVUsg5xAYLmIoAw_jdks5rUUHIgaJpZM4LHFxk> .

smarterclayton · 2017-01-23T03:28:26Z

Yes, the annotation being settable by normal users is the problem. On Sun, Jan 22, 2017 at 6:57 PM, Vish Kannan <notifications@github.com> wrote:

…

I was not proposing to "prevent" users from using the pod annotation, instead I was recommending "disabling" any special behavior across kubernetes for that pod annotation. The rescheduler will ignore this annotation if the feature is "disabled", which it will be by default. I assume consuming the pod annotation for rescheduling use cases is also considered problematic. On Fri, Jan 20, 2017 at 3:22 PM, David Oppenheimer < ***@***.*** > wrote: > Right -- it's fine to disable the eviction behavior of the annotation by > default. Just don't prevent people from setting the annotation by default, > since it is used by a completely unrelated subsystem (the "rescheduler"). > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <https://github.com/kubernetes/kubernetes/issues/ 38322#issuecomment-274206281>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AGvIKDYdYTe- lgzVUsg5xAYLmIoAw_jdks5rUUHIgaJpZM4LHFxk> > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#38322 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_pwCS9mZ_QRBLNd3gwOG3HBnREYTmks5rU-zpgaJpZM4LHFxk> .

davidopp · 2017-01-23T07:06:30Z

So we are saying that it's OK if upgrading to 1.6 breaks this feature for everyone who is currently relying on it to get their DNS, Heapster, etc. scheduled today on GCE, AWS, Azure, on-prem, etc., unless they take the additional action of re-enabling it?

davidopp · 2017-01-23T15:57:43Z

cc/ @piosz @crassirostris

erictune · 2017-01-23T18:21:44Z

It sounds like it would be okay if the behavior is flag-controllable, and off by default in head, but enabled on GKE, and kube-up.sh. Not a great long-term solution, but could bridge the gap until another solution is ready.

dchen1107 · 2017-01-23T23:54:58Z

I am confused here.

I thought we had a design meeting and achieved the agreement with the following action items:

vishnuk@ finishes the feature gate for "Critical Pod Annotation" across the system, and cherrypick it to both 1.4 and 1.5. The feature gate should be off by default, but GKE can carry the burden to re-enable it. Like what @erictune suggested above, we can also enable this by default through kube-up.sh, so that there is no regression introduced for 1.5 users.
dashpole@ (or vishnuk@) work on resolving the rest short-term issues for 1.6 which is documented by vish at Memory pressure shouldn't hard evict static pods #38322 (comment)
- Kubelet will not evict "critical pods" and instead restart them.
- Kubelet will guarantee that static pod manifests are processed before API pods are processed. This is to ensure that static pods can fit on a node. We talked about some corner cases:
  - A static pod or a daemonset that gets updated might not be accepted by the node due to resource constraints, even if they are critical.
  - Updates or additions of static pods after the initial node startup are not "guaranteed" acceptance by Kubelet and this is a known issue /corner case.
davidopp@ is working on the long term design: Preemption priority / scheme #22212, and sig-node, sig-scheduling and sig-apps will be involved with the design since it affects Kubelet, scheduler, rescheduler and daemonset-controller, etc.

Are we change the timeline or revert some of the decision here?

thockin · 2017-01-24T16:25:27Z

Dawn, the part you're missing (I think) is that we should also make the critical annotation disabled or disableable in rescheduler by default.

davidopp · 2017-01-24T21:25:22Z

I think there are a lot of people who run Kubernetes without GKE or kube-up.sh. But I guess we can say the rescheduler is alpha, so it's OK if upgrading disables it... (The documentation doesn't say anything about it being alpha or beta or GA.)

vishh · 2017-01-27T01:39:02Z

Let's move this discussion to #40573

bprashanth added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Dec 7, 2016

bprashanth mentioned this issue Dec 7, 2016

Preemption priority / scheme #22212

Open

thockin added the priority/P0 label Dec 15, 2016

thockin assigned dchen1107 Dec 15, 2016

euank mentioned this issue Dec 15, 2016

kubelet: always admit static pods #38795

Closed

euank added a commit to euank/kubernetes that referenced this issue Dec 15, 2016

kubelet/eviction: always admit static pods

f52673a

Fixes kubernetes#38322 This assumes that all static pods are important to the point that eviction denials do not apply to them.

bprashanth self-assigned this Dec 16, 2016

This was referenced Dec 16, 2016

move kube-proxy into a daemonset #23225

Closed

Admit critical pods in the kubelet #38836

Merged

[wip] Prioritize non-critical pods for eviction #38914

Closed

dchen1107 mentioned this issue Dec 22, 2016

assign -998 as the oom_score_adj for critical pods (e.g. kube-proxy) #39114

Merged

k8s-github-robot closed this as completed in #39114 Dec 22, 2016

crassirostris mentioned this issue Dec 22, 2016

Make fluentd pods critical #39146

Merged

This was referenced Jan 17, 2017

Revert recent changes made to QoS and Eviction #40024

Closed

Revert changes made to eviction and qos #40025

Closed

vishh mentioned this issue Jan 27, 2017

Guaranteed admissions and evictions of "critical pods" and "static pods" #40573

Closed

12 tasks

nailgun mentioned this issue May 9, 2017

configure nodes to reserve some RAM for maintenance nailgun/seedbox#15

Closed

Memory pressure shouldn't hard evict static pods #38322

Memory pressure shouldn't hard evict static pods #38322

Comments

bprashanth commented Dec 7, 2016

MrHohn commented Dec 15, 2016

euank commented Dec 15, 2016

euank commented Dec 15, 2016

Make static pods 'special'

Make static pods guaranteed

bprashanth commented Dec 15, 2016

euank commented Dec 15, 2016 • edited Loading

bprashanth commented Dec 15, 2016

dchen1107 commented Dec 15, 2016

davidopp commented Dec 15, 2016

davidopp commented Dec 15, 2016

bprashanth commented Dec 15, 2016

davidopp commented Dec 15, 2016

euank commented Dec 15, 2016

fgrzadkowski commented Dec 21, 2016

crassirostris commented Dec 21, 2016

vishh commented Jan 12, 2017

thockin commented Jan 12, 2017 via email

dchen1107 commented Jan 17, 2017 • edited Loading

From dawnchen@ to the customers:

vishh commented Jan 18, 2017

smarterclayton commented Jan 19, 2017 via email

derekwaynecarr commented Jan 19, 2017

vishh commented Jan 19, 2017

dchen1107 commented Jan 19, 2017

liggitt commented Jan 19, 2017

davidopp commented Jan 20, 2017 • edited Loading

derekwaynecarr commented Jan 20, 2017

derekwaynecarr commented Jan 20, 2017

vishh commented Jan 20, 2017 via email

derekwaynecarr commented Jan 20, 2017

derekwaynecarr commented Jan 20, 2017

vishh commented Jan 20, 2017 via email

davidopp commented Jan 20, 2017

vishh commented Jan 22, 2017 via email

smarterclayton commented Jan 23, 2017 via email

davidopp commented Jan 23, 2017

davidopp commented Jan 23, 2017

erictune commented Jan 23, 2017

dchen1107 commented Jan 23, 2017 • edited Loading

thockin commented Jan 24, 2017

davidopp commented Jan 24, 2017

vishh commented Jan 27, 2017

euank commented Dec 15, 2016 •

edited

Loading

dchen1107 commented Jan 17, 2017 •

edited

Loading

davidopp commented Jan 20, 2017 •

edited

Loading

dchen1107 commented Jan 23, 2017 •

edited

Loading