Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kube-dns should not be setting CPU limits #33222

Closed
vishh opened this Issue Sep 21, 2016 · 47 comments

Comments

Projects
None yet
6 participants
@vishh
Copy link
Member

vishh commented Sep 21, 2016

CPU limits are enforced on most distros excepting Google's debian based containerVM. The current CPU limits for kube-dns are not based on empirical data ( quoting @bprashanth ).
The kube-dns pod is already in the Burstable class. I propose removing CPU limits altogether until we profile kube-dns in more detail.
This issue will impact users.

@vishh vishh added this to the v1.4 milestone Sep 21, 2016

@vishh vishh self-assigned this Sep 21, 2016

@bprashanth

This comment has been minimized.

Copy link
Member

bprashanth commented Sep 21, 2016

It's based on some empirical data, just not very good data because we keep running into issues. I think removing the limits should be fine, but this isn't a fire right now, so it probably won't make it into 1.4.0 as a cherrypick unless @kubernetes/sig-network really wants to relax it.

@vishh

This comment has been minimized.

Copy link
Member Author

vishh commented Sep 21, 2016

This issue will not be detected by our existing tests. What defines a fire?

@bprashanth

This comment has been minimized.

Copy link
Member

bprashanth commented Sep 21, 2016

hmm, something that is causing problems for existing customers with todays deployment/os etc?

@bprashanth

This comment has been minimized.

Copy link
Member

bprashanth commented Sep 21, 2016

or will, if they get on the 1.4 release

@vishh

This comment has been minimized.

Copy link
Member Author

vishh commented Sep 21, 2016

There have been GKE customer issues in the past around kube dns being slow
on GCI based ContainerVM which has CPU limits enabled.

On Wed, Sep 21, 2016 at 3:46 PM, Prashanth B notifications@github.com
wrote:

or will, if they get on the 1.4 release


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#33222 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGvIKCV0rG55lmGS7yYZobGGBZYAVzVRks5qsbPegaJpZM4KDWOm
.

@bprashanth

This comment has been minimized.

Copy link
Member

bprashanth commented Sep 21, 2016

sure we haven't prioritized it till now for 1.4, i don't see a sudden reason to do so

@bprashanth

This comment has been minimized.

Copy link
Member

bprashanth commented Sep 21, 2016

when i say 1.4 i meant 1.4.0

@bprashanth

This comment has been minimized.

Copy link
Member

bprashanth commented Sep 21, 2016

also i think the slowness was more because of cpu requests being low (even on gce), per #33027. Though as described, it really can't hurt to alleviate the limits, especially if we were never honoring them. Do you have a bug for that I can look over to understand the risk/impact?

@vishh

This comment has been minimized.

Copy link
Member Author

vishh commented Sep 21, 2016

I can share the internal bug offline.

On Wed, Sep 21, 2016 at 3:59 PM, Prashanth B notifications@github.com
wrote:

also i think the slowness was more because of cpu requests being low (even
on gce), per #33027
#33027. Though as
described, it really can't hurt to alleviate the limits, especially if
we were never honoring them. Do you have a bug for that I can look over to
understand the risk/impact?


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#33222 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGvIKD-QGIV_eS7hBg09KGACBUi4Itmuks5qsbbqgaJpZM4KDWOm
.

@bprashanth

This comment has been minimized.

Copy link
Member

bprashanth commented Sep 21, 2016

oh, so it's still honored on other platforms, then it's still a risk right?

@vishh

This comment has been minimized.

Copy link
Member Author

vishh commented Sep 21, 2016

Yes. CPU limits are honored on all other distros other than debian based
container VM.

On Wed, Sep 21, 2016 at 4:15 PM, Prashanth B notifications@github.com
wrote:

oh, so it's still honored on other platforms, then it's still a risk
right?


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#33222 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGvIKMH2ZIWA8Mc_t23JCuiizX1vu0Onks5qsbqcgaJpZM4KDWOm
.

@bprashanth

This comment has been minimized.

Copy link
Member

bprashanth commented Sep 21, 2016

Alright, lets assess the real risk/benefit here, there are currently 3 dns containers:

  • kubedns: guaranteed cpu, burstable ram
  • healthz: guaranteed cpu/ram
  • dnsmasq; guaranteed cpu/ram

you're proposing that we lift cpu limits on all of them. The impact is:

  1. Debian based: suffer from stringent enforcement of cpu limits. On these platforms DNS lookups can simply fail because the limits are throttling the container.
  2. Non debian: moves all containers out of guaranteed cpu.

I'm trying to quantify the tradeoff here.

  • Do all debian based platforms suffer?
  • By how much is the stringent enforcement (or, please describe this problem so I can understand it better)?
  • What happens to !Debian platforms that are running other cpu hogs on the same node, because now the DNS containers are no longer guaranteed?

The case I'm really trying to understand is:

  1. If a user has a bunch of cpu hogs all in burstable, in the !Debian case, does this mean DNS will suffer?
  2. If kubedns goes wild on cpu because of a skydns bug, does that mean all !guaranteed pods on all platforms will suffer?

I understand they'll all get at least what they requested in terms of cpu, but I'm asking about the impact to apps that might be running on allocated cpu that's say 2 x requested. All this analysis only because we're so close to the releases and we've had enough surprises in the past.

@bprashanth

This comment has been minimized.

Copy link
Member

bprashanth commented Sep 22, 2016

Spoke irl, the impact is to all non debian systems. They could potentially get throttled on cpu, only debian won't because it ignores cpu limits.

Simply setting any container cpu or memory requests > limits puts the pod in burstable QoS, so kube-dns is already only burstable and the change doesn't degrade its QoS.

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Sep 22, 2016

I think we DO set the CPU limit, so more modern distros will throttle CPU.
We should disable that.

On Wed, Sep 21, 2016 at 7:06 PM, Prashanth B notifications@github.com
wrote:

Spoke irl, the impact is to all non debian systems. They could
potentially get throttled on cpu, only debian won't because it ignores cpu
limits.

Simply setting any container cpu or memory requests > limits puts the pod
in burstable QoS, so kube-dns is already only burstable and the change
doesn't degrade its QoS.


You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
#33222 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVNnVz15OaobsR8_AFvIlwJCc156yks5qseKcgaJpZM4KDWOm
.

@bprashanth

This comment has been minimized.

Copy link
Member

bprashanth commented Sep 22, 2016

the only reason we set it in the first place was for qos, but apparently there was a misunderstanding because the pod is already burstable on memory. the only reason it isn't infinite is because I thought qos degraded by the amount limits are > requests, but apparently that is also not true.

There really is very little motivation to set cpu limits on anything important, apparently.

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Sep 22, 2016

The primary reason to set CPU limits is determinism, e.g. benchmarks. The
Borg default is not to limit CPU. Request is guaranteed.

On Wed, Sep 21, 2016 at 9:24 PM, Prashanth B notifications@github.com
wrote:

the only reason we set it in the first place was for qos, but apparently
there was a misunderstanding because the pod is already burstable on
memory. the only reason it isn't infinite is because I thought qos degraded
by the amount limits are > requests, but apparently that is also not
true.

There really is very little motivation to set cpu limits on anything
important, apparently.


You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
#33222 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVP-N6yC1517dqpmvI0keim93OtF2ks5qsgMKgaJpZM4KDWOm
.

k8s-github-robot pushed a commit that referenced this issue Sep 22, 2016

Kubernetes Submit Queue
Merge pull request #33227 from vishh/remove-dns-limits
Automatic merge from submit-queue

Remove cpu limits for dns pod to avoid CPU starvation

The current limits are not based on usage profiles
Fixes #33222
@pendoragon

This comment has been minimized.

Copy link
Contributor

pendoragon commented Sep 30, 2016

Don't know if this is the right place to ask but I was going through the QoS docs and had this (wrong?) impression that anything important should fall into the Guaranteed bucket. But then as @bprashanth said, if something is really important, there is little motivation to set cpu limit since the requested cpu is always guaranteed. Now I'm wondering what kind of applications should be classified as Guaranteed ? And why it is called Guaranteed?

@vishh

This comment has been minimized.

Copy link
Member Author

vishh commented Sep 30, 2016

If an app is sensitive to resource availability & performance, and doesn't
want to ever be starved of any compute resources, it should be in the
Guaranteed class.
If an app falls out of the Guaranteed class, then it can get evicted at
any time once it exceeds its requested amount of compute resources.

On Fri, Sep 30, 2016 at 2:38 AM, Pendoragon notifications@github.com
wrote:

Don't know if this is the right place to ask but I was going through the
QoS docs and had this (wrong?) impression that anything important should
fall into the Guaranteed bucket. But then as @bprashanth
https://github.com/bprashanth said, if something is really important,
there is little motivation to set cpu limit since the requested cpu is
always guaranteed. Now I'm wondering what kind of applications should be
classified as Guaranteed ? And why it is called Guaranteed?


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#33222 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGvIKInp7TWCHiVSl7u57929d49oDzWfks5qvNh9gaJpZM4KDWOm
.

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Sep 30, 2016

Vish,

It's important in these discussions to distinguish the behavior wrt
compressible and incompressible resources. If you go over your memory
request, your OOM-probability and eviction-probability goes up in the face
of contention. If you go over your CPU request and there is contention,
you should just get throttled back to request. Right?

On Fri, Sep 30, 2016 at 2:35 PM, Vish Kannan notifications@github.com
wrote:

If an app is sensitive to resource availability & performance, and doesn't
want to ever be starved of any compute resources, it should be in the
Guaranteed class.
If an app falls out of the Guaranteed class, then it can get evicted at
any time once it exceeds its requested amount of compute resources.

On Fri, Sep 30, 2016 at 2:38 AM, Pendoragon notifications@github.com
wrote:

Don't know if this is the right place to ask but I was going through the
QoS docs and had this (wrong?) impression that anything important should
fall into the Guaranteed bucket. But then as @bprashanth
https://github.com/bprashanth said, if something is really important,
there is little motivation to set cpu limit since the requested cpu is
always guaranteed. Now I'm wondering what kind of applications should be
classified as Guaranteed ? And why it is called Guaranteed?


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<https://github.com/kubernetes/kubernetes/issues/
33222#issuecomment-250702655>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/
AGvIKInp7TWCHiVSl7u57929d49oDzWfks5qvNh9gaJpZM4KDWOm>
.


You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
#33222 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVMqb-hOIbrJNJKGSGpUPvCFKXQyzks5qvYCigaJpZM4KDWOm
.

@vishh

This comment has been minimized.

Copy link
Member Author

vishh commented Sep 30, 2016

@thockin Yeah. I assume you are describing the behavior of Burstable pods. As for Guaranteed, they are (mostly) shielded from evictions and have to worry about CPU throttling only if the limit is lower than what is required for the containers.

@vishh

This comment has been minimized.

Copy link
Member Author

vishh commented Sep 30, 2016

As for this specific issue, we'd like to keep kubedns in the Guaranteed class ideally. To do that though, we either need vertical scaling of compute resources or detailed profiling of resource requirements for kubedns across different cluster sizes and performance requirements (QPS)

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Sep 30, 2016

Vish, I think there's a disconnect, or maybe I am misunderstanding.

Guaranteed class means that you will never use more than your CPU limit.
While this SOUNDS good, it's actually pretty useless outside of benchmarks.

We absolutely want memory request == limit. I do not think we want the
same for CPU.

On Fri, Sep 30, 2016 at 3:59 PM, Vish Kannan notifications@github.com
wrote:

As for this specific issue, we'd like to keep kubedns in the Guaranteed
class ideally. To do that though, we either need vertical scaling of
compute resources or detailed profiling of resource requirements for
kubedns across different cluster sizes and performance requirements (QPS)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#33222 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVP0GyyqSsc07b_4VMF5Tl7v52DFPks5qvZRXgaJpZM4KDWOm
.

@vishh

This comment has been minimized.

Copy link
Member Author

vishh commented Oct 1, 2016

Guaranteed class means that you will never use more than your CPU limit.

I assume you say this because we choose to hard limit CPU for guaranteed now.

  1. It is an internal implementation detail and we can choose to change it in the future.
  2. Ideally, if hard capping doesn't negatively impact apps, for example by having a short quota period, CPU limits wouldn't matter to most end users.

While this SOUNDS good, it's actually pretty useless outside of benchmarks.

CPU isolation is one facet of QoS classes. Guaranteed class happens to include CPU quota as of now. I don't see how that would make Guaranteed a candidate just for benchmarks. Can you clarify further?

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Oct 1, 2016

For a benchmark or load-test, you want to know how your system performs at
the limits. Request is guaranteed, so "the limit" is when you can't go
above your request.

In real operation, there's almost always free cycles to use. Why would you
NOT use them? CPU is a compressible resource. There's very little cost to
using more CPU than you requested, as long as that CPU is available. This
is different from memory or disk, which have a real cost to reclaim.

This is extra painful with highly-concurrent languages and low-request
jobs. If I request 1 core of a 32 core machine, I can theoretically
exhaust my quota in 34 milliseconds. Even with 250 ms quota, I am sitting
idle for 216 ms. That is going to look SUPER CRAPPY on a service that
really needs to be < 2 ms latency, and happens to be at the center of
everything (e.g. DNS).

So remind me why we are advising users to hurt themselves? Or am I
misunderstanding?

On Fri, Sep 30, 2016 at 5:08 PM, Vish Kannan notifications@github.com
wrote:

Guaranteed class means that you will never use more than your CPU limit.

I assume you say this because we choose to hard limit CPU for guaranteed
now.

  1. It is an internal implementation detail and we can choose to change
    it in the future.
  2. Ideally, if hard capping doesn't negatively impact apps, for
    example by having a short quota period, CPU limits wouldn't matter to most
    end users.

While this SOUNDS good, it's actually pretty useless outside of benchmarks.

CPU isolation is one facet of QoS classes. Guaranteed class happens to
include CPU quota as of now. I don't see how that would make Guaranteed a
candidate just for benchmarks. Can you clarify further?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#33222 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVMP2xKzh3IYP8dYHmct9OziAnpmyks5qvaRsgaJpZM4KDWOm
.

@vishh

This comment has been minimized.

Copy link
Member Author

vishh commented Oct 1, 2016

So remind me why we are advising users to hurt themselves? Or am I
misunderstanding?

IIUC, you are primarily focussed on CPU quota, and not on QoS classes in general.

The real question I guess is why is CPU quota imposed on Guaranteed pods?
This was suggested by @erictune I think because Guaranteed pods should not consume more than what they asked for.
If we want to optimize for latency instead of burst capacity, then we can set the quota period to 20ms for example.
Guaranteed pods are expected to be fine tuned for consistent performance across homogenous nodes. Allowing Bursting of CPU can result in inconsistent performance across nodes.

That said, admins can choose to allow Guaranteed pods to burst too with a simple flag flip.
I'd like to wait for user feedback on QoS before moving further ahead.

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Oct 1, 2016

Well, either way DNS should NOT be using quota. It's killing our tail
latency.

On Fri, Sep 30, 2016 at 5:55 PM, Vish Kannan notifications@github.com
wrote:

So remind me why we are advising users to hurt themselves? Or am I
misunderstanding?

IIUC, you are primarily focussed on CPU quota, and not on QoS classes in
general.

The real question I guess is why is CPU quota imposed on Guaranteed pods?
This was suggested by @erictune https://github.com/erictune I think
because Guaranteed pods should not consume more than what they asked for.
If we want to optimize for latency instead of burst capacity, then we can
set the quota period to 20ms for example.
Guaranteed pods are expected to be fine tuned for consistent performance
across homogenous nodes. Allowing Bursting of CPU can result in
inconsistent performance across nodes.

That said, admins can choose to allow Guaranteed pods to burst too with a
simple flag flip.
I'd like to wait for user feedback on QoS before moving further ahead.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#33222 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVOvTEUyLMmZu_MeoPp2Dja2bfVnTks5qva-EgaJpZM4KDWOm
.

@bprashanth

This comment has been minimized.

Copy link
Member

bprashanth commented Oct 1, 2016

It currently still uses requests, not limits. And if we're only guaranteeing requests, it's probably still deieing. when the wrong pods land on the node (it might've been worse on debian where we didn't enforce the other addon limits).

@vishh

This comment has been minimized.

Copy link
Member Author

vishh commented Oct 1, 2016

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Oct 1, 2016

I really do not think eviction should be tied to compressible-resource
overages, or at least that is the VERY LOWEST factor

On Sat, Oct 1, 2016 at 8:28 AM, Vish Kannan notifications@github.com
wrote:

I suspect we can run kubedns in the Guaranteed class by running pod
nanny

alongside it. Pod nanny can resize the kubedns pod as and when required.
Given that it will be a guaranteed pod, it will most-likely never get
evicted.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#33222 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVDiBG_ZTjZHQiGYyDikPrdl0-jTQks5qvnwXgaJpZM4KDWOm
.

@bprashanth

This comment has been minimized.

Copy link
Member

bprashanth commented Oct 2, 2016

Increasing requests for kube-dns has the awful downside of evicting user pods (the last time this happened we had an epic disaster: #23556). My first reaction coming from a "process management" world is: I just want cpulimits+nice.

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Oct 2, 2016

We should dynamically increase request to what we think is a realistic
value - this is not a free ride. But I don't think we need quota.

On Sat, Oct 1, 2016 at 6:08 PM, Prashanth B notifications@github.com
wrote:

Increasing requests for kube-dns has the awful downside of evicting user
pods (the last time this happened we had an epic disaster: #23556
#23556). My first
reaction coming from a "process management" world is: I just want
cpulimits+nice.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#33222 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVEdTC_yfJcGKoF3neyK3BdRiqjufks5qvwQpgaJpZM4KDWOm
.

@bprashanth

This comment has been minimized.

Copy link
Member

bprashanth commented Oct 2, 2016

yeah but what's to prevent everyone else from doing the same? at which point someone gets kicked out. do we have a priority level setting that says never kick dns out?

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Oct 2, 2016

DNS is marked as critical, but no, nobody should be kicked out for using
spare cycles. We schedule on requests, and we don't overcommit. If
everyone runs full tilt and eats as much CPU as they can, everyone gets
their request. 99.999% of the time this is not the case and there are
spare cycles.

Ideally, tooling would automatically adjust DNS to have more CPU request
when it needs more on a consistent basis. We don't have that tooling yet.

On Oct 1, 2016 11:13 PM, "Prashanth B" notifications@github.com wrote:

yeah but what's to prevent everyone else from doing the same? at which
point someone gets kicked out. do we have a priority level setting that
says never kick dns out?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#33222 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVFAOt6uzgUy_lk3MLw1wCTvp3_Gvks5qv0t5gaJpZM4KDWOm
.

@bprashanth

This comment has been minimized.

Copy link
Member

bprashanth commented Oct 2, 2016

It is easy to write a feedback loop that scrapes heapster data (I think the nanny Vish mentioned does some form of this). My previous question was in this context. If everyone kept bumping up requests when they observed their pod hitting request, what happens? Do we try to reschedule dns, leave it running with too little cpu, or kick something else out and scale dns? I want to kick things out based on priority.

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Oct 2, 2016

DNS is critical, so it will kill other things. In theory. I have not
tested this myself :)

On Oct 2, 2016 9:16 AM, "Prashanth B" notifications@github.com wrote:

It is easy to write a feedback loop that scrapes heapster data (I think
the nanny Vish mentioned does some form of this). My previous question was
in this context. If everyone kept bumping up requests when they observed
their pod hitting request, what happens? Do we try to reschedule dns, leave
it running with too little cpu, or kick something else out and scale dns? I
want to kick things out based on priority.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#33222 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVBF_k-JBNcyeksgIhmCJcXMH5YIoks5qv9jSgaJpZM4KDWOm
.

@pendoragon

This comment has been minimized.

Copy link
Contributor

pendoragon commented Oct 8, 2016

So in terms of incompressible resource, what is the point of setting limit? I ask this because pod eviction is WIP and in case of memory or disk contention, someone will get kicked out and the node will remain stable. Is it because we can prioritize pods and eviction can be done in order(BE ones first, then Bu...)? Or scheduling can probably be done based on limit in the future?

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Oct 8, 2016

Setting limit allows you to opportunistically use more memory than you have
a guaranteed right to use. This of it this way - if you have limit ==
request, and you get a sudden spike, you will OOM. If you have limit >
request, you might OOM, but you might survive. It's not so black and
white, though, because having a limit means the kernel tries to keep you
below the limit, so raising the limit means you're less likely to stay
below request (pressure is applied wrt limit, I am pretty sure).

There's a LOT of work we can do here, but a lot of it is very subtle, and
upstream kernels have indicated reluctance to adopt some of the more
nuanced semantics.

On Sat, Oct 8, 2016 at 2:56 AM, Pendoragon notifications@github.com wrote:

So in terms of incompressible resource, what is the point of setting
limit? I ask this because pod eviction is WIP and in case of memory or disk
contention, someone will get kicked out and the node will remain stable. Is
it because we can prioritize pods and eviction can be done in order(BE ones
first, then Bu...)? Or scheduling can probably be done based on limit in
the future?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#33222 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVMNKC9jFKTV3mHvYHI-xyalCab23ks5qx2jigaJpZM4KDWOm
.

@ddysher

This comment has been minimized.

Copy link
Contributor

ddysher commented Oct 9, 2016

+1 But everywhere eviction is mentioned, we see QoS. And although QoS is said be orthogonal to request & limit, we uses request & limit to determine QoS exclusively. Further, there doesn't seem to be a plan to decouple them? I remembered reading some docs in k8s.io saying that we deliberately rejected using numbered priority.

I really do not think eviction should be tied to compressible-resource overages, or at least that is the VERY LOWEST factor

One of the assumptions for CPU is that there is always free cycles; but we run cpu intensive apps on kubernetes (tensorflow). Haven't tried yet, but curious about what happens if we run them with normal apps.

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Oct 10, 2016

My main point is that for compressible resources there is almost NO IMPACT
on the system if I temporarily exceed my request. If everyone is using
their request, I might not be able to get extra, and that might cause my
app pain, but that's on me to figure out.

On Sat, Oct 8, 2016 at 11:21 PM, Deyuan Deng notifications@github.com
wrote:

+1 But everywhere eviction is mentioned, we see QoS. And although QoS is
said be orthogonal to request & limit, we uses request & limit to determine
QoS exclusively. Further, there doesn't seem to be a plan to decouple them?
I remembered reading some docs in k8s.io saying that we deliberately
rejected using numbered priority.

I really do not think eviction should be tied to compressible-resource
overages, or at least that is the VERY LOWEST factor

One of the assumptions for CPU is that there is always free cycles; but we
run cpu intensive apps on kubernetes (tensorflow). Haven't tried yet, but
curious about what happens if we run them with normal apps.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#33222 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVJ3D91nSH0ZfwYeuCofkFld7qIuSks5qyIfTgaJpZM4KDWOm
.

@pendoragon

This comment has been minimized.

Copy link
Contributor

pendoragon commented Oct 12, 2016

Does this mean that If an application is critical, I should place it in Burstable class in terms of memory? e.g. In stead of setting request & limit memory as 2G/2G, I can raise the memory limit to 4G if the application is critical so that it can survive some memory spikes. Or can I make the statement that critical applications should be classified as Burstable rather than Guaranteed because it can survive memory spikes while a guaranteed application will be killed once it tries to use more memory than its request.

This of it this way - if you have limit ==request, and you get a sudden spike, you will OOM. If you have limit > request, you might OOM, but you might survive.

@vishh

This comment has been minimized.

Copy link
Member Author

vishh commented Oct 12, 2016

If a pod exceeds its requests, it becomes a highly likely candidate for OOM
kills or evictions. Using more than request is opportunistic and can lead
to container deaths.

If a pod is critical and cannot tolerate disruptions, I'd recommend setting
request ( and limits) to all its containers to the amount of memory and cpu
the respective containers will require at expected peak usage. Maybe give
them 20% more just to be safe.

On Tue, Oct 11, 2016 at 8:57 PM, Pendoragon notifications@github.com
wrote:

Does this mean that If an application is critical, I should place it in
Burstable class in terms of memory? e.g. In stead of setting request &
limit memory as 2G/2G, I can raise the memory limit to 4G if the
application is critical so that it can survive some memory spikes. Or can I
make the statement that critical applications should be classified as
Burstable rather than Guaranteed because it can survive memory spikes
while a guaranteed application will be killed once it tries to use more
memory than its request.

This of it this way - if you have limit ==request, and you get a sudden
spike, you will OOM. If you have limit > request, you might OOM, but
you might survive.


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#33222 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGvIKEbuoxb8IfpYoh7_Sg9sdo5SuDNSks5qzFq_gaJpZM4KDWOm
.

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Oct 12, 2016

Vish and I are just going to disagree on CPU, but we agree on memory.

On Tue, Oct 11, 2016 at 10:13 PM, Vish Kannan notifications@github.com
wrote:

If a pod exceeds its requests, it becomes a highly likely candidate for OOM
kills or evictions. Using more than request is opportunistic and can lead
to container deaths.

If a pod is critical and cannot tolerate disruptions, I'd recommend setting
request ( and limits) to all its containers to the amount of memory and cpu
the respective containers will require at expected peak usage. Maybe give
them 20% more just to be safe.

On Tue, Oct 11, 2016 at 8:57 PM, Pendoragon notifications@github.com
wrote:

Does this mean that If an application is critical, I should place it in
Burstable class in terms of memory? e.g. In stead of setting request &
limit memory as 2G/2G, I can raise the memory limit to 4G if the
application is critical so that it can survive some memory spikes. Or
can I
make the statement that critical applications should be classified as
Burstable rather than Guaranteed because it can survive memory spikes
while a guaranteed application will be killed once it tries to use more
memory than its request.

This of it this way - if you have limit ==request, and you get a sudden
spike, you will OOM. If you have limit > request, you might OOM, but
you might survive.


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<https://github.com/kubernetes/kubernetes/issues/
33222#issuecomment-253112798>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGvIKEbuoxb8IfpYoh7_
Sg9sdo5SuDNSks5qzFq_gaJpZM4KDWOm>

.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#33222 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AFVgVCaOCJJ92BDkUk-FktWuTvMHjDd0ks5qzGyXgaJpZM4KDWOm
.

@pendoragon

This comment has been minimized.

Copy link
Contributor

pendoragon commented Oct 12, 2016

So to put it another way, QoS class has little, if any, relevance to the importance of the application. What matters is how much guaranteed resource(request) it can safely use and how much resource it can opportunistically use(limit). Because it's not the QoS class that decides whether a pod should be evicted or killed, rather it's whether a pod exceeds its requested resource. From a user's PoV, I should only care about guaranteed and opportunistic resource I give to an application respectively and not be concerned about the QoS class it is in.

@ddysher

This comment has been minimized.

Copy link
Contributor

ddysher commented Oct 12, 2016

This only hold true for burstable QoS ?

Because it's not the QoS class that decides whether a pod should be evicted or killed, rather it's whether a pod exceeds its requested resource.

We have users running quite a few apps on kubernetes and they are terrible at guesting request/limit; so they usually request large number of resource for important app which result in VERY bad utilization. We are investigating a couple of solutions, like initial resource, autopilot kind of tooling. This really bites us.

@vishh

This comment has been minimized.

Copy link
Member Author

vishh commented Oct 12, 2016

QoS classes are tied to requests and limits today and so QoS classes do
determine which pods get evicted under resource pressure on the nodes.

In the future, we hope to deliver pressure signals to k8s native apps that
can opportunistically use more resources and dial down their usage when
there is resource pressure (primarily for memory)

On Tue, Oct 11, 2016 at 11:51 PM, Pendoragon notifications@github.com
wrote:

So to put it another way, QoS class has little, if any, relevance to the
importance of the application. What matters is how much guaranteed
resource(request) it can safely use and how much resource it can
opportunistically use(limit). Because it's not the QoS class that decides
whether a pod should be evicted or killed, rather it's whether a pod
exceeds its requested resource. From a user's PoV, I should only care about
guaranteed and opportunistic resource I give to an application respectively
and not be concerned about the QoS class it is in.


You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
#33222 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGvIKBpESglvt9z3JCCIcHzJ5DpKkPDgks5qzIN4gaJpZM4KDWOm
.

@pendoragon

This comment has been minimized.

Copy link
Contributor

pendoragon commented Oct 12, 2016

@vishh Sorry, maybe I didn't convey it clearly. What I meant by "Because it's not the QoS class that decides whether a pod should be evicted or killed" is that request is guaranteed resource, being in Guaranteed class does not buy you extra level of guarantee on the requested resource than Burstable. e.g. For a Guaranteed pod A with 2G/2G (request/limit) and a Burstable pod B with 2G/4G are running on the same node with some other pods, A and B will have the same level of guarantee on their requested 2G memory. If their actual memory usage is below request, theoretically neither of them will be evicted in any circumstances (I am not counting the kube reserve and system reserve). Because the scheduling is based on request so if everyone on the node is below request, there will not be any memory contention. And if some other pods goes above their request, those who exceed their requests will be killed/evicted first. If I am correct on the scenario above, then will it be any different if we only have request and limit defined but no QoS class at all?

@chrissound

This comment has been minimized.

Copy link

chrissound commented Apr 20, 2018

What is the conclusion to this?

#33777

This question has 4000 views on stackoverflow (in 6 months, that is 20 views per day): https://stackoverflow.com/questions/45573825/pod-will-not-start-due-to-no-nodes-are-available-that-match-all-of-the-followin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.