DisruptionBudget object to define the max disruption that can be caused to pods #12611

nikhiljindal · 2015-08-12T21:56:41Z

Forked from #12236.
@davidopp suggested introducing a DisruptionBudget object in #12236 (comment):

will it be possible to construct enforceable SLOs if each controller that can cause disruptions has
a separate disruption budget spec? (An example of another such controller is rescheduler, but I'm sure there will be others, e.g. maybe one that takes machines offline for kernel upgrades.)

It seems that we may want a new object that lets you define arbitrary collections of Pods (using label
selectors) and specifies a correspond disruption budget (maxUnavailable like you have here, max rate of churn, etc.). The Spec for this object would be the budget, and the Status would be the
number currently down or current churn rate or whatever. This object could be owned by a service that controllers ask permission before they initiate a disruption, or the logic for checking if it is OK to disrupt and updating the status could be implemented in a library that is used by all the controllers that
cause disruptions.

How will DisruptionBudget in Deployment work? What if a DisruptionBudget (or whatever) object already exists that covers
(some of) these Pods? Does it override the one specified in Deployment? (i.e. you only
create DisruptionBudget if it doesn't exist?) Is it an error to specify a disruption budget in Deployment if there is a DisruptionBudget object for (some of) the Pods covered by the Deployment object?

In Deployment, we are currently using a maxUnavailable field. We can eventually switch to using a DisruptionBudget.

cc @bgrant0607

davidopp · 2015-08-13T01:52:52Z

Copying @bgrant0607 's responses to my questions:

General issue of how to deal with conflicting overlapping sets, similar to multiple controllers. Added to my list of API conventions to document.

Generation of the DisruptionBudget should be optional, though, in case someone did want to create it separately deliberately.

FWIW, internally, self-inflicted disruptions (updates, crash loops, etc.) are not counted against the infrastructure disruption budget.

Last sentence is an excellent point. I think that policy makes a lot of sense, so maybe we wouldn't want Deployment to ever consult the disruption budget. Anyway, not something we need to decide now.

davidopp · 2015-09-02T00:57:13Z

cc/ @thockin (you mentioned something related to this in a meeting today)

bgrant0607 · 2015-10-29T21:03:53Z

cc @mml

mml · 2015-11-04T21:15:21Z

Re: not counting self-inflicted wounds against the budget: Yes, but we do still want to count unplanned failures, such as a machine panic or crash. We need to be able to distinguish these.

"Budget" is a good metaphor for one of the things we usually care about around SLO enforcement: evictions/time. We can model these with a token bucket.

"Budget" might be a bad metaphor for binary conditions. One example is shard strength.

Do we have a good handle on the kinds of SLOs users want to see, or cluster admins want to offer?

davidopp · 2015-11-07T14:17:42Z

Do we have a good handle on the kinds of SLOs users want to see, or cluster admins want to offer?

Not really. Deployment already has maxUnavailable, and I assume we want that. Presumably we want evictions/time as well, as you mentioned. I'm not sure we need anything else, at least for now.

bgrant0607 · 2015-11-17T23:18:21Z

Forgiveness is related: #1574

So is service-level readiness, which was discussed in the Component Registration API context #13216.

mml · 2015-11-21T00:45:23Z

Occurred to me while working on #17393 that we should define and enforce an upper bound on min time between evictions, not just so we can reasonably make progress, but also so we don't end up in a degenerate case where the drain controller is trying to establish/maintain 2 years worth of disruption history.

There might be other cases where we need boundaries to make implementation feasible.

thockin · 2015-11-21T02:57:21Z

The issue of control objects with selectors that overlap comes up again and
again, and I am not really satisfied with (my understanding of) the answer.

It came up again on network policies - how do I apply policies to sets of
pods and know that the policies are not in conflict? It seems easier to
have pods declare which policy they belong to rather than policies declare
which pods they apply to.

Obviously, this doesn't work for RCs and similar.
On Aug 12, 2015 6:53 PM, "David Oppenheimer" notifications@github.com
wrote:

Copying @bgrant0607 https://github.com/bgrant0607 's responses to my
questions:

General issue of how to deal with conflicting overlapping sets, similar to
multiple controllers. Added to my list of API conventions to document.

Generation of the DisruptionBudget should be optional, though, in case
someone did want to create it separately deliberately.

FWIW, internally, self-inflicted disruptions (updates, crash loops, etc.)
are not counted against the infrastructure disruption budget.

Last sentence is an excellent point. I think that policy makes a lot of
sense, so maybe we wouldn't want Deployment to ever consult the disruption
budget. Anyway, not something we need to decide now.

—
Reply to this email directly or view it on GitHub
#12611 (comment)
.

bgrant0607 · 2015-11-21T06:07:47Z

@mml Forgiveness is similar; it also needs a rate.

@thockin
For controllers the tie-breaking proposal is controllerRef: #2210 (comment), #14961

Required label keys could also be used to ensure non-overlap: #15390

However, whether pod specifies policy or vice versa depends on which scenarios we want to facilitate. Related policy discussion: #17097

roberthbailey · 2016-02-24T04:15:20Z

/cc me & @ihmccreery

mml · 2016-02-24T22:58:49Z

We really need to collect user feedback about how much flexibility is needed here. If a cluster-wide policy is adequate, it can probably be configured via flags, and it will be easier to implement than the fully general idea of a DisruptionBudget object with selectors.

davidopp · 2016-02-24T23:18:32Z

If these are global (or per-namespace) policies as you are advocating, then you presumably need to express them as a fraction of collection size, since a ReplicaSet or Job or whatever with 100 pods can probably tolerate more simultaneous pods down and higher eviction rate than one with 2 pods. Even as a fraction of collection size I'm not sure you can set a good global number, though. At the very least, I suspect some collections will want to opt out of the policy.

Also, we need to track the current state in order to enforce any kind of policy (e.g. how many pods have been disrupted from this collection during the last N minutes). Assuming we want that to be persisted, we need an object anyway, so it seems reasonable to add to the controllers a pointer to a DisruptionBudget object, that contains both the Spec (disruption rate limit and max simultaneously down pods) and the Status (recent disruptions).

Part of kubernetes#12611

Automatic merge from submit-queue Implement DisruptionController. Part of #12611 This currently also includes a pending commit from #25895

Automatic merge from submit-queue Followup fixes for disruption controller. Part of #12611. - Record an event when a pod does not have exactly 1 controller. - Add TODO comment suggesting we simplify the two cases: integer and percentage.

This is for kubernetes#12611.

goltermann · 2016-09-06T18:44:27Z

@davidopp can you take a look and bump to v1.5 if not needed for 1.4

davidopp · 2016-09-06T20:10:48Z

This is just waiting on docs.

ingvagabund · 2016-09-23T14:56:07Z

@davidopp what is exactly missing here to document? Devel documents how to implement other DB types (mentioning we handle MinAvailable for the moment) with explanation what DB is (referring rescheduling docs + additional notes) and how it works based on the current API & controller implementation? Plus with comments from related issues/PRs about current limitations, how is PodDisruptionAllowed set and how it is (about to be) consumed by /eviction subresource?

Has anyone started working on that?

bgrant0607 · 2017-03-21T18:17:33Z

@mml What remains to be done here?

bgrant0607 · 2017-03-23T17:45:37Z

ref #35318, #38336

mml · 2017-05-31T17:52:58Z

I think this is done.

nikhiljindal added area/api Indicates an issue on api area. team/control-plane kind/design Categorizes issue or PR as related to design. labels Aug 12, 2015

nikhiljindal mentioned this issue Aug 12, 2015

Adding a proposal for deployment #12236

Merged

bgrant0607 added team/api priority/backlog Higher priority than priority/awaiting-more-evidence. labels Aug 12, 2015

This was referenced Oct 1, 2015

Node upgrades: graceful node evacuation #6080

Closed

Mark node to be decommissioned and act accordingly #3885

Closed

bgrant0607 mentioned this issue Oct 17, 2015

[Proposal] Component Registration #13216

Closed

davidopp mentioned this issue Nov 22, 2015

Add proposal for node maintenance. #17393

Closed

This was referenced Dec 5, 2015

Write proposal for controller pod management: adoption, orphaning, ownership, etc. (aka controllers v2) #14961

Open

Create a rescheduler #12140

Closed

bgrant0607 mentioned this issue Feb 18, 2016

Implement alpha version of PreferAvoidPods #20699

Merged

davidopp mentioned this issue Feb 24, 2016

Draining Node for Planned Maintenance/Cluster Scaling Down Proposal #21679

Closed

davidopp added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Feb 24, 2016

davidopp assigned mml Feb 24, 2016

davidopp added this to the next-candidate milestone Feb 24, 2016

mml added a commit to mml/kubernetes that referenced this issue Aug 2, 2016

Implement DisruptionController.

542b8c6

Part of kubernetes#12611

mml added a commit to mml/kubernetes that referenced this issue Aug 16, 2016

Implement DisruptionController.

56faa74

Part of kubernetes#12611

mml added a commit to mml/kubernetes that referenced this issue Aug 16, 2016

Implement DisruptionController.

d60ba3c

Part of kubernetes#12611

k8s-github-robot pushed a commit that referenced this issue Aug 17, 2016

Merge pull request #25921 from mml/db.controller

d574082

Automatic merge from submit-queue Implement DisruptionController. Part of #12611 This currently also includes a pending commit from #25895

mml mentioned this issue Aug 17, 2016

Followup fixes for disruption controller. #30800

Merged

mml added a commit to mml/kubernetes that referenced this issue Aug 22, 2016

Add /eviction subresource.

c86b84c

This is for kubernetes#12611.

davidopp mentioned this issue Aug 30, 2016

PodDisruptionBudget and /eviction subresource kubernetes/enhancements#85

Closed

12 tasks

pwittrock modified the milestones: v1.4-nonblocking, v1.4 Sep 8, 2016

bgrant0607 removed team/api (deprecated - do not use) labels Mar 21, 2017

bgrant0607 removed this from the v1.4-nonblocking milestone Mar 21, 2017

bgrant0607 mentioned this issue Mar 21, 2017

Workload API v1 requirements umbrella issue #42752

Closed

bgrant0607 added the area/node-lifecycle Issues or PRs related to Node lifecycle label Mar 23, 2017

mml closed this as completed May 31, 2017

bsalamat mentioned this issue Mar 19, 2019

Add a KEP for graduating PodDisruptionBudget to stable kubernetes/enhancements#904

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DisruptionBudget object to define the max disruption that can be caused to pods #12611

DisruptionBudget object to define the max disruption that can be caused to pods #12611

nikhiljindal commented Aug 12, 2015

davidopp commented Aug 13, 2015

davidopp commented Sep 2, 2015

bgrant0607 commented Oct 29, 2015

mml commented Nov 4, 2015

davidopp commented Nov 7, 2015

bgrant0607 commented Nov 17, 2015

mml commented Nov 21, 2015

thockin commented Nov 21, 2015

bgrant0607 commented Nov 21, 2015

roberthbailey commented Feb 24, 2016

mml commented Feb 24, 2016

davidopp commented Feb 24, 2016

goltermann commented Sep 6, 2016

davidopp commented Sep 6, 2016

ingvagabund commented Sep 23, 2016

bgrant0607 commented Mar 21, 2017

bgrant0607 commented Mar 23, 2017

mml commented May 31, 2017

DisruptionBudget object to define the max disruption that can be caused to pods #12611

DisruptionBudget object to define the max disruption that can be caused to pods #12611

Comments

nikhiljindal commented Aug 12, 2015

davidopp commented Aug 13, 2015

davidopp commented Sep 2, 2015

bgrant0607 commented Oct 29, 2015

mml commented Nov 4, 2015

davidopp commented Nov 7, 2015

bgrant0607 commented Nov 17, 2015

mml commented Nov 21, 2015

thockin commented Nov 21, 2015

bgrant0607 commented Nov 21, 2015

roberthbailey commented Feb 24, 2016

mml commented Feb 24, 2016

davidopp commented Feb 24, 2016

goltermann commented Sep 6, 2016

davidopp commented Sep 6, 2016

ingvagabund commented Sep 23, 2016

bgrant0607 commented Mar 21, 2017

bgrant0607 commented Mar 23, 2017

mml commented May 31, 2017