Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a KEP for graduating PodDisruptionBudget to stable #904

Merged
merged 1 commit into from Jul 30, 2019

Conversation

bsalamat
Copy link
Member

/sig apps
/assign @kow3ns @liggitt

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels Mar 19, 2019
## Summary

[Pod Disruption Budget (PDB)](https://kubernetes.io/docs/tasks/run-application/configure-pdb/)
is a Kubernete API that limits the number of pods of a collection that are down simultaneously from voluntary disruptions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Kubernetes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

#### Mutable PDBs

A mutable PDB object allows its `MinAvailable` and `MaxUnavailable` fields to be
modified by clients. Components that use PDB must watch such modifications and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about selector fields? is there a good reason to limit which spec fields can be modified?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the selector as well. I am not closely familiar with the internal logic of Disruption controller to say with confidence whether updating the selector (and/or other fields) could cause any issues there, but I don't see a concern after a quick look at the code.

the rolling update spec and currently does not take PDB into account. We need to
change the implementation and use
`min(PDB.MinAvailable, RollingUpdate.MinAvailable)` and
`max(PDB.MaxUnavailable, RollingUpdate.MaxUnavailable)` instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want deployments looking at PDB objects or using the evict subresource? Just looking at the PDB does not take into account PDB status for selectors that match across deployments, right?

Respecting PDB could easily lead to a situation in which a deployment could not make progress. Given that, is respecting PDB still desired? If so, describe how that state will be detected, communicated, and/or resolved (automatically or manually).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I had in mind was to only look at MinAvailable and MaxUnavailable values and treat those similar to MinAvailable and MaxUnavailable of a rolling update, without making further changes and without using "evict" subresource, but given your points and reading some of the old comments, I think we should drop this requirement.

#### Needed Tests

If this is considered a non-optional feature, there should be a conformance test
for it (this needs to be an e2e test tagged as conformance).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good candidate for conformance testing. With an ack from @kubernetes/cncf-conformance-wg I'd make this a more definitive statement

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed wording to make the test a requirement.

Copy link
Member Author

@bsalamat bsalamat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @liggitt for your quick review!

## Summary

[Pod Disruption Budget (PDB)](https://kubernetes.io/docs/tasks/run-application/configure-pdb/)
is a Kubernete API that limits the number of pods of a collection that are down simultaneously from voluntary disruptions.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

#### Mutable PDBs

A mutable PDB object allows its `MinAvailable` and `MaxUnavailable` fields to be
modified by clients. Components that use PDB must watch such modifications and
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the selector as well. I am not closely familiar with the internal logic of Disruption controller to say with confidence whether updating the selector (and/or other fields) could cause any issues there, but I don't see a concern after a quick look at the code.

the rolling update spec and currently does not take PDB into account. We need to
change the implementation and use
`min(PDB.MinAvailable, RollingUpdate.MinAvailable)` and
`max(PDB.MaxUnavailable, RollingUpdate.MaxUnavailable)` instead.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I had in mind was to only look at MinAvailable and MaxUnavailable values and treat those similar to MinAvailable and MaxUnavailable of a rolling update, without making further changes and without using "evict" subresource, but given your points and reading some of the old comments, I think we should drop this requirement.


## Summary

[Pod Disruption Budget (PDB)](https://kubernetes.io/docs/tasks/run-application/configure-pdb/)
Copy link
Member

@timothysc timothysc Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bsalamat @liggitt - I've been thinking a lot about this topic when working on ClusterAPI.

Because the object is very loosely coupled with Pods, and only via label selector, is there any reason why we couldn't have it be more generic and remove the the 'Pod' from the resource kind? The link via label denotes the resource that it is tied to.

As a concrete example, we would like to tie disruption budgets to other resources for cluster management (machines).
/cc @ncdc @detiber

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the other scenarios, is delete the standard definition of disruption?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with disruption budgets to machines. There's two orthogonal dimensions and use cases - managing machine upgrades because of organizational or infrastructure policy (and having to create fake pods to represent that with PDB today), and managing impact to applications.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would just using label selector cause confusion here? Would there be a case where we would want to match both Machines and Pods with the same selector? Or would it be on the consumer to differentiate the type to associate with the label selector?

As to the specific use case around Machines, would it make sense to have the disruption budget exist for the Machines or the underlying Nodes instead? If what we are striving for is to ensure a certain level of capacity is available during an upgrade, then it seems to me that we are more concerned with the Nodes rather than the Machines themselves.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have protected a pool of machines by running a pod per node with anti-affinity rules setup, and then you can prevent disruption to more than N number of nodes using that pod as a guard. its a bit of a hack, but it works. in general, i would prefer we have pod disruption budget, and machine disruption budgets as separate objects, the use cases may vary slightly in practice. similar to having machine sets and replica sets.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to having different disruption budget for pods and nodes for a few reasons:

  1. PDB is enforced through evict subresource of Pods which is obviously not applicable to nodes.
  2. If we used the same API for both objects, we would need to add a field to the API to specify the type of object that it applies to (node vs pod).
  3. Implementation of Node Disruption Budget shares little with that of PDB. So, we won't save much on the coding side by combining the two APIs either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer we have pod disruption budget, and machine disruption budgets as separate objects, the use cases may vary slightly in practice. similar to having machine sets and replica sets

I'd tend to agree. The concepts are similar, but the application, audience, and scope are different.

PDB's are namespaced, and write access can be granted to users with write access in a namespace.

Machine or Node disruption budgets would be cluster-scoped, and should only be writeable by the cluster admin.

Copy link
Member

@timothysc timothysc Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Machine or Node disruption budgets would be cluster-scoped, and should only be writeable by the cluster admin.

That's not true, they would be namespace scoped in a mgmt cluster.

FWIW I'm ok if we decide to not rationalize now, but I do think this is an area in the api where there is overlapping concepts and having (N) of the same type is also not good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I'm ok if we decide to not rationalize now,

If we think we may change the API at some point, discussing it now is probably better than after the API is GA'ed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have work-arounds on our side.

@derekwaynecarr
Copy link
Member

i have no major issue about graduating pdb in its current form. limitations that we have encountered recently could be handled via a higher level controller writing to a pdb so making them mutable would help a lot. an example scenario was sizing a pdb relative to the number of control plane machines in a cluster which is easier to do if we support mutability. this is useful if you use daemonsets for some types of workloads, but still want to minimize potential disruption to the number of control plane machines.

### Goals

* Plan to promote PDB API to stable version.
* Propose changes to the deployment controller to take PDB into account.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is "taking it into account" described?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed that section as it turns out to create complexities which are not easy to address. I'll remove this goal.

- [This test](https://github.com/kubernetes/kubernetes/blob/ac56bd502ab96696682c66ebdff94b6e52471aa3/test/integration/scheduler/preemption_test.go#L731)
tests effects of PDB on preemption (PDB is honored in a best effort way)
* Eviction integration tests
- [These tests](https://github.com/kubernetes/kubernetes/blob/master/test/integration/evictions/evictions_test.go) test eviction logic and its interactions with PDB.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the test

- [ ] Ensure that components do not have any logic that relies on immutability
of PDBs. For example, if component builds a cache of pods that match various
PDBs, they must add logic to invalid the cache on updates.
- Action Item: sweep for in-tree uses to make sure we’re informer driven and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should include all repos in kubernetes org, not just k/k. I know Cluster Autoscaler has logic to deal with PDBs (I don't think it will be impacted by this change, but it's worth double checking), I suspect VPA may also use it. Not sure about other projects under kubernetes org.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@kow3ns
Copy link
Member

kow3ns commented Mar 22, 2019

@janetkuo and @mortent PTAL as well, I'd like your thoughts about this

@bsalamat
Copy link
Member Author

bsalamat commented Apr 8, 2019

ping @janetkuo and @mortent. Is there anything else that you want to see here?
Since PDB is owned by SIG Apps, I would expect someone from your SIG to take the graduation of PDB from here and perform the remaining actions items.

@mortent
Copy link
Member

mortent commented Apr 8, 2019

I'm looking into adding support for the scale subresource with PDBs, but I'm not sure if this would be a requirement for GA. But I hope this would be ready for the next release of Kubernetes together with mutable PDBs. And then both changes could soak for one release and we can push PDBs to GA in 1.16.

I am interested in taking on the task of getting this to GA.

@kow3ns
Copy link
Member

kow3ns commented Jul 9, 2019

So we've discussed this as something we want to do. It passes the criteria for approval.
/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 9, 2019
Kubernetes control plane evicts some of the existing pods, but keeps at least 10
around. The PDB is updated and states that at least 20 replicas must be kept
alive. It may appear to an observer that the evictions happened before the PDB
update were incorrect, if they don’t notice the PDB update.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have this problem before PDBs are mutable.

Immutable PDBs can be deleted and re-created with new spec (20 replicas in this example).

@janetkuo
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 30, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, janetkuo, kow3ns

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit a206c03 into kubernetes:master Jul 30, 2019
mortent added a commit to mortent/website that referenced this pull request Aug 14, 2019
This is a placeholder PR for the docs for the PDB to GA KEP kubernetes/enhancements#904
mortent added a commit to mortent/website that referenced this pull request Aug 14, 2019
This is a placeholder PR for the docs for the PDB to GA KEP kubernetes/enhancements#904
@seh
Copy link

seh commented Sep 26, 2020

See kubernetes/kubernetes#95083 for a potential problem with the debatable treatment of the PodDisruptionBudget.Selector field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet