Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maxSurge for node draining or how to meet availability requirements when draining nodes by adding pods #114877

Open
txomon opened this issue Jan 6, 2023 · 15 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@txomon
Copy link

txomon commented Jan 6, 2023

What would you like to be added?

A way to drain nodes by adding more pods elsewhere to meet PodDisruptionBudgets.

Why is this needed?

Currently, when there is a Deployment, it can be configured to have a maxSurge to avoid going under the amount of replicas the deployment requires while allowing for a new release to be rolled out. This parameter allows adding extra pods before subtracting the old ones so that the "replicas" number required is always met as a minimum,

This feature (to my knowledge) is only available when releasing new versions of an application, however when draining nodes this would be extremely useful.

Usual cluster maintenance is done by adding new nodes before removing old ones. This means all the pods in the node need to be evicted and there is usually space for one more of each of the old node in the new node. Current solutions such as the PodDisruptionBudget or Eviction API are trying to make sure that substracting pods from the current amount don´t break anything, however the possibility of temporarily having one extra pod of each deployment is not contemplated at the moment.

This request is asking for the ability to use a surplus of pods to meet all constraints for safe eviction.

Some side notes to stress the importance. Although when operating evictions on large workloads lack of PDBs or PDBs with minAvailable/maxUnavailable settings work fine. When moving deployments with 1 replicas or HPA controlled deployments that are currently scaled down enough the problem is aggravated and can only be solved through a few inefficient means, which is acerbated if node maintenance is done automatically (such as GKE, and other cloud services)

Just in case, this is a limitation that should only be counted against Deployments with strategy.type=RollingUpdate.

Ways to deal with this situation currently:

  1. Have a minReplicas/replicas to >1, and a PDB with maxUnavailable=1 when it's known that the autoscaler if in use, it's usually scaled on the lower end. Pros: There is no downtime, Cons: Waste of resources.
  2. Do nothing, and deal with eventual downtimes. Pros: No waste of resources, Cons: There is downtime in the deployment
@txomon txomon added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 6, 2023
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 6, 2023
@k8s-ci-robot
Copy link
Contributor

@txomon: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 6, 2023
@txomon
Copy link
Author

txomon commented Jan 6, 2023

/wg reliability
/sig scalability
/sig cluster-lifecycle
/sig autoscaling
/sig apps

@k8s-ci-robot k8s-ci-robot added wg/reliability Categorizes an issue or PR as relevant to WG Reliability sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 6, 2023
@txomon txomon changed the title maxSurge for node draining or how to meet availability requirements when draining nodes maxSurge for node draining or how to meet availability requirements when draining nodes by adding pods Jan 6, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 6, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 6, 2023
@runephilosof-karnovgroup

It seems @0xmichalis also advocated this on Mar 8, 2017 when drain was made to honor PodDisruptionBudgets. But is seems no one took notice of it.

It seems a lot of users are frustrated/confused by this https://duckduckgo.com/?q=poddisruptionbudget+single+replica. I think it would help if the maxSurge was utilized instead of getting stuck.

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 22, 2023
@james-callahan
Copy link

This is still a huge obstacle for gracefully replacing nodes.
Is anyone trying to take this on?

@txomon
Copy link
Author

txomon commented Jul 31, 2023

I don't think so, it seems like it isn't gathering much attention, I'm not sure if the working group is even aware of it...

james-callahan added a commit to james-callahan/kyverno-kustomize that referenced this issue Aug 11, 2023
With a single replica and the PDB, it's not possible to cleanly evict the pod
from a node.

Relevant kubernetes issue: kubernetes/kubernetes#114877 (comment)
@neolit123
Copy link
Member

/remove-wg reliability
/remove-sig scalability
/remove-sig cluster-lifecycle
/remove-sig autoscaling
/sig node

sig apps is the primary owner of this FR due to the maxSurge feature. you can join their zoom meeting and present it:
https://github.com/kubernetes/community/tree/master/sig-apps
sig node can participate with respect to node draining (as the title indicates).

other sigs / wg should only be added once maintainers have a look at this ticket.

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed wg/reliability Categorizes an issue or PR as relevant to WG Reliability sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. labels Sep 4, 2023
@soltysh
Copy link
Contributor

soltysh commented Sep 5, 2023

It's important to understand that any controller (deployment, or any other workload) and disruption are two distinct mechanism having different roles, although their functionality is complementary when we're talking about ensuring high availability.
I'd like to stress here the high availability factor, which is at the front and center when talking about PDBs. I will admit it's challenging to have any further discussion without that prerequisite fullfilled.
In a similar vein, when talking about HPAs, there's an option for minimum number of replicas, which ensure the application always maintains the HA pre-reqs.

Moving on to the subject of configuring a rolling update (in case of a deployment), versus responding to external disruptions (in this case, PDB being an external actor is equal to invoking kubectl delete by the user). The reason we have
the ability to have a detailed rollout is coming from the fact that it is a process fully operated by the owning controller. Whereas in all other situations (ie. any external errors or disruptions), the controller's goal is to reach the desired state as quickly as possible.

Speaking from my own experience, we've had multiple instances of problems when PDBs were blocking the upgrades, due to users setting the allowed replicas down to bare minimum. We solved it by adding an alert which looks at PDBs and notifies administrators in those cases. Since then we haven't seen any problems, or cluster administrators were aware that problems might popup during upgrade and were able to solve these problems even before initiating the upgrades.

Having said all of the above, SIG Apps meeting happens on every other Monday, next occurrence is planned for September 18. If you're interested, I'd be happy to hear more about your use cases and the problems you're struggling with.

@txomon
Copy link
Author

txomon commented Sep 5, 2023 via email

@james-callahan
Copy link

Likewise: usually when I run into this its when a low availability requirement cluster is running cluster-critical services (e.g. an admission controller) where it is a complete waste of resources to run multiple replicas, but the need to drain without manual interaction (e.g. in response to a spot instance removal) is important.

I'm unable to join the SIG apps call, as the meeting time is incompatible with Australian timezones.

@txomon
Copy link
Author

txomon commented Nov 27, 2023

For anyone following the thread, https://github.com/kubernetes/enhancements/pull/4213/files was brought forward to take into account for this situation during the sig-apps weekly.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 25, 2024
@james-callahan
Copy link

/remove-lifecycle stale

Where can I follow any ongoing discussion?

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 25, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
Status: Needs Triage
Development

No branches or pull requests

7 participants