maxSurge for node draining or how to meet availability requirements when draining nodes by adding pods #114877

txomon · 2023-01-06T16:05:01Z

What would you like to be added?

A way to drain nodes by adding more pods elsewhere to meet PodDisruptionBudgets.

Why is this needed?

Currently, when there is a Deployment, it can be configured to have a maxSurge to avoid going under the amount of replicas the deployment requires while allowing for a new release to be rolled out. This parameter allows adding extra pods before subtracting the old ones so that the "replicas" number required is always met as a minimum,

This feature (to my knowledge) is only available when releasing new versions of an application, however when draining nodes this would be extremely useful.

Usual cluster maintenance is done by adding new nodes before removing old ones. This means all the pods in the node need to be evicted and there is usually space for one more of each of the old node in the new node. Current solutions such as the PodDisruptionBudget or Eviction API are trying to make sure that substracting pods from the current amount don´t break anything, however the possibility of temporarily having one extra pod of each deployment is not contemplated at the moment.

This request is asking for the ability to use a surplus of pods to meet all constraints for safe eviction.

Some side notes to stress the importance. Although when operating evictions on large workloads lack of PDBs or PDBs with minAvailable/maxUnavailable settings work fine. When moving deployments with 1 replicas or HPA controlled deployments that are currently scaled down enough the problem is aggravated and can only be solved through a few inefficient means, which is acerbated if node maintenance is done automatically (such as GKE, and other cloud services)

Just in case, this is a limitation that should only be counted against Deployments with strategy.type=RollingUpdate.

Ways to deal with this situation currently:

Have a minReplicas/replicas to >1, and a PDB with maxUnavailable=1 when it's known that the autoscaler if in use, it's usually scaled on the lower end. Pros: There is no downtime, Cons: Waste of resources.
Do nothing, and deal with eventual downtimes. Pros: No waste of resources, Cons: There is downtime in the deployment

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2023-01-06T16:05:09Z

@txomon: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

txomon · 2023-01-06T16:10:34Z

/wg reliability
/sig scalability
/sig cluster-lifecycle
/sig autoscaling
/sig apps

k8s-triage-robot · 2023-04-06T17:05:44Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2023-05-06T17:42:00Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

runephilosof-karnovgroup · 2023-05-22T14:29:55Z

It seems @0xmichalis also advocated this on Mar 8, 2017 when drain was made to honor PodDisruptionBudgets. But is seems no one took notice of it.

It seems a lot of users are frustrated/confused by this https://duckduckgo.com/?q=poddisruptionbudget+single+replica. I think it would help if the maxSurge was utilized instead of getting stuck.

/remove-lifecycle rotten

james-callahan · 2023-07-31T03:45:50Z

This is still a huge obstacle for gracefully replacing nodes.
Is anyone trying to take this on?

txomon · 2023-07-31T10:54:23Z

I don't think so, it seems like it isn't gathering much attention, I'm not sure if the working group is even aware of it...

With a single replica and the PDB, it's not possible to cleanly evict the pod from a node. Relevant kubernetes issue: kubernetes/kubernetes#114877 (comment)

neolit123 · 2023-09-04T07:12:57Z

/remove-wg reliability
/remove-sig scalability
/remove-sig cluster-lifecycle
/remove-sig autoscaling
/sig node

sig apps is the primary owner of this FR due to the maxSurge feature. you can join their zoom meeting and present it:
https://github.com/kubernetes/community/tree/master/sig-apps
sig node can participate with respect to node draining (as the title indicates).

other sigs / wg should only be added once maintainers have a look at this ticket.

soltysh · 2023-09-05T11:50:42Z

It's important to understand that any controller (deployment, or any other workload) and disruption are two distinct mechanism having different roles, although their functionality is complementary when we're talking about ensuring high availability.
I'd like to stress here the high availability factor, which is at the front and center when talking about PDBs. I will admit it's challenging to have any further discussion without that prerequisite fullfilled.
In a similar vein, when talking about HPAs, there's an option for minimum number of replicas, which ensure the application always maintains the HA pre-reqs.

Moving on to the subject of configuring a rolling update (in case of a deployment), versus responding to external disruptions (in this case, PDB being an external actor is equal to invoking kubectl delete by the user). The reason we have
the ability to have a detailed rollout is coming from the fact that it is a process fully operated by the owning controller. Whereas in all other situations (ie. any external errors or disruptions), the controller's goal is to reach the desired state as quickly as possible.

Speaking from my own experience, we've had multiple instances of problems when PDBs were blocking the upgrades, due to users setting the allowed replicas down to bare minimum. We solved it by adding an alert which looks at PDBs and notifies administrators in those cases. Since then we haven't seen any problems, or cluster administrators were aware that problems might popup during upgrade and were able to solve these problems even before initiating the upgrades.

Having said all of the above, SIG Apps meeting happens on every other Monday, next occurrence is planned for September 18. If you're interested, I'd be happy to hear more about your use cases and the problems you're struggling with.

txomon · 2023-09-05T12:08:03Z

Hello Maciej, I will do my best to join the monday call. Just for other people reading the thread, my use case are all the miscellaneous apps that are needed to run some services on top of kubernetes (such as knative admission controller, external-dns, etc.), as well as any other app that just doesn't need scaling. These apps have PDBs that make sure that there is always at least a single instance running (hence it makes sense that the minimum availability of the PDB is 1), however running more than one instance at a time would be a waste of resources. I understand there is ownership and interactions between the different controllers, and some of my ideas were revolving around creating a new attribute, however it was brought to my attention that the `maxSurge` field perfectly describes the situation, "I'm okay with having up to X instances for a while". Regarding the PDB blocking upgrade issue, I had those too, however because we are running in GKE and nodes get recycled in a regular basis, we can't have a cluster operator be waiting for a node maintenance to happen (nor we want to). I hope I was able to give enough context, Cheers, Javier

…

On Tue, Sep 5, 2023 at 1:50 PM Maciej Szulik ***@***.***> wrote: It's important to understand that any controller (deployment, or any other workload) and disruption are two distinct mechanism having different roles, although their functionality is complementary when we're talking about ensuring high availability. I'd like to stress here the high availability factor, which is at the front and center when talking about PDBs. I will admit it's challenging to have any further discussion without that prerequisite fullfilled. In a similar vein, when talking about HPAs, there's an option for minimum number of replicas, which ensure the application always maintains the HA pre-reqs. Moving on to the subject of configuring a rolling update (in case of a deployment), versus responding to external disruptions (in this case, PDB being an external actor is equal to invoking kubectl delete by the user). The reason we have the ability to have a detailed rollout is coming from the fact that it is a process fully operated by the owning controller. Whereas in all other situations (ie. any external errors or disruptions), the controller's goal is to reach the desired state as quickly as possible. Speaking from my own experience, we've had multiple instances of problems when PDBs were blocking the upgrades, due to users setting the allowed replicas down to bare minimum. We solved it by adding an alert <https://github.com/openshift/cluster-kube-controller-manager-operator/blob/d95b0c25ba55c4ef8e09e56461562ee60b22d51c/manifests/0000_90_kube-controller-manager-operator_05_alerts.yaml#L25-L44> which looks at PDBs and notifies administrators in those cases. Since then we haven't seen any problems, or cluster administrators were aware that problems might popup during upgrade and were able to solve these problems even before initiating the upgrades. Having said all of the above, SIG Apps <https://github.com/kubernetes/community/tree/master/sig-apps> meeting happens on every other Monday, next occurrence is planned for September 18. If you're interested, I'd be happy to hear more about your use cases and the problems you're struggling with. — Reply to this email directly, view it on GitHub <#114877 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABXXGQOBDXTKBTHRAUQZ2DXY4GZ5ANCNFSM6AAAAAATTG7BPM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

james-callahan · 2023-09-05T12:21:00Z

Likewise: usually when I run into this its when a low availability requirement cluster is running cluster-critical services (e.g. an admission controller) where it is a complete waste of resources to run multiple replicas, but the need to drain without manual interaction (e.g. in response to a spot instance removal) is important.

I'm unable to join the SIG apps call, as the meeting time is incompatible with Australian timezones.

txomon · 2023-11-27T17:56:11Z

For anyone following the thread, https://github.com/kubernetes/enhancements/pull/4213/files was brought forward to take into account for this situation during the sig-apps weekly.

k8s-triage-robot · 2024-02-25T18:35:26Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

james-callahan · 2024-02-25T23:20:13Z

/remove-lifecycle stale

Where can I follow any ongoing discussion?

k8s-triage-robot · 2024-05-25T23:33:17Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

txomon added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 6, 2023

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 6, 2023

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 6, 2023

txomon changed the title ~~maxSurge for node draining or how to meet availability requirements when draining nodes~~ maxSurge for node draining or how to meet availability requirements when draining nodes by adding pods Jan 6, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 6, 2023

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 6, 2023

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 22, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 25, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 25, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 25, 2024

atiratree mentioned this issue Jun 7, 2024

KEP-4212: Declarative Node Maintenance kubernetes/enhancements#4213

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

maxSurge for node draining or how to meet availability requirements when draining nodes by adding pods #114877

maxSurge for node draining or how to meet availability requirements when draining nodes by adding pods #114877

txomon commented Jan 6, 2023 •

edited

k8s-ci-robot commented Jan 6, 2023

txomon commented Jan 6, 2023

k8s-triage-robot commented Apr 6, 2023

k8s-triage-robot commented May 6, 2023

runephilosof-karnovgroup commented May 22, 2023

james-callahan commented Jul 31, 2023

txomon commented Jul 31, 2023

neolit123 commented Sep 4, 2023

soltysh commented Sep 5, 2023

txomon commented Sep 5, 2023 via email

james-callahan commented Sep 5, 2023

txomon commented Nov 27, 2023 •

edited

k8s-triage-robot commented Feb 25, 2024

james-callahan commented Feb 25, 2024

k8s-triage-robot commented May 25, 2024

maxSurge for node draining or how to meet availability requirements when draining nodes by adding pods #114877

maxSurge for node draining or how to meet availability requirements when draining nodes by adding pods #114877

Comments

txomon commented Jan 6, 2023 • edited

What would you like to be added?

Why is this needed?

k8s-ci-robot commented Jan 6, 2023

txomon commented Jan 6, 2023

k8s-triage-robot commented Apr 6, 2023

k8s-triage-robot commented May 6, 2023

runephilosof-karnovgroup commented May 22, 2023

james-callahan commented Jul 31, 2023

txomon commented Jul 31, 2023

neolit123 commented Sep 4, 2023

soltysh commented Sep 5, 2023

txomon commented Sep 5, 2023 via email

james-callahan commented Sep 5, 2023

txomon commented Nov 27, 2023 • edited

k8s-triage-robot commented Feb 25, 2024

james-callahan commented Feb 25, 2024

k8s-triage-robot commented May 25, 2024

txomon commented Jan 6, 2023 •

edited

txomon commented Nov 27, 2023 •

edited