-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize and improve rolling updates even more #1718
Comments
Trying to consolidate #1452 here Rolling-updates has some issues that should be handled:
In order to handle this issues rolling-update could take and strategy to bring up and terminate nodes at the same time and by N instances:
should parameters ( Regarding to be smarter and know if there is enough resources to drain a node cluster autoscaler can be inspiration https://github.com/kubernetes/contrib/blob/master/cluster-autoscaler The simple flow should be:
|
Another idea that @hubt mentioned on chat is that some users may want to cordon all nodes and wait for the new node before the pods are rescheduled. |
I realized that my algorithm will take forever in case number of nodes are more than a handful. Start new nodes = as many as old nodes Advantages of this algorithm are simpler, quicker to implement, at least for first cut thanks @vinayagg |
I need to also research the Pod Disruption stuff upstream. |
Prescaling the number of nodes seems like a good idea, but what about masters? It's my understanding that kops doesn't yet manage etcd membership. |
I would not recommend prescaling masters, as we have already run into problems with HA upgrades. Looking headroom on the masters is usually not a huge issue. And the code running on the masters 'should' be built for failover. The practice for upgrading Masters is on at a time. |
I have put in a PR #2818 that starts to address this issue This PR includes two new strategies that influence node replacement. The first option is the current code path. All masters and bastions are rolled sequentially before the nodes, and this flag does not control their replacement. We are now including three new strategies that influence node replacement. All masters and bastions are rolled sequentially before the nodes, and this flag does not influence their replacement. These strategies utilize the feature flag mentioned above.
The second option pre-creates a whole new set of instance groups first, while the third option only creates a new ig as each ig is rolled. The first and default option is the original code path. |
@chrislovecnm We've been talking about rolling updates a lot here over the past week. We've had a great experience with a running kops cluster [thanks], but we've had less great experiences with rolling updates. It seems that its usually cluster networking [weave]. We think its due to weave's decentralized approach of storing ips. Anyhow, we've concluded that roling updates will take a long time no matter what, and we'd rather not risk the whole cluster at once. We would like to take rolling updates slow. like, really slow. For example, update a couple of nodes a day, with the idea that its always going on-- an 'evergreen cluster' if you will. This way, we never have risk of a big downtime, and we never have more than a couple of nodes broken-- we can just cordon them and deal with it. Is this a stupid idea? If not, i think what we want is kops rolling update, where the time between nodes is on the order of a day or more, which i think requires moving the state of the upgrade in progress to the server side, or maybe taking a whole other strategy. What are your thoughts? |
@dcowden We personally dont use rolling-restarts because the cluster is fronted by an ha-proxy and by the end of a roling-restart we would end up with stale records. However, we considered it a bit and decided it was actually too slow to be useful. The restriction you pose makes perfect sense in the context of weave - but I would point out that your suggesting is to peg the performance to the 'lowest performing component' (not to hate on weave, it serves a purpose, but its peer-peer discovery makes it not so ideal for high-frequency node changes). You might want to look at calico and flannel which use etcd for storing their state (disclaimer: I am no where close to being an expert in CNI technology) Instead what you are suggesting could be an additional feature on top of rolling update. "Roll the cluster with X nodes over Y time" |
hi @toidiu We are currently stuck with weave. canal is our first choice, but unfortunately we have extremely high security requirements, and one of them is that our pod network is encrypted. weave supports this. Flannel is working on adding encryption, and weave is working on the problems associated with the decentralized model. For now, we're stuck until one of them moves. |
As a side note, we're also looking into using wireguard. Has anyone tried that with kubernetes? |
Ish ... we have some ideas. As much as it has been tested rolling-update is not that risky. It uses the same HA principle as failover.
You can, but how many nodes? If you do it that slow just cordon and drain a node and delete it in ec2. Wash, rinse, repeat. You could build a simple controller that kills the oldest node :) It gets complicated managing the launch configurations :)
I think you are overthinking it, personally. But I understand your concerns. This is another approach, which is kinda what I am going to code next:
|
@chrislovecnm thanks. in our experience, rolling updates have broken things quite frequently. I think @toidiu is right-- that's weave's fault, and the right solution is to fix weave not to tiptoe around it. I like your ig-base rolling update approach. would it be possible to have a manual step/trigger between steps 3 and 4? |
have you worked with the weave team? Where is the issue? |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
what is the latest here? after clicking through multiple tickets and MR maybe this #4038 is the latest ? |
Instead of rolling-update I am doing the following steps after updating:
This is cutting more than in half of the time spending with rolling updates. |
Since #4038 has been closed, I'm wondering if there's any opposition to a smaller change in the face of the Sticking with the current rolling-update strategy, but surging by 1 VM before beginning the node drainage? It's not ideal to temporarily shift more load onto VMs and then shortly take that extra loaded VM down as well. If not, would someone kindly update this issue with how |
Repeated from #1134 (comment)
It looks like rolling update only parallelizes by rolling one node from each instance group concurrently. In a medium or large cluster, that's going to take a while. Rather than having to wait for an hour or temporarily expand the ASG, it'd be nice if I could specify a roll concurrency parameter in absolute number or as a percentage, similar to how a rollingUpdateStrategy of a deployment has a maxUnavailable. It's true that in a heterogeneous cluster it could simultaneously roll all your big nodes, but that's an acceptable risk to save hours of waiting.
If you want to be fancy, you could attempt to auto-detect an acceptable roll rate by making sure there are no Pending pods, but that's probably too tricky for right now.
Another good suggestion was to pre-scale ASG before rolling.
The text was updated successfully, but these errors were encountered: