-
Notifications
You must be signed in to change notification settings - Fork 12
Upgrade Kubernetes Cluster
kimschles edited this page Mar 13, 2019
·
2 revisions
- Update the kubernetes version and AMI image
- In the
cluster.yaml
, update the kubernetes version - In the
nodes.yaml
andmasters.yaml
, update the AMI image id
- Replace those files using kops
- Look at the
kops.sh
script. Make sure the verbreplace
is present, and comment-out the secrets script if you don't need it ./kops.sh
- Update the launch configuration
- Run
kops update cluster
- This means that any new nodes that launch will use the updated K8s version and AMI image
- Upgrade just the masters
- Run
kops rolling-update cluster
to see all nodes - Take note of the names of each instance group.
- Run
kops rolling-update cluster
and pass in the instance group flag with the name of each ig, like this:kops rolling-update cluster --instance-group master-us-east-1a --instance-group master-us-east-1b --instance-group master-us-east-1c
- This takes a while because each master is terminated and replaced one at a time
- Double the cluster size by upping the mins and maxes
- Edit the kops instance groups:
kops edit ig nodes-us-east-1a
- If you are using ELBs, take a moment here to validate that your new nodes are Healthy in the ELB.
- Check your load balancer services and change the
externalTrafficPolicy
fromLocal
toCluster
Here's an example of how to find those services:kubectl get svc --all-namespaces | grep LoadBalancer
kubectl -n infra edit svc nginx-ingress-external-controller
kubectl -n infra edit svc nginx-ingress-internal-controller
- Cordon the old nodes
kubectl get nodes | grep <old version> | awk '{print $1}' | xargs kubectl cordon
- Drain the old nodes
kubectl get nodes | grep SchedulingDisabled | awk '{print $1}' | xargs kubectl drain --ignore-daemonsets --delete-local-data --force
- Change back what you've changed
externalTrafficPolicy: Local
- mins and maxes with kops edit
note: there's an alternative to turning off the cluster-autoscaler and it is modifying the minimum node count on the ASG, and the autoscaler respects that number (especially since we are using automatic node-group detection), so it won’t try to delete any nodes during the upgrade because the minimum is the double cluster size
- A rolling update means that the deployment updates pods one at a time (or whatever cadence) so there is no downtime.
- Cordoning a node: no new pods will be scheduled on the node
- Draining a node: boots out all pods on a node