Skip to content

Upgrade Kubernetes Cluster

kimschles edited this page Mar 13, 2019 · 2 revisions

For a cluster built with aws, pentagon and kops

  1. Update the kubernetes version and AMI image
  • In the cluster.yaml, update the kubernetes version
  • In the nodes.yaml and masters.yaml, update the AMI image id
  1. Replace those files using kops
  • Look at the kops.sh script. Make sure the verb replace is present, and comment-out the secrets script if you don't need it
  • ./kops.sh
  1. Update the launch configuration
  • Run kops update cluster
  • This means that any new nodes that launch will use the updated K8s version and AMI image
  1. Upgrade just the masters
  • Run kops rolling-update cluster to see all nodes
  • Take note of the names of each instance group.
  • Run kops rolling-update cluster and pass in the instance group flag with the name of each ig, like this:
    • kops rolling-update cluster --instance-group master-us-east-1a --instance-group master-us-east-1b --instance-group master-us-east-1c
  • This takes a while because each master is terminated and replaced one at a time

  1. Double the cluster size by upping the mins and maxes
  • Edit the kops instance groups:
    • kops edit ig nodes-us-east-1a
  1. If you are using ELBs, take a moment here to validate that your new nodes are Healthy in the ELB.
  2. Check your load balancer services and change the externalTrafficPolicy from Local to Cluster Here's an example of how to find those services:
    • kubectl get svc --all-namespaces | grep LoadBalancer
    • kubectl -n infra edit svc nginx-ingress-external-controller
    • kubectl -n infra edit svc nginx-ingress-internal-controller
  3. Cordon the old nodes
    • kubectl get nodes | grep <old version> | awk '{print $1}' | xargs kubectl cordon
  4. Drain the old nodes
    • kubectl get nodes | grep SchedulingDisabled | awk '{print $1}' | xargs kubectl drain --ignore-daemonsets --delete-local-data --force
  5. Change back what you've changed
    • externalTrafficPolicy: Local
    • mins and maxes with kops edit

note: there's an alternative to turning off the cluster-autoscaler and it is modifying the minimum node count on the ASG, and the autoscaler respects that number (especially since we are using automatic node-group detection), so it won’t try to delete any nodes during the upgrade because the minimum is the double cluster size


Vocabulary

  • A rolling update means that the deployment updates pods one at a time (or whatever cadence) so there is no downtime.
  • Cordoning a node: no new pods will be scheduled on the node
  • Draining a node: boots out all pods on a node
Clone this wiki locally