Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not possible to rolling update cluster #9953

Closed
zetaab opened this issue Sep 16, 2020 · 9 comments · Fixed by #9998
Closed

not possible to rolling update cluster #9953

zetaab opened this issue Sep 16, 2020 · 9 comments · Fixed by #9998
Labels
blocks-next kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@zetaab
Copy link
Member

zetaab commented Sep 16, 2020

1. What kops version are you running? The command kops version, will display
this information.

1.19 alpha 4

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

1.19.1

3. What cloud provider are you using?

openstack / aws

4. What commands did you run? What is the simplest way to reproduce this issue?

I am updating clusters from 1.17 / 1.18 -> 1.19.1

kops update cluster --yes && kops rolling-update --yes

5. What happened after the commands executed?

Bastion is rotated fine. However, after the bastion is rotated something is executing new manifests for kops-controller. This will lead to following situation:

kops-controller-68fc4                                        1/1     Running             2          20d
kops-controller-9hq4s                                        1/1     Running             0          15d
kops-controller-b7jrx                                        0/1     ContainerCreating   0          20m
% kubectl describe pod kops-controller-b7jrx

  Warning  FailedMount  14s (x18 over 20m)   kubelet, master-zone-1-1-1-jannem-k8s-local  MountVolume.SetUp failed for volume "kops-controller-pki" : hostPath type check failed: /etc/kubernetes/kops-controller/ is not a directory

So the new kops-controller manifest is NOT backwards compatible and this means that only way to update kops cluster masters currently is roll ALL masters at once (or skip cluster validation or modify kops controller manifest and be fast). The folder does not exist (yet) in old masters installed by older kops version. This folder is available only in newest kops version. So it should not update the manifest before the folder really exists.

6. What did you expect to happen?

I expect that rolling update should work usually

@zetaab zetaab added kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Sep 16, 2020
@zetaab
Copy link
Member Author

zetaab commented Sep 16, 2020

@johngmyers as you have done #9653 which makes this bug happen, do you have any ideas how we could fix this correctly? My fix is dirty fix and it will not work in all cases. It might be that there is 1 old kops-controller after full kops rolling-update cluster.

This happens only if the cluster is created before kops 1.19 like using kops 1.18.

@zetaab
Copy link
Member Author

zetaab commented Sep 16, 2020

Ok we fixed this in following way:

kubectl apply -f hack.yaml && sleep 20 && kubectl delete ds folderfix -n kube-system && kubectl delete pods -n kube-system -l k8s-app=kops-controller

Where the hack.yaml is following:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: folderfix
  namespace: kube-system
  labels:
    app.kubernetes.io/name: folderfix
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: folderfix
  template:
    metadata:
      labels:
        app.kubernetes.io/name: folderfix
    spec:
      containers:
      - name: folderfix
        image: busybox
        command: [ "sh", "-c", "mkdir /etc/kubernetes/kops-controller/" ]
        volumeMounts:
        - name: files
          mountPath: /etc/kubernetes
      volumes:
      - name: files
        hostPath:
          path: /etc/kubernetes
      nodeSelector:
        node-role.kubernetes.io/master: ""
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists

warning tested only in OpenStack

@johngmyers
Copy link
Member

Setting an updateStrategy of OnDelete seems appropriate. A static manifest would probably be even more appropriate, but there might be other issues with that.

The hack is likely to result in non-working (or only partially-working) kops-controllers on AWS, as it won't provision the keys/certs that the AWS bootstrap server needs. It might be enough to get cluster validation to pass long enough for the control plane to update. A simpler form of that hack would be to make the kops-controller-pki volume type: DirectoryOrCreate.

I wonder if we could put an additional nodeSelector on the DaemonSet to keep it from scheduling on old nodes.

As a separate issue, we might want to make it so that bastions don't apply addons. I'm a bit concerned that they even have the credentials to be able to do that. Or is it that the old control plane nodes are picking up and applying the new set of addon manifests?

@johngmyers
Copy link
Member

If we do go with OnDelete we might need to put a hash derived from the manifest in the NodeupConfig of masters in order to make sure rolling update will deploy any changes.

@zetaab
Copy link
Member Author

zetaab commented Sep 17, 2020

@johngmyers I can confirm that this workaround does not work for AWS, it do work only in OpenStack

@olemarkus
Copy link
Member

If we do go with OnDelete we might need to put a hash derived from the manifest in the NodeupConfig of masters in order to make sure rolling update will deploy any changes.

The same "problem" can exist for DaemonSets too. So not only things running on masters. We could add a nodeselector to the channels API and when that selector is set, channels cmd set kops.k8s.io/needs-update annotation on the matching nodes after running it's normal kubectl apply.

@alesanmed
Copy link

alesanmed commented Oct 8, 2020

@johngmyers So sorry to ask again but I'm having this exact same problem after upgrading from 1.18.1 to 1.19.0... How can I fix this? :(

PS: Going back to v1.18.1 is definetly an option, I tried but still the same problem. Thanks

@olemarkus
Copy link
Member

@alesanmed can you file a new issue about this?

@alesanmed
Copy link

@olemarkus Sorry for not replying. I managed to rollback the cluster version and, since I've seen that the v1.19 is still an alpha, I'll wait for a stable release. Don't want to bother with issues since I'm sure they're going to fix them. Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocks-next kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
4 participants