Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to revert deployment to previous version #43948

Closed
justinsb opened this issue Apr 1, 2017 · 10 comments
Closed

Unable to revert deployment to previous version #43948

justinsb opened this issue Apr 1, 2017 · 10 comments
Labels
area/workload-api/deployment kind/bug Categorizes issue or PR as related to a bug. sig/apps Categorizes an issue or PR as relevant to SIG Apps.

Comments

@justinsb
Copy link
Member

justinsb commented Apr 1, 2017

K8s 1.6.0: When I try setting a deployment back to a version that already existed (i.e. A -> B -> A), I get the following error logged in k-c-m:

I0401 15:21:52.223300       5 deployment_controller.go:480] Error syncing deployment kube-system/kube-dns: replicasets.extensions "kube-dns-1321724180" already exists
I0401 15:21:52.232857       5 deployment_controller.go:480] Error syncing deployment kube-system/kube-dns: replicasets.extensions "kube-dns-1321724180" already exists
I0401 15:21:52.247466       5 deployment_controller.go:480] Error syncing deployment kube-system/kube-dns: replicasets.extensions "kube-dns-1321724180" already exists
I0401 15:21:52.272142       5 deployment_controller.go:480] Error syncing deployment kube-system/kube-dns: replicasets.extensions "kube-dns-1321724180" already exists
I0401 15:21:52.319308       5 deployment_controller.go:480] Error syncing deployment kube-system/kube-dns: replicasets.extensions "kube-dns-1321724180" already exists
E0401 15:21:52.404181       5 deployment_controller.go:485] replicasets.extensions "kube-dns-1321724180" already exists
I0401 15:21:52.404279       5 deployment_controller.go:486] Dropping deployment "kube-system/kube-dns" out of the queue: replicasets.extensions "kube-dns-1321724180" already exists

(In my case, I was changing the kube-dns config map from optional: true -> optional: false -> optional: true, around a 1.5 -> 1.6 -> 1.5 upgrade / downgrade)

This is particularly problematic because the new deployment is then not configured - the pod retains its old configuration.

I was able to reproduce this repeatedly, but then I deleted the replicaset and the system was then able to recover and I could no longer reproduce it.

@justinsb justinsb closed this as completed Apr 1, 2017
@justinsb justinsb changed the title Unable to revert deployment Unable to revert deployment to previous version Apr 1, 2017
@justinsb justinsb reopened this Apr 1, 2017
@0xmichalis
Copy link
Contributor

Is this from local testing? Can you post the full log from the controller manager? Can you also post the output of:

kubectl get deploy,rs,pods -l kube-dns_label_here -n kube-system -o yaml
kubectl get events -n kube-system

@0xmichalis
Copy link
Contributor

It seems that the controller manager was fast enough to process the kube-dns deployment in less than 0.2 seconds before the cache got updated with the replica set. Can happen when your queues are empty. We need to verify that rate-limiting actually works and maybe revisit its settings. Also we probably want to bump the retries to something higher, the deployment controller is the only one out of the workload controllers that drops objects out of its queue after some retry count.

@0xmichalis 0xmichalis added area/workload-api/deployment sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Apr 2, 2017
@0xmichalis
Copy link
Contributor

@kubernetes/sig-apps-bugs @deads2k

@0xmichalis
Copy link
Contributor

Rate-limiting works fine, we need to add more retries.

@justinsb
Copy link
Member Author

justinsb commented Apr 4, 2017

Thanks for looking into it - I was able to reproduce. My suspicion is that it is not a race, but rather because of newly introduced fields across versions.

>kubectl get deploy,rs,pods -l k8s-addon=kube-dns.addons.k8s.io -n kube-system -oyaml

(I inserted some sections for sanity)

apiVersion: v1
items:
- apiVersion: extensions/v1beta1
  kind: Deployment
  metadata:
    annotations:
      deployment.kubernetes.io/revision: "4"
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"extensions/v1beta1","kind":"Deployment","metadata":{"annotations":{},"labels":{"k8s-addon":"kube-dns.addons.k8s.io","k8s-app":"kube-dns","kubernetes.io/cluster-service":"true"},"name":"kube-dns","namespace":"kube-system"},"spec":{"selector":{"matchLabels":{"k8s-app":"kube-dns"}},"strategy":{"rollingUpdate":{"maxSurge":"10%","maxUnavailable":0}},"template":{"metadata":{"annotations":{"scheduler.alpha.kubernetes.io/critical-pod":"","scheduler.alpha.kubernetes.io/tolerations":"[{\"key\":\"CriticalAddonsOnly\", \"operator\":\"Exists\"}]"},"labels":{"k8s-app":"kube-dns"}},"spec":{"containers":[{"args":["--domain=cluster.local.","--dns-port=10053","--config-dir=/kube-dns-config","--v=2"],"env":[{"name":"PROMETHEUS_PORT","value":"10055"}],"image":"gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.1","livenessProbe":{"failureThreshold":5,"httpGet":{"path":"/healthcheck/kubedns","port":10054,"scheme":"HTTP"},"initialDelaySeconds":60,"successThreshold":1,"timeoutSeconds":5},"name":"kubedns","ports":[{"containerPort":10053,"name":"dns-local","protocol":"UDP"},{"containerPort":10053,"name":"dns-tcp-local","protocol":"TCP"},{"containerPort":10055,"name":"metrics","protocol":"TCP"}],"readinessProbe":{"httpGet":{"path":"/readiness","port":8081,"scheme":"HTTP"},"initialDelaySeconds":3,"timeoutSeconds":5},"resources":{"limits":{"memory":"170Mi"},"requests":{"cpu":"100m","memory":"70Mi"}},"volumeMounts":[{"mountPath":"/kube-dns-config","name":"kube-dns-config"}]},{"args":["-v=2","-logtostderr","-configDir=/etc/k8s/dns/dnsmasq-nanny","-restartDnsmasq=true","--","-k","--cache-size=1000","--log-facility=-","--server=/cluster.local/127.0.0.1#10053","--server=/in-addr.arpa/127.0.0.1#10053","--server=/in6.arpa/127.0.0.1#10053"],"image":"gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.1","livenessProbe":{"failureThreshold":5,"httpGet":{"path":"/healthcheck/dnsmasq","port":10054,"scheme":"HTTP"},"initialDelaySeconds":60,"successThreshold":1,"timeoutSeconds":5},"name":"dnsmasq","ports":[{"containerPort":53,"name":"dns","protocol":"UDP"},{"containerPort":53,"name":"dns-tcp","protocol":"TCP"}],"resources":{"requests":{"cpu":"150m","memory":"20Mi"}},"volumeMounts":[{"mountPath":"/etc/k8s/dns/dnsmasq-nanny","name":"kube-dns-config"}]},{"args":["--v=2","--logtostderr","--probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,A","--probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,A"],"image":"gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.1","livenessProbe":{"failureThreshold":5,"httpGet":{"path":"/metrics","port":10054,"scheme":"HTTP"},"initialDelaySeconds":60,"successThreshold":1,"timeoutSeconds":5},"name":"sidecar","ports":[{"containerPort":10054,"name":"metrics","protocol":"TCP"}],"resources":{"requests":{"cpu":"10m","memory":"20Mi"}}}],"dnsPolicy":"Default","serviceAccountName":"kube-dns","volumes":[{"configMap":{"name":"kube-dns","optional":true},"name":"kube-dns-config"}]}}}}
    creationTimestamp: 2017-04-03T15:27:40Z
    generation: 5
    labels:
      k8s-addon: kube-dns.addons.k8s.io
      k8s-app: kube-dns
      kubernetes.io/cluster-service: "true"
    name: kube-dns
    namespace: kube-system
    resourceVersion: "44633"
    selfLink: /apis/extensions/v1beta1/namespaces/kube-system/deployments/kube-dns
    uid: 1323102f-1882-11e7-a2e0-0aced200add4
  spec:
    replicas: 2
    selector:
      matchLabels:
        k8s-app: kube-dns
    strategy:
      rollingUpdate:
        maxSurge: 10%
        maxUnavailable: 0
      type: RollingUpdate
    template:
      metadata:
        annotations:
          scheduler.alpha.kubernetes.io/critical-pod: ""
          scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly",
            "operator":"Exists"}]'
        creationTimestamp: null
        labels:
          k8s-app: kube-dns
      spec:
        containers:
        - args:
          - --domain=cluster.local.
          - --dns-port=10053
          - --config-dir=/kube-dns-config
          - --v=2
          env:
          - name: PROMETHEUS_PORT
            value: "10055"
          image: gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.1
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /healthcheck/kubedns
              port: 10054
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
          name: kubedns
          ports:
          - containerPort: 10053
            name: dns-local
            protocol: UDP
          - containerPort: 10053
            name: dns-tcp-local
            protocol: TCP
          - containerPort: 10055
            name: metrics
            protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /readiness
              port: 8081
              scheme: HTTP
            initialDelaySeconds: 3
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
          resources:
            limits:
              memory: 170Mi
            requests:
              cpu: 100m
              memory: 70Mi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /kube-dns-config
            name: kube-dns-config
        - args:
          - -v=2
          - -logtostderr
          - -configDir=/etc/k8s/dns/dnsmasq-nanny
          - -restartDnsmasq=true
          - --
          - -k
          - --cache-size=1000
          - --log-facility=-
          - --server=/cluster.local/127.0.0.1#10053
          - --server=/in-addr.arpa/127.0.0.1#10053
          - --server=/in6.arpa/127.0.0.1#10053
          image: gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.1
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /healthcheck/dnsmasq
              port: 10054
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
          name: dnsmasq
          ports:
          - containerPort: 53
            name: dns
            protocol: UDP
          - containerPort: 53
            name: dns-tcp
            protocol: TCP
          resources:
            requests:
              cpu: 150m
              memory: 20Mi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /etc/k8s/dns/dnsmasq-nanny
            name: kube-dns-config
        - args:
          - --v=2
          - --logtostderr
          - --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,A
          - --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,A
          image: gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.1
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /metrics
              port: 10054
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
          name: sidecar
          ports:
          - containerPort: 10054
            name: metrics
            protocol: TCP
          resources:
            requests:
              cpu: 10m
              memory: 20Mi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
        dnsPolicy: Default
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        serviceAccount: kube-dns
        serviceAccountName: kube-dns
        terminationGracePeriodSeconds: 30
        volumes:
        - configMap:
            defaultMode: 420
            name: kube-dns
            optional: true
          name: kube-dns-config
  status:
    conditions:
    - lastTransitionTime: 2017-04-03T16:59:39Z
      lastUpdateTime: 2017-04-03T16:59:39Z
      message: Deployment does not have minimum availability.
      reason: MinimumReplicasUnavailable
      status: "False"
      type: Available
    observedGeneration: 4
    replicas: 2
    unavailableReplicas: 2
    updatedReplicas: 2
- apiVersion: extensions/v1beta1
  kind: ReplicaSet
  metadata:
    annotations:
      deployment.kubernetes.io/desired-replicas: "2"
      deployment.kubernetes.io/max-replicas: "3"
      deployment.kubernetes.io/revision: "3"
    creationTimestamp: 2017-04-03T15:55:37Z
    generation: 5
    labels:
      k8s-app: kube-dns
      pod-template-hash: "1321724180"
    name: kube-dns-1321724180
    namespace: kube-system
    ownerReferences:
    - apiVersion: extensions/v1beta1
      controller: true
      kind: Deployment
      name: kube-dns
      uid: 1323102f-1882-11e7-a2e0-0aced200add4
    resourceVersion: "6967"
    selfLink: /apis/extensions/v1beta1/namespaces/kube-system/replicasets/kube-dns-1321724180
    uid: fa88b571-1885-11e7-985a-0abf8cb3620a
  spec:
    replicas: 0
    selector:
      matchLabels:
        k8s-app: kube-dns
        pod-template-hash: "1321724180"
    template:
      metadata:
        annotations:
          scheduler.alpha.kubernetes.io/critical-pod: ""
          scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly",
            "operator":"Exists"}]'
        creationTimestamp: null
        labels:
          k8s-app: kube-dns
          pod-template-hash: "1321724180"
      spec:
        containers:
        - args:
          - --domain=cluster.local.
          - --dns-port=10053
          - --config-dir=/kube-dns-config
          - --v=2
          env:
          - name: PROMETHEUS_PORT
            value: "10055"
          image: gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.1
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /healthcheck/kubedns
              port: 10054
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
          name: kubedns
          ports:
          - containerPort: 10053
            name: dns-local
            protocol: UDP
          - containerPort: 10053
            name: dns-tcp-local
            protocol: TCP
          - containerPort: 10055
            name: metrics
            protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /readiness
              port: 8081
              scheme: HTTP
            initialDelaySeconds: 3
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
          resources:
            limits:
              memory: 170Mi
            requests:
              cpu: 100m
              memory: 70Mi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /kube-dns-config
            name: kube-dns-config
        - args:
          - -v=2
          - -logtostderr
          - -configDir=/etc/k8s/dns/dnsmasq-nanny
          - -restartDnsmasq=true
          - --
          - -k
          - --cache-size=1000
          - --log-facility=-
          - --server=/cluster.local/127.0.0.1#10053
          - --server=/in-addr.arpa/127.0.0.1#10053
          - --server=/in6.arpa/127.0.0.1#10053
          image: gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.1
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /healthcheck/dnsmasq
              port: 10054
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
          name: dnsmasq
          ports:
          - containerPort: 53
            name: dns
            protocol: UDP
          - containerPort: 53
            name: dns-tcp
            protocol: TCP
          resources:
            requests:
              cpu: 150m
              memory: 20Mi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /etc/k8s/dns/dnsmasq-nanny
            name: kube-dns-config
        - args:
          - --v=2
          - --logtostderr
          - --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,A
          - --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,A
          image: gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.1
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /metrics
              port: 10054
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
          name: sidecar
          ports:
          - containerPort: 10054
            name: metrics
            protocol: TCP
          resources:
            requests:
              cpu: 10m
              memory: 20Mi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
        dnsPolicy: Default
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        serviceAccount: kube-dns
        serviceAccountName: kube-dns
        terminationGracePeriodSeconds: 30
        volumes:
        - configMap:
            defaultMode: 420
            name: kube-dns
          name: kube-dns-config
  status:
    observedGeneration: 5
    replicas: 0
- apiVersion: extensions/v1beta1
  kind: ReplicaSet
  metadata:
    annotations:
      deployment.kubernetes.io/desired-replicas: "2"
      deployment.kubernetes.io/max-replicas: "3"
      deployment.kubernetes.io/revision: "1"
    creationTimestamp: 2017-04-03T15:27:40Z
    generation: 4
    labels:
      k8s-app: kube-dns
      pod-template-hash: "141550303"
    name: kube-dns-141550303
    namespace: kube-system
    ownerReferences:
    - apiVersion: extensions/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: Deployment
      name: kube-dns
      uid: 1323102f-1882-11e7-a2e0-0aced200add4
    resourceVersion: "3624"
    selfLink: /apis/extensions/v1beta1/namespaces/kube-system/replicasets/kube-dns-141550303
    uid: 134bd95b-1882-11e7-a2e0-0aced200add4
  spec:
    replicas: 0
    selector:
      matchLabels:
        k8s-app: kube-dns
        pod-template-hash: "141550303"
    template:
      metadata:
        annotations:
          scheduler.alpha.kubernetes.io/critical-pod: ""
          scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly",
            "operator":"Exists"}]'
        creationTimestamp: null
        labels:
          k8s-app: kube-dns
          pod-template-hash: "141550303"
      spec:
        containers:
        - args:
          - --domain=cluster.local.
          - --dns-port=10053
          - --config-dir=/kube-dns-config
          - --v=2
          env:
          - name: PROMETHEUS_PORT
            value: "10055"
          image: gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.1
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /healthcheck/kubedns
              port: 10054
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
          name: kubedns
          ports:
          - containerPort: 10053
            name: dns-local
            protocol: UDP
          - containerPort: 10053
            name: dns-tcp-local
            protocol: TCP
          - containerPort: 10055
            name: metrics
            protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /readiness
              port: 8081
              scheme: HTTP
            initialDelaySeconds: 3
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
          resources:
            limits:
              memory: 170Mi
            requests:
              cpu: 100m
              memory: 70Mi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /kube-dns-config
            name: kube-dns-config
        - args:
          - -v=2
          - -logtostderr
          - -configDir=/etc/k8s/dns/dnsmasq-nanny
          - -restartDnsmasq=true
          - --
          - -k
          - --cache-size=1000
          - --log-facility=-
          - --server=/cluster.local/127.0.0.1#10053
          - --server=/in-addr.arpa/127.0.0.1#10053
          - --server=/in6.arpa/127.0.0.1#10053
          image: gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.1
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /healthcheck/dnsmasq
              port: 10054
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
          name: dnsmasq
          ports:
          - containerPort: 53
            name: dns
            protocol: UDP
          - containerPort: 53
            name: dns-tcp
            protocol: TCP
          resources:
            requests:
              cpu: 150m
              memory: 20Mi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /etc/k8s/dns/dnsmasq-nanny
            name: kube-dns-config
        - args:
          - --v=2
          - --logtostderr
          - --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,A
          - --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,A
          image: gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.1
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /metrics
              port: 10054
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
          name: sidecar
          ports:
          - containerPort: 10054
            name: metrics
            protocol: TCP
          resources:
            requests:
              cpu: 10m
              memory: 20Mi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
        dnsPolicy: Default
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        terminationGracePeriodSeconds: 30
        volumes:
        - configMap:
            defaultMode: 420
            name: kube-dns
          name: kube-dns-config
  status:
    observedGeneration: 4
    replicas: 0
- apiVersion: extensions/v1beta1
  kind: ReplicaSet
  metadata:
    annotations:
      deployment.kubernetes.io/desired-replicas: "2"
      deployment.kubernetes.io/max-replicas: "3"
      deployment.kubernetes.io/revision: "4"
      deployment.kubernetes.io/revision-history: "2"
    creationTimestamp: 2017-04-03T15:32:47Z
    generation: 4
    labels:
      k8s-app: kube-dns
      pod-template-hash: "1443162385"
    name: kube-dns-1443162385
    namespace: kube-system
    ownerReferences:
    - apiVersion: extensions/v1beta1
      controller: true
      kind: Deployment
      name: kube-dns
      uid: 1323102f-1882-11e7-a2e0-0aced200add4
    resourceVersion: "8179"
    selfLink: /apis/extensions/v1beta1/namespaces/kube-system/replicasets/kube-dns-1443162385
    uid: ca3799d2-1882-11e7-a2e0-0aced200add4
  spec:
    replicas: 2
    selector:
      matchLabels:
        k8s-app: kube-dns
        pod-template-hash: "1443162385"
    template:
      metadata:
        annotations:
          scheduler.alpha.kubernetes.io/critical-pod: ""
          scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly",
            "operator":"Exists"}]'
        creationTimestamp: null
        labels:
          k8s-app: kube-dns
          pod-template-hash: "1443162385"
      spec:
        containers:
        - args:
          - --domain=cluster.local.
          - --dns-port=10053
          - --config-dir=/kube-dns-config
          - --v=2
          env:
          - name: PROMETHEUS_PORT
            value: "10055"
          image: gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.1
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /healthcheck/kubedns
              port: 10054
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
          name: kubedns
          ports:
          - containerPort: 10053
            name: dns-local
            protocol: UDP
          - containerPort: 10053
            name: dns-tcp-local
            protocol: TCP
          - containerPort: 10055
            name: metrics
            protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /readiness
              port: 8081
              scheme: HTTP
            initialDelaySeconds: 3
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
          resources:
            limits:
              memory: 170Mi
            requests:
              cpu: 100m
              memory: 70Mi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /kube-dns-config
            name: kube-dns-config
        - args:
          - -v=2
          - -logtostderr
          - -configDir=/etc/k8s/dns/dnsmasq-nanny
          - -restartDnsmasq=true
          - --
          - -k
          - --cache-size=1000
          - --log-facility=-
          - --server=/cluster.local/127.0.0.1#10053
          - --server=/in-addr.arpa/127.0.0.1#10053
          - --server=/in6.arpa/127.0.0.1#10053
          image: gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.1
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /healthcheck/dnsmasq
              port: 10054
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
          name: dnsmasq
          ports:
          - containerPort: 53
            name: dns
            protocol: UDP
          - containerPort: 53
            name: dns-tcp
            protocol: TCP
          resources:
            requests:
              cpu: 150m
              memory: 20Mi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
          - mountPath: /etc/k8s/dns/dnsmasq-nanny
            name: kube-dns-config
        - args:
          - --v=2
          - --logtostderr
          - --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,A
          - --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,A
          image: gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.1
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 5
            httpGet:
              path: /metrics
              port: 10054
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 5
          name: sidecar
          ports:
          - containerPort: 10054
            name: metrics
            protocol: TCP
          resources:
            requests:
              cpu: 10m
              memory: 20Mi
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
        dnsPolicy: Default
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        serviceAccount: kube-dns
        serviceAccountName: kube-dns
        terminationGracePeriodSeconds: 30
        volumes:
        - configMap:
            defaultMode: 420
            name: kube-dns
          name: kube-dns-config
  status:
    fullyLabeledReplicas: 2
    observedGeneration: 4
    replicas: 2
- apiVersion: v1
  kind: Pod
  metadata:
    annotations:
      kubernetes.io/created-by: |
        {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"kube-dns-1443162385","uid":"ca3799d2-1882-11e7-a2e0-0aced200add4","apiVersion":"extensions","resourceVersion":"8149"}}
      scheduler.alpha.kubernetes.io/critical-pod: ""
      scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly", "operator":"Exists"}]'
    creationTimestamp: 2017-04-03T16:59:59Z
    generateName: kube-dns-1443162385-
    labels:
      k8s-app: kube-dns
      pod-template-hash: "1443162385"
    name: kube-dns-1443162385-l485c
    namespace: kube-system
    ownerReferences:
    - apiVersion: extensions/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: ReplicaSet
      name: kube-dns-1443162385
      uid: ca3799d2-1882-11e7-a2e0-0aced200add4
    resourceVersion: "43273"
    selfLink: /api/v1/namespaces/kube-system/pods/kube-dns-1443162385-l485c
    uid: f889cde2-188e-11e7-996e-0a2a5c43cf94
  spec:
    containers:
    - args:
      - --domain=cluster.local.
      - --dns-port=10053
      - --config-dir=/kube-dns-config
      - --v=2
      env:
      - name: PROMETHEUS_PORT
        value: "10055"
      image: gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.1
      imagePullPolicy: IfNotPresent
      livenessProbe:
        failureThreshold: 5
        httpGet:
          path: /healthcheck/kubedns
          port: 10054
          scheme: HTTP
        initialDelaySeconds: 60
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 5
      name: kubedns
      ports:
      - containerPort: 10053
        name: dns-local
        protocol: UDP
      - containerPort: 10053
        name: dns-tcp-local
        protocol: TCP
      - containerPort: 10055
        name: metrics
        protocol: TCP
      readinessProbe:
        failureThreshold: 3
        httpGet:
          path: /readiness
          port: 8081
          scheme: HTTP
        initialDelaySeconds: 3
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 5
      resources:
        limits:
          memory: 170Mi
        requests:
          cpu: 100m
          memory: 70Mi
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
      - mountPath: /kube-dns-config
        name: kube-dns-config
      - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
        name: kube-dns-token-pp102
        readOnly: true
    - args:
      - -v=2
      - -logtostderr
      - -configDir=/etc/k8s/dns/dnsmasq-nanny
      - -restartDnsmasq=true
      - --
      - -k
      - --cache-size=1000
      - --log-facility=-
      - --server=/cluster.local/127.0.0.1#10053
      - --server=/in-addr.arpa/127.0.0.1#10053
      - --server=/in6.arpa/127.0.0.1#10053
      image: gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.1
      imagePullPolicy: IfNotPresent
      livenessProbe:
        failureThreshold: 5
        httpGet:
          path: /healthcheck/dnsmasq
          port: 10054
          scheme: HTTP
        initialDelaySeconds: 60
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 5
      name: dnsmasq
      ports:
      - containerPort: 53
        name: dns
        protocol: UDP
      - containerPort: 53
        name: dns-tcp
        protocol: TCP
      resources:
        requests:
          cpu: 150m
          memory: 20Mi
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
      - mountPath: /etc/k8s/dns/dnsmasq-nanny
        name: kube-dns-config
      - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
        name: kube-dns-token-pp102
        readOnly: true
    - args:
      - --v=2
      - --logtostderr
      - --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,A
      - --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,A
      image: gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.1
      imagePullPolicy: IfNotPresent
      livenessProbe:
        failureThreshold: 5
        httpGet:
          path: /metrics
          port: 10054
          scheme: HTTP
        initialDelaySeconds: 60
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 5
      name: sidecar
      ports:
      - containerPort: 10054
        name: metrics
        protocol: TCP
      resources:
        requests:
          cpu: 10m
          memory: 20Mi
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
      - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
        name: kube-dns-token-pp102
        readOnly: true
    dnsPolicy: Default
    restartPolicy: Always
    schedulerName: default-scheduler
    securityContext: {}
    serviceAccount: kube-dns
    serviceAccountName: kube-dns
    terminationGracePeriodSeconds: 30
    tolerations:
    - effect: NoExecute
      key: node.alpha.kubernetes.io/notReady
      operator: Exists
      tolerationSeconds: 300
    - effect: NoExecute
      key: node.alpha.kubernetes.io/unreachable
      operator: Exists
      tolerationSeconds: 300
    volumes:
    - configMap:
        defaultMode: 420
        name: kube-dns
      name: kube-dns-config
    - name: kube-dns-token-pp102
      secret:
        defaultMode: 420
        secretName: kube-dns-token-pp102
  status:
    conditions:
    - lastProbeTime: null
      lastTransitionTime: 2017-04-03T16:59:59Z
      message: 'No nodes are available that match all of the following predicates::
        Insufficient cpu (1), PodToleratesNodeTaints (1).'
      reason: Unschedulable
      status: "False"
      type: PodScheduled
    phase: Pending
    qosClass: Burstable
- apiVersion: v1
  kind: Pod
  metadata:
    annotations:
      kubernetes.io/created-by: |
        {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"kube-dns-1443162385","uid":"ca3799d2-1882-11e7-a2e0-0aced200add4","apiVersion":"extensions","resourceVersion":"8111"}}
      scheduler.alpha.kubernetes.io/critical-pod: ""
      scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly", "operator":"Exists"}]'
    creationTimestamp: 2017-04-03T16:59:58Z
    generateName: kube-dns-1443162385-
    labels:
      k8s-app: kube-dns
      pod-template-hash: "1443162385"
    name: kube-dns-1443162385-vx74w
    namespace: kube-system
    ownerReferences:
    - apiVersion: extensions/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: ReplicaSet
      name: kube-dns-1443162385
      uid: ca3799d2-1882-11e7-a2e0-0aced200add4
    resourceVersion: "43274"
    selfLink: /api/v1/namespaces/kube-system/pods/kube-dns-1443162385-vx74w
    uid: f83b6c3a-188e-11e7-996e-0a2a5c43cf94
  spec:
    containers:
    - args:
      - --domain=cluster.local.
      - --dns-port=10053
      - --config-dir=/kube-dns-config
      - --v=2
      env:
      - name: PROMETHEUS_PORT
        value: "10055"
      image: gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.1
      imagePullPolicy: IfNotPresent
      livenessProbe:
        failureThreshold: 5
        httpGet:
          path: /healthcheck/kubedns
          port: 10054
          scheme: HTTP
        initialDelaySeconds: 60
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 5
      name: kubedns
      ports:
      - containerPort: 10053
        name: dns-local
        protocol: UDP
      - containerPort: 10053
        name: dns-tcp-local
        protocol: TCP
      - containerPort: 10055
        name: metrics
        protocol: TCP
      readinessProbe:
        failureThreshold: 3
        httpGet:
          path: /readiness
          port: 8081
          scheme: HTTP
        initialDelaySeconds: 3
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 5
      resources:
        limits:
          memory: 170Mi
        requests:
          cpu: 100m
          memory: 70Mi
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
      - mountPath: /kube-dns-config
        name: kube-dns-config
      - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
        name: kube-dns-token-pp102
        readOnly: true
    - args:
      - -v=2
      - -logtostderr
      - -configDir=/etc/k8s/dns/dnsmasq-nanny
      - -restartDnsmasq=true
      - --
      - -k
      - --cache-size=1000
      - --log-facility=-
      - --server=/cluster.local/127.0.0.1#10053
      - --server=/in-addr.arpa/127.0.0.1#10053
      - --server=/in6.arpa/127.0.0.1#10053
      image: gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.1
      imagePullPolicy: IfNotPresent
      livenessProbe:
        failureThreshold: 5
        httpGet:
          path: /healthcheck/dnsmasq
          port: 10054
          scheme: HTTP
        initialDelaySeconds: 60
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 5
      name: dnsmasq
      ports:
      - containerPort: 53
        name: dns
        protocol: UDP
      - containerPort: 53
        name: dns-tcp
        protocol: TCP
      resources:
        requests:
          cpu: 150m
          memory: 20Mi
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
      - mountPath: /etc/k8s/dns/dnsmasq-nanny
        name: kube-dns-config
      - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
        name: kube-dns-token-pp102
        readOnly: true
    - args:
      - --v=2
      - --logtostderr
      - --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,A
      - --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,A
      image: gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.1
      imagePullPolicy: IfNotPresent
      livenessProbe:
        failureThreshold: 5
        httpGet:
          path: /metrics
          port: 10054
          scheme: HTTP
        initialDelaySeconds: 60
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 5
      name: sidecar
      ports:
      - containerPort: 10054
        name: metrics
        protocol: TCP
      resources:
        requests:
          cpu: 10m
          memory: 20Mi
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
      - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
        name: kube-dns-token-pp102
        readOnly: true
    dnsPolicy: Default
    restartPolicy: Always
    schedulerName: default-scheduler
    securityContext: {}
    serviceAccount: kube-dns
    serviceAccountName: kube-dns
    terminationGracePeriodSeconds: 30
    tolerations:
    - effect: NoExecute
      key: node.alpha.kubernetes.io/notReady
      operator: Exists
      tolerationSeconds: 300
    - effect: NoExecute
      key: node.alpha.kubernetes.io/unreachable
      operator: Exists
      tolerationSeconds: 300
    volumes:
    - configMap:
        defaultMode: 420
        name: kube-dns
      name: kube-dns-config
    - name: kube-dns-token-pp102
      secret:
        defaultMode: 420
        secretName: kube-dns-token-pp102
  status:
    conditions:
    - lastProbeTime: null
      lastTransitionTime: 2017-04-03T16:59:58Z
      message: 'No nodes are available that match all of the following predicates::
        Insufficient cpu (1), PodToleratesNodeTaints (1).'
      reason: Unschedulable
      status: "False"
      type: PodScheduled
    phase: Pending
    qosClass: Burstable

@justinsb
Copy link
Member Author

justinsb commented Apr 4, 2017

The error was

Error syncing deployment kube-system/kube-dns: replicasets.extensions "kube-dns-1321724180" already exists

I am 95% sure (sadly kubectl edit doesn't appear in scrollback) that the problem is that the last-applied-configuration had ..."volumes":[{"configMap":{"name":"kube-dns","optional":true},... but the optional: true did not appear in the volume mount, because it had been wiped by 1.5.

I then re-added optional: true and got the collision.

My hypothesis (though I don't understand the code) is that the hash (1321724180) is based on the configuration as applied, which might not accurately reflect the values actually set either on a cluster downgrade & upgrade, or when using an older version of kubectl/k8s originally (I am unsure exactly). So then I set the manifest to readd the missing field, and collide with the existing hashed value. Is that plausible?

@0xmichalis
Copy link
Contributor

The hash is based on the Deployment PodTemplateSpec. If the Deployment controller can't find a ReplicaSet that has a semantically deep equal PodTemplateSpec, then it will create a new ReplicaSet by hashing the Deployment template (for the new RS name). There is a slight chance you will hit a hash collision with the current algo but this seems to be a problem with 200s of old ReplicaSets: #29735

I then re-added optional: true and got the collision.

You re-added optional: true where? What happened before you re-add it?

Can you try to patch the Deployment controller to use more retries (15 is a good number) and retry the upgrade-downgrade?

@0xmichalis
Copy link
Contributor

0xmichalis commented Apr 4, 2017

Ok, this seems like a collision: https://www.diffchecker.com/E4CxPOdr
When was the 4th ReplicaSet created? After the downgrade? Unfortunately hash collisions between versions of Kubernetes are more likely due to changes in the PodSpec.

@0xmichalis
Copy link
Contributor

I want a concrete timeline here:

  • In which version the 3rd RS was created?
  • In which version the 4th RS was created?
  • And more importantly: You re-added optional: true where? What happened before you re-add it?

@0xmichalis
Copy link
Contributor

@kubernetes/sig-apps-bugs

@0xmichalis 0xmichalis added the kind/bug Categorizes issue or PR as related to a bug. label Apr 4, 2017
k8s-github-robot pushed a commit that referenced this issue May 2, 2017
Automatic merge from submit-queue

[1.5] Update deployment retries to a saner count

Safe-guard for failures like #43948
k8s-github-robot pushed a commit that referenced this issue May 25, 2017
Automatic merge from submit-queue

Switch Deployments to new hashing algo w/ collision avoidance mechanism

Implements kubernetes/community#477

@kubernetes/sig-apps-api-reviews @kubernetes/sig-apps-pr-reviews 

Fixes #29735
Fixes #43948

```release-note
Deployments are updated to use (1) a more stable hashing algorithm (fnv) than the previous one (adler) and (2) a hashing collision avoidance mechanism that will ensure new rollouts will not block on hashing collisions anymore.
```
perotinus pushed a commit to kubernetes-retired/cluster-registry that referenced this issue Sep 2, 2017
Automatic merge from submit-queue

Switch Deployments to new hashing algo w/ collision avoidance mechanism

Implements kubernetes/community#477

@kubernetes/sig-apps-api-reviews @kubernetes/sig-apps-pr-reviews 

Fixes kubernetes/kubernetes#29735
Fixes kubernetes/kubernetes#43948

```release-note
Deployments are updated to use (1) a more stable hashing algorithm (fnv) than the previous one (adler) and (2) a hashing collision avoidance mechanism that will ensure new rollouts will not block on hashing collisions anymore.
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/workload-api/deployment kind/bug Categorizes issue or PR as related to a bug. sig/apps Categorizes an issue or PR as relevant to SIG Apps.
Projects
None yet
Development

No branches or pull requests

2 participants