The pod of the CVO cannot restart via the daemonset #620

jianzhangbjz · 2018-11-06T13:52:17Z

Version

[jzhang@dhcp-140-18 installer]$ ./bin/openshift-install version
./bin/openshift-install v0.3.0-83-g0baec58239c67b607904e1ab82341cbab2ea5f7e-dirty
Terraform v0.11.8

Your version of Terraform is out of date! The latest version
is 0.11.10. You can update by downloading from www.terraform.io/downloads.html

Platform (aws|libvirt|openshift):

libvirt

What happened?

The CVO's pod cannot be restart.

What you expected to happen?

The pod of the CVO can restart successfully. And, how can I make it work?

How to reproduce it (as minimally and precisely as possible)?

Not sure, I just delete the pod.

$oc delete pods --all -n openshift-cluster-version

Anything else we need to know?

[jzhang@dhcp-140-18 installer]$ oc get pods
No resources found.
[jzhang@dhcp-140-18 installer]$ oc get daemonset
NAME                       DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                     AGE
cluster-version-operator   0         0         0         0            0           node-role.kubernetes.io/master=   5h
[jzhang@dhcp-140-18 installer]$ oc get daemonset cluster-version-operator -o yaml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  creationTimestamp: 2018-11-06T08:06:03Z
  generation: 2
  labels:
    k8s-app: cluster-version-operator
  name: cluster-version-operator
  namespace: openshift-cluster-version
  resourceVersion: "6148598"
  selfLink: /apis/extensions/v1beta1/namespaces/openshift-cluster-version/daemonsets/cluster-version-operator
  uid: cdf63174-e19a-11e8-9149-223557d3326b
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: cluster-version-operator
  template:
    metadata:
      creationTimestamp: null
      labels:
        k8s-app: cluster-version-operator
      name: cluster-version-operator
    spec:
      containers:
      - args:
        - start
        - --release-image=registry.svc.ci.openshift.org/openshift/origin-release:v4.0
        - --enable-auto-update=true
        - --v=5
        env:
        - name: KUBERNETES_SERVICE_PORT
          value: "6443"
        - name: KUBERNETES_SERVICE_HOST
          value: 127.0.0.1
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        image: registry.svc.ci.openshift.org/openshift/origin-release:v4.0
        imagePullPolicy: Always
        name: cluster-version-operator
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/ssl/certs
          name: etc-ssl-certs
          readOnly: true
        - mountPath: /etc/cvo/updatepayloads
          name: etc-cvo-updatepayloads
          readOnly: true
      dnsPolicy: ClusterFirst
      hostNetwork: true
      nodeSelector:
        node-role.kubernetes.io/master: ""
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      tolerations:
      - operator: Exists
      volumes:
      - hostPath:
          path: /etc/ssl/certs
          type: ""
        name: etc-ssl-certs
      - hostPath:
          path: /etc/cvo/updatepayloads
          type: ""
        name: etc-cvo-updatepayloads
  templateGeneration: 7
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
status:
  currentNumberScheduled: 0
  desiredNumberScheduled: 0
  numberMisscheduled: 0
  numberReady: 0

[jzhang@dhcp-140-18 installer]$ oc get pods --all-namespaces
NAMESPACE                                NAME                                                              READY     STATUS             RESTARTS   AGE
kube-system                              kube-controller-manager-2c2lg                                     0/1       ImagePullBackOff   0          5d
kube-system                              kube-dns-787c975867-jtklm                                         3/3       Running            0          5d
kube-system                              kube-flannel-kg4hk                                                2/2       Running            0          5d
kube-system                              kube-flannel-rpl9s                                                2/2       Running            3          4d
kube-system                              kube-proxy-kkhb2                                                  1/1       Running            0          4d
kube-system                              kube-proxy-pz6mw                                                  1/1       Running            0          5d
kube-system                              kube-scheduler-zb2dx                                              0/1       ImagePullBackOff   0          5d
kube-system                              metrics-server-5767bfc576-sswfk                                   2/2       Running            0          5d
kube-system                              pod-checkpointer-hqsgz                                            1/1       Running            0          5d
kube-system                              pod-checkpointer-hqsgz-test1-master-0                             1/1       Running            0          5d
kube-system                              tectonic-network-operator-5w4q4                                   1/1       Running            1          5d
openshift-apiserver                      apiserver-6mq49                                                   1/1       Running            4          5d
openshift-cluster-api                    clusterapi-manager-controllers-84497fc6f5-h2dk9                   2/2       Running            4          5d
openshift-cluster-api                    machine-api-operator-74d6b7bbbb-xxrjz                             1/1       Running            1          5d
openshift-cluster-network-operator       cluster-network-operator-xbvft                                    1/1       Running            0          6h
openshift-cluster-node-tuning-operator   cluster-node-tuning-operator-676db8b6f4-hvfp4                     0/1       ImagePullBackOff   0          5d
openshift-cluster-samples-operator       cluster-samples-operator-5dbd88bffc-2glbj                         1/1       Running            0          5d
openshift-console                        console-operator-d7b4998-wrpcf                                    0/1       ImagePullBackOff   0          5d
openshift-controller-manager             controller-manager-dhxjg                                          1/1       Running            1          5d
openshift-core-operators                 openshift-cluster-kube-apiserver-operator-6cf9b49bb4-m4t7c        1/1       Running            2          6h
openshift-core-operators                 openshift-cluster-kube-controller-manager-operator-574884fpxzlb   1/1       Running            0          6h
openshift-core-operators                 openshift-cluster-kube-scheduler-operator-7489b84496-7cj5k        1/1       Running            1          6h
openshift-core-operators                 openshift-cluster-openshift-apiserver-operator-7566f664d7-gzlw2   1/1       Running            1          5d
openshift-core-operators                 openshift-cluster-openshift-controller-manager-operator-5558jj4   1/1       Running            1          5d
openshift-core-operators                 openshift-service-cert-signer-operator-5b8495c5bc-jnqf6           1/1       Running            1          6h
openshift-csi-operator                   csi-operator-5486cf97d9-jkvrw                                     0/1       ErrImagePull       0          5d
openshift-image-registry                 cluster-image-registry-operator-d874b9755-4wf6k                   1/1       Running            0          5d
openshift-ingress-operator               ingress-operator-57bbdcb764-dd52r                                 0/1       ImagePullBackOff   0          5d
openshift-kube-apiserver                 installer-1-test1-master-0                                        0/1       Completed          0          5d
openshift-kube-apiserver                 installer-2-test1-master-0                                        0/1       Completed          0          6h
openshift-kube-apiserver                 openshift-kube-apiserver-test1-master-0                           1/1       Running            0          6h
openshift-kube-controller-manager        installer-1-test1-master-0                                        0/1       Completed          0          5d
openshift-kube-controller-manager        openshift-kube-controller-manager-test1-master-0                  0/1       ImagePullBackOff   110        5d
openshift-kube-scheduler                 scheduler-6fd8fd5b46-57hhh                                        1/1       Running            0          5d
openshift-machine-config-operator        machine-config-controller-84674586b7-q7wbn                        1/1       Running            1          5d
openshift-machine-config-operator        machine-config-daemon-sbn6b                                       0/1       ImagePullBackOff   0          4d
openshift-machine-config-operator        machine-config-daemon-x7gxx                                       0/1       CrashLoopBackOff   1461       5d
openshift-machine-config-operator        machine-config-operator-697868664d-wshvs                          1/1       Running            1          5d
openshift-machine-config-operator        machine-config-server-jgfgr                                       1/1       Running            0          5d
openshift-monitoring                     cluster-monitoring-operator-54bf9b8bf8-2ndqh                      0/1       ImagePullBackOff   0          5d
openshift-service-cert-signer            apiservice-cabundle-injector-78db8784b4-sdkjb                     1/1       Running            1          6h
openshift-service-cert-signer            configmap-cabundle-injector-66db88bf67-cpvkj                      1/1       Running            1          6h
openshift-service-cert-signer            service-serving-cert-signer-7fc7666fdf-5w4d7                      1/1       Running            1          6h
tectonic-system                          kube-addon-operator-654588755f-bgv5q                              1/1       Running            0          5d
[jzhang@dhcp-140-18 installer]$ oc get nodes
NAME                   STATUS    ROLES     AGE       VERSION
test1-master-0         Ready     master    5d        v1.11.0+d4cacc0
test1-worker-0-gq5ql   Ready     worker    4d        v1.11.0+d4cacc0

References

enter text here.

abhinavdahiya · 2018-11-06T20:58:48Z

So the kube-controller-manager running on openshift-kube-controller-manager namespace should be the one recreating the lost daemonset pods. can you check and paste the logs from that pod. Maybe the controller manager is not running.

Also open an issue on https://github.com/openshift/cluster-kube-controller-manager-operator if the pods don't come back

/cc @mfojtik @sttts

wking · 2018-11-07T00:06:01Z

[jzhang@dhcp-140-18 installer]$ ./bin/openshift-install version
./bin/openshift-install v0.3.0-83-g0baec58239c67b607904e1ab82341cbab2ea5f7e-dirty
Terraform v0.11.8

This is from last Wednesday:

$ git show --format='%h %aD %s' 0baec58
0baec58 Wed, 31 Oct 2018 15:35:09 -0700 Merge pull request #571 from sttts/sttts-remove-kube-core-secrets

31 pull requests have landed since, and things like #551 and #624 are important to keep up with the evolving release image. Can you try again with a fresh build from master?

jianzhangbjz · 2018-11-07T08:27:41Z

@wking Thanks! I will have a try!

jianzhangbjz · 2018-11-07T09:24:13Z

I tried it with the latest version but got below errors:

[jzhang@dhcp-140-18 installer]$ openshift-install version
openshift-install v0.3.0-155-g25ceecc296bd020219967ae1258180df01acff0f
Terraform v0.11.8

Your version of Terraform is out of date! The latest version
is 0.11.10. You can update by downloading from www.terraform.io/downloads.html

[jzhang@dhcp-140-18 installer]$ openshift-install create cluster --dir 1107
? Image https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/47.77/redhat-coreos-maipo-47.77-qemu.qcow2
INFO Fetching OS image...                         
INFO Using Terraform to create cluster...         
INFO Waiting for bootstrap completion...          
INFO API v1.11.0+d4cacc0 up                       


WARNING RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 1827 
INFO Destroying the bootstrap resources...        
INFO Using Terraform to destroy bootstrap resources... 
[jzhang@dhcp-140-18 installer]$ sudo virsh list
setlocale: No such file or directory
 Id    Name                           State
----------------------------------------------------
 30    master0                        running
[jzhang@dhcp-140-18 installer]$ openshift-install destroy cluster --dir 1107
FATAL Error executing openshift-install: Failed while preparing to destroy cluster: no destroyers registered for "libvirt"

wking · 2018-11-07T15:36:59Z

FATAL Error executing openshift-install: Failed while preparing to destroy cluster: no destroyers registered for "libvirt"

To get the libvirt destroyer you need to build with TAGS.

jianzhangbjz · 2018-11-08T01:56:07Z

Yes, it did, thanks! Sorry for the late to reply.

wking · 2018-11-08T02:59:53Z

@jianzhangbjz, is everything working for you, then? Can you close if so?

jianzhangbjz · 2018-11-08T03:11:48Z

@wking Actually, no. We still suffering the OCP 4.0 crash. But, it's nothing matter with this issue. Close it.

jianzhangbjz closed this as completed Nov 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The pod of the CVO cannot restart via the daemonset #620

The pod of the CVO cannot restart via the daemonset #620

jianzhangbjz commented Nov 6, 2018

abhinavdahiya commented Nov 6, 2018

wking commented Nov 7, 2018

jianzhangbjz commented Nov 7, 2018 •

edited

Loading

jianzhangbjz commented Nov 7, 2018 •

edited

Loading

wking commented Nov 7, 2018 •

edited

Loading

jianzhangbjz commented Nov 8, 2018

wking commented Nov 8, 2018

jianzhangbjz commented Nov 8, 2018 •

edited

Loading

The pod of the CVO cannot restart via the daemonset #620

The pod of the CVO cannot restart via the daemonset #620

Comments

jianzhangbjz commented Nov 6, 2018

Version

Platform (aws|libvirt|openshift):

What happened?

What you expected to happen?

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

References

abhinavdahiya commented Nov 6, 2018

wking commented Nov 7, 2018

jianzhangbjz commented Nov 7, 2018 • edited Loading

jianzhangbjz commented Nov 7, 2018 • edited Loading

wking commented Nov 7, 2018 • edited Loading

jianzhangbjz commented Nov 8, 2018

wking commented Nov 8, 2018

jianzhangbjz commented Nov 8, 2018 • edited Loading

jianzhangbjz commented Nov 7, 2018 •

edited

Loading

jianzhangbjz commented Nov 7, 2018 •

edited

Loading

wking commented Nov 7, 2018 •

edited

Loading

jianzhangbjz commented Nov 8, 2018 •

edited

Loading