Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The pod of the CVO cannot restart via the daemonset #620

Closed
jianzhangbjz opened this issue Nov 6, 2018 · 8 comments
Closed

The pod of the CVO cannot restart via the daemonset #620

jianzhangbjz opened this issue Nov 6, 2018 · 8 comments

Comments

@jianzhangbjz
Copy link

Version

[jzhang@dhcp-140-18 installer]$ ./bin/openshift-install version
./bin/openshift-install v0.3.0-83-g0baec58239c67b607904e1ab82341cbab2ea5f7e-dirty
Terraform v0.11.8

Your version of Terraform is out of date! The latest version
is 0.11.10. You can update by downloading from www.terraform.io/downloads.html

Platform (aws|libvirt|openshift):

libvirt

What happened?

The CVO's pod cannot be restart.

What you expected to happen?

The pod of the CVO can restart successfully. And, how can I make it work?

How to reproduce it (as minimally and precisely as possible)?

Not sure, I just delete the pod.

$oc delete pods --all -n openshift-cluster-version

Anything else we need to know?

[jzhang@dhcp-140-18 installer]$ oc get pods
No resources found.
[jzhang@dhcp-140-18 installer]$ oc get daemonset
NAME                       DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR                     AGE
cluster-version-operator   0         0         0         0            0           node-role.kubernetes.io/master=   5h
[jzhang@dhcp-140-18 installer]$ oc get daemonset cluster-version-operator -o yaml
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  creationTimestamp: 2018-11-06T08:06:03Z
  generation: 2
  labels:
    k8s-app: cluster-version-operator
  name: cluster-version-operator
  namespace: openshift-cluster-version
  resourceVersion: "6148598"
  selfLink: /apis/extensions/v1beta1/namespaces/openshift-cluster-version/daemonsets/cluster-version-operator
  uid: cdf63174-e19a-11e8-9149-223557d3326b
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: cluster-version-operator
  template:
    metadata:
      creationTimestamp: null
      labels:
        k8s-app: cluster-version-operator
      name: cluster-version-operator
    spec:
      containers:
      - args:
        - start
        - --release-image=registry.svc.ci.openshift.org/openshift/origin-release:v4.0
        - --enable-auto-update=true
        - --v=5
        env:
        - name: KUBERNETES_SERVICE_PORT
          value: "6443"
        - name: KUBERNETES_SERVICE_HOST
          value: 127.0.0.1
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        image: registry.svc.ci.openshift.org/openshift/origin-release:v4.0
        imagePullPolicy: Always
        name: cluster-version-operator
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/ssl/certs
          name: etc-ssl-certs
          readOnly: true
        - mountPath: /etc/cvo/updatepayloads
          name: etc-cvo-updatepayloads
          readOnly: true
      dnsPolicy: ClusterFirst
      hostNetwork: true
      nodeSelector:
        node-role.kubernetes.io/master: ""
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      tolerations:
      - operator: Exists
      volumes:
      - hostPath:
          path: /etc/ssl/certs
          type: ""
        name: etc-ssl-certs
      - hostPath:
          path: /etc/cvo/updatepayloads
          type: ""
        name: etc-cvo-updatepayloads
  templateGeneration: 7
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
status:
  currentNumberScheduled: 0
  desiredNumberScheduled: 0
  numberMisscheduled: 0
  numberReady: 0
[jzhang@dhcp-140-18 installer]$ oc get pods --all-namespaces
NAMESPACE                                NAME                                                              READY     STATUS             RESTARTS   AGE
kube-system                              kube-controller-manager-2c2lg                                     0/1       ImagePullBackOff   0          5d
kube-system                              kube-dns-787c975867-jtklm                                         3/3       Running            0          5d
kube-system                              kube-flannel-kg4hk                                                2/2       Running            0          5d
kube-system                              kube-flannel-rpl9s                                                2/2       Running            3          4d
kube-system                              kube-proxy-kkhb2                                                  1/1       Running            0          4d
kube-system                              kube-proxy-pz6mw                                                  1/1       Running            0          5d
kube-system                              kube-scheduler-zb2dx                                              0/1       ImagePullBackOff   0          5d
kube-system                              metrics-server-5767bfc576-sswfk                                   2/2       Running            0          5d
kube-system                              pod-checkpointer-hqsgz                                            1/1       Running            0          5d
kube-system                              pod-checkpointer-hqsgz-test1-master-0                             1/1       Running            0          5d
kube-system                              tectonic-network-operator-5w4q4                                   1/1       Running            1          5d
openshift-apiserver                      apiserver-6mq49                                                   1/1       Running            4          5d
openshift-cluster-api                    clusterapi-manager-controllers-84497fc6f5-h2dk9                   2/2       Running            4          5d
openshift-cluster-api                    machine-api-operator-74d6b7bbbb-xxrjz                             1/1       Running            1          5d
openshift-cluster-network-operator       cluster-network-operator-xbvft                                    1/1       Running            0          6h
openshift-cluster-node-tuning-operator   cluster-node-tuning-operator-676db8b6f4-hvfp4                     0/1       ImagePullBackOff   0          5d
openshift-cluster-samples-operator       cluster-samples-operator-5dbd88bffc-2glbj                         1/1       Running            0          5d
openshift-console                        console-operator-d7b4998-wrpcf                                    0/1       ImagePullBackOff   0          5d
openshift-controller-manager             controller-manager-dhxjg                                          1/1       Running            1          5d
openshift-core-operators                 openshift-cluster-kube-apiserver-operator-6cf9b49bb4-m4t7c        1/1       Running            2          6h
openshift-core-operators                 openshift-cluster-kube-controller-manager-operator-574884fpxzlb   1/1       Running            0          6h
openshift-core-operators                 openshift-cluster-kube-scheduler-operator-7489b84496-7cj5k        1/1       Running            1          6h
openshift-core-operators                 openshift-cluster-openshift-apiserver-operator-7566f664d7-gzlw2   1/1       Running            1          5d
openshift-core-operators                 openshift-cluster-openshift-controller-manager-operator-5558jj4   1/1       Running            1          5d
openshift-core-operators                 openshift-service-cert-signer-operator-5b8495c5bc-jnqf6           1/1       Running            1          6h
openshift-csi-operator                   csi-operator-5486cf97d9-jkvrw                                     0/1       ErrImagePull       0          5d
openshift-image-registry                 cluster-image-registry-operator-d874b9755-4wf6k                   1/1       Running            0          5d
openshift-ingress-operator               ingress-operator-57bbdcb764-dd52r                                 0/1       ImagePullBackOff   0          5d
openshift-kube-apiserver                 installer-1-test1-master-0                                        0/1       Completed          0          5d
openshift-kube-apiserver                 installer-2-test1-master-0                                        0/1       Completed          0          6h
openshift-kube-apiserver                 openshift-kube-apiserver-test1-master-0                           1/1       Running            0          6h
openshift-kube-controller-manager        installer-1-test1-master-0                                        0/1       Completed          0          5d
openshift-kube-controller-manager        openshift-kube-controller-manager-test1-master-0                  0/1       ImagePullBackOff   110        5d
openshift-kube-scheduler                 scheduler-6fd8fd5b46-57hhh                                        1/1       Running            0          5d
openshift-machine-config-operator        machine-config-controller-84674586b7-q7wbn                        1/1       Running            1          5d
openshift-machine-config-operator        machine-config-daemon-sbn6b                                       0/1       ImagePullBackOff   0          4d
openshift-machine-config-operator        machine-config-daemon-x7gxx                                       0/1       CrashLoopBackOff   1461       5d
openshift-machine-config-operator        machine-config-operator-697868664d-wshvs                          1/1       Running            1          5d
openshift-machine-config-operator        machine-config-server-jgfgr                                       1/1       Running            0          5d
openshift-monitoring                     cluster-monitoring-operator-54bf9b8bf8-2ndqh                      0/1       ImagePullBackOff   0          5d
openshift-service-cert-signer            apiservice-cabundle-injector-78db8784b4-sdkjb                     1/1       Running            1          6h
openshift-service-cert-signer            configmap-cabundle-injector-66db88bf67-cpvkj                      1/1       Running            1          6h
openshift-service-cert-signer            service-serving-cert-signer-7fc7666fdf-5w4d7                      1/1       Running            1          6h
tectonic-system                          kube-addon-operator-654588755f-bgv5q                              1/1       Running            0          5d
[jzhang@dhcp-140-18 installer]$ oc get nodes
NAME                   STATUS    ROLES     AGE       VERSION
test1-master-0         Ready     master    5d        v1.11.0+d4cacc0
test1-worker-0-gq5ql   Ready     worker    4d        v1.11.0+d4cacc0

References

  • enter text here.
@abhinavdahiya
Copy link
Contributor

So the kube-controller-manager running on openshift-kube-controller-manager namespace should be the one recreating the lost daemonset pods. can you check and paste the logs from that pod. Maybe the controller manager is not running.

Also open an issue on https://github.com/openshift/cluster-kube-controller-manager-operator if the pods don't come back

/cc @mfojtik @sttts

@wking
Copy link
Member

wking commented Nov 7, 2018

[jzhang@dhcp-140-18 installer]$ ./bin/openshift-install version
./bin/openshift-install v0.3.0-83-g0baec58239c67b607904e1ab82341cbab2ea5f7e-dirty
Terraform v0.11.8

This is from last Wednesday:

$ git show --format='%h %aD %s' 0baec58
0baec58 Wed, 31 Oct 2018 15:35:09 -0700 Merge pull request #571 from sttts/sttts-remove-kube-core-secrets

31 pull requests have landed since, and things like #551 and #624 are important to keep up with the evolving release image. Can you try again with a fresh build from master?

@jianzhangbjz
Copy link
Author

jianzhangbjz commented Nov 7, 2018

@wking Thanks! I will have a try!

@jianzhangbjz
Copy link
Author

jianzhangbjz commented Nov 7, 2018

I tried it with the latest version but got below errors:

[jzhang@dhcp-140-18 installer]$ openshift-install version
openshift-install v0.3.0-155-g25ceecc296bd020219967ae1258180df01acff0f
Terraform v0.11.8

Your version of Terraform is out of date! The latest version
is 0.11.10. You can update by downloading from www.terraform.io/downloads.html

[jzhang@dhcp-140-18 installer]$ openshift-install create cluster --dir 1107
? Image https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/47.77/redhat-coreos-maipo-47.77-qemu.qcow2
INFO Fetching OS image...                         
INFO Using Terraform to create cluster...         
INFO Waiting for bootstrap completion...          
INFO API v1.11.0+d4cacc0 up                       


WARNING RetryWatcher - getting event failed! Re-creating the watcher. Last RV: 1827 
INFO Destroying the bootstrap resources...        
INFO Using Terraform to destroy bootstrap resources... 
[jzhang@dhcp-140-18 installer]$ sudo virsh list
setlocale: No such file or directory
 Id    Name                           State
----------------------------------------------------
 30    master0                        running
[jzhang@dhcp-140-18 installer]$ openshift-install destroy cluster --dir 1107
FATAL Error executing openshift-install: Failed while preparing to destroy cluster: no destroyers registered for "libvirt"

@wking
Copy link
Member

wking commented Nov 7, 2018

FATAL Error executing openshift-install: Failed while preparing to destroy cluster: no destroyers registered for "libvirt"

To get the libvirt destroyer you need to build with TAGS.

@jianzhangbjz
Copy link
Author

Yes, it did, thanks! Sorry for the late to reply.

@wking
Copy link
Member

wking commented Nov 8, 2018

@jianzhangbjz, is everything working for you, then? Can you close if so?

@jianzhangbjz
Copy link
Author

jianzhangbjz commented Nov 8, 2018

@wking Actually, no. We still suffering the OCP 4.0 crash. But, it's nothing matter with this issue. Close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants