Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libvirt install: failed to initialize the cluster: Multiple errors.... #1928

Closed
tux1980 opened this issue Jul 1, 2019 · 28 comments
Closed
Assignees

Comments

@tux1980
Copy link

tux1980 commented Jul 1, 2019

Version

./openshift-install unreleased-master-1207-g45f81e1d950cbc80e46e0b230e11b0032de8c868
built from commit 45f81e1
release image registry.svc.ci.openshift.org/origin/release:4.2
[root@ovirt bin]# release image registry.svc.ci.openshift.org/origin/release:4.2

Platform:

RHEL 7.6 with libvirt 5.0.0

What happened?

After setting up the Kube API service successfully (at least this is what the bootstrap log is claiming...) I'm running into several errors with the ./installer.

What you expected to happen?

Installer would finishing the Deployment successfully....

How to reproduce it (as minimally and precisely as possible)?

./openshift-install create cluster --log-level=debug

Error Message in .openshift_install.log

  • Could not update servicemonitor "openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator" (374 of 398): the server does not recognize this resource, check extension API servers
    FATAL failed to initialize the cluster: Multiple errors are preventing progress:
  • Could not update servicemonitor "openshift-apiserver-operator/openshift-apiserver-operator" (393 of 398): the server does not recognize this resource, check extension API servers
  • Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (362 of 398): the server does not recognize this resource, check extension API servers
  • Could not update servicemonitor "openshift-controller-manager-operator/openshift-controller-manager-operator" (397 of 398): the server does not recognize this resource, check extension API servers
  • Could not update servicemonitor "openshift-image-registry/image-registry" (368 of 398): the server does not recognize this resource, check extension API servers
  • Could not update servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator" (378 of 398): the server does not recognize this resource, check extension API servers
  • Could not update servicemonitor "openshift-kube-controller-manager-operator/kube-controller-manager-operator" (382 of 398): the server does not recognize this resource, check extension API servers
  • Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (386 of 398): the server does not recognize this resource, check extension API servers
  • Could not update servicemonitor "openshift-machine-api/cluster-autoscaler-operator" (138 of 398): the server does not recognize this resource, check extension API servers
  • Could not update servicemonitor "openshift-operator-lifecycle-manager/olm-operator" (387 of 398): the server does not recognize this resource, check extension API servers
  • Could not update servicemonitor "openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator" (371 of 398): the server does not recognize this resource, check extension API servers
  • Could not update servicemonitor "openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator" (374 of 398): the server does not recognize this resource, check extension API servers
@abhinavdahiya
Copy link
Contributor

/assign @zeenix

@zeenix
Copy link
Contributor

zeenix commented Jul 2, 2019

I think this is likely a duplicate of #1893.

@tux1980 Could you please see if you get the same as here. You can find the exact pod name with oc get pods -A|grep approver.

@zeenix
Copy link
Contributor

zeenix commented Jul 2, 2019

/label platform/libvirt

@tux1980
Copy link
Author

tux1980 commented Jul 2, 2019

@zeenix When I am logon onto th emaster and execute any oc command I receive now:

core@okd-dz7vc-master-0 ~]$ oc get pods -A | grep approver
error: Missing or incomplete configuration info. Please login or point to an existing, complete config file:

  1. Via the command-line flag --config
  2. Via the KUBECONFIG environment variable
  3. In your home directory as ~/.kube/config

[core@okd-dz7vc-master-0 ~]$ oc get csr
error: Missing or incomplete configuration info. Please login or point to an existing, complete config file:

  1. Via the command-line flag --config
  2. Via the KUBECONFIG environment variable
  3. In your home directory as ~/.kube/config

@tux1980
Copy link
Author

tux1980 commented Jul 2, 2019

Whatever I do with libvert5 / QEMU 2.12 / /RHEL7.6, i didn't succeed, whatever I do, on a certain point I run into some errors with the Installer. It is a bit exhausting... Is there any recommended Platform with libvirt that is thoroughly tested for DEMO environment with OCP 4.1? Fedora / CentOS / RHEL8?

@zeenix
Copy link
Contributor

zeenix commented Jul 2, 2019

@zeenix When I am logon onto th emaster and execute any oc command I receive now:

Not on the master, you oc from outside with KUBECONFIG defined: KUBECONFIG=YOUR_CLUSTER_DIR/auth/kubeconfig oc get pods -A | grep approver

@tux1980
Copy link
Author

tux1980 commented Jul 2, 2019

Ok. When I execute the command from the outside world I get the following result:
root@ovirt auth]# KUBECONFIG=kubeconfig oc get pods -A | grep approver

openshift-cluster-machine-approver machine-approver-8dffd968f-2cvrf 1/1 Running 0 21h

@zeenix
Copy link
Contributor

zeenix commented Jul 2, 2019

@tux1980 Thanks. Is this after the error/timeout? If not, please wait for that first and see if the logs you see there have the same error as we see on our nested virt machine.

@mkumatag
Copy link
Member

mkumatag commented Jul 3, 2019

Even I have seen this error with fedora OS, will update with more detail with my findings.

@zeenix
Copy link
Contributor

zeenix commented Jul 3, 2019

Even I have seen this error with fedora OS, will update with more detail with my findings.

Yeah same here but it was fixed recently. It still occurs on nested virt setup though, which is tracked in #1893.

@mkumatag
Copy link
Member

mkumatag commented Jul 3, 2019

Even I have seen this error with fedora OS, will update with more detail with my findings.

Yeah same here but it was fixed recently. It still occurs on nested virt setup though, which is tracked in #1893.

I'm not using nested virt btw, mine is just a kvm host running on Fedora 30

@zeenix
Copy link
Contributor

zeenix commented Jul 3, 2019

I'm not using nested virt btw, mine is just a kvm host running on Fedora 30

That is nested virt. :) Unless you mean the (cluster) host is baremetal running F30?

@mkumatag
Copy link
Member

mkumatag commented Jul 4, 2019

I'm not using nested virt btw, mine is just a kvm host running on Fedora 30

That is nested virt. :) Unless you mean the (cluster) host is baremetal running F30?

that's right, I'm running openshift-installer on an F30 baremetal machine

@mrniranjan
Copy link

I am trying the installer on RHEL8 and could not get a successful installation . I see following errors:

[root@hp-dl380-gen9-5 installer]# bin/openshift-install create --dir=mydata cluster
? SSH Public Key /root/.ssh/id_rsa.pub
? Platform libvirt
? Libvirt Connection URI qemu+tcp://192.168.122.1/system
? Base Domain tt.testing
? Cluster Name titan
? Pull Secret [? for help] ***************************************************************************************************************************INFO Fetching OS image: rhcos-420.8.20190624.0-qemu.qcow2 *********************
INFO Creating infrastructure resources...         
INFO Waiting up to 30m0s for the Kubernetes API at https://api.titan.tt.testing:6443... 
INFO API v1.14.0+dd87d3b up                       
INFO Waiting up to 30m0s for bootstrapping to complete... 
INFO Destroying the bootstrap resources...        
INFO Waiting up to 30m0s for the cluster at https://api.titan.tt.testing:6443 to initialize... 
FATAL failed to initialize the cluster: Working towards 4.2.0-0.okd-2019-07-04-005736: 99% complete, waiting on authentication, monitoring

The logs show the below:

time="2019-07-04T00:07:03-04:00" level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Could not update oauthclient \"console\" (244 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-apiserver-operator/openshift-apiserver-operator\" (395 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-authentication-operator/authentication-operator\" (364 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-controller-manager-operator/openshift-controller-manager-operator\" (399 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-image-registry/image-registry\" (370 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-apiserver-operator/kube-apiserver-operator\" (380 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-controller-manager-operator/kube-controller-manager-operator\" (384 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-scheduler-operator/kube-scheduler-operator\" (388 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/cluster-autoscaler-operator\" (139 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-operator-lifecycle-manager/olm-operator\" (389 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator\" (373 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator\" (376 of 400): the server does not recognize this resource, check extension API servers"
time="2019-07-04T00:09:06-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-07-04-005736: 90% complete"
time="2019-07-04T00:09:12-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-07-04-005736: 91% complete"
time="2019-07-04T00:11:06-04:00" level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Could not update servicemonitor \"openshift-apiserver-operator/openshift-apiserver-operator\" (395 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-authentication-operator/authentication-operator\" (364 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-controller-manager-operator/openshift-controller-manager-operator\" (399 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-image-registry/image-registry\" (370 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-apiserver-operator/kube-apiserver-operator\" (380 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-controller-manager-operator/kube-controller-manager-operator\" (384 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-scheduler-operator/kube-scheduler-operator\" (388 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/cluster-autoscaler-operator\" (139 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-operator-lifecycle-manager/olm-operator\" (389 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator\" (373 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator\" (376 of 400): the server does not recognize this resource, check extension API servers"
time="2019-07-04T00:15:48-04:00" level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Could not update servicemonitor \"openshift-apiserver-operator/openshift-apiserver-operator\" (395 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-authentication-operator/authentication-operator\" (364 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-controller-manager-operator/openshift-controller-manager-operator\" (399 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-image-registry/image-registry\" (370 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-apiserver-operator/kube-apiserver-operator\" (380 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-controller-manager-operator/kube-controller-manager-operator\" (384 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-scheduler-operator/kube-scheduler-operator\" (388 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/cluster-autoscaler-operator\" (139 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-operator-lifecycle-manager/olm-operator\" (389 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator\" (373 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator\" (376 of 400): the server does not recognize this resource, check extension API servers"
time="2019-07-04T00:16:06-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-07-04-005736: 96% complete"
time="2019-07-04T00:17:51-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-07-04-005736: 99% complete"
time="2019-07-04T00:18:21-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-07-04-005736: 99% complete, waiting on authentication, monitoring"
time="2019-07-04T00:20:04-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-07-04-005736: 99% complete, waiting on authentication, monitoring"
time="2019-07-04T00:20:44-04:00" level=fatal msg="failed to initialize the cluster: Working towards 4.2.0-0.okd-2019-07-04-005736: 99% complete, waiting on authentication, monitoring"

I am running the installer on Bare metal System.

@zeenix
Copy link
Contributor

zeenix commented Jul 4, 2019

@tux1980 @mkumatag @mrniranjan Thanks. Could you folks kindly let me know if you see the same symptoms as here?

@zeenix
Copy link
Contributor

zeenix commented Jul 4, 2019

Also, see if assigning appropriate resources help:

export TF_VAR_libvirt_master_memory=8192
export TF_VAR_libvirt_master_vcpu=4

thanks.

@mrniranjan
Copy link

@zeenix i see this:

[root@titan mydata3]# KUBECONFIG=/root/go/src/github.com/openshift/installer/mydata3/auth/kubeconfig oc get csr
NAME        AGE       REQUESTOR                                                                   CONDITION
csr-jfwbn   26m       system:node:pnq-wdb6v-master-0                                              Approved,Issued
csr-n44bj   26m       system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-npbfc   7m14s     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-xrcpb   22m       system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
[root@titan mydata3]# KUBECONFIG=/root/go/src/github.com/openshift/installer/mydata3/auth/kubeconfig oc get pods
No resources found.

openshift-install.log:

time="2019-07-04T07:18:47-04:00" level=info msg="Waiting up to 30m0s for the cluster at https://api.pnq.tt.testing:6443 to initialize..."
time="2019-07-04T07:34:22-04:00" level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Could not update servicemonitor \"openshift-apiserver-operator/openshift-apiserver-operator\" (395 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-authentication-operator/authentication-operator\" (364 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-controller-manager-operator/openshift-controller-manager-operator\" (399 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-image-registry/image-registry\" (370 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-apiserver-operator/kube-apiserver-operator\" (380 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-controller-manager-operator/kube-controller-manager-operator\" (384 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-scheduler-operator/kube-scheduler-operator\" (388 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/cluster-autoscaler-operator\" (139 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-operator-lifecycle-manager/olm-operator\" (389 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator\" (373 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator\" (376 of 400): the server does not recognize this resource, check extension API servers"
time="2019-07-04T07:37:18-04:00" level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Could not update servicemonitor \"openshift-apiserver-operator/openshift-apiserver-operator\" (395 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-authentication-operator/authentication-operator\" (364 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-controller-manager-operator/openshift-controller-manager-operator\" (399 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-image-registry/image-registry\" (370 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-apiserver-operator/kube-apiserver-operator\" (380 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-controller-manager-operator/kube-controller-manager-operator\" (384 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-scheduler-operator/kube-scheduler-operator\" (388 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/cluster-autoscaler-operator\" (139 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-operator-lifecycle-manager/olm-operator\" (389 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator\" (373 of 400): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator\" (376 of 400): the server does not recognize this resource, check extension API servers"
time="2019-07-04T07:39:50-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-07-04-095206: 91% complete"
time="2019-07-04T07:40:20-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-07-04-095206: 92% complete"

@zeenix
Copy link
Contributor

zeenix commented Jul 4, 2019

@mrniranjan Thanks. that's oc get pods -A | grep approver. No need to paste here if it's the same as I got in #1893, just say so here and I'll close this one then cause it's likely a duplicate of that.

@mrniranjan
Copy link

@zeenix when i type oc get pods -A | grep approver it throws the help message of oc command. i am using below version

[root@titan mydata4]# oc version
oc v3.11.0+0cbc58b
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

@zeenix
Copy link
Contributor

zeenix commented Jul 4, 2019

@mrniranjan right, the -A shortcut was only recently added. Please use --all-namespaces instead.

@mrniranjan
Copy link

@zeenix i am pasting the output because i think this is different than the one mentioned in #1893

[root@titan installer]# bin/openshift-install create --dir=mydata5 cluster
? SSH Public Key /root/.ssh/id_rsa.pub
? Platform libvirt
? Libvirt Connection URI qemu+tcp://192.168.122.1/system
? Base Domain tt.testing
? Cluster Name pnq
? Pull Secret [? for help] *************************************************************************************************************************************************************************************
INFO Fetching OS image: rhcos-420.8.20190624.0-qemu.qcow2
INFO The installer no longer uses "/root/.cache/openshift-install/libvirt/http", it can be deleted
INFO Creating infrastructure resources...
INFO Waiting up to 30m0s for the Kubernetes API at https://api.pnq.tt.testing:6443...
INFO API v1.14.0+276c1b3 up
INFO Waiting up to 30m0s for bootstrapping to complete...
INFO Destroying the bootstrap resources...
INFO Waiting up to 30m0s for the cluster at https://api.pnq.tt.testing:6443 to initialize...
FATAL failed to initialize the cluster: Some cluster operators are still updating: authentication, console 
[root@titan mydata5]# KUBECONFIG=/root/go/src/github.com/openshift/installer/mydata5/auth/kubeconfig oc get csr
NAME        AGE       REQUESTOR                                                                   CONDITION
csr-m84f2   36m       system:node:pnq-kxsq7-master-0                                              Approved,Issued
csr-s52wg   36m       system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-t7qpk   30m       system:node:pnq-kxsq7-worker-0-twgzh                                        Approved,Issued
csr-vnj7p   32m       system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
[root@titan mydata5]# KUBECONFIG=/root/go/src/github.com/openshift/installer/mydata5/auth/kubeconfig oc get pods --all-namespaces  | grep approver
openshift-cluster-machine-approver                      machine-approver-5f5f7cc4d5-nbsm4                                 1/1       Running            0          37m

@mrniranjan
Copy link

mrniranjan commented Jul 4, 2019

with every run it seems to fail differently , the last run the installer went a lot ahead then the previous runs. I tried the installer many times and never had success.

@zeenix
Copy link
Contributor

zeenix commented Jul 5, 2019

@mrniranjan Thanks. Seems you're facing multiple different issues. If your CSRs are all in approved state and you don't see the warnings/errors in the description here, I'd say it's a different issue and you should file a different issue to track that. Having said that, what you saw last could likely be #1428 so do try openshift-install wait-for install-complete after Installer fails to see if it's just taking longer than expected. Also, I assume you tried giving more resources, as I suggested above.

As for the issue being tracked here, given that CSR are pending, I'm pretty sure this is a duplicate of #1893 so I'm closing this one. Please do re-open if you can reproduce the symptoms in the description but CSR are not pending or if they are, you don't see the same in the log as here.

/close

@openshift-ci-robot
Copy link
Contributor

@zeenix: Closing this issue.

In response to this:

@mrniranjan Thanks. Seems you're facing multiple different issues. If your CSRs are all in approved state and you don't see the warnings/errors in the description here, I'd say it's a different issue and you should file a different issue to track that. Having said that, what you saw last could likely be #1428 so do try openshift-install wait-for install-complete after Installer fails to see if it's just taking longer than expected. Also, I assumed you tried giving more resources, as I suggested above.

As for the issue being tracked here, given that CSR are pending, I'm pretty sure this is a duplicate of #1893 so I'm closing this one. Please do re-open if you can reproduce the symptoms in the description but CSR are not pending or if they are, you don't see the same in the log as here.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@TuranTimur
Copy link

TuranTimur commented Aug 9, 2019

Hi. sorry to re open but I'm also experiencing almost same error.

version
openshift-install version

./bin/openshift-install unreleased-master-1545-gd292db908fe206872f646eda48aa3739dc69f015
built from commit d292db9
release image registry.svc.ci.openshift.org/origin/release:4.2

The cluster creation is looping; created, and fails, created again

openshift-install wait-for install-complete

time="20
19-08-10T03:04:43+08:00" level=debug msg="Still waiting for the cluster to initialize: Some cluster operators are still updating: authentication, console"
time="2019-08-10T03:15:24+08:00" level=debug msg="Still waiting for the cluster to initialize: Some cluster operators are still updating: authentication, console"
time="2019-08-10T03:17:22+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-170836"
time="2019-08-10T03:17:23+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-170836: downloading update"
time="2019-08-10T03:17:23+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-170836"
time="2019-08-10T03:17:25+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-170836: 5% complete"
time="2019-08-10T03:17:25+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-170836: 10% complete"
time="2019-08-10T03:17:25+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-170836: 14% complete"
time="2019-08-10T03:17:25+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-170836: 19% complete"
time="2019-08-10T03:17:26+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-170836: 23% complete"
time="2019-08-10T03:17:26+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-170836: 27% complete"
time="2019-08-10T03:17:38+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-170836: 74% complete"
time="2019-08-10T03:17:39+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-170836: 77% complete"
time="2019-08-10T03:17:43+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-170836: 78% complete"
time="2019-08-10T03:17:43+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-170836: 80% complete"
time="2019-08-10T03:17:54+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-170836: 98% complete"
time="2019-08-10T03:18:08+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.okd-2019-08-09-170836: 99% complete"
time="2019-08-10T03:20:10+08:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0
-0.okd-2019-08-09-170836: 99% complete"

KUBECONFIG=./auth/kubeconfig ./bin/oc get pods --all-namespaces | grep -v

"Running|Completed"
NAMESPACE NAME READY STATUS RESTARTS AGE
openshift-apiserver-operator openshift-apiserver-operator-66d77b64f6-npldc 0/1 CrashLoopBackOff 6 93m
openshift-cluster-node-tuning-operator tuned-fsw9h 0/1 CrashLoopBackOff 5 57m
openshift-ingress router-default-6767fb9974-hrs6w 0/1 Pending 0 60m
openshift-kube-controller-manager kube-controller-manager-test1-pxh29-master-0 1/2 CrashLoopBackOff 12 59m
openshift-machine-config-operator etcd-quorum-guard-855d994f67-985bs 0/1 Pending 0 90m
openshift-machine-config-operator etcd-quorum-guard-855d994f67-sd2fh 0/1 Pending 0 90m
openshift-machine-config-operator machine-config-controller-7fdf44bdc-gmx4r 0/1 Error 4 90m
openshift-service-catalog-apiserver-operator openshift-service-catalog-apiserver-operator-75bd4d7c6c-7ktzc 0/1 CrashLoopBackOff 5 64m
openshift-service-catalog-controller-manager-operator openshift-service-catalog-controller-manager-operator-bccfkqmp2 0/1 CrashLoopBackOff 5 64m

@TuranTimur
Copy link

here is log of credential operator

time="2019-08-09T19:31:45Z" level=debug msg="0 cred requests"
time="2019-08-09T19:31:45Z" level=debug msg="set ClusterOperator condition" message="No credentials requests reporting errors." reason=NoCredentialsFailing status=False type=Degraded
time="2019-08-09T19:31:45Z" level=debug msg="set ClusterOperator condition" message="0 of 0 credentials requests provisioned and reconciled." reason=ReconcilingComplete status=False type=Progressing
time="2019-08-09T19:31:45Z" level=debug msg="set ClusterOperator condition" message= reason= status=True type=Available
time="2019-08-09T19:31:45Z" level=debug msg="set ClusterOperator condition" message= reason= status=True type=Upgradeable
time="2019-08-09T19:31:46Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-openstack
time="2019-08-09T19:31:46Z" level=error msg="failed to determine cloud platform type" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-openstack error="unsupported platorm type: Libvirt" secret=openshift-machine-api/openstack-cloud-credentials
time="2019-08-09T19:31:46Z" level=debug msg="syncing cluster operator status"
time="2019-08-09T19:31:46Z" level=warning msg="ignoring for status condition because could not decode provider spec" credentialsRequest=azure-openshift-ingress error="unsupported platorm type: Libvirt"
time="2019-08-09T19:31:46Z" level=warning msg="ignoring for status condition because could not decode provider spec" credentialsRequest=cloud-credential-operator-iam-ro error="unsupported platorm type: Libvirt"
time="2019-08-09T19:31:46Z" level=warning msg="ignoring for status condition because could not decode provider spec" credentialsRequest=gcp-openshift-ingress error="unsupported platorm type: Libvirt"
time="2019-08-09T19:31:46Z" level=warning msg="ignoring for status condition because could not decode provider spec" credentialsRequest=openshift-image-registry error="unsupported platorm type: Libvirt"
time="2019-08-09T19:31:46Z" level=warning msg="ignoring for status condition because could not decode provider spec" credentialsRequest=openshift-image-registry-azure error="unsupported platorm type: Libvirt"
time="2019-08-09T19:31:46Z" level=warning msg="ignoring for status condition because could not decode provider spec" credentialsRequest=openshift-image-registry-gcs error="unsupported platorm type: Libvirt"
time="2019-08-09T19:31:46Z" level=warning msg="ignoring for status condition because could not decode provider spec" credentialsRequest=openshift-image-registry-openstack error="unsupported platorm type: Libvirt"
time="2019-08-09T19:31:46Z" level=warning msg="ignoring for status condition because could not decode provider spec" credentialsRequest=openshift-ingress error="unsupported platorm type: Libvirt"
time="2019-08-09T19:31:46Z" level=warning msg="ignoring for status condition because could not decode provider spec" credentialsRequest=openshift-machine-api-aws error="unsupported platorm type: Libvirt"
time="2019-08-09T19:31:46Z" level=warning msg="ignoring for status condition because could not decode provider spec" credentialsRequest=openshift-machine-api-azure error="unsupported platorm type: Libvirt"
time="2019-08-09T19:31:46Z" level=warning msg="ignoring for status condition because could not decode provider spec" credentialsRequest=openshift-machine-api-gcp error="unsupported platorm type: Libvirt"
time="2019-08-09T19:31:46Z" level=warning msg="ignoring for status condition because could not decode provider spec" credentialsRequest=openshift-machine-api-openstack error="unsupported platorm type: Libvirt"
time="2019-08-09T19:31:46Z" level=debug msg="0 cred requests"
time="2019-08-09T19:31:46Z" level=debug msg="set ClusterOperator condition" message="No credentials requests reporting errors." reason=NoCredentialsFailing status=False type=Degraded
time="2019-08-09T19:31:46Z" level=debug msg="set ClusterOperator condition" message="0 of 0 credentials requests provisioned and reconciled." reason=ReconcilingComplete status=False type=Progressing
time="2019-08-09T19:31:46Z" level=debug msg="set ClusterOperator condition" message= reason= status=True type=Available
time="2019-08-09T19:31:46Z" level=debug msg="set ClusterOperator condition" message= reason= status=True type=Upgradeable
time="2019-08-09T19:31:48Z" level=error msg="leader election lostunable to run the manager"

@TuranTimur
Copy link

and console log

KUBECONFIG=./auth/kubeconfig ./bin/oc --namespace openshift-console logs console-7b469cfb7f-vbdg9
2019/08/9 19:33:12 cmd/main: cookies are secure!
2019/08/9 19:33:12 auth: error contacting auth provider (retrying in 10s): Get https://172.30.0.1:443/.well-known/oauth-authorization-server: dial tcp 172.30.0.1:443: connect: connection refused
2019/08/9 19:33:22 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.test1.tt.testing/oauth/token failed: Head https://oauth-openshift.apps.test1.tt.testing: dial tcp: lookup oauth-openshift.apps.test1.tt.testing on 172.30.0.10:53: no such host
2019/08/9 19:33:32 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.test1.tt.testing/oauth/token failed: Head https://oauth-openshift.apps.test1.tt.testing: dial tcp: lookup oauth-openshift.apps.test1.tt.testing on 172.30.0.10:53: no such host
2019/08/9 19:33:42 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.test1.tt.testing/oauth/token failed: Head https://oauth-openshift.apps.test1.tt.testing: dial tcp: lookup oauth-openshift.apps.test1.tt.testing on 172.30.0.10:53: no such host
2019/08/9 19:33:52 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.test1.tt.testing/oauth/token failed: Head https://oauth-openshift.apps.test1.tt.testing: dial tcp: lookup oauth-openshift.apps.test1.tt.testing on 172.30.0.10:53: no such host
2019/08/9 19:34:02 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.test1.tt.testing/oauth/token failed: Head https://oauth-openshift.apps.test1.tt.testing: dial tcp: lookup oauth-openshift.apps.test1.tt.testing on 172.30.0.10:53: no such host
2019/08/9 19:34:12 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.test1.tt.testing/oauth/token failed: Head https://oauth-openshift.apps.test1.tt.testing: dial tcp: lookup oauth-openshift.apps.test1.tt.testing on 172.30.0.10:53: no such host
2019/08/9 19:34:22 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.test1.tt.testing/oauth/token failed: Head https://oauth-openshift.apps.test1.tt.testing: dial tcp: lookup oauth-openshift.apps.test1.tt.testing on 172.30.0.10:53: no such host
2019/08/9 19:34:32 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.test1.tt.testing/oauth/token failed: Head https://oauth-openshift.apps.test1.tt.testing: dial tcp: lookup oauth-openshift.apps.test1.tt.testing on 172.30.0.10:53: no such host
2019/08/9 19:34:42 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.test1.tt.testing/oauth/token failed: Head https://oauth-openshift.apps.test1.tt.testing: dial tcp: lookup oauth-openshift.apps.test1.tt.testing on 172.30.0.10:53: no such host
2019/08/9 19:34:52 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.test1.tt.testing/oauth/token failed: Head https://oauth-openshift.apps.test1.tt.testing: dial tcp: lookup oauth-openshift.apps.test1.tt.testing on 172.30.0.10:53: no such host
2019/08/9 19:35:02 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.test1.tt.testing/oauth/token failed: Head https://oauth-openshift.apps.test1.tt.testing: dial tcp: lookup oauth-openshift.apps.test1.tt.testing on 172.30.0.10:53: no such host
2019/08/9 19:35:12 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.test1.tt.testing/oauth/token failed: Head https://oauth-openshift.apps.test1.tt.testing: dial tcp: lookup oauth-openshift.apps.test1.tt.testing on 172.30.0.10:53: no such host
2019/08/9 19:35:22 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.test1.tt.testing/oauth/token failed: Head https://oauth-openshift.apps.test1.tt.testing: dial tcp: lookup oauth-openshift.apps.test1.tt.testing on 172.30.0.10:53: no such host
2019/08/9 19:35:32 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.test1.tt.testing/oauth/token failed: Head https://oauth-openshift.apps.test1.tt.testing: dial tcp: lookup oauth-openshift.apps.test1.tt.testing on 172.30.0.10:53: no such host
2019/08/9 19:35:42 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.test1.tt.testing/oauth/token failed: Head https://oauth-openshift.apps.test1.tt.testing: dial tcp: lookup oauth-openshift.apps.test1.tt.testing on 172.30.0.10:53: no such host
2019/08/9 19:35:52 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.test1.tt.testing/oauth/token failed: Head https://oauth-openshift.apps.test1.tt.testing: dial tcp: lookup oauth-openshift.apps.test1.tt.testing on 172.30.0.10:53: no such host
2019/08/9 19:36:02 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.test1.tt.testing/oauth/token failed: Head https://oauth-openshift.apps.test1.tt.testing: dial tcp: lookup oauth-openshift.apps.test1.tt.testing on 172.30.0.10:53: no such host

@TuranTimur
Copy link

TuranTimur commented Aug 9, 2019

#1007

it turned out that this is what I can follow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants