Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libvirt] Default time for 30m for the cluster to initialize not enough #1428

Closed
praveenkumar opened this issue Mar 18, 2019 · 35 comments
Closed
Labels
platform/libvirt priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@praveenkumar
Copy link
Contributor

Version

$ openshift-install version
openshift-install unreleased-master-569-g660f2bbf74431775c0677449bc5487572b069d26
built from commit 660f2bbf74431775c0677449bc5487572b069d26

Platform (aws|libvirt|openstack):

libvirt

What happened?

Cluster initialisation time out after 30m and cluster is taking around 45-50 mins to become healthy in libvirt provider.

What you expected to happen?

Cluster should have taken 30 mins only for libvirt also like it is taking for aws.

How to reproduce it (as minimally and precisely as possible)?

$ openshift-install create cluster
[...]
level=info msg="Waiting up to 30m0s for the cluster at https://api.test1.openshift.testing:6443 to initialize..."
level=debug msg="Still waiting for the cluster to initialize..."
level=debug msg="Still waiting for the cluster to initialize..."
level=debug msg="Still waiting for the cluster to initialize..."
level=debug msg="Still waiting for the cluster to initialize..."
level=debug msg="Still waiting for the cluster to initialize..."
level=debug msg="Still waiting for the cluster to initialize..."
level=debug msg="Still waiting for the cluster to initialize..."
level=debug msg="Still waiting for the cluster to initialize..."
level=debug msg="Still waiting for the cluster to initialize..."
level=debug msg="Still waiting for the cluster to initialize..."
level=debug msg="Still waiting for the cluster to initialize..."
level=debug msg="Still waiting for the cluster to initialize: Could not update servicemonitor \"openshift-kube-scheduler-operator/kube-scheduler-operator\" (299 of 306): the server does not recognize this resource, check extension API servers"
level=debug msg="Still waiting for the cluster to initialize..."
level=debug msg="Still waiting for the cluster to initialize..."
level=fatal msg="failed to initialize the cluster: timed out waiting for the condition"

$ oc get co
NAME                                  VERSION                           AVAILABLE   PROGRESSING   FAILING   SINCE
authentication                        4.0.0-0.alpha-2019-03-17-071624   True        False         False     112s
cluster-autoscaler                    4.0.0-0.alpha-2019-03-17-071624   True        False         False     2m3s
console                               4.0.0-0.alpha-2019-03-17-071624   True        False         False     116s
dns                                   4.0.0-0.alpha-2019-03-17-071624   True        False         False     63m
image-registry                        4.0.0-0.alpha-2019-03-17-071624   True        False         False     55m
ingress                               4.0.0-0.alpha-2019-03-17-071624   True        False         False     54m
kube-apiserver                        4.0.0-0.alpha-2019-03-17-071624   True        True          True      106s
kube-controller-manager               4.0.0-0.alpha-2019-03-17-071624   True        False         False     109s
kube-scheduler                        4.0.0-0.alpha-2019-03-17-071624   True        False         False     109s
machine-api                           4.0.0-0.alpha-2019-03-17-071624   True        False         False     64m
machine-config                        4.0.0-0.alpha-2019-03-17-071624   True        False         False     63m
marketplace-operator                  4.0.0-0.alpha-2019-03-17-071624   True        False         False     58m
monitoring                            4.0.0-0.alpha-2019-03-17-071624   True        False         False     12m
network                               4.0.0-0.alpha-2019-03-17-071624   True        True          False     64m
node-tuning                           4.0.0-0.alpha-2019-03-17-071624   True        False         False     58m
openshift-apiserver                   4.0.0-0.alpha-2019-03-17-071624   True        False         False     7m48s
openshift-cloud-credential-operator   4.0.0-0.alpha-2019-03-17-071624   True        False         False     62m
openshift-controller-manager          4.0.0-0.alpha-2019-03-17-071624   True        False         False     116s
openshift-samples                     4.0.0-0.alpha-2019-03-17-071624   True        False         False     53m
operator-lifecycle-manager            4.0.0-0.alpha-2019-03-17-071624   True        False         False     64m
service-ca                                                              True        False         False     99s
service-catalog-apiserver             4.0.0-0.alpha-2019-03-17-071624   True        False         False     105s
service-catalog-controller-manager    4.0.0-0.alpha-2019-03-17-071624   True        False         False     111s
storage                                                                 True        False         False     54m

$ oc logs kube-apiserver-operator-7d7b576bd6-xbtrh -n openshift-kube-apiserver-operator
E0318 06:42:28.447175       1 reflector.go:134] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: Failed to list *v1.Network: Get https://172.30.0.1:443/apis/config.openshift.io/v1/networks?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0318 06:42:28.447380       1 reflector.go:134] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: Failed to list *v1.Infrastructure: Get https://172.30.0.1:443/apis/config.openshift.io/v1/infrastructures?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0318 06:42:28.474647       1 reflector.go:134] k8s.io/client-go/informers/factory.go:131: Failed to list *v1.ServiceAccount: Get https://172.30.0.1:443/api/v1/namespaces/openshift-kube-apiserver/serviceaccounts?limit=500&resourceVersion=0: dial tcp 172.30.0.1:443: connect: connection refused
E0318 06:42:35.703268       1 event.go:259] Could not construct reference to: '&v1.ConfigMap{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Data:map[string]string(nil), BinaryData:map[string][]uint8(nil)}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Normal' 'LeaderElection' 'e320aa8f-4948-11e9-9b3e-0a580a80000d stopped leading'
I0318 06:42:35.703428       1 leaderelection.go:249] failed to renew lease openshift-kube-apiserver-operator/kube-apiserver-operator-lock: failed to tryAcquireOrRenew context deadline exceeded
F0318 06:42:35.703502       1 leaderelection.go:65] leaderelection lost
I0318 06:42:35.758575       1 backing_resource_controller.go:155] Shutting down BackingResourceController
I0318 06:42:35.758751       1 client_cert_rotation_controller.go:168] Shutting down CertRotationController - "KubeSchedulerClient"
I0318 06:42:35.758769       1 client_cert_rotation_controller.go:168] Shutting down CertRotationController - "KubeControllerManagerClient"
I0318 06:42:35.758793       1 client_cert_rotation_controller.go:168] Shutting down CertRotationController - "ServiceNetworkServing"
I0318 06:42:35.758805       1 client_cert_rotation_controller.go:168] Shutting down CertRotationController - "LocalhostServing"
I0318 06:42:35.758817       1 client_cert_rotation_controller.go:168] Shutting down CertRotationController - "AggregatorProxyClientCert"
I0318 06:42:35.758827       1 client_cert_rotation_controller.go:168] Shutting down CertRotationController - "LoadBalancerServing"
I0318 06:42:35.758840       1 client_cert_rotation_controller.go:168] Shutting down CertRotationController - "KubeAPIServerCertSyncer"
I0318 06:42:35.758850       1 resourcesync_controller.go:211] Shutting down ResourceSyncController
I0318 06:42:35.758885       1 monitoring_resource_controller.go:163] Shutting down MonitoringResourceController
I0318 06:42:35.758893       1 config_observer_controller.go:159] Shutting down ConfigObserver
I0318 06:42:35.758902       1 unsupportedconfigoverrides_controller.go:164] Shutting down UnsupportedConfigOverridesController
I0318 06:42:35.758911       1 node_controller.go:133] Shutting down NodeController
I0318 06:42:35.759139       1 targetconfigcontroller.go:317] Shutting down TargetConfigController
I0318 06:42:35.759149       1 prune_controller.go:288] Shutting down PruneController
I0318 06:42:35.759156       1 staticpodstate_controller.go:178] Shutting down StaticPodStateController
I0318 06:42:35.759163       1 installer_controller.go:742] Shutting down InstallerController
I0318 06:42:35.759172       1 revision_controller.go:296] Shutting down RevisionController
I0318 06:42:35.759400       1 status_controller.go:189] Shutting down StatusSyncer-kube-apiserver
F0318 06:42:35.759664       1 builder.go:240] stopped

$ oc get pods --all-namespaces | grep -i crash
openshift-authentication-operator                       openshift-authentication-operator-5cdb858474-mscb8                0/1     CrashLoopBackOff   11         63m
openshift-console-operator                              console-operator-6969dc4d65-plgbj                                 0/1     CrashLoopBackOff   11         63m
openshift-controller-manager-operator                   openshift-controller-manager-operator-d54df77d7-9bw5t             0/1     CrashLoopBackOff   14         68m
openshift-controller-manager                            controller-manager-wgd5x                                          0/1     CrashLoopBackOff   9          59m
openshift-kube-apiserver-operator                       kube-apiserver-operator-7d7b576bd6-xbtrh                          0/1     CrashLoopBackOff   14         68m
openshift-kube-controller-manager-operator              kube-controller-manager-operator-7fd685cf44-7v7rb                 0/1     CrashLoopBackOff   14         68m
openshift-machine-api                                   cluster-autoscaler-operator-5dfb8c9c49-wpxrw                      0/1     CrashLoopBackOff   13         68m
openshift-sdn                                           sdn-controller-t5q9b                                              0/1     CrashLoopBackOff   13         68m
openshift-service-ca-operator                           openshift-service-ca-operator-7b56d9576d-vhfvd                    0/1     CrashLoopBackOff   13         68m
openshift-service-ca                                    apiservice-cabundle-injector-5f4598456-vmkgn                      0/1     CrashLoopBackOff   13         67m
openshift-service-ca                                    configmap-cabundle-injector-6db5fd6746-6dr8h                      0/1     CrashLoopBackOff   13         67m
openshift-service-ca                                    service-serving-cert-signer-7f54646c5-znlst                       0/1     CrashLoopBackOff   13         67m
openshift-service-catalog-controller-manager-operator   openshift-service-catalog-controller-manager-operator-596b4mphw   0/1     CrashLoopBackOff   11         63m
@ghost
Copy link

ghost commented Mar 18, 2019

@praveenkumar,

slow internet loses much has in podman pull

Regards,
Fábio Sbano

@praveenkumar
Copy link
Contributor Author

slow internet loses much has in podman pull

@ssbano I am not on the slow internet, for this I am using the GCE nested virt and image pulls are quite fast, have to tried latest master of installer today/yesterday with a success?

@ghost
Copy link

ghost commented Mar 18, 2019

@praveenkumar,

I changed on create.go 60 minutes

Regards,
Fábio Sbano

@ghost
Copy link

ghost commented Mar 18, 2019

@praveenkumar,

I'll try tonight and give you some feedback.

Regards,
Fabio Sbano

@ghost
Copy link

ghost commented Mar 18, 2019

@praveenkumar,

I was waiting to solve the certificate problem #1394

@praveenkumar
Copy link
Contributor Author

Check the pods behaviour and some of the components are retry continue.

$ oc get pods --all-namespaces
NAMESPACE                                               NAME                                                              READY   STATUS      RESTARTS   AGE
kube-system                                             etcd-member-test1-dl2sn-master-0                                  1/1     Running     0          4h41m
openshift-apiserver-operator                            openshift-apiserver-operator-6cc4546d78-lztg7                     1/1     Running     48         4h42m
openshift-apiserver                                     apiserver-7xqg9                                                   1/1     Running     0          4h35m
openshift-authentication-operator                       openshift-authentication-operator-5cdb858474-mscb8                1/1     Running     37         4h36m
openshift-authentication                                openshift-authentication-c9c5f79ff-98zdf                          1/1     Running     0          92m
openshift-authentication                                openshift-authentication-c9c5f79ff-zkwcl                          1/1     Running     0          93m
openshift-cloud-credential-operator                     cloud-credential-operator-6575487785-7fq5j                        1/1     Running     0          4h40m
openshift-cluster-machine-approver                      machine-approver-7c7c9d9686-cnbpx                                 1/1     Running     0          4h42m
openshift-cluster-node-tuning-operator                  cluster-node-tuning-operator-74d66ffb55-fhcjq                     1/1     Running     0          4h36m
openshift-cluster-node-tuning-operator                  tuned-42hj9                                                       1/1     Running     0          4h35m
openshift-cluster-node-tuning-operator                  tuned-kd6tw                                                       1/1     Running     0          4h35m
openshift-cluster-samples-operator                      cluster-samples-operator-7d69dc98c-g69fj                          1/1     Running     0          4h36m
openshift-cluster-storage-operator                      cluster-storage-operator-7499867ff9-dkjrc                         1/1     Running     5          4h36m
openshift-cluster-version                               cluster-version-operator-847975454d-tnbv4                         1/1     Running     0          4h41m
openshift-console-operator                              console-operator-6969dc4d65-plgbj                                 1/1     Running     37         4h36m
openshift-console                                       console-547c7b8846-29z89                                          1/1     Running     2          4h33m
openshift-console                                       console-547c7b8846-8l8gm                                          1/1     Running     2          4h33m
openshift-controller-manager-operator                   openshift-controller-manager-operator-d54df77d7-9bw5t             1/1     Running     47         4h42m
openshift-controller-manager                            controller-manager-4b7sc                                          1/1     Running     8          39m
openshift-dns-operator                                  dns-operator-58b496f677-ckfh2                                     1/1     Running     0          4h42m
openshift-dns                                           dns-default-4d7km                                                 2/2     Running     0          4h38m
openshift-dns                                           dns-default-q6htd                                                 2/2     Running     0          4h41m
openshift-image-registry                                cluster-image-registry-operator-bff67847f-zq5p7                   1/1     Running     0          4h36m
openshift-image-registry                                image-registry-5d85d6c8d9-kgl28                                   1/1     Running     0          4h35m
openshift-image-registry                                node-ca-p7q26                                                     1/1     Running     0          4h35m
openshift-image-registry                                node-ca-sc4h9                                                     1/1     Running     0          4h35m
openshift-ingress-operator                              ingress-operator-84fcfc454d-rwvd2                                 1/1     Running     0          4h36m
openshift-ingress                                       router-default-875d76bbf-bzsbx                                    0/1     Pending     0          4h35m
openshift-ingress                                       router-default-875d76bbf-xbhb9                                    1/1     Running     0          4h35m
openshift-kube-apiserver-operator                       kube-apiserver-operator-7d7b576bd6-xbtrh                          1/1     Running     47         4h42m
openshift-kube-apiserver                                installer-32-test1-dl2sn-master-0                                 0/1     Completed   0          142m
openshift-kube-apiserver                                installer-41-test1-dl2sn-master-0                                 0/1     OOMKilled   0          51m
openshift-kube-apiserver                                installer-42-test1-dl2sn-master-0                                 0/1     Completed   0          40m
openshift-kube-apiserver                                installer-43-test1-dl2sn-master-0                                 0/1     Completed   0          39m
openshift-kube-apiserver                                installer-44-test1-dl2sn-master-0                                 0/1     Completed   0          37m
openshift-kube-apiserver                                installer-45-test1-dl2sn-master-0                                 0/1     Completed   0          36m
openshift-kube-apiserver                                installer-46-test1-dl2sn-master-0                                 0/1     Completed   0          34m
openshift-kube-apiserver                                installer-47-test1-dl2sn-master-0                                 0/1     Completed   0          32m
openshift-kube-apiserver                                installer-48-test1-dl2sn-master-0                                 0/1     Completed   0          28m
openshift-kube-apiserver                                installer-49-test1-dl2sn-master-0                                 0/1     Completed   0          22m
openshift-kube-apiserver                                kube-apiserver-test1-dl2sn-master-0                               2/2     Running     0          21m
openshift-kube-apiserver                                revision-pruner-32-test1-dl2sn-master-0                           0/1     Completed   0          136m
openshift-kube-apiserver                                revision-pruner-41-test1-dl2sn-master-0                           0/1     OOMKilled   0          50m
openshift-kube-apiserver                                revision-pruner-42-test1-dl2sn-master-0                           0/1     Completed   0          39m
openshift-kube-apiserver                                revision-pruner-43-test1-dl2sn-master-0                           0/1     Completed   0          37m
openshift-kube-apiserver                                revision-pruner-44-test1-dl2sn-master-0                           0/1     Completed   0          36m
openshift-kube-apiserver                                revision-pruner-45-test1-dl2sn-master-0                           0/1     Completed   0          34m
openshift-kube-apiserver                                revision-pruner-46-test1-dl2sn-master-0                           0/1     Completed   0          32m
openshift-kube-apiserver                                revision-pruner-47-test1-dl2sn-master-0                           0/1     Completed   0          28m
openshift-kube-apiserver                                revision-pruner-48-test1-dl2sn-master-0                           0/1     Completed   0          22m
openshift-kube-apiserver                                revision-pruner-49-test1-dl2sn-master-0                           0/1     Completed   0          16m
openshift-kube-controller-manager-operator              kube-controller-manager-operator-7fd685cf44-7v7rb                 1/1     Running     47         4h42m
openshift-kube-controller-manager                       installer-2-test1-dl2sn-master-0                                  0/1     Completed   0          4h40m
openshift-kube-controller-manager                       installer-22-test1-dl2sn-master-0                                 0/1     Completed   0          160m
openshift-kube-controller-manager                       installer-3-test1-dl2sn-master-0                                  0/1     Completed   0          4h39m
openshift-kube-controller-manager                       installer-37-test1-dl2sn-master-0                                 0/1     Completed   0          40m
openshift-kube-controller-manager                       installer-38-test1-dl2sn-master-0                                 0/1     Completed   0          40m
openshift-kube-controller-manager                       installer-39-test1-dl2sn-master-0                                 0/1     Completed   0          32m
openshift-kube-controller-manager                       installer-40-test1-dl2sn-master-0                                 0/1     Completed   0          28m
openshift-kube-controller-manager                       installer-41-test1-dl2sn-master-0                                 0/1     Completed   0          22m
openshift-kube-controller-manager                       installer-42-test1-dl2sn-master-0                                 0/1     Completed   0          16m
openshift-kube-controller-manager                       installer-5-test1-dl2sn-master-0                                  0/1     Completed   0          4h34m
openshift-kube-controller-manager                       kube-controller-manager-test1-dl2sn-master-0                      1/1     Running     3          15m
openshift-kube-controller-manager                       revision-pruner-2-test1-dl2sn-master-0                            0/1     Completed   0          4h39m
openshift-kube-controller-manager                       revision-pruner-22-test1-dl2sn-master-0                           0/1     Completed   0          159m
openshift-kube-controller-manager                       revision-pruner-3-test1-dl2sn-master-0                            0/1     Completed   0          4h38m
openshift-kube-controller-manager                       revision-pruner-37-test1-dl2sn-master-0                           0/1     Completed   0          40m
openshift-kube-controller-manager                       revision-pruner-38-test1-dl2sn-master-0                           0/1     Completed   0          38m
openshift-kube-controller-manager                       revision-pruner-39-test1-dl2sn-master-0                           0/1     Completed   0          28m
openshift-kube-controller-manager                       revision-pruner-40-test1-dl2sn-master-0                           0/1     Completed   0          22m
openshift-kube-controller-manager                       revision-pruner-41-test1-dl2sn-master-0                           0/1     Completed   0          16m
openshift-kube-controller-manager                       revision-pruner-42-test1-dl2sn-master-0                           0/1     Completed   0          14m
openshift-kube-controller-manager                       revision-pruner-5-test1-dl2sn-master-0                            0/1     Completed   0          4h34m
openshift-kube-scheduler-operator                       openshift-kube-scheduler-operator-7fb87bf449-g4pvv                1/1     Running     47         4h42m
openshift-kube-scheduler                                installer-20-test1-dl2sn-master-0                                 0/1     Error       0          159m
openshift-kube-scheduler                                installer-21-test1-dl2sn-master-0                                 0/1     Error       0          157m
openshift-kube-scheduler                                installer-37-test1-dl2sn-master-0                                 0/1     Error       0          37m
openshift-kube-scheduler                                installer-38-test1-dl2sn-master-0                                 0/1     Completed   0          36m
openshift-kube-scheduler                                installer-39-test1-dl2sn-master-0                                 0/1     Completed   0          32m
openshift-kube-scheduler                                installer-40-test1-dl2sn-master-0                                 0/1     Completed   0          28m
openshift-kube-scheduler                                installer-41-test1-dl2sn-master-0                                 0/1     Completed   0          22m
openshift-kube-scheduler                                installer-42-test1-dl2sn-master-0                                 0/1     Completed   0          15m
openshift-kube-scheduler                                openshift-kube-scheduler-test1-dl2sn-master-0                     1/1     Running     3          15m
openshift-kube-scheduler                                revision-pruner-20-test1-dl2sn-master-0                           0/1     Completed   0          158m
openshift-kube-scheduler                                revision-pruner-21-test1-dl2sn-master-0                           0/1     Completed   0          156m
openshift-kube-scheduler                                revision-pruner-37-test1-dl2sn-master-0                           0/1     Completed   0          36m
openshift-kube-scheduler                                revision-pruner-38-test1-dl2sn-master-0                           0/1     OOMKilled   0          36m
openshift-kube-scheduler                                revision-pruner-39-test1-dl2sn-master-0                           0/1     Completed   0          28m
openshift-kube-scheduler                                revision-pruner-40-test1-dl2sn-master-0                           0/1     Completed   0          28m
openshift-kube-scheduler                                revision-pruner-41-test1-dl2sn-master-0                           0/1     Completed   0          16m
openshift-kube-scheduler                                revision-pruner-42-test1-dl2sn-master-0                           0/1     Completed   0          14m
openshift-machine-api                                   cluster-autoscaler-operator-5dfb8c9c49-wpxrw                      1/1     Running     46         4h42m
openshift-machine-api                                   clusterapi-manager-controllers-787f5f5dbf-t2dpb                   4/4     Running     0          4h41m
openshift-machine-api                                   machine-api-operator-7958456954-vw2bf                             1/1     Running     0          4h41m
openshift-machine-config-operator                       machine-config-controller-55f66dcdb4-qjgvr                        1/1     Running     0          4h41m
openshift-machine-config-operator                       machine-config-daemon-rwpx5                                       1/1     Running     0          4h38m
openshift-machine-config-operator                       machine-config-daemon-xhkmj                                       1/1     Running     0          4h40m
openshift-machine-config-operator                       machine-config-operator-6d55fdf7b9-kf9w5                          1/1     Running     0          4h42m
openshift-machine-config-operator                       machine-config-server-n29s4                                       1/1     Running     0          4h40m
openshift-marketplace                                   certified-operators-64cfd5dfc-x5pf4                               1/1     Running     2          4h32m
openshift-marketplace                                   community-operators-5dd94977d7-9925k                              1/1     Running     2          4h32m
openshift-marketplace                                   marketplace-operator-5957c59b55-t7hqt                             1/1     Running     2          4h36m
openshift-marketplace                                   redhat-operators-86f64c75b5-9gdz7                                 1/1     Running     2          4h32m
openshift-monitoring                                    alertmanager-main-0                                               3/3     Running     0          4h31m
openshift-monitoring                                    alertmanager-main-1                                               3/3     Running     0          4h30m
openshift-monitoring                                    alertmanager-main-2                                               3/3     Running     0          4h30m
openshift-monitoring                                    cluster-monitoring-operator-556478479c-xhgwx                      1/1     Running     0          4h36m
openshift-monitoring                                    grafana-6df47448b-9fc7l                                           2/2     Running     0          4h31m
openshift-monitoring                                    kube-state-metrics-c886c4d49-4qxqg                                3/3     Running     0          4h36m
openshift-monitoring                                    node-exporter-rvjdg                                               2/2     Running     0          4h35m
openshift-monitoring                                    node-exporter-wv2jk                                               2/2     Running     0          4h35m
openshift-monitoring                                    prometheus-adapter-7867c57f4f-7xd77                               1/1     Running     0          51m
openshift-monitoring                                    prometheus-adapter-7867c57f4f-9mhrg                               1/1     Running     0          51m
openshift-monitoring                                    prometheus-k8s-0                                                  6/6     Running     1          4h29m
openshift-monitoring                                    prometheus-k8s-1                                                  6/6     Running     1          4h29m
openshift-monitoring                                    prometheus-operator-654865bfd9-5kbln                              1/1     Running     2          4h32m
openshift-multus                                        multus-dqpv5                                                      1/1     Running     0          4h41m
openshift-multus                                        multus-qpss2                                                      1/1     Running     0          4h38m
openshift-network-operator                              network-operator-8474b95564-vhbd2                                 1/1     Running     0          4h42m
openshift-operator-lifecycle-manager                    catalog-operator-7555554bb6-vd8k2                                 1/1     Running     0          4h40m
openshift-operator-lifecycle-manager                    olm-operator-7554b549f9-zcxgg                                     1/1     Running     0          4h40m
openshift-operator-lifecycle-manager                    olm-operators-qmrdj                                               1/1     Running     0          4h41m
openshift-operator-lifecycle-manager                    packageserver-586d546f99-dtlnj                                    1/1     Running     0          33m
openshift-operator-lifecycle-manager                    packageserver-586d546f99-rxwxx                                    1/1     Running     0          33m
openshift-sdn                                           ovs-4bf5l                                                         1/1     Running     0          4h38m
openshift-sdn                                           ovs-ldddl                                                         1/1     Running     0          4h41m
openshift-sdn                                           sdn-4vjvb                                                         1/1     Running     0          4h38m
openshift-sdn                                           sdn-controller-t5q9b                                              1/1     Running     46         4h41m
openshift-sdn                                           sdn-wbvpw                                                         1/1     Running     1          4h41m
openshift-service-ca-operator                           openshift-service-ca-operator-7b56d9576d-vhfvd                    1/1     Running     45         4h42m
openshift-service-ca                                    apiservice-cabundle-injector-5f4598456-vmkgn                      1/1     Running     46         4h40m
openshift-service-ca                                    configmap-cabundle-injector-6db5fd6746-6dr8h                      1/1     Running     46         4h40m
openshift-service-ca                                    service-serving-cert-signer-7f54646c5-znlst                       1/1     Running     46         4h40m
openshift-service-catalog-apiserver-operator            openshift-service-catalog-apiserver-operator-7798bcdbc7-ggrsr     1/1     Running     37         4h36m
openshift-service-catalog-controller-manager-operator   openshift-service-catalog-controller-manager-operator-596b4mphw   1/1     Running     37         4h36m

@ghost
Copy link

ghost commented Mar 18, 2019

@praveenkumar,

I changed the bootstrap and master values to 32gb mem

Are you using the default setting?

@ghost
Copy link

ghost commented Mar 18, 2019

@praveenkumar,

oc describe , What problem does it report?

@praveenkumar
Copy link
Contributor Author

@ssbano I use master 12gb not the default one and bootstrap doesn't run workload it's just there till the etcd ownership transferred to master.

@ghost
Copy link

ghost commented Mar 18, 2019

@praveenkumar,

I know but you adjusting the values the bootstrap flows faster and then it is destroyed by the installer after

@zeenix
Copy link
Contributor

zeenix commented Mar 18, 2019

Cluster initialisation time out after 30m and cluster is taking around 45-50 mins to become healthy in libvirt provider.

FWIW, I can verify that is usually the case, although I haven't checked how long it really takes for the cluster to be fully up. It's at least more than 30 mins since installer gives up before that.

@ghost
Copy link

ghost commented Mar 20, 2019

@praveenkumar,

I've been busy but I'll make git fetch soon.

Did you have any progress?

Regards,
Fabio Sbano

@wking
Copy link
Member

wking commented Mar 20, 2019

On libvirt, this may be due to delays pulling all the images down to your local machines. But it can happen for other reasons too. For example, here we timed out on AWS because of the same CVO-roll-out delays discussed in openshift/cluster-authentication-operator#95.

@praveenkumar
Copy link
Contributor Author

praveenkumar commented Mar 20, 2019

@ssbano So today with latest master I am not seeing this except known one openshift/cluster-storage-operator#19 , might be something is improved regarding CVO roll-out, but I will keep it open till we have a desired success for libvirt provider.

@ghost
Copy link

ghost commented Mar 20, 2019

@praveenkumar,

All right!

Regards,
Fabio Sbano

@jsm84
Copy link

jsm84 commented Mar 28, 2019

I'm also hitting the 30m timeout when using the libvirt install method:

time="2019-03-27T23:47:43-04:00" level=debug msg="Destroy complete! Resources: 3 destroyed."
time="2019-03-27T23:47:43-04:00" level=info msg="Waiting up to 30m0s for the cluster at https://api.test.openshift.local:6443 to initialize..."
time="2019-03-27T23:47:43-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.alpha-2019-03-28-033217: 87% complete"
time="2019-03-27T23:47:53-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.alpha-2019-03-28-033217: 90% complete"
time="2019-03-27T23:48:11-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.alpha-2019-03-28-033217: 90% complete"
time="2019-03-27T23:49:14-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.alpha-2019-03-28-033217: 90% complete"
time="2019-03-27T23:50:38-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.alpha-2019-03-28-033217: 92% complete"
time="2019-03-27T23:51:57-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.alpha-2019-03-28-033217: 92% complete"
time="2019-03-27T23:53:10-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.alpha-2019-03-28-033217: 92% complete"
time="2019-03-27T23:55:23-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.alpha-2019-03-28-033217: 93% complete"
time="2019-03-27T23:55:38-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.alpha-2019-03-28-033217: 94% complete"
time="2019-03-28T00:04:23-04:00" level=debug msg="Still waiting for the cluster to initialize: Cluster operator image-registry is still updating"
time="2019-03-28T00:17:43-04:00" level=fatal msg="failed to initialize the cluster: Cluster operator image-registry is still updating: timed out waiting for the condition"

I'll try changing the timeout in create.go to 60m and will report back.

@praveenkumar
Copy link
Contributor Author

@jsm84 sure, also can you tell which version of installer you are using? Is it from the master or a tag one?

@jsm84
Copy link

jsm84 commented Mar 30, 2019

I was using the master branch. I just ran the install again, after rebasing against the latest changes to master and changing the cluster initialization timeout to 60m, and it still timed out. It reached 94% completion. I'm running this on a Lenovo P50 laptop (4 cores, 8 threads, 32GB RAM) which isn't exactly low-end hardware, and Fedora 29 Xfce Spin (lighter than usual DE = more free resources).

stbenjam added a commit to stbenjam/installer that referenced this issue Apr 8, 2019
We frequently see issues where 30m isn't enough for the cluster to come
up. Waiting another 10-15 minutes would probaly resolve the issue for
us. This bumps the cluster timeout to 60, and lets users override all
timeouts with environment variables.

Fixes openshift#1428
@cgwalters
Copy link
Member

One thing that helped a lot for me in libvirt was to provide more vcpus. See https://github.com/cgwalters/xokdinst/blob/d45f422bed5a6cfaa78a1efb67aaecf5f77d6b37/src/xokdinst.rs#L146

The installer should probably detect the number of physical cores and match that.

@jsm84
Copy link

jsm84 commented Apr 8, 2019

Still getting a timeout, even at 3hrs 20min (200min). I went digging further into the code, besides simply adjusting the timeout value in create.go, and found that this function is simply watching the Status:Conditions field of the ClusterVersion CR and waiting until it reads complete. I think I've found why this is occurring, at least in my case.

Here are the contents of the ClusterVersion CR for my libvirt cluster:

oc describe clusterversion
Name:         version
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterVersion
Metadata:
  Creation Timestamp:  2019-04-08T18:44:38Z
  Generation:          1
  Resource Version:    82281
  Self Link:           /apis/config.openshift.io/v1/clusterversions/version
  UID:                 5cb8ccc6-5a2e-11e9-8397-52fdfc072182
Spec:
  Channel:     stable-4.0
  Cluster ID:  655b7588-49af-4697-891c-d3e0e20b5cf6
  Upstream:    https://api.openshift.com/api/upgrades_info/v1/graph
Status:
  Available Updates:  <nil>
  Conditions:
    Last Transition Time:  2019-04-08T18:44:56Z
    Status:                False
    Type:                  Available
    Last Transition Time:  2019-04-08T21:45:48Z
    Status:                False
    Type:                  Failing
    Last Transition Time:  2019-04-08T18:44:56Z
    Message:               Working towards 4.0.0-0.alpha-2019-04-08-172826: 93% complete
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2019-04-08T18:44:56Z
    Message:               Unable to retrieve available updates: unknown version 4.0.0-0.alpha-2019-04-08-172826
    Reason:                RemoteFailed
    Status:                False
    Type:                  RetrievedUpdates
  Desired:
    Image:    registry.svc.ci.openshift.org/openshift/origin-release@sha256:3e213b7638925940ddd1337f607809d022db840f1d58624d7a46eab2c5b3bbea
    Version:  4.0.0-0.alpha-2019-04-08-172826
  History:
    Completion Time:    <nil>
    Image:              registry.svc.ci.openshift.org/openshift/origin-release@sha256:3e213b7638925940ddd1337f607809d022db840f1d58624d7a46eab2c5b3bbea
    Started Time:       2019-04-08T18:44:56Z
    State:              Partial
    Version:            4.0.0-0.alpha-2019-04-08-172826
  Observed Generation:  1
  Version Hash:         a5X_hbkL0n4=
Events:                 <none>

It appears to be attempting to fetch updates and is failing due to an unknown version. This version tag appears to be created at compile time, which might explain why it fails to find anything. I just don't know the best course of action to circumvent this. It appears as though the cluster operator is stuck in a "never completed" state since it can't find updates for the version it's running.

@praveenkumar
Copy link
Contributor Author

@jsm84 What is the output of oc get co when it timeout? you need to consider #1371 also when try to start on the libvirt.

@jsm84
Copy link

jsm84 commented Apr 9, 2019

Below is the pasted output:

$ oc get co
NAME                                 VERSION                           AVAILABLE   PROGRESSING   FAILING   SINCE
authentication                                                         False       False         True      3h21m
cloud-credential                     4.0.0-0.alpha-2019-04-09-141413   True        False         False     3h48m
cluster-autoscaler                   4.0.0-0.alpha-2019-04-09-141413   True        False         False     3h48m
console                              4.0.0-0.alpha-2019-04-09-141413   True        True          True      3h28m
dns                                  4.0.0-0.alpha-2019-04-09-141413   True        False         False     3h47m
image-registry                                                         False       True          False     3h27m
ingress                                                                False       True          False     3h27m
kube-apiserver                       4.0.0-0.alpha-2019-04-09-141413   True        False         False     3h44m
kube-controller-manager              4.0.0-0.alpha-2019-04-09-141413   True        False         False     3h29m
kube-scheduler                       4.0.0-0.alpha-2019-04-09-141413   True        False         False     3h44m
machine-api                          4.0.0-0.alpha-2019-04-09-141413   True        False         False     3h48m
machine-config                       4.0.0-0.alpha-2019-04-09-141413   True        False         False     3h47m
marketplace                          4.0.0-0.alpha-2019-04-09-141413   True        False         False     3h27m
monitoring                                                             False       True          True      3h17m
network                              4.0.0-0.alpha-2019-04-09-141413   True        False         False     3h48m
node-tuning                          4.0.0-0.alpha-2019-04-09-141413   True        False         False     3h26m
openshift-apiserver                  4.0.0-0.alpha-2019-04-09-141413   True        False         False     2m39s
openshift-controller-manager         4.0.0-0.alpha-2019-04-09-141413   True        False         False     3h23m
openshift-samples                    4.0.0-0.alpha-2019-04-09-141413   True        False         False     3h22m
operator-lifecycle-manager           4.0.0-0.alpha-2019-04-09-141413   True        False         False     3h48m
operator-lifecycle-manager-catalog   4.0.0-0.alpha-2019-04-09-141413   True        False         False     3h48m
service-ca                           4.0.0-0.alpha-2019-04-09-141413   True        False         False     3h21m
service-catalog-apiserver            4.0.0-0.alpha-2019-04-09-141413   True        False         False     3h27m
service-catalog-controller-manager   4.0.0-0.alpha-2019-04-09-141413   True        False         False     3h27m
storage                              4.0.0-0.alpha-2019-04-09-141413   True        False         False     3h27m

I can mostly confirm #1371, as I noticed that I was unable to login as kubeadmin, since I couldn't resolve the openshift-authentication route (missing the apps.test.openshift.local wildcard entry in libvirt's net-config for the cluster network). I can't completely confirm it, as I'm not getting to the "Waiting on Console" part of the installation. My timeout is occuring during the phase prior to that: INFO Waiting up to 3h20m0s for the cluster at https://api.test.openshift.local:6443 to initialize.

Here's the log snippet:

time="2019-04-09T11:10:05-04:00" level=debug msg="Still waiting for the cluster to initialize: Cluster operator monitoring is still updating"
time="2019-04-09T11:11:35-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.alpha-2019-04-09-141413: 94% complete"
time="2019-04-09T11:27:35-04:00" level=debug msg="Still waiting for the cluster to initialize: Cluster operator image-registry is still updating"
time="2019-04-09T11:29:35-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.alpha-2019-04-09-141413: 94% complete"
time="2019-04-09T11:46:05-04:00" level=debug msg="Still waiting for the cluster to initialize: Cluster operator image-registry is still updating"
time="2019-04-09T11:50:05-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.alpha-2019-04-09-141413: 94% complete"
time="2019-04-09T12:06:16-04:00" level=debug msg="Still waiting for the cluster to initialize: Cluster operator image-registry is still updating"
time="2019-04-09T12:10:22-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.alpha-2019-04-09-141413: 94% complete"
time="2019-04-09T12:26:30-04:00" level=debug msg="Still waiting for the cluster to initialize: Cluster operator monitoring is still updating"
time="2019-04-09T12:30:37-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.alpha-2019-04-09-141413: 94% complete"
time="2019-04-09T12:47:06-04:00" level=debug msg="Still waiting for the cluster to initialize: Cluster operator monitoring is still updating"
time="2019-04-09T12:51:13-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.alpha-2019-04-09-141413: 94% complete"
time="2019-04-09T13:07:35-04:00" level=debug msg="Still waiting for the cluster to initialize: Cluster operator monitoring is still updating"
time="2019-04-09T13:11:50-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.alpha-2019-04-09-141413: 94% complete"
time="2019-04-09T13:28:14-04:00" level=debug msg="Still waiting for the cluster to initialize: Cluster operator authentication is still updating"
time="2019-04-09T13:32:33-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.alpha-2019-04-09-141413: 94% complete"
time="2019-04-09T13:49:00-04:00" level=debug msg="Still waiting for the cluster to initialize: Cluster operator monitoring is still updating"
time="2019-04-09T13:52:55-04:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.0.0-0.alpha-2019-04-09-141413: 94% complete"
time="2019-04-09T13:55:39-04:00" level=fatal msg="failed to initialize the cluster: Working towards 4.0.0-0.alpha-2019-04-09-141413: 94% complete: timed out waiting for the condition"

And finally, here's the output of oc describe clusterversion:

Name:         version
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterVersion
Metadata:
  Creation Timestamp:  2019-04-09T14:31:21Z
  Generation:          1
  Resource Version:    148258
  Self Link:           /apis/config.openshift.io/v1/clusterversions/version
  UID:                 24f83687-5ad4-11e9-8cfc-664f163f5f0f
Spec:
  Channel:         fast
  Cluster ID:      b943f276-4d58-4e89-90eb-def70325b1c1
  Desired Update:  <nil>
  Upstream:        https://api.openshift.com/api/upgrades_info/v1/graph
Status:
  Available Updates:  <nil>
  Conditions:
    Last Transition Time:  2019-04-09T14:31:21Z
    Status:                False
    Type:                  Available
    Last Transition Time:  2019-04-09T18:50:13Z
    Message:               Could not update servicemonitor "openshift-kube-controller-manager-operator/kube-controller-manager-operator" (298 of 308): the server does not recognize this resource, check extension API servers
    Reason:                UpdatePayloadResourceTypeMissing
    Status:                True
    Type:                  Failing
    Last Transition Time:  2019-04-09T14:31:21Z
    Message:               Unable to apply 4.0.0-0.alpha-2019-04-09-141413: a required extension is not available to update
    Reason:                UpdatePayloadResourceTypeMissing
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2019-04-09T14:31:21Z
    Message:               Unable to retrieve available updates: unknown version 4.0.0-0.alpha-2019-04-09-141413
    Reason:                RemoteFailed
    Status:                False
    Type:                  RetrievedUpdates
  Desired:
    Image:    registry.svc.ci.openshift.org/openshift/origin-release@sha256:567e704a6017e1588e350203416c86fb06e33fb062b23817b1f62cfa73a35bfd
    Version:  4.0.0-0.alpha-2019-04-09-141413
  History:
    Completion Time:    <nil>
    Image:              registry.svc.ci.openshift.org/openshift/origin-release@sha256:567e704a6017e1588e350203416c86fb06e33fb062b23817b1f62cfa73a35bfd
    Started Time:       2019-04-09T14:31:21Z
    State:              Partial
    Version:            4.0.0-0.alpha-2019-04-09-141413
  Observed Generation:  1
  Version Hash:         SBoe2693NXs=
Events:                 <none>

@rsriniva
Copy link

I think increasing timeouts may not be the ideal solution. What i found out is that after the bootstrap VM sets up the masters, it waits for the API to come up by pinging the public DNS, which takes a long long time to propagate depending on where you are located and what DNS you are using. I got 100% failure when installing from within the office network because we have our own corporate DNS server. Next I tried from home and my ISPs DNS is also slow to see the new DNS entry.

Finally, I changed /etc/resolv.conf to point to google DNS and finally the install was successful.

I think we are going to have a problem with enterprise customers trying to install from within the corp network and failing. the further away from US based AWS clusters, the worse the problem

Increasing timeout to 1 hr or more does not make sense and is a band aid solution. I am just a daft moron trying out the install, but is it possible to separate out the install from the Route 53 DNS config - maybe do a staged install, and ask users to manually verify that the DNS is resolvable from their machine and then continue tearing down bootstrap etc?

Also, instead of pinging the public DNS, why not ping the private AWS DNS entry and finish the install. User can wait for DNS to propagate and then do oc login when the entry is publicly visible? just my 2c

@wking
Copy link
Member

wking commented Apr 24, 2019

Also, instead of pinging the public DNS, why not ping the private AWS DNS entry...

That should only be an option from within the installer-created VPC, unless you're following a UPI flow. But you could certainly create an installer machine in your AWS account (in your target region or not, probably wouldn't matter) and run openshift-installer from there to avoid any issues with your local network.

@zeenix
Copy link
Contributor

zeenix commented Apr 24, 2019

@rsriniva Have you setup the DNS overlay as instructed here and the used the workaround for the console issue?

@wking wking changed the title [libvirt] Default time for 30m for the cluste to initialize not enough [libvirt] Default time for 30m for the cluster to initialize not enough Apr 24, 2019
@schmaustech
Copy link

When using the kni-installer for baremetal in a virtualized setting, as for CI environments or classroom lab settings, the apiTimeout wait of 30m is not enough for the bootstrap VM to instantiate due to the reduced speed of nested virtualization.

I have noticed in these nested virtualized environments that job:

A start job is running for Ignition (disks)
[ **] A start job is running for Ignition (disks) (14min 11s / no limit)

can take anywhere from 12m to 22m to complete depending on the underlying disk in the nested environment. By that time rhcos has already been laid down on the master nodes who have been waiting for the api but the bootstrap VM is still in process of coming up and the 30m timer is already running.

I use to make the timeout adjustment myself in create.go but now that we are working with release payloads where that is bundled in there I do not have that ability.

This lab is in Westford.

@zeenix
Copy link
Contributor

zeenix commented Jun 12, 2019

/label platform/libvirt

@zeenix
Copy link
Contributor

zeenix commented Jun 27, 2019

/priority critical-urgent

@openshift-ci-robot openshift-ci-robot added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Jun 27, 2019
@fgiloux
Copy link

fgiloux commented Jul 4, 2019

I had the state that jsm84 described in his comment
This was due to my worker node not joining the cluster. Looking at it the kubelet with the service account system:serviceaccount:openshift-machine-config-operator:node-bootstrapper was not allowed to query the API server.
I had plenty of CSRs pending. I approved them: for c in $(oc get csr -o name); do oc adm certificate approve $c; done
My worker node joined the cluster and the remaining operators completed.

@Sarang-Sangram
Copy link

I am facing timeout issue too :
Error log :

DEBUG module.bootstrap.libvirt_volume.bootstrap: Destruction complete after 6s
DEBUG module.bootstrap.libvirt_ignition.bootstrap: Still destroying... [id=/var/lib/libvirt/images/os4c12-ptj9b-bo...n;5d3593d7-6059-b20a-544e-b2e7f3773083, 10s elapsed]
DEBUG module.bootstrap.libvirt_ignition.bootstrap: Destruction complete after 11s DEBUG Destroy complete! Resources: 3 destroyed.
INFO Waiting up to 30m0s for the cluster at https://api.os4c12.hc.os4:6443 to initialize... FATAL failed to initialize the cluster: timed out waiting for the condition

OS image: rhcos-420.8.20190708.2-qemu.qcow2

Openshift Installer version : # openshift-install version openshift-install unreleased-master-1258-g1a78ab5e12aa9295b20c789c0e42c0760bb753c1 built from commit 1a78ab5e12aa9295b20c789c0e42c0760bb753c1 release image registry.svc.ci.openshift.org/origin/release:4.2

I have tried all alternatives from modifying the system resource value to increasing the timeout value, but no luck.

@zeenix
Copy link
Contributor

zeenix commented Jul 22, 2019

@Sarang-Sangram Sorry to hear that you face the same but if waiting longer for the Installer doesn't help and it never comes up, this is not the relevant issue. You might be facing #1893

@Sarang-Sangram
Copy link

After destroying the cluster I re-initiated the cluster creation, which again timed out. But I noticed, below states in the oc commands :

  1. Below pods in pending state :
core@os4c12-79q6j-master-0 ~]$ oc get pods --all-namespaces --config=kubeconfig |grep -v Completed | grep -v Running
NAMESPACE                                               NAME                                                              READY   STATUS      RESTARTS   AGE
openshift-ingress                                       router-default-565b58b856-vtx6m                                   0/1     Pending     0          12m
openshift-machine-config-operator                       etcd-quorum-guard-7774ffcfc9-hcmzg                                0/1     Pending     0          36m
openshift-machine-config-operator                       etcd-quorum-guard-7774ffcfc9-x64sc                                0/1     Pending     0          36m
  1. All oc services are up and True except for Authentication
[core@os4c12-79q6j-master-0 ~]$ oc get co --config=kubeconfig
NAME                                       VERSION                         AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                                             Unknown     Unknown       True       15m
cloud-credential                           4.2.0-0.okd-2019-07-30-090300   True        False         False      36m
  1. In the describe clusterversion, I noticed error as unable to retrieve available updates :
[core@os4c12-79q6j-master-0 ~]$ oc describe clusterversion --config=kubeconfig
Name:         version
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterVersion
Metadata:
  Creation Timestamp:  2019-07-30T09:49:19Z
  Generation:          1
  Resource Version:    17367
  Self Link:           /apis/config.openshift.io/v1/clusterversions/version
  UID:                 4d67f930-b2af-11e9-b5b0-52540082171a
Spec:
  Channel:     stable-4.2
  Cluster ID:  09d2b77e-c935-4167-8561-9ebbb385dcfb
  Upstream:    https://api.openshift.com/api/upgrades_info/v1/graph
Status:
  Available Updates:  <nil>
  Conditions:
    Last Transition Time:  2019-07-30T09:49:20Z
    Status:                False
    Type:                  Available
    Last Transition Time:  2019-07-30T10:27:45Z
    Message:               Some cluster operators are still updating: authentication, console
    Reason:                ClusterOperatorsNotAvailable
    Status:                True
    Type:                  Failing
    Last Transition Time:  2019-07-30T09:49:20Z
   -->Message:               Unable to apply 4.2.0-0.okd-2019-07-30-090300: some cluster operators have not yet rolled out
    Reason:                ClusterOperatorsNotAvailable
    Status:                True
    Type:                  Progressing
    Last Transition Time:  2019-07-30T09:49:20Z
   --->Message:               Unable to retrieve available updates: currently installed version 4.2.0-0.okd-2019-07-30-090300 not found in the "stable-4.2" channel 
    Reason:                RemoteFailed
    Status:                False
    Type:                  RetrievedUpdates
  Desired:
    Force:    false
    Image:    registry.svc.ci.openshift.org/origin/release@sha256:c2a18d8b435f21b007150d0cf06fc1181916a30b52c7379d12f1a9adc09ed268
    Version:  4.2.0-0.okd-2019-07-30-090300
  History:
    Completion Time:    <nil>
    Image:              registry.svc.ci.openshift.org/origin/release@sha256:c2a18d8b435f21b007150d0cf06fc1181916a30b52c7379d12f1a9adc09ed268
    Started Time:       2019-07-30T09:49:20Z
    State:              Partial
    Verified:           false
    Version:            4.2.0-0.okd-2019-07-30-090300
  Observed Generation:  1
  Version Hash:         5GapOZTd4e0=
Events:                 <none>
 

@ghost
Copy link

ghost commented Sep 28, 2019

@abhinavdahiya
Copy link
Contributor

we already support wait-for <> subcommands for people who are outliers even in 30 min timeouts.

And as for 30 min timeout customization, we CI most of our deployment models and they are always in around ~25-35 mins total, and we do accept bugs when there is slowdown because of some bug, and not based on user environment restrictions.

/close

@openshift-ci-robot
Copy link
Contributor

@abhinavdahiya: Closing this issue.

In response to this:

we already support wait-for <> subcommands for people who are outliers even in 30 min timeouts.

And as for 30 min timeout customization, we CI most of our deployment models and they are always in around ~25-35 mins total, and we do accept bugs when there is slowdown because of some bug, and not based on user environment restrictions.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform/libvirt priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

Successfully merging a pull request may close this issue.