Cluster deployment does not finish on vsphere #1884

neuroserve · 2019-06-21T12:31:32Z

Version

../openshift-installer/openshift-install v4.1.1-201906040019-dirty
built from commit fb776038a1d90b2b83839ab5deb8579287972e11
release image quay.io/openshift-release-dev/ocp-release@sha256:9c5f0df8b192a0d7b46cd5f6a4da2289c155fd5302dec7954f8f06c878160b8b

Platform (aws|libvirt|openstack):

Platform is vsphere

[root@console ~]# govc about
Name: VMware vCenter Server
Vendor: VMware, Inc.
Version: 6.7.0
Build: 13007421
OS type: linux-x64
API type: VirtualCenter
API version: 6.7.2
Product ID: vpx
UUID: 1d884c6e-a1ac-4daf-9e25-b197e7f6bd91

VMs have a hw version of 15.

What happened?

The cluster does not finish deployment.

[root@console]# oc --config=/root/sgl-1/auth/kubeconfig get clusterversion -oyaml
apiVersion: v1
items:

apiVersion: config.openshift.io/v1
kind: ClusterVersion
metadata:
creationTimestamp: "2019-06-21T10:11:17Z"
generation: 1
name: version
resourceVersion: "180651"
selfLink: /apis/config.openshift.io/v1/clusterversions/version
uid: e8d08059-940c-11e9-91f9-005056acf7ce
spec:
channel: stable-4.1
clusterID: 8ff0010a-1f47-4f05-a555-3e7ef1321d70
upstream: https://api.openshift.com/api/upgrades_info/v1/graph
status:
availableUpdates: null
conditions:
- lastTransitionTime: "2019-06-21T10:11:47Z"
  status: "False"
  type: Available
- lastTransitionTime: "2019-06-21T10:55:41Z"
  status: "False"
  type: Failing
- lastTransitionTime: "2019-06-21T10:11:47Z"
  message: 'Working towards 4.1.2: 80% complete'
  status: "True"
  type: Progressing
- lastTransitionTime: "2019-06-21T10:11:47Z"
  status: "True"
  type: RetrievedUpdates
  desired:
  force: false
  image: quay.io/openshift-release-dev/ocp-release@sha256:9c5f0df8b192a0d7b46cd5f6a4da2289c155fd5302dec7954f8f06c878160b8b
  version: 4.1.2
  history:
- completionTime: null
  image: quay.io/openshift-release-dev/ocp-release@sha256:9c5f0df8b192a0d7b46cd5f6a4da2289c155fd5302dec7954f8f06c878160b8b
  startedTime: "2019-06-21T10:11:47Z"
  state: Partial
  verified: false
  version: 4.1.2
  observedGeneration: 1
  versionHash: CGRQCirWw8Y=
  kind: List
  metadata:
  resourceVersion: ""
  selfLink: ""

[root@console sgl-1]# oc --config=/root/sgl-1/auth/kubeconfig get clusterversion version -o=jsonpath='{range .status.conditions[*]}{.type}{" "}{.status}{" "}{.message}{"\n"}{end}'
Available False
Failing False
Progressing True Working towards 4.1.2: 73% complete
RetrievedUpdates True

[root@console sgl-1]# oc --config=/root/sgl-1/auth/kubeconfig get clusteroperator
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
cloud-credential 4.1.2 True False False 102m
cluster-autoscaler 4.1.2 True False False 102m
dns 4.1.2 True False False 11m
kube-apiserver 4.1.2 False True False 101m
kube-controller-manager 4.1.2 False True False 102m
kube-scheduler 4.1.2 False True False 101m
machine-api 4.1.2 True False False 102m
machine-config 4.1.2 False True False 102m
network 4.1.2 True False False 102m
openshift-apiserver 4.1.2 Unknown Unknown False 102m
openshift-controller-manager False True False 101m
operator-lifecycle-manager 4.1.2 True False False 98m
operator-lifecycle-manager-catalog 4.1.2 True False False 98m
service-ca True True False 101m

The master-nodes reboot pretty often for reasons unknown to me. The have the required resources regarding memory and cpu according to the docs. Those VMs are the only ones on the vsphere cluster. Basically each vm runs on its own ESX host.

What you expected to happen?

I expect the openshift cluster setup to succeed.

Anything else we need to know?

The vcenter is flagging the VMs "red" because of their memory consumption. Is more memory (than 16 GB) needed on the master nodes?

As you see I could not execute the above commands fast enough before the master nodes rebooted themselves. That's why once we see "'Working towards 4.1.2: 80% complete" and the second time "Progressing True Working towards 4.1.2: 73% complete". I guess we do not reach 100%. Question is why (and what progress is indicated here)?

I'm available for installation debugging on vsphere as this is a non-production cluster.

Thanx for your help.

The text was updated successfully, but these errors were encountered:

neuroserve · 2019-06-21T12:32:15Z

NAME                                 VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
cloud-credential                     4.1.2     True        False         False      102m
cluster-autoscaler                   4.1.2     True        False         False      102m
dns                                  4.1.2     True        False         False      11m
kube-apiserver                       4.1.2     False       True          False      101m
kube-controller-manager              4.1.2     False       True          False      102m
kube-scheduler                       4.1.2     False       True          False      101m
machine-api                          4.1.2     True        False         False      102m
machine-config                       4.1.2     False       True          False      102m
network                              4.1.2     True        False         False      102m
openshift-apiserver                  4.1.2     Unknown     Unknown       False      102m
openshift-controller-manager                   False       True          False      101m
operator-lifecycle-manager           4.1.2     True        False         False      98m
operator-lifecycle-manager-catalog   4.1.2     True        False         False      98m
service-ca                                     True        True          False      101m

staebler · 2019-06-21T13:45:41Z

This looks to be an occurrence of a bug that happens occasionally with the installation. The memory requirements look to be sufficient. Would you be willing to retry the installation?

staebler · 2019-06-21T13:46:20Z

See https://bugzilla.redhat.com/show_bug.cgi?id=1717257.

neuroserve · 2019-06-24T06:49:53Z

I re-tried the installation. I used the https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.1.3/openshift-install-linux-4.1.3.tar.gz installer. But obviously I shouldn't have:

apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
  metadata:
    creationTimestamp: "2019-06-24T06:39:59Z"
    generation: 1
    name: version
    resourceVersion: "17285"
    selfLink: /apis/config.openshift.io/v1/clusterversions/version
    uid: e335216a-964a-11e9-9551-005056acf7ce
  spec:
    channel: stable-4.1
    clusterID: 0045c01b-5a84-4e98-b55b-b8a6aa399c2a
    upstream: https://api.openshift.com/api/upgrades_info/v1/graph
  status:
    availableUpdates: null
    conditions:
    - lastTransitionTime: "2019-06-24T06:40:29Z"
      status: "False"
      type: Available
    - lastTransitionTime: "2019-06-24T06:45:44Z"
      status: "False"
      type: Failing
    - lastTransitionTime: "2019-06-24T06:40:29Z"
      message: 'Working towards 4.1.3: 82% complete'
      status: "True"
      type: Progressing
    - lastTransitionTime: "2019-06-24T06:40:29Z"
      message: 'Unable to retrieve available updates: currently installed version
        4.1.3 not found in the "stable-4.1" channel'
      reason: RemoteFailed
      status: "False"
      type: RetrievedUpdates
    desired:
      force: false
      image: quay.io/openshift-release-dev/ocp-release@sha256:f852f9d8c2e81a633e874e57a7d9bdd52588002a9b32fc037dba12b67cf1f8b0
      version: 4.1.3
    history:
    - completionTime: null
      image: quay.io/openshift-release-dev/ocp-release@sha256:f852f9d8c2e81a633e874e57a7d9bdd52588002a9b32fc037dba12b67cf1f8b0
      startedTime: "2019-06-24T06:40:29Z"
      state: Partial
      verified: false
      version: 4.1.3
    observedGeneration: 1
    versionHash: aGxoncMkq9U=
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

I'll try again now with 4.1.2.

neuroserve · 2019-06-24T07:18:16Z

OK. Re-trying with 4.1.2 comes up to this stage:

apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
  metadata:
    creationTimestamp: "2019-06-24T07:05:19Z"
    generation: 1
    name: version
    resourceVersion: "10443"
    selfLink: /apis/config.openshift.io/v1/clusterversions/version
    uid: 6d54be78-964e-11e9-a683-005056acf7ce
  spec:
    channel: stable-4.1
    clusterID: 507b020b-7b3f-4088-8f53-f117b48d19c0
    upstream: https://api.openshift.com/api/upgrades_info/v1/graph
  status:
    availableUpdates: null
    conditions:
    - lastTransitionTime: "2019-06-24T07:05:53Z"
      status: "False"
      type: Available
    - lastTransitionTime: "2019-06-24T07:09:23Z"
      message: |-
        Multiple errors are preventing progress:
        * Cluster operator authentication has not yet reported success
        * Cluster operator cloud-credential has not yet reported success
        * Cluster operator cluster-autoscaler has not yet reported success
        * Cluster operator image-registry has not yet reported success
        * Cluster operator ingress has not yet reported success
        * Cluster operator kube-apiserver has not yet reported success
        * Cluster operator kube-controller-manager has not yet reported success
        * Cluster operator kube-scheduler has not yet reported success
        * Cluster operator machine-api has not yet reported success
        * Cluster operator machine-config has not yet reported success
        * Cluster operator marketplace has not yet reported success
        * Cluster operator monitoring has not yet reported success
        * Cluster operator network has not yet reported success
        * Cluster operator node-tuning has not yet reported success
        * Cluster operator openshift-apiserver has not yet reported success
        * Cluster operator openshift-controller-manager has not yet reported success
        * Cluster operator operator-lifecycle-manager has not yet reported success
        * Cluster operator service-ca has not yet reported success
        * Cluster operator service-catalog-apiserver has not yet reported success
        * Cluster operator service-catalog-controller-manager has not yet reported success
        * Cluster operator storage has not yet reported success
        * Could not update deployment "openshift-cluster-version/cluster-version-operator" (5 of 350)
        * Could not update deployment "openshift-dns-operator/dns-operator" (305 of 350)
        * Could not update oauthclient "console" (220 of 350): the server does not recognize this resource, check extension API servers
        * Could not update rolebinding "openshift/cluster-samples-operator-openshift-edit" (182 of 350): resource may have been deleted
        * Could not update servicemonitor "openshift-apiserver-operator/openshift-apiserver-operator" (346 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (321 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-controller-manager-operator/openshift-controller-manager-operator" (349 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-image-registry/image-registry" (327 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator" (337 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-controller-manager-operator/kube-controller-manager-operator" (340 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (343 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator" (330 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator" (333 of 350): the server does not recognize this resource, check extension API servers
      reason: MultipleErrors
      status: "True"
      type: Failing
    - lastTransitionTime: "2019-06-24T07:05:53Z"
      message: 'Unable to apply 4.1.2: an unknown error has occurred'
      reason: MultipleErrors
      status: "True"
      type: Progressing
    - lastTransitionTime: "2019-06-24T07:05:53Z"
      status: "True"
      type: RetrievedUpdates
    desired:
      force: false
      image: quay.io/openshift-release-dev/ocp-release@sha256:9c5f0df8b192a0d7b46cd5f6a4da2289c155fd5302dec7954f8f06c878160b8b
      version: 4.1.2
    history:
    - completionTime: null
      image: quay.io/openshift-release-dev/ocp-release@sha256:9c5f0df8b192a0d7b46cd5f6a4da2289c155fd5302dec7954f8f06c878160b8b
      startedTime: "2019-06-24T07:05:53Z"
      state: Partial
      verified: false
      version: 4.1.2
    observedGeneration: 1
    versionHash: CGRQCirWw8Y=
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

I'll try and dig a little deeper. If you have any hints for me, feel free :-)

neuroserve · 2019-06-24T08:56:53Z

Now this:

apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
  metadata:
    creationTimestamp: "2019-06-24T08:47:00Z"
    generation: 1
    name: version
    resourceVersion: "9883"
    selfLink: /apis/config.openshift.io/v1/clusterversions/version
    uid: a160e17c-965c-11e9-92f5-005056acf7ce
  spec:
    channel: fast
    clusterID: a094c84b-c7e9-4809-90d8-1ce87fd8bbd0
    upstream: https://api.openshift.com/api/upgrades_info/v1/graph
  status:
    availableUpdates: null
    conditions:
    - lastTransitionTime: "2019-06-24T08:47:00Z"
      status: "False"
      type: Available
    - lastTransitionTime: "2019-06-24T08:50:30Z"
      message: |-
        Multiple errors are preventing progress:
        * Cluster operator authentication has not yet reported success
        * Cluster operator image-registry has not yet reported success
        * Cluster operator ingress has not yet reported success
        * Cluster operator kube-apiserver is still updating: missing version information for kube-apiserver
        * Cluster operator kube-controller-manager is still updating: missing version information for kube-controller-manager
        * Cluster operator kube-scheduler has not yet reported success
        * Cluster operator machine-config has not yet reported success
        * Cluster operator marketplace has not yet reported success
        * Cluster operator monitoring has not yet reported success
        * Cluster operator node-tuning has not yet reported success
        * Cluster operator openshift-apiserver has not yet reported success
        * Cluster operator openshift-controller-manager is still updating
        * Cluster operator service-catalog-apiserver has not yet reported success
        * Cluster operator service-catalog-controller-manager has not yet reported success
        * Cluster operator storage has not yet reported success
        * Could not update oauthclient "console" (220 of 350): the server does not recognize this resource, check extension API servers
        * Could not update rolebinding "openshift/cluster-samples-operator-openshift-edit" (182 of 350): resource may have been deleted
        * Could not update servicemonitor "openshift-apiserver-operator/openshift-apiserver-operator" (346 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (321 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-controller-manager-operator/openshift-controller-manager-operator" (349 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-image-registry/image-registry" (327 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator" (337 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-controller-manager-operator/kube-controller-manager-operator" (340 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (343 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-operator-lifecycle-manager/olm-operator" (267 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator" (330 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator" (333 of 350): the server does not recognize this resource, check extension API servers
      reason: MultipleErrors
      status: "True"
      type: Failing
    - lastTransitionTime: "2019-06-24T08:47:00Z"
      message: 'Unable to apply 4.1.2: an unknown error has occurred'
      message: |-
        Multiple errors are preventing progress:
        * Cluster operator authentication has not yet reported success
        * Cluster operator image-registry has not yet reported success
        * Cluster operator ingress has not yet reported success
        * Cluster operator kube-apiserver is still updating: missing version information for kube-apiserver
        * Cluster operator kube-controller-manager is still updating: missing version information for kube-controller-manager
        * Cluster operator kube-scheduler has not yet reported success
        * Cluster operator machine-config has not yet reported success
        * Cluster operator marketplace has not yet reported success
        * Cluster operator monitoring has not yet reported success
        * Cluster operator node-tuning has not yet reported success
        * Cluster operator openshift-apiserver has not yet reported success
        * Cluster operator openshift-controller-manager is still updating
        * Cluster operator service-catalog-apiserver has not yet reported success
        * Cluster operator service-catalog-controller-manager has not yet reported success
        * Cluster operator storage has not yet reported success
        * Could not update oauthclient "console" (220 of 350): the server does not recognize this resource, check extension API servers
        * Could not update rolebinding "openshift/cluster-samples-operator-openshift-edit" (182 of 350): resource may have been deleted
        * Could not update servicemonitor "openshift-apiserver-operator/openshift-apiserver-operator" (346 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (321 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-controller-manager-operator/openshift-controller-manager-operator" (349 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-image-registry/image-registry" (327 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator" (337 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-controller-manager-operator/kube-controller-manager-operator" (340 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (343 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-operator-lifecycle-manager/olm-operator" (267 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator" (330 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator" (333 of 350): the server does not recognize this resource, check extension API servers
      reason: MultipleErrors
      status: "True"
      type: Failing
    - lastTransitionTime: "2019-06-24T08:47:00Z"
      message: 'Unable to apply 4.1.2: an unknown error has occurred'
      reason: MultipleErrors
      status: "True"
      type: Progressing
    - lastTransitionTime: "2019-06-24T08:47:00Z"
      message: 'Unable to retrieve available updates: currently installed version
        4.1.2 not found in the "fast" channel'
      reason: RemoteFailed
      status: "False"
      type: RetrievedUpdates
    desired:
      force: false
      image: quay.io/openshift-release-dev/ocp-release@sha256:9c5f0df8b192a0d7b46cd5f6a4da2289c155fd5302dec7954f8f06c878160b8b
      version: 4.1.2
    history:
    - completionTime: null
      image: quay.io/openshift-release-dev/ocp-release@sha256:9c5f0df8b192a0d7b46cd5f6a4da2289c155fd5302dec7954f8f06c878160b8b
      startedTime: "2019-06-24T08:47:00Z"
      state: Partial
      verified: false
      version: 4.1.2
    observedGeneration: 1
    versionHash: CGRQCirWw8Y=
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

What does
"message: 'Unable to retrieve available updates: currently installed version
4.1.2 not found in the "fast" channel'"
mean? Should I use the 4.1.0 installer, instead? The link on the OpenShift Infrastructure Providers refers to https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/ - and that is 4.1.2 atm.

neuroserve · 2019-06-24T09:41:21Z

Not even 4.1.0 works:

apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
  metadata:
    creationTimestamp: "2019-06-24T09:35:29Z"
    generation: 1
    name: version
    resourceVersion: "9767"
    selfLink: /apis/config.openshift.io/v1/clusterversions/version
    uid: 67c8ec6b-9663-11e9-a435-005056acf7ce
  spec:
    channel: fast
    clusterID: b67c0aba-094d-4279-99ae-980d746b49e2
    upstream: https://api.openshift.com/api/upgrades_info/v1/graph
  status:
    availableUpdates: null
    conditions:
    - lastTransitionTime: "2019-06-24T09:35:29Z"
      status: "False"
      type: Available
    - lastTransitionTime: "2019-06-24T09:39:00Z"
      message: |-
        Multiple errors are preventing progress:
        * Cluster operator authentication has not yet reported success
        * Cluster operator image-registry has not yet reported success
        * Cluster operator ingress has not yet reported success
        * Cluster operator kube-apiserver is still updating: missing version information for kube-apiserver
        * Cluster operator kube-controller-manager is still updating: missing version information for kube-controller-manager
        * Cluster operator kube-scheduler has not yet reported success
        * Cluster operator machine-config has not yet reported success
        * Cluster operator marketplace has not yet reported success
        * Cluster operator monitoring has not yet reported success
        * Cluster operator node-tuning has not yet reported success
        * Cluster operator openshift-apiserver has not yet reported success
        * Cluster operator openshift-controller-manager is still updating
        * Cluster operator service-catalog-apiserver has not yet reported success
        * Cluster operator service-catalog-controller-manager has not yet reported success
        * Cluster operator storage has not yet reported success
        * Could not update oauthclient "console" (220 of 350): the server does not recognize this resource, check extension API servers
        * Could not update rolebinding "openshift/cluster-samples-operator-openshift-edit" (182 of 350): resource may have been deleted
        * Could not update servicemonitor "openshift-apiserver-operator/openshift-apiserver-operator" (346 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (321 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-controller-manager-operator/openshift-controller-manager-operator" (349 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-image-registry/image-registry" (327 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator" (337 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-controller-manager-operator/kube-controller-manager-operator" (340 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (343 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-operator-lifecycle-manager/olm-operator" (267 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator" (330 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator" (333 of 350): the server does not recognize this resource, check extension API servers
      reason: MultipleErrors
      status: "True"
      type: Failing
    - lastTransitionTime: "2019-06-24T09:35:29Z"
      message: 'Unable to apply 4.1.0: an unknown error has occurred'
      reason: MultipleErrors
      status: "True"
      type: Progressing
    - lastTransitionTime: "2019-06-24T09:35:29Z"
      message: 'Unable to retrieve available updates: currently installed version
        4.1.0 not found in the "fast" channel'
      reason: RemoteFailed
      status: "False"
      type: RetrievedUpdates
    desired:
      force: false
      image: quay.io/openshift-release-dev/ocp-release@sha256:b8307ac0f3ec4ac86c3f3b52846425205022da52c16f56ec31cbe428501001d6
      version: 4.1.0
    history:
    - completionTime: null
      image: quay.io/openshift-release-dev/ocp-release@sha256:b8307ac0f3ec4ac86c3f3b52846425205022da52c16f56ec31cbe428501001d6
      startedTime: "2019-06-24T09:35:29Z"
      state: Partial
      verified: false
      version: 4.1.0
    observedGeneration: 1
    versionHash: 7arisRJErYo=
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Why is even 4.1.0 pulled from the "fast" channel? I'd expected "channel: stable-4.1". Where is this channel selected?

neuroserve · 2019-06-24T10:49:46Z

NAME                                 VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                 Unknown     Unknown       True       63m
cloud-credential                     4.1.0     True        False         False      72m
cluster-autoscaler                   4.1.0     True        False         False      72m
console                              4.1.0     Unknown     True          False      46m
dns                                  4.1.0     True        False         False      16m
image-registry                                 False       False         True       50m
ingress                              unknown   False       True          False      52m
kube-apiserver                       4.1.0     True        False         False      65m
kube-controller-manager              4.1.0     True        True          False      65m
kube-scheduler                       4.1.0     True        False         False      49m
machine-api                          4.1.0     True        False         False      72m
machine-config                       4.1.0     True        False         False      66m
marketplace                          4.1.0     True        False         False      16m
monitoring                                     False       True          True       49m
network                              4.1.0     True        False         False      72m
node-tuning                          4.1.0     True        False         False      16m
openshift-apiserver                  4.1.0     False       False         False      84s
openshift-controller-manager         4.1.0     True        False         False      15m
openshift-samples                    4.1.0     True        False         False      16m
operator-lifecycle-manager           4.1.0     True        False         False      71m
operator-lifecycle-manager-catalog   4.1.0     True        False         False      71m
service-ca                           4.1.0     True        False         False      71m
service-catalog-apiserver            4.1.0     True        False         False      63m
service-catalog-controller-manager   4.1.0     True        False         False      63m
storage                              4.1.0     True        False         False      51m

staebler · 2019-06-24T13:28:53Z

NAME                                 VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                 Unknown     Unknown       True       63m
cloud-credential                     4.1.0     True        False         False      72m
cluster-autoscaler                   4.1.0     True        False         False      72m
console                              4.1.0     Unknown     True          False      46m
dns                                  4.1.0     True        False         False      16m
image-registry                                 False       False         True       50m
ingress                              unknown   False       True          False      52m
kube-apiserver                       4.1.0     True        False         False      65m
kube-controller-manager              4.1.0     True        True          False      65m
kube-scheduler                       4.1.0     True        False         False      49m
machine-api                          4.1.0     True        False         False      72m
machine-config                       4.1.0     True        False         False      66m
marketplace                          4.1.0     True        False         False      16m
monitoring                                     False       True          True       49m
network                              4.1.0     True        False         False      72m
node-tuning                          4.1.0     True        False         False      16m
openshift-apiserver                  4.1.0     False       False         False      84s
openshift-controller-manager         4.1.0     True        False         False      15m
openshift-samples                    4.1.0     True        False         False      16m
operator-lifecycle-manager           4.1.0     True        False         False      71m
operator-lifecycle-manager-catalog   4.1.0     True        False         False      71m
service-ca                           4.1.0     True        False         False      71m
service-catalog-apiserver            4.1.0     True        False         False      63m
service-catalog-controller-manager   4.1.0     True        False         False      63m
storage                              4.1.0     True        False         False      51m

Have you set a storage backend for the image registry? See https://access.redhat.com/documentation/en-us/openshift_container_platform/4.1/html/installing/installing-on-vsphere#installation-operators-config_installing-vsphere.

For the cluster operators that are not available, you can get the reason for the failure from the yaml for the operator. For example, oc get co openshift-apiserver -oyaml.

neuroserve · 2019-06-24T14:07:39Z

Haven't setup a storage backend because the cluster did not complete its setup, yet. Should I proceed anyway? Prerequisite seems to be a "provisioned persistent volume (PV) with ReadWriteMany access mode, such as NFS." How do I provision such a volume (and why do I have to do it when I gave the vsphere user the required rights beforehand)?

"Verify you do not have a registry pod:"

NAME                                               READY   STATUS    RESTARTS   AGE
cluster-image-registry-operator-5fc86678cf-7wv5d   1/1     Running   0          3h59m

I already have one, obviously.

"Check the registry configuration:"

# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: imageregistry.operator.openshift.io/v1
kind: Config
metadata:
  creationTimestamp: "2019-06-24T09:58:22Z"
  finalizers:
  - imageregistry.operator.openshift.io/finalizer
  generation: 1
  name: cluster
  resourceVersion: "121535"
  selfLink: /apis/imageregistry.operator.openshift.io/v1/configs/cluster
  uid: 9a02c38d-9666-11e9-ac41-005056ac20a2
spec:
  defaultRoute: false
  httpSecret: somesecret
  logging: 2
  managementState: Managed
  proxy:
    http: ""
    https: ""
    noProxy: ""
  readOnly: false
  replicas: 1
  requests:
    read:
      maxInQueue: 0
      maxRunning: 0
      maxWaitInQueue: 0s
    write:
      maxInQueue: 0
      maxRunning: 0
      maxWaitInQueue: 0s
  storage: {}
status:
  conditions:
  - lastTransitionTime: "2019-06-24T09:58:22Z"
    message: The deployment does not exist
    reason: DeploymentNotFound
    status: "False"
    type: Available
  - lastTransitionTime: "2019-06-24T09:58:22Z"
    message: 'Unable to apply resources: storage backend not configured'
    reason: Error
    status: "False"
    type: Progressing
  - lastTransitionTime: "2019-06-24T09:58:22Z"
    message: storage backend not configured
    reason: StorageNotConfigured
    status: "True"
    type: Degraded
  - lastTransitionTime: "2019-06-24T09:58:22Z"
    status: "False"
    type: Removed
  observedGeneration: 1
  readyReplicas: 0
  storage: {}
  storageManaged: false```

Should I add
storage:
  pvc:
    claim:
here?

For the other operators - the error messages are not too helpful:
openshift-apiserver:
    message: 'Available: apiservice/v1.security.openshift.io: not available: no response
      from https://10.128.0.41:8443: Get https://10.128.0.41:8443: dial tcp 10.128.0.41:8443:
      connect: connection refused'
    reason: AvailableAPIServiceNotAvailable
monitoring:
  - lastTransitionTime: "2019-06-24T14:05:29Z"
    message: 'Failed to rollout the stack. Error: running task Updating configuration
      sharing failed: failed to retrieve Prometheus host: getting Route object failed:
      the server is currently unable to handle the request (get routes.route.openshift.io
      prometheus-k8s)'

staebler · 2019-06-24T14:29:57Z

You need to set the storage backend for the image registry in order for the installation to complete. The image-registry operator will not become available until the storage backend has been configured.

For the production use case, you need to provision your own storage because the vSphere cloud provider does not support ReadWriteMany access for its storage.

If this is for non-production purposes, you can set the storage backend to emptyDir.
oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"storage":{"emptyDir":{}}}}'

staebler · 2019-06-24T14:34:24Z

For the errors in the other operators, do you have a wildcard DNS entry for the Ingress router pods? This is by default a *.apps entry. See https://access.redhat.com/documentation/en-us/openshift_container_platform/4.1/html/installing/installing-on-vsphere#installation-infrastructure-user-infra_installing-vsphere.

neuroserve · 2019-06-25T06:10:08Z

I have NFS available in that environment (I guess, there's a NetApp somewhere). How/Where would I configure the mountpoint for that nfs volume?

neuroserve · 2019-06-25T06:19:34Z

Ah - https://cormachogan.com/2019/06/20/kubernetes-storage-on-vsphere-101-readwritemany-nfs/

neuroserve · 2019-06-25T07:02:48Z

Regarding the DNS wildcard record: yes - exists and resolves to a vip that balances on the worker nodes.

staebler · 2019-06-25T13:45:31Z

Regarding the DNS wildcard record: yes - exists and resolves to a vip that balances on the worker nodes.

OK. From the co list, it looks like the ingress operator is having problems. What is the output of oc get co ingress -oyaml?

neuroserve · 2019-06-25T13:53:58Z

apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: "2019-06-24T09:56:53Z"
  generation: 1
  name: ingress
  resourceVersion: "110244"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/ingress
  uid: 650d3133-9666-11e9-ac41-005056ac20a2
spec: {}
status:
  conditions:
  - lastTransitionTime: "2019-06-24T09:56:54Z"
    message: operand namespace exists
    status: "False"
    type: Degraded
  - lastTransitionTime: "2019-06-24T09:56:54Z"
    message: |-
      Not all ingress controllers are available.
      Moving to release version "4.1.0".
      Moving to ingress-controller image version "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7afd1b6aace6db643532680ca61761cf66ded116f8f673ec89121dbd424b2a15".
    reason: Reconciling
    status: "True"
    type: Progressing
  - lastTransitionTime: "2019-06-24T09:56:54Z"
    message: Not all ingress controllers are available.
    reason: IngressUnavailable
    status: "False"
    type: Available
  extension: null
  relatedObjects:
  - group: ""
    name: openshift-ingress-operator
    resource: namespaces
  - group: ""
    name: openshift-ingress
    resource: namespaces
  versions:
  - name: operator
    version: unknown
  - name: ingress-controller
    version: unknown

staebler · 2019-06-25T16:15:06Z

What about oc get nodes?

neuroserve · 2019-06-26T05:47:57Z

NAME                    STATUS     ROLES    AGE   VERSION
localhost.localdomain   NotReady   master   44h   v1.13.4+cb455d664

staebler · 2019-06-26T13:11:42Z

Do you have worker VMs that are not getting added as nodes? Or do you not have any worker VMs? A handful of operators will not function without worker nodes.

staebler · 2019-06-26T13:16:55Z

Also, it looks like your machines do not have hostnames that are resolvable by the other machines. It looks like the hostnames are using the default localhost.localdomain.

https://docs.openshift.com/container-platform/4.1/installing/installing_vsphere/installing-vsphere.html#installation-network-user-infra_installing-vsphere

You must configure the network connectivity between machines to allow cluster components to communicate. Each machine must be able to resolve the host names of all other machines in the cluster.

The hostname is used as the node name for the machine. If all of the machines have the same hostname of localhost.localdomain, then there will be conflicts in creating the nodes.

neuroserve · 2019-06-26T13:21:11Z

Do you have worker VMs that are not getting added as nodes? Or do you not have any worker VMs? A handful of operators will not function without worker nodes.

I have worker nodes and they get a kubelet being deployed on them.

neuroserve · 2019-06-26T13:24:41Z

Also, it looks like your machines do not have hostnames that are resolvable by the other machines. It looks like the hostnames are using the default localhost.localdomain.

https://docs.openshift.com/container-platform/4.1/installing/installing_vsphere/installing-vsphere.html#installation-network-user-infra_installing-vsphere

You must configure the network connectivity between machines to allow cluster components to communicate. Each machine must be able to resolve the host names of all other machines in the cluster.

The hostname is used as the node name for the machine. If all of the machines have the same hostname of localhost.localdomain, then there will be conflicts in creating the nodes.

We have changed that now. Is there a recommendation for the names? Does it have to be a fqdn or is "worker1" sufficient?

staebler · 2019-06-26T13:28:41Z

You can use any name you like, so long as the other machines can resolve the name to an IP address. The node name has a limit of 64 characters. If your fqdn fits within that limit, then you can use that. If you use a shortname, then you can configure your machines to have a DNS search domain for your domain.

Personally, I use shortnames of control-plane-0, control-plane-1, compute-0, etc.

neuroserve · 2019-06-26T13:53:55Z

That was a big part of the problem.

NAME                                 VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                 Unknown     Unknown       True       5m23s
cloud-credential                     4.1.0     True        False         False      12m
cluster-autoscaler                   4.1.0     True        False         False      12m
console                              4.1.0     False       True          False      5m4s
dns                                  4.1.0     True        False         False      12m
image-registry                       4.1.0     True        False         False      2m21s
ingress                              4.1.0     True        False         False      4m40s
kube-apiserver                       4.1.0     True        True          False      11m
kube-controller-manager              4.1.0     True        False         False      10m
kube-scheduler                       4.1.0     True        False         False      10m
machine-api                          4.1.0     True        False         False      12m
machine-config                       4.1.0     True        False         False      12m
marketplace                          4.1.0     True        False         False      4m48s
monitoring                           4.1.0     True        False         False      3m44s
network                              4.1.0     True        False         False      13m
node-tuning                          4.1.0     True        False         False      5m18s
openshift-apiserver                  4.1.0     True        False         False      8m6s
openshift-controller-manager         4.1.0     True        False         False      11m
openshift-samples                    4.1.0     True        False         False      6m23s
operator-lifecycle-manager           4.1.0     True        True          False      11m
operator-lifecycle-manager-catalog   4.1.0     True        True          False      11m
service-ca                           4.1.0     True        False         False      12m
service-catalog-apiserver            4.1.0     True        False         False      5m25s
service-catalog-controller-manager   4.1.0     True        False         False      5m23s
storage                              4.1.0     True        False         False      5m28s```

neuroserve · 2019-06-26T13:55:46Z

authentication gives this error now:

apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: "2019-06-26T13:47:56Z"
  generation: 1
  name: authentication
  resourceVersion: "11399"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/authentication
  uid: 0094a0b6-9819-11e9-80d6-005056ac79d6
spec: {}
status:
  conditions:
  - lastTransitionTime: "2019-06-26T13:48:59Z"
    message: 'Degraded: error checking current version: unable to check route health:
      failed to GET route: tls: oversized record received with length 20527'
    reason: DegradedOperatorSyncLoopError
    status: "True"
    type: Degraded
  - lastTransitionTime: "2019-06-26T13:47:55Z"
    reason: NoData
    status: Unknown
    type: Progressing
  - lastTransitionTime: "2019-06-26T13:47:55Z"
    reason: NoData
    status: Unknown
    type: Available
  - lastTransitionTime: "2019-06-26T13:47:55Z"
    reason: NoData
    status: Unknown
    type: Upgradeable
  extension: null
  relatedObjects:
  - group: operator.openshift.io
    name: cluster
    resource: authentications
  - group: config.openshift.io
    name: cluster
    resource: authentications
  - group: config.openshift.io
    name: cluster
    resource: infrastructures
  - group: config.openshift.io
    name: cluster
    resource: oauths
  - group: ""
    name: openshift-config
    resource: namespaces
  - group: ""
    name: openshift-config-managed
    resource: namespaces
  - group: ""
    name: openshift-authentication
    resource: namespaces
  - group: ""
    name: authentication-operator
    resource: namespaces```

neuroserve · 2019-06-26T13:56:44Z

console this:

apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: "2019-06-26T13:45:54Z"
  generation: 1
  name: console
  resourceVersion: "10689"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/console
  uid: b7e70ec9-9818-11e9-b886-005056ac7d5e
spec: {}
status:
  conditions:
  - lastTransitionTime: "2019-06-26T13:45:54Z"
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2019-06-26T13:45:54Z"
    message: 'Progressing: Moving to version 4.1.0'
    reason: ProgressingSyncLoopProgressing
    status: "True"
    type: Progressing
  - lastTransitionTime: "2019-06-26T13:48:14Z"
    message: 'Available: 0 pods available for console deployment'
    reason: AvailableNoPodsAvailable
    status: "False"
    type: Available
  - lastTransitionTime: "2019-06-26T13:45:54Z"
    reason: AsExpected
    status: "True"
    type: Upgradeable
  extension: null
  relatedObjects:
  - group: operator.openshift.io
    name: cluster
    resource: consoles
  - group: config.openshift.io
    name: cluster
    resource: consoles
  - group: config.openshift.io
    name: cluster
    resource: infrastructures
  - group: oauth.openshift.io
    name: console
    resource: oauthclients
  - group: ""
    name: openshift-console-operator
    resource: namespaces
  - group: ""
    name: openshift-console
    resource: namespaces
  - group: ""
    name: console-public
    namespace: openshift-config-managed
    resource: configmaps
  versions:
  - name: operator
    version: 4.1.0```

neuroserve · 2019-06-26T13:57:38Z

and obviously:

apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
  metadata:
    creationTimestamp: "2019-06-26T13:38:45Z"
    generation: 1
    name: version
    resourceVersion: "15030"
    selfLink: /apis/config.openshift.io/v1/clusterversions/version
    uid: b86e03eb-9817-11e9-bccb-005056acf7ce
  spec:
    channel: stable-4.1
    clusterID: ea408e02-3d6d-46d3-b30e-f7099b69e6b6
    upstream: https://api.openshift.com/api/upgrades_info/v1/graph
  status:
    availableUpdates:
    - force: false
      image: quay.io/openshift-release-dev/ocp-release@sha256:9c5f0df8b192a0d7b46cd5f6a4da2289c155fd5302dec7954f8f06c878160b8b
      version: 4.1.2
    conditions:
    - lastTransitionTime: "2019-06-26T13:39:19Z"
      status: "False"
      type: Available
    - lastTransitionTime: "2019-06-26T13:48:06Z"
      status: "False"
      type: Failing
    - lastTransitionTime: "2019-06-26T13:39:19Z"
      message: 'Working towards 4.1.0: 99% complete, waiting on authentication, console'
      reason: ClusterOperatorsNotAvailable
      status: "True"
      type: Progressing
    - lastTransitionTime: "2019-06-26T13:39:19Z"
      status: "True"
      type: RetrievedUpdates
    desired:
      force: false
      image: quay.io/openshift-release-dev/ocp-release@sha256:b8307ac0f3ec4ac86c3f3b52846425205022da52c16f56ec31cbe428501001d6
      version: 4.1.0
    history:
    - completionTime: null
      image: quay.io/openshift-release-dev/ocp-release@sha256:b8307ac0f3ec4ac86c3f3b52846425205022da52c16f56ec31cbe428501001d6
      startedTime: "2019-06-26T13:39:19Z"
      state: Partial
      verified: false
      version: 4.1.0
    observedGeneration: 1
    versionHash: 7arisRJErYo=
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

staebler · 2019-06-26T14:34:45Z

The kube-apiserver operator is still progressing. Things may settle down some after that completes is progression. If that operaror does not progress fully, please add the yaml for that operator.

ams0 · 2019-06-26T23:46:16Z

I seem to have a similar issue in Azure #1817 . Simply, the Prometheus operator is not installing the CRDs (servicemonitors.monitoring.coreos.com):

$> oc --config=${INSTALL_DIR}/auth/kubeconfig describe clusteroperator monitoring
Name:         monitoring
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2019-06-26T22:38:51Z
  Generation:          1
  Resource Version:    41535
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/monitoring
  UID:                 2b81f371-9863-11e9-80ca-000d3a2756ce
Spec:
Status:
  Conditions:
    Last Transition Time:  2019-06-26T22:49:07Z
    Message:               Failed to rollout the stack. Error: running task Updating Cluster Monitoring Operator failed: reconciling Cluster Monitoring Operator ServiceMonitor failed: creating ServiceMonitor object failed: the server could not find the requested resource (post servicemonitors.monitoring.coreos.com)
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2019-06-26T22:44:01Z
    Status:                False
    Type:                  Available
    Last Transition Time:  2019-06-26T23:40:06Z
    Message:               Rolling out the stack.
    Status:                True
    Type:                  Progressing
  Extension:               <nil>
Events:                    <none>

Can you point me to the place in the code where this gets executed?

neuroserve · 2019-06-27T07:29:12Z

The kube-apiserver operator is still progressing. Things may settle down some after that completes is progression. If that operaror does not progress fully, please add the yaml for that operator.

Everything (except console and authentication) are AVAILABLE now:

NAME                                 VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                 Unknown     Unknown       True       17h
cloud-credential                     4.1.0     True        False         False      17h
cluster-autoscaler                   4.1.0     True        False         False      17h
console                              4.1.0     False       True          False      17h
dns                                  4.1.0     True        False         False      17h
image-registry                       4.1.0     True        False         False      17h
ingress                              4.1.0     True        False         False      17h
kube-apiserver                       4.1.0     True        False         False      17h
kube-controller-manager              4.1.0     True        False         False      17h
kube-scheduler                       4.1.0     True        False         False      17h
machine-api                          4.1.0     True        False         False      17h
machine-config                       4.1.0     True        False         False      17h
marketplace                          4.1.0     True        False         False      17h
monitoring                           4.1.0     True        False         False      17h
network                              4.1.0     True        False         False      17h
node-tuning                          4.1.0     True        False         False      17h
openshift-apiserver                  4.1.0     True        False         False      17h
openshift-controller-manager         4.1.0     True        False         False      17h
openshift-samples                    4.1.0     True        False         False      17h
operator-lifecycle-manager           4.1.0     True        False         False      17h
operator-lifecycle-manager-catalog   4.1.0     True        False         False      17h
service-ca                           4.1.0     True        False         False      17h
service-catalog-apiserver            4.1.0     True        False         False      17h
service-catalog-controller-manager   4.1.0     True        False         False      17h
storage                              4.1.0     True        False         False      17h```

neuroserve · 2019-06-27T12:28:41Z

OK - console problem was a loadbalancer configuration issue - current status:

NAME                                 VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                 Unknown     Unknown       True       11m
cloud-credential                     4.1.3     True        False         False      15m
cluster-autoscaler                   4.1.3     True        False         False      15m
console                              4.1.3     True        False         False      89s
dns                                  4.1.3     True        False         False      15m
image-registry                       4.1.3     True        False         False      10m
ingress                              4.1.3     True        False         False      10m
kube-apiserver                       4.1.3     True        False         False      13m
kube-controller-manager              4.1.3     True        False         False      12m
kube-scheduler                       4.1.3     True        False         False      13m
machine-api                          4.1.3     True        False         False      15m
machine-config                       4.1.3     True        False         False      14m
marketplace                          4.1.3     True        False         False      10m
monitoring                           4.1.3     True        False         False      9m15s
network                              4.1.3     True        False         False      15m
node-tuning                          4.1.3     True        False         False      11m
openshift-apiserver                  4.1.3     True        False         False      12m
openshift-controller-manager         4.1.3     True        False         False      14m
openshift-samples                    4.1.3     True        False         False      10m
operator-lifecycle-manager           4.1.3     True        False         False      14m
operator-lifecycle-manager-catalog   4.1.3     True        False         False      14m
service-ca                           4.1.3     True        False         False      15m
service-catalog-apiserver            4.1.3     True        False         False      11m
service-catalog-controller-manager   4.1.3     True        False         False      11m
storage                              4.1.3     True        False         False      11m```

neuroserve · 2019-06-27T12:31:01Z

That fixed authentication as well:

NAME                                 VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                       4.1.3     True        False         False      2m25s
cloud-credential                     4.1.3     True        False         False      18m
cluster-autoscaler                   4.1.3     True        False         False      18m
console                              4.1.3     True        False         False      4m15s
dns                                  4.1.3     True        False         False      18m
image-registry                       4.1.3     True        False         False      12m
ingress                              4.1.3     True        False         False      13m
kube-apiserver                       4.1.3     True        False         False      15m
kube-controller-manager              4.1.3     True        False         False      15m
kube-scheduler                       4.1.3     True        False         False      15m
machine-api                          4.1.3     True        False         False      18m
machine-config                       4.1.3     True        False         False      17m
marketplace                          4.1.3     True        False         False      13m
monitoring                           4.1.3     True        False         False      12m
network                              4.1.3     True        False         False      18m
node-tuning                          4.1.3     True        False         False      14m
openshift-apiserver                  4.1.3     True        False         False      14m
openshift-controller-manager         4.1.3     True        False         False      17m
openshift-samples                    4.1.3     True        False         False      13m
operator-lifecycle-manager           4.1.3     True        False         False      17m
operator-lifecycle-manager-catalog   4.1.3     True        False         False      17m
service-ca                           4.1.3     True        False         False      18m
service-catalog-apiserver            4.1.3     True        False         False      14m
service-catalog-controller-manager   4.1.3     True        False         False      14m
storage                              4.1.3     True        False         False      14m

Thank you for your relentless support :-)

staebler · 2019-06-27T14:30:45Z

Whew! I'm glad that it all worked out for you in the end. Sorry that it wasn't a smoother journey. I will take some of the pitfalls that you ran into as a cause for improving areas of the docs.

DanyC97 · 2019-06-28T06:58:52Z

because the vSphere cloud provider does not support ReadWriteMany access for its storage

thanks for the info !! this type of info i'd expect to be in the docs as i bet folks will enable it and then .. surprise surprise.

neuroserve · 2019-06-28T07:21:01Z

because the vSphere cloud provider does not support ReadWriteMany access for its storage

thanks for the info !! this type of info i'd expect to be in the docs as i bet folks will enable it and then .. surprise surprise.

It's in the docs: https://docs.openshift.com/container-platform/4.1/storage/understanding-persistent-storage.html#pv-access-modes_understanding-persistent-storage

neuroserve · 2019-07-04T14:02:23Z

I like to provide an nfs-share for the registry - but I have to use a second network interface for this. Will that work with the provided ignition file or do I have to modify it? If yes, would that be along the lines of https://coreos.com/os/docs/latest/network-config-with-networkd.html? Can this be used with RHCOS?

neuroserve · 2019-07-05T11:01:24Z

I've transferred this to another issue (#1943). This issue can be closed

neuroserve closed this as completed Jul 5, 2019

Cluster deployment does not finish on vsphere #1884

Cluster deployment does not finish on vsphere #1884

Comments

neuroserve commented Jun 21, 2019

Version

Platform (aws|libvirt|openstack):

What happened?

What you expected to happen?

Anything else we need to know?

neuroserve commented Jun 21, 2019

staebler commented Jun 21, 2019

staebler commented Jun 21, 2019

neuroserve commented Jun 24, 2019

neuroserve commented Jun 24, 2019

neuroserve commented Jun 24, 2019 • edited Loading

neuroserve commented Jun 24, 2019

neuroserve commented Jun 24, 2019

staebler commented Jun 24, 2019

neuroserve commented Jun 24, 2019 • edited Loading

staebler commented Jun 24, 2019

staebler commented Jun 24, 2019

neuroserve commented Jun 25, 2019

neuroserve commented Jun 25, 2019

neuroserve commented Jun 25, 2019

staebler commented Jun 25, 2019

neuroserve commented Jun 25, 2019

staebler commented Jun 25, 2019

neuroserve commented Jun 26, 2019

staebler commented Jun 26, 2019

staebler commented Jun 26, 2019 • edited Loading

neuroserve commented Jun 26, 2019

neuroserve commented Jun 26, 2019

staebler commented Jun 26, 2019 • edited Loading

neuroserve commented Jun 26, 2019

neuroserve commented Jun 26, 2019

neuroserve commented Jun 26, 2019

neuroserve commented Jun 26, 2019

staebler commented Jun 26, 2019

ams0 commented Jun 26, 2019

neuroserve commented Jun 27, 2019 • edited Loading

neuroserve commented Jun 27, 2019

neuroserve commented Jun 27, 2019

staebler commented Jun 27, 2019

DanyC97 commented Jun 28, 2019

neuroserve commented Jun 28, 2019

neuroserve commented Jul 4, 2019

neuroserve commented Jul 5, 2019

neuroserve commented Jun 24, 2019 •

edited

Loading

neuroserve commented Jun 24, 2019 •

edited

Loading

staebler commented Jun 26, 2019 •

edited

Loading

staebler commented Jun 26, 2019 •

edited

Loading

neuroserve commented Jun 27, 2019 •

edited

Loading