Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster deployment does not finish on vsphere #1884

Closed
neuroserve opened this issue Jun 21, 2019 · 38 comments
Closed

Cluster deployment does not finish on vsphere #1884

neuroserve opened this issue Jun 21, 2019 · 38 comments

Comments

@neuroserve
Copy link

Version

../openshift-installer/openshift-install v4.1.1-201906040019-dirty
built from commit fb776038a1d90b2b83839ab5deb8579287972e11
release image quay.io/openshift-release-dev/ocp-release@sha256:9c5f0df8b192a0d7b46cd5f6a4da2289c155fd5302dec7954f8f06c878160b8b

Platform (aws|libvirt|openstack):

Platform is vsphere

[root@console ~]# govc about
Name: VMware vCenter Server
Vendor: VMware, Inc.
Version: 6.7.0
Build: 13007421
OS type: linux-x64
API type: VirtualCenter
API version: 6.7.2
Product ID: vpx
UUID: 1d884c6e-a1ac-4daf-9e25-b197e7f6bd91

VMs have a hw version of 15.

What happened?

The cluster does not finish deployment.

[root@console]# oc --config=/root/sgl-1/auth/kubeconfig get clusterversion -oyaml
apiVersion: v1
items:

  • apiVersion: config.openshift.io/v1
    kind: ClusterVersion
    metadata:
    creationTimestamp: "2019-06-21T10:11:17Z"
    generation: 1
    name: version
    resourceVersion: "180651"
    selfLink: /apis/config.openshift.io/v1/clusterversions/version
    uid: e8d08059-940c-11e9-91f9-005056acf7ce
    spec:
    channel: stable-4.1
    clusterID: 8ff0010a-1f47-4f05-a555-3e7ef1321d70
    upstream: https://api.openshift.com/api/upgrades_info/v1/graph
    status:
    availableUpdates: null
    conditions:
    • lastTransitionTime: "2019-06-21T10:11:47Z"
      status: "False"
      type: Available
    • lastTransitionTime: "2019-06-21T10:55:41Z"
      status: "False"
      type: Failing
    • lastTransitionTime: "2019-06-21T10:11:47Z"
      message: 'Working towards 4.1.2: 80% complete'
      status: "True"
      type: Progressing
    • lastTransitionTime: "2019-06-21T10:11:47Z"
      status: "True"
      type: RetrievedUpdates
      desired:
      force: false
      image: quay.io/openshift-release-dev/ocp-release@sha256:9c5f0df8b192a0d7b46cd5f6a4da2289c155fd5302dec7954f8f06c878160b8b
      version: 4.1.2
      history:
    • completionTime: null
      image: quay.io/openshift-release-dev/ocp-release@sha256:9c5f0df8b192a0d7b46cd5f6a4da2289c155fd5302dec7954f8f06c878160b8b
      startedTime: "2019-06-21T10:11:47Z"
      state: Partial
      verified: false
      version: 4.1.2
      observedGeneration: 1
      versionHash: CGRQCirWw8Y=
      kind: List
      metadata:
      resourceVersion: ""
      selfLink: ""

[root@console sgl-1]# oc --config=/root/sgl-1/auth/kubeconfig get clusterversion version -o=jsonpath='{range .status.conditions[*]}{.type}{" "}{.status}{" "}{.message}{"\n"}{end}'
Available False
Failing False
Progressing True Working towards 4.1.2: 73% complete
RetrievedUpdates True

[root@console sgl-1]# oc --config=/root/sgl-1/auth/kubeconfig get clusteroperator
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
cloud-credential 4.1.2 True False False 102m
cluster-autoscaler 4.1.2 True False False 102m
dns 4.1.2 True False False 11m
kube-apiserver 4.1.2 False True False 101m
kube-controller-manager 4.1.2 False True False 102m
kube-scheduler 4.1.2 False True False 101m
machine-api 4.1.2 True False False 102m
machine-config 4.1.2 False True False 102m
network 4.1.2 True False False 102m
openshift-apiserver 4.1.2 Unknown Unknown False 102m
openshift-controller-manager False True False 101m
operator-lifecycle-manager 4.1.2 True False False 98m
operator-lifecycle-manager-catalog 4.1.2 True False False 98m
service-ca True True False 101m

The master-nodes reboot pretty often for reasons unknown to me. The have the required resources regarding memory and cpu according to the docs. Those VMs are the only ones on the vsphere cluster. Basically each vm runs on its own ESX host.

What you expected to happen?

I expect the openshift cluster setup to succeed.

Anything else we need to know?

The vcenter is flagging the VMs "red" because of their memory consumption. Is more memory (than 16 GB) needed on the master nodes?

As you see I could not execute the above commands fast enough before the master nodes rebooted themselves. That's why once we see "'Working towards 4.1.2: 80% complete" and the second time "Progressing True Working towards 4.1.2: 73% complete". I guess we do not reach 100%. Question is why (and what progress is indicated here)?

I'm available for installation debugging on vsphere as this is a non-production cluster.

Thanx for your help.

@neuroserve
Copy link
Author

NAME                                 VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
cloud-credential                     4.1.2     True        False         False      102m
cluster-autoscaler                   4.1.2     True        False         False      102m
dns                                  4.1.2     True        False         False      11m
kube-apiserver                       4.1.2     False       True          False      101m
kube-controller-manager              4.1.2     False       True          False      102m
kube-scheduler                       4.1.2     False       True          False      101m
machine-api                          4.1.2     True        False         False      102m
machine-config                       4.1.2     False       True          False      102m
network                              4.1.2     True        False         False      102m
openshift-apiserver                  4.1.2     Unknown     Unknown       False      102m
openshift-controller-manager                   False       True          False      101m
operator-lifecycle-manager           4.1.2     True        False         False      98m
operator-lifecycle-manager-catalog   4.1.2     True        False         False      98m
service-ca                                     True        True          False      101m

@staebler
Copy link
Contributor

This looks to be an occurrence of a bug that happens occasionally with the installation. The memory requirements look to be sufficient. Would you be willing to retry the installation?

@staebler
Copy link
Contributor

@neuroserve
Copy link
Author

I re-tried the installation. I used the https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.1.3/openshift-install-linux-4.1.3.tar.gz installer. But obviously I shouldn't have:

apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
  metadata:
    creationTimestamp: "2019-06-24T06:39:59Z"
    generation: 1
    name: version
    resourceVersion: "17285"
    selfLink: /apis/config.openshift.io/v1/clusterversions/version
    uid: e335216a-964a-11e9-9551-005056acf7ce
  spec:
    channel: stable-4.1
    clusterID: 0045c01b-5a84-4e98-b55b-b8a6aa399c2a
    upstream: https://api.openshift.com/api/upgrades_info/v1/graph
  status:
    availableUpdates: null
    conditions:
    - lastTransitionTime: "2019-06-24T06:40:29Z"
      status: "False"
      type: Available
    - lastTransitionTime: "2019-06-24T06:45:44Z"
      status: "False"
      type: Failing
    - lastTransitionTime: "2019-06-24T06:40:29Z"
      message: 'Working towards 4.1.3: 82% complete'
      status: "True"
      type: Progressing
    - lastTransitionTime: "2019-06-24T06:40:29Z"
      message: 'Unable to retrieve available updates: currently installed version
        4.1.3 not found in the "stable-4.1" channel'
      reason: RemoteFailed
      status: "False"
      type: RetrievedUpdates
    desired:
      force: false
      image: quay.io/openshift-release-dev/ocp-release@sha256:f852f9d8c2e81a633e874e57a7d9bdd52588002a9b32fc037dba12b67cf1f8b0
      version: 4.1.3
    history:
    - completionTime: null
      image: quay.io/openshift-release-dev/ocp-release@sha256:f852f9d8c2e81a633e874e57a7d9bdd52588002a9b32fc037dba12b67cf1f8b0
      startedTime: "2019-06-24T06:40:29Z"
      state: Partial
      verified: false
      version: 4.1.3
    observedGeneration: 1
    versionHash: aGxoncMkq9U=
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

I'll try again now with 4.1.2.

@neuroserve
Copy link
Author

OK. Re-trying with 4.1.2 comes up to this stage:

apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
  metadata:
    creationTimestamp: "2019-06-24T07:05:19Z"
    generation: 1
    name: version
    resourceVersion: "10443"
    selfLink: /apis/config.openshift.io/v1/clusterversions/version
    uid: 6d54be78-964e-11e9-a683-005056acf7ce
  spec:
    channel: stable-4.1
    clusterID: 507b020b-7b3f-4088-8f53-f117b48d19c0
    upstream: https://api.openshift.com/api/upgrades_info/v1/graph
  status:
    availableUpdates: null
    conditions:
    - lastTransitionTime: "2019-06-24T07:05:53Z"
      status: "False"
      type: Available
    - lastTransitionTime: "2019-06-24T07:09:23Z"
      message: |-
        Multiple errors are preventing progress:
        * Cluster operator authentication has not yet reported success
        * Cluster operator cloud-credential has not yet reported success
        * Cluster operator cluster-autoscaler has not yet reported success
        * Cluster operator image-registry has not yet reported success
        * Cluster operator ingress has not yet reported success
        * Cluster operator kube-apiserver has not yet reported success
        * Cluster operator kube-controller-manager has not yet reported success
        * Cluster operator kube-scheduler has not yet reported success
        * Cluster operator machine-api has not yet reported success
        * Cluster operator machine-config has not yet reported success
        * Cluster operator marketplace has not yet reported success
        * Cluster operator monitoring has not yet reported success
        * Cluster operator network has not yet reported success
        * Cluster operator node-tuning has not yet reported success
        * Cluster operator openshift-apiserver has not yet reported success
        * Cluster operator openshift-controller-manager has not yet reported success
        * Cluster operator operator-lifecycle-manager has not yet reported success
        * Cluster operator service-ca has not yet reported success
        * Cluster operator service-catalog-apiserver has not yet reported success
        * Cluster operator service-catalog-controller-manager has not yet reported success
        * Cluster operator storage has not yet reported success
        * Could not update deployment "openshift-cluster-version/cluster-version-operator" (5 of 350)
        * Could not update deployment "openshift-dns-operator/dns-operator" (305 of 350)
        * Could not update oauthclient "console" (220 of 350): the server does not recognize this resource, check extension API servers
        * Could not update rolebinding "openshift/cluster-samples-operator-openshift-edit" (182 of 350): resource may have been deleted
        * Could not update servicemonitor "openshift-apiserver-operator/openshift-apiserver-operator" (346 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (321 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-controller-manager-operator/openshift-controller-manager-operator" (349 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-image-registry/image-registry" (327 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator" (337 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-controller-manager-operator/kube-controller-manager-operator" (340 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (343 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator" (330 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator" (333 of 350): the server does not recognize this resource, check extension API servers
      reason: MultipleErrors
      status: "True"
      type: Failing
    - lastTransitionTime: "2019-06-24T07:05:53Z"
      message: 'Unable to apply 4.1.2: an unknown error has occurred'
      reason: MultipleErrors
      status: "True"
      type: Progressing
    - lastTransitionTime: "2019-06-24T07:05:53Z"
      status: "True"
      type: RetrievedUpdates
    desired:
      force: false
      image: quay.io/openshift-release-dev/ocp-release@sha256:9c5f0df8b192a0d7b46cd5f6a4da2289c155fd5302dec7954f8f06c878160b8b
      version: 4.1.2
    history:
    - completionTime: null
      image: quay.io/openshift-release-dev/ocp-release@sha256:9c5f0df8b192a0d7b46cd5f6a4da2289c155fd5302dec7954f8f06c878160b8b
      startedTime: "2019-06-24T07:05:53Z"
      state: Partial
      verified: false
      version: 4.1.2
    observedGeneration: 1
    versionHash: CGRQCirWw8Y=
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

I'll try and dig a little deeper. If you have any hints for me, feel free :-)

@neuroserve
Copy link
Author

neuroserve commented Jun 24, 2019

Now this:

apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
  metadata:
    creationTimestamp: "2019-06-24T08:47:00Z"
    generation: 1
    name: version
    resourceVersion: "9883"
    selfLink: /apis/config.openshift.io/v1/clusterversions/version
    uid: a160e17c-965c-11e9-92f5-005056acf7ce
  spec:
    channel: fast
    clusterID: a094c84b-c7e9-4809-90d8-1ce87fd8bbd0
    upstream: https://api.openshift.com/api/upgrades_info/v1/graph
  status:
    availableUpdates: null
    conditions:
    - lastTransitionTime: "2019-06-24T08:47:00Z"
      status: "False"
      type: Available
    - lastTransitionTime: "2019-06-24T08:50:30Z"
      message: |-
        Multiple errors are preventing progress:
        * Cluster operator authentication has not yet reported success
        * Cluster operator image-registry has not yet reported success
        * Cluster operator ingress has not yet reported success
        * Cluster operator kube-apiserver is still updating: missing version information for kube-apiserver
        * Cluster operator kube-controller-manager is still updating: missing version information for kube-controller-manager
        * Cluster operator kube-scheduler has not yet reported success
        * Cluster operator machine-config has not yet reported success
        * Cluster operator marketplace has not yet reported success
        * Cluster operator monitoring has not yet reported success
        * Cluster operator node-tuning has not yet reported success
        * Cluster operator openshift-apiserver has not yet reported success
        * Cluster operator openshift-controller-manager is still updating
        * Cluster operator service-catalog-apiserver has not yet reported success
        * Cluster operator service-catalog-controller-manager has not yet reported success
        * Cluster operator storage has not yet reported success
        * Could not update oauthclient "console" (220 of 350): the server does not recognize this resource, check extension API servers
        * Could not update rolebinding "openshift/cluster-samples-operator-openshift-edit" (182 of 350): resource may have been deleted
        * Could not update servicemonitor "openshift-apiserver-operator/openshift-apiserver-operator" (346 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (321 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-controller-manager-operator/openshift-controller-manager-operator" (349 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-image-registry/image-registry" (327 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator" (337 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-controller-manager-operator/kube-controller-manager-operator" (340 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (343 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-operator-lifecycle-manager/olm-operator" (267 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator" (330 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator" (333 of 350): the server does not recognize this resource, check extension API servers
      reason: MultipleErrors
      status: "True"
      type: Failing
    - lastTransitionTime: "2019-06-24T08:47:00Z"
      message: 'Unable to apply 4.1.2: an unknown error has occurred'
      message: |-
        Multiple errors are preventing progress:
        * Cluster operator authentication has not yet reported success
        * Cluster operator image-registry has not yet reported success
        * Cluster operator ingress has not yet reported success
        * Cluster operator kube-apiserver is still updating: missing version information for kube-apiserver
        * Cluster operator kube-controller-manager is still updating: missing version information for kube-controller-manager
        * Cluster operator kube-scheduler has not yet reported success
        * Cluster operator machine-config has not yet reported success
        * Cluster operator marketplace has not yet reported success
        * Cluster operator monitoring has not yet reported success
        * Cluster operator node-tuning has not yet reported success
        * Cluster operator openshift-apiserver has not yet reported success
        * Cluster operator openshift-controller-manager is still updating
        * Cluster operator service-catalog-apiserver has not yet reported success
        * Cluster operator service-catalog-controller-manager has not yet reported success
        * Cluster operator storage has not yet reported success
        * Could not update oauthclient "console" (220 of 350): the server does not recognize this resource, check extension API servers
        * Could not update rolebinding "openshift/cluster-samples-operator-openshift-edit" (182 of 350): resource may have been deleted
        * Could not update servicemonitor "openshift-apiserver-operator/openshift-apiserver-operator" (346 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (321 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-controller-manager-operator/openshift-controller-manager-operator" (349 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-image-registry/image-registry" (327 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator" (337 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-controller-manager-operator/kube-controller-manager-operator" (340 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (343 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-operator-lifecycle-manager/olm-operator" (267 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator" (330 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator" (333 of 350): the server does not recognize this resource, check extension API servers
      reason: MultipleErrors
      status: "True"
      type: Failing
    - lastTransitionTime: "2019-06-24T08:47:00Z"
      message: 'Unable to apply 4.1.2: an unknown error has occurred'
      reason: MultipleErrors
      status: "True"
      type: Progressing
    - lastTransitionTime: "2019-06-24T08:47:00Z"
      message: 'Unable to retrieve available updates: currently installed version
        4.1.2 not found in the "fast" channel'
      reason: RemoteFailed
      status: "False"
      type: RetrievedUpdates
    desired:
      force: false
      image: quay.io/openshift-release-dev/ocp-release@sha256:9c5f0df8b192a0d7b46cd5f6a4da2289c155fd5302dec7954f8f06c878160b8b
      version: 4.1.2
    history:
    - completionTime: null
      image: quay.io/openshift-release-dev/ocp-release@sha256:9c5f0df8b192a0d7b46cd5f6a4da2289c155fd5302dec7954f8f06c878160b8b
      startedTime: "2019-06-24T08:47:00Z"
      state: Partial
      verified: false
      version: 4.1.2
    observedGeneration: 1
    versionHash: CGRQCirWw8Y=
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

What does
"message: 'Unable to retrieve available updates: currently installed version
4.1.2 not found in the "fast" channel'"
mean? Should I use the 4.1.0 installer, instead? The link on the OpenShift Infrastructure Providers refers to https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest/ - and that is 4.1.2 atm.

@neuroserve
Copy link
Author

Not even 4.1.0 works:

apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
  metadata:
    creationTimestamp: "2019-06-24T09:35:29Z"
    generation: 1
    name: version
    resourceVersion: "9767"
    selfLink: /apis/config.openshift.io/v1/clusterversions/version
    uid: 67c8ec6b-9663-11e9-a435-005056acf7ce
  spec:
    channel: fast
    clusterID: b67c0aba-094d-4279-99ae-980d746b49e2
    upstream: https://api.openshift.com/api/upgrades_info/v1/graph
  status:
    availableUpdates: null
    conditions:
    - lastTransitionTime: "2019-06-24T09:35:29Z"
      status: "False"
      type: Available
    - lastTransitionTime: "2019-06-24T09:39:00Z"
      message: |-
        Multiple errors are preventing progress:
        * Cluster operator authentication has not yet reported success
        * Cluster operator image-registry has not yet reported success
        * Cluster operator ingress has not yet reported success
        * Cluster operator kube-apiserver is still updating: missing version information for kube-apiserver
        * Cluster operator kube-controller-manager is still updating: missing version information for kube-controller-manager
        * Cluster operator kube-scheduler has not yet reported success
        * Cluster operator machine-config has not yet reported success
        * Cluster operator marketplace has not yet reported success
        * Cluster operator monitoring has not yet reported success
        * Cluster operator node-tuning has not yet reported success
        * Cluster operator openshift-apiserver has not yet reported success
        * Cluster operator openshift-controller-manager is still updating
        * Cluster operator service-catalog-apiserver has not yet reported success
        * Cluster operator service-catalog-controller-manager has not yet reported success
        * Cluster operator storage has not yet reported success
        * Could not update oauthclient "console" (220 of 350): the server does not recognize this resource, check extension API servers
        * Could not update rolebinding "openshift/cluster-samples-operator-openshift-edit" (182 of 350): resource may have been deleted
        * Could not update servicemonitor "openshift-apiserver-operator/openshift-apiserver-operator" (346 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-authentication-operator/authentication-operator" (321 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-controller-manager-operator/openshift-controller-manager-operator" (349 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-image-registry/image-registry" (327 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-apiserver-operator/kube-apiserver-operator" (337 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-controller-manager-operator/kube-controller-manager-operator" (340 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-kube-scheduler-operator/kube-scheduler-operator" (343 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-operator-lifecycle-manager/olm-operator" (267 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator" (330 of 350): the server does not recognize this resource, check extension API servers
        * Could not update servicemonitor "openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator" (333 of 350): the server does not recognize this resource, check extension API servers
      reason: MultipleErrors
      status: "True"
      type: Failing
    - lastTransitionTime: "2019-06-24T09:35:29Z"
      message: 'Unable to apply 4.1.0: an unknown error has occurred'
      reason: MultipleErrors
      status: "True"
      type: Progressing
    - lastTransitionTime: "2019-06-24T09:35:29Z"
      message: 'Unable to retrieve available updates: currently installed version
        4.1.0 not found in the "fast" channel'
      reason: RemoteFailed
      status: "False"
      type: RetrievedUpdates
    desired:
      force: false
      image: quay.io/openshift-release-dev/ocp-release@sha256:b8307ac0f3ec4ac86c3f3b52846425205022da52c16f56ec31cbe428501001d6
      version: 4.1.0
    history:
    - completionTime: null
      image: quay.io/openshift-release-dev/ocp-release@sha256:b8307ac0f3ec4ac86c3f3b52846425205022da52c16f56ec31cbe428501001d6
      startedTime: "2019-06-24T09:35:29Z"
      state: Partial
      verified: false
      version: 4.1.0
    observedGeneration: 1
    versionHash: 7arisRJErYo=
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Why is even 4.1.0 pulled from the "fast" channel? I'd expected "channel: stable-4.1". Where is this channel selected?

@neuroserve
Copy link
Author

NAME                                 VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                 Unknown     Unknown       True       63m
cloud-credential                     4.1.0     True        False         False      72m
cluster-autoscaler                   4.1.0     True        False         False      72m
console                              4.1.0     Unknown     True          False      46m
dns                                  4.1.0     True        False         False      16m
image-registry                                 False       False         True       50m
ingress                              unknown   False       True          False      52m
kube-apiserver                       4.1.0     True        False         False      65m
kube-controller-manager              4.1.0     True        True          False      65m
kube-scheduler                       4.1.0     True        False         False      49m
machine-api                          4.1.0     True        False         False      72m
machine-config                       4.1.0     True        False         False      66m
marketplace                          4.1.0     True        False         False      16m
monitoring                                     False       True          True       49m
network                              4.1.0     True        False         False      72m
node-tuning                          4.1.0     True        False         False      16m
openshift-apiserver                  4.1.0     False       False         False      84s
openshift-controller-manager         4.1.0     True        False         False      15m
openshift-samples                    4.1.0     True        False         False      16m
operator-lifecycle-manager           4.1.0     True        False         False      71m
operator-lifecycle-manager-catalog   4.1.0     True        False         False      71m
service-ca                           4.1.0     True        False         False      71m
service-catalog-apiserver            4.1.0     True        False         False      63m
service-catalog-controller-manager   4.1.0     True        False         False      63m
storage                              4.1.0     True        False         False      51m

@staebler
Copy link
Contributor

NAME                                 VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                 Unknown     Unknown       True       63m
cloud-credential                     4.1.0     True        False         False      72m
cluster-autoscaler                   4.1.0     True        False         False      72m
console                              4.1.0     Unknown     True          False      46m
dns                                  4.1.0     True        False         False      16m
image-registry                                 False       False         True       50m
ingress                              unknown   False       True          False      52m
kube-apiserver                       4.1.0     True        False         False      65m
kube-controller-manager              4.1.0     True        True          False      65m
kube-scheduler                       4.1.0     True        False         False      49m
machine-api                          4.1.0     True        False         False      72m
machine-config                       4.1.0     True        False         False      66m
marketplace                          4.1.0     True        False         False      16m
monitoring                                     False       True          True       49m
network                              4.1.0     True        False         False      72m
node-tuning                          4.1.0     True        False         False      16m
openshift-apiserver                  4.1.0     False       False         False      84s
openshift-controller-manager         4.1.0     True        False         False      15m
openshift-samples                    4.1.0     True        False         False      16m
operator-lifecycle-manager           4.1.0     True        False         False      71m
operator-lifecycle-manager-catalog   4.1.0     True        False         False      71m
service-ca                           4.1.0     True        False         False      71m
service-catalog-apiserver            4.1.0     True        False         False      63m
service-catalog-controller-manager   4.1.0     True        False         False      63m
storage                              4.1.0     True        False         False      51m

Have you set a storage backend for the image registry? See https://access.redhat.com/documentation/en-us/openshift_container_platform/4.1/html/installing/installing-on-vsphere#installation-operators-config_installing-vsphere.

For the cluster operators that are not available, you can get the reason for the failure from the yaml for the operator. For example, oc get co openshift-apiserver -oyaml.

@neuroserve
Copy link
Author

neuroserve commented Jun 24, 2019

Haven't setup a storage backend because the cluster did not complete its setup, yet. Should I proceed anyway? Prerequisite seems to be a "provisioned persistent volume (PV) with ReadWriteMany access mode, such as NFS." How do I provision such a volume (and why do I have to do it when I gave the vsphere user the required rights beforehand)?

"Verify you do not have a registry pod:"

NAME                                               READY   STATUS    RESTARTS   AGE
cluster-image-registry-operator-5fc86678cf-7wv5d   1/1     Running   0          3h59m

I already have one, obviously.

"Check the registry configuration:"

# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: imageregistry.operator.openshift.io/v1
kind: Config
metadata:
  creationTimestamp: "2019-06-24T09:58:22Z"
  finalizers:
  - imageregistry.operator.openshift.io/finalizer
  generation: 1
  name: cluster
  resourceVersion: "121535"
  selfLink: /apis/imageregistry.operator.openshift.io/v1/configs/cluster
  uid: 9a02c38d-9666-11e9-ac41-005056ac20a2
spec:
  defaultRoute: false
  httpSecret: somesecret
  logging: 2
  managementState: Managed
  proxy:
    http: ""
    https: ""
    noProxy: ""
  readOnly: false
  replicas: 1
  requests:
    read:
      maxInQueue: 0
      maxRunning: 0
      maxWaitInQueue: 0s
    write:
      maxInQueue: 0
      maxRunning: 0
      maxWaitInQueue: 0s
  storage: {}
status:
  conditions:
  - lastTransitionTime: "2019-06-24T09:58:22Z"
    message: The deployment does not exist
    reason: DeploymentNotFound
    status: "False"
    type: Available
  - lastTransitionTime: "2019-06-24T09:58:22Z"
    message: 'Unable to apply resources: storage backend not configured'
    reason: Error
    status: "False"
    type: Progressing
  - lastTransitionTime: "2019-06-24T09:58:22Z"
    message: storage backend not configured
    reason: StorageNotConfigured
    status: "True"
    type: Degraded
  - lastTransitionTime: "2019-06-24T09:58:22Z"
    status: "False"
    type: Removed
  observedGeneration: 1
  readyReplicas: 0
  storage: {}
  storageManaged: false```

Should I add
storage:
  pvc:
    claim:
here?

For the other operators - the error messages are not too helpful:
openshift-apiserver:
    message: 'Available: apiservice/v1.security.openshift.io: not available: no response
      from https://10.128.0.41:8443: Get https://10.128.0.41:8443: dial tcp 10.128.0.41:8443:
      connect: connection refused'
    reason: AvailableAPIServiceNotAvailable
monitoring:
  - lastTransitionTime: "2019-06-24T14:05:29Z"
    message: 'Failed to rollout the stack. Error: running task Updating configuration
      sharing failed: failed to retrieve Prometheus host: getting Route object failed:
      the server is currently unable to handle the request (get routes.route.openshift.io
      prometheus-k8s)'

@staebler
Copy link
Contributor

You need to set the storage backend for the image registry in order for the installation to complete. The image-registry operator will not become available until the storage backend has been configured.

For the production use case, you need to provision your own storage because the vSphere cloud provider does not support ReadWriteMany access for its storage.

If this is for non-production purposes, you can set the storage backend to emptyDir.
oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"storage":{"emptyDir":{}}}}'

@staebler
Copy link
Contributor

For the errors in the other operators, do you have a wildcard DNS entry for the Ingress router pods? This is by default a *.apps entry. See https://access.redhat.com/documentation/en-us/openshift_container_platform/4.1/html/installing/installing-on-vsphere#installation-infrastructure-user-infra_installing-vsphere.

@neuroserve
Copy link
Author

I have NFS available in that environment (I guess, there's a NetApp somewhere). How/Where would I configure the mountpoint for that nfs volume?

@neuroserve
Copy link
Author

@neuroserve
Copy link
Author

Regarding the DNS wildcard record: yes - exists and resolves to a vip that balances on the worker nodes.

@staebler
Copy link
Contributor

Regarding the DNS wildcard record: yes - exists and resolves to a vip that balances on the worker nodes.

OK. From the co list, it looks like the ingress operator is having problems. What is the output of oc get co ingress -oyaml?

@neuroserve
Copy link
Author

apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: "2019-06-24T09:56:53Z"
  generation: 1
  name: ingress
  resourceVersion: "110244"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/ingress
  uid: 650d3133-9666-11e9-ac41-005056ac20a2
spec: {}
status:
  conditions:
  - lastTransitionTime: "2019-06-24T09:56:54Z"
    message: operand namespace exists
    status: "False"
    type: Degraded
  - lastTransitionTime: "2019-06-24T09:56:54Z"
    message: |-
      Not all ingress controllers are available.
      Moving to release version "4.1.0".
      Moving to ingress-controller image version "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7afd1b6aace6db643532680ca61761cf66ded116f8f673ec89121dbd424b2a15".
    reason: Reconciling
    status: "True"
    type: Progressing
  - lastTransitionTime: "2019-06-24T09:56:54Z"
    message: Not all ingress controllers are available.
    reason: IngressUnavailable
    status: "False"
    type: Available
  extension: null
  relatedObjects:
  - group: ""
    name: openshift-ingress-operator
    resource: namespaces
  - group: ""
    name: openshift-ingress
    resource: namespaces
  versions:
  - name: operator
    version: unknown
  - name: ingress-controller
    version: unknown

@staebler
Copy link
Contributor

What about oc get nodes?

@neuroserve
Copy link
Author

NAME                    STATUS     ROLES    AGE   VERSION
localhost.localdomain   NotReady   master   44h   v1.13.4+cb455d664

@staebler
Copy link
Contributor

Do you have worker VMs that are not getting added as nodes? Or do you not have any worker VMs? A handful of operators will not function without worker nodes.

@staebler
Copy link
Contributor

staebler commented Jun 26, 2019

Also, it looks like your machines do not have hostnames that are resolvable by the other machines. It looks like the hostnames are using the default localhost.localdomain.

https://docs.openshift.com/container-platform/4.1/installing/installing_vsphere/installing-vsphere.html#installation-network-user-infra_installing-vsphere

You must configure the network connectivity between machines to allow cluster components to communicate. Each machine must be able to resolve the host names of all other machines in the cluster.

The hostname is used as the node name for the machine. If all of the machines have the same hostname of localhost.localdomain, then there will be conflicts in creating the nodes.

@neuroserve
Copy link
Author

Do you have worker VMs that are not getting added as nodes? Or do you not have any worker VMs? A handful of operators will not function without worker nodes.

I have worker nodes and they get a kubelet being deployed on them.

@neuroserve
Copy link
Author

Also, it looks like your machines do not have hostnames that are resolvable by the other machines. It looks like the hostnames are using the default localhost.localdomain.

https://docs.openshift.com/container-platform/4.1/installing/installing_vsphere/installing-vsphere.html#installation-network-user-infra_installing-vsphere

You must configure the network connectivity between machines to allow cluster components to communicate. Each machine must be able to resolve the host names of all other machines in the cluster.

The hostname is used as the node name for the machine. If all of the machines have the same hostname of localhost.localdomain, then there will be conflicts in creating the nodes.

We have changed that now. Is there a recommendation for the names? Does it have to be a fqdn or is "worker1" sufficient?

@staebler
Copy link
Contributor

staebler commented Jun 26, 2019

You can use any name you like, so long as the other machines can resolve the name to an IP address. The node name has a limit of 64 characters. If your fqdn fits within that limit, then you can use that. If you use a shortname, then you can configure your machines to have a DNS search domain for your domain.

Personally, I use shortnames of control-plane-0, control-plane-1, compute-0, etc.

@neuroserve
Copy link
Author

That was a big part of the problem.

NAME                                 VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                 Unknown     Unknown       True       5m23s
cloud-credential                     4.1.0     True        False         False      12m
cluster-autoscaler                   4.1.0     True        False         False      12m
console                              4.1.0     False       True          False      5m4s
dns                                  4.1.0     True        False         False      12m
image-registry                       4.1.0     True        False         False      2m21s
ingress                              4.1.0     True        False         False      4m40s
kube-apiserver                       4.1.0     True        True          False      11m
kube-controller-manager              4.1.0     True        False         False      10m
kube-scheduler                       4.1.0     True        False         False      10m
machine-api                          4.1.0     True        False         False      12m
machine-config                       4.1.0     True        False         False      12m
marketplace                          4.1.0     True        False         False      4m48s
monitoring                           4.1.0     True        False         False      3m44s
network                              4.1.0     True        False         False      13m
node-tuning                          4.1.0     True        False         False      5m18s
openshift-apiserver                  4.1.0     True        False         False      8m6s
openshift-controller-manager         4.1.0     True        False         False      11m
openshift-samples                    4.1.0     True        False         False      6m23s
operator-lifecycle-manager           4.1.0     True        True          False      11m
operator-lifecycle-manager-catalog   4.1.0     True        True          False      11m
service-ca                           4.1.0     True        False         False      12m
service-catalog-apiserver            4.1.0     True        False         False      5m25s
service-catalog-controller-manager   4.1.0     True        False         False      5m23s
storage                              4.1.0     True        False         False      5m28s```

@neuroserve
Copy link
Author

authentication gives this error now:

apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: "2019-06-26T13:47:56Z"
  generation: 1
  name: authentication
  resourceVersion: "11399"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/authentication
  uid: 0094a0b6-9819-11e9-80d6-005056ac79d6
spec: {}
status:
  conditions:
  - lastTransitionTime: "2019-06-26T13:48:59Z"
    message: 'Degraded: error checking current version: unable to check route health:
      failed to GET route: tls: oversized record received with length 20527'
    reason: DegradedOperatorSyncLoopError
    status: "True"
    type: Degraded
  - lastTransitionTime: "2019-06-26T13:47:55Z"
    reason: NoData
    status: Unknown
    type: Progressing
  - lastTransitionTime: "2019-06-26T13:47:55Z"
    reason: NoData
    status: Unknown
    type: Available
  - lastTransitionTime: "2019-06-26T13:47:55Z"
    reason: NoData
    status: Unknown
    type: Upgradeable
  extension: null
  relatedObjects:
  - group: operator.openshift.io
    name: cluster
    resource: authentications
  - group: config.openshift.io
    name: cluster
    resource: authentications
  - group: config.openshift.io
    name: cluster
    resource: infrastructures
  - group: config.openshift.io
    name: cluster
    resource: oauths
  - group: ""
    name: openshift-config
    resource: namespaces
  - group: ""
    name: openshift-config-managed
    resource: namespaces
  - group: ""
    name: openshift-authentication
    resource: namespaces
  - group: ""
    name: authentication-operator
    resource: namespaces```

@neuroserve
Copy link
Author

console this:

apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  creationTimestamp: "2019-06-26T13:45:54Z"
  generation: 1
  name: console
  resourceVersion: "10689"
  selfLink: /apis/config.openshift.io/v1/clusteroperators/console
  uid: b7e70ec9-9818-11e9-b886-005056ac7d5e
spec: {}
status:
  conditions:
  - lastTransitionTime: "2019-06-26T13:45:54Z"
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2019-06-26T13:45:54Z"
    message: 'Progressing: Moving to version 4.1.0'
    reason: ProgressingSyncLoopProgressing
    status: "True"
    type: Progressing
  - lastTransitionTime: "2019-06-26T13:48:14Z"
    message: 'Available: 0 pods available for console deployment'
    reason: AvailableNoPodsAvailable
    status: "False"
    type: Available
  - lastTransitionTime: "2019-06-26T13:45:54Z"
    reason: AsExpected
    status: "True"
    type: Upgradeable
  extension: null
  relatedObjects:
  - group: operator.openshift.io
    name: cluster
    resource: consoles
  - group: config.openshift.io
    name: cluster
    resource: consoles
  - group: config.openshift.io
    name: cluster
    resource: infrastructures
  - group: oauth.openshift.io
    name: console
    resource: oauthclients
  - group: ""
    name: openshift-console-operator
    resource: namespaces
  - group: ""
    name: openshift-console
    resource: namespaces
  - group: ""
    name: console-public
    namespace: openshift-config-managed
    resource: configmaps
  versions:
  - name: operator
    version: 4.1.0```

@neuroserve
Copy link
Author

and obviously:

apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
  metadata:
    creationTimestamp: "2019-06-26T13:38:45Z"
    generation: 1
    name: version
    resourceVersion: "15030"
    selfLink: /apis/config.openshift.io/v1/clusterversions/version
    uid: b86e03eb-9817-11e9-bccb-005056acf7ce
  spec:
    channel: stable-4.1
    clusterID: ea408e02-3d6d-46d3-b30e-f7099b69e6b6
    upstream: https://api.openshift.com/api/upgrades_info/v1/graph
  status:
    availableUpdates:
    - force: false
      image: quay.io/openshift-release-dev/ocp-release@sha256:9c5f0df8b192a0d7b46cd5f6a4da2289c155fd5302dec7954f8f06c878160b8b
      version: 4.1.2
    conditions:
    - lastTransitionTime: "2019-06-26T13:39:19Z"
      status: "False"
      type: Available
    - lastTransitionTime: "2019-06-26T13:48:06Z"
      status: "False"
      type: Failing
    - lastTransitionTime: "2019-06-26T13:39:19Z"
      message: 'Working towards 4.1.0: 99% complete, waiting on authentication, console'
      reason: ClusterOperatorsNotAvailable
      status: "True"
      type: Progressing
    - lastTransitionTime: "2019-06-26T13:39:19Z"
      status: "True"
      type: RetrievedUpdates
    desired:
      force: false
      image: quay.io/openshift-release-dev/ocp-release@sha256:b8307ac0f3ec4ac86c3f3b52846425205022da52c16f56ec31cbe428501001d6
      version: 4.1.0
    history:
    - completionTime: null
      image: quay.io/openshift-release-dev/ocp-release@sha256:b8307ac0f3ec4ac86c3f3b52846425205022da52c16f56ec31cbe428501001d6
      startedTime: "2019-06-26T13:39:19Z"
      state: Partial
      verified: false
      version: 4.1.0
    observedGeneration: 1
    versionHash: 7arisRJErYo=
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

@staebler
Copy link
Contributor

The kube-apiserver operator is still progressing. Things may settle down some after that completes is progression. If that operaror does not progress fully, please add the yaml for that operator.

@ams0
Copy link

ams0 commented Jun 26, 2019

I seem to have a similar issue in Azure #1817 . Simply, the Prometheus operator is not installing the CRDs (servicemonitors.monitoring.coreos.com):

$> oc --config=${INSTALL_DIR}/auth/kubeconfig describe clusteroperator monitoring
Name:         monitoring
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  config.openshift.io/v1
Kind:         ClusterOperator
Metadata:
  Creation Timestamp:  2019-06-26T22:38:51Z
  Generation:          1
  Resource Version:    41535
  Self Link:           /apis/config.openshift.io/v1/clusteroperators/monitoring
  UID:                 2b81f371-9863-11e9-80ca-000d3a2756ce
Spec:
Status:
  Conditions:
    Last Transition Time:  2019-06-26T22:49:07Z
    Message:               Failed to rollout the stack. Error: running task Updating Cluster Monitoring Operator failed: reconciling Cluster Monitoring Operator ServiceMonitor failed: creating ServiceMonitor object failed: the server could not find the requested resource (post servicemonitors.monitoring.coreos.com)
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2019-06-26T22:44:01Z
    Status:                False
    Type:                  Available
    Last Transition Time:  2019-06-26T23:40:06Z
    Message:               Rolling out the stack.
    Status:                True
    Type:                  Progressing
  Extension:               <nil>
Events:                    <none>

Can you point me to the place in the code where this gets executed?

@neuroserve
Copy link
Author

neuroserve commented Jun 27, 2019

The kube-apiserver operator is still progressing. Things may settle down some after that completes is progression. If that operaror does not progress fully, please add the yaml for that operator.

Everything (except console and authentication) are AVAILABLE now:

NAME                                 VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                 Unknown     Unknown       True       17h
cloud-credential                     4.1.0     True        False         False      17h
cluster-autoscaler                   4.1.0     True        False         False      17h
console                              4.1.0     False       True          False      17h
dns                                  4.1.0     True        False         False      17h
image-registry                       4.1.0     True        False         False      17h
ingress                              4.1.0     True        False         False      17h
kube-apiserver                       4.1.0     True        False         False      17h
kube-controller-manager              4.1.0     True        False         False      17h
kube-scheduler                       4.1.0     True        False         False      17h
machine-api                          4.1.0     True        False         False      17h
machine-config                       4.1.0     True        False         False      17h
marketplace                          4.1.0     True        False         False      17h
monitoring                           4.1.0     True        False         False      17h
network                              4.1.0     True        False         False      17h
node-tuning                          4.1.0     True        False         False      17h
openshift-apiserver                  4.1.0     True        False         False      17h
openshift-controller-manager         4.1.0     True        False         False      17h
openshift-samples                    4.1.0     True        False         False      17h
operator-lifecycle-manager           4.1.0     True        False         False      17h
operator-lifecycle-manager-catalog   4.1.0     True        False         False      17h
service-ca                           4.1.0     True        False         False      17h
service-catalog-apiserver            4.1.0     True        False         False      17h
service-catalog-controller-manager   4.1.0     True        False         False      17h
storage                              4.1.0     True        False         False      17h```

@neuroserve
Copy link
Author

OK - console problem was a loadbalancer configuration issue - current status:

NAME                                 VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                 Unknown     Unknown       True       11m
cloud-credential                     4.1.3     True        False         False      15m
cluster-autoscaler                   4.1.3     True        False         False      15m
console                              4.1.3     True        False         False      89s
dns                                  4.1.3     True        False         False      15m
image-registry                       4.1.3     True        False         False      10m
ingress                              4.1.3     True        False         False      10m
kube-apiserver                       4.1.3     True        False         False      13m
kube-controller-manager              4.1.3     True        False         False      12m
kube-scheduler                       4.1.3     True        False         False      13m
machine-api                          4.1.3     True        False         False      15m
machine-config                       4.1.3     True        False         False      14m
marketplace                          4.1.3     True        False         False      10m
monitoring                           4.1.3     True        False         False      9m15s
network                              4.1.3     True        False         False      15m
node-tuning                          4.1.3     True        False         False      11m
openshift-apiserver                  4.1.3     True        False         False      12m
openshift-controller-manager         4.1.3     True        False         False      14m
openshift-samples                    4.1.3     True        False         False      10m
operator-lifecycle-manager           4.1.3     True        False         False      14m
operator-lifecycle-manager-catalog   4.1.3     True        False         False      14m
service-ca                           4.1.3     True        False         False      15m
service-catalog-apiserver            4.1.3     True        False         False      11m
service-catalog-controller-manager   4.1.3     True        False         False      11m
storage                              4.1.3     True        False         False      11m```

@neuroserve
Copy link
Author

That fixed authentication as well:

NAME                                 VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                       4.1.3     True        False         False      2m25s
cloud-credential                     4.1.3     True        False         False      18m
cluster-autoscaler                   4.1.3     True        False         False      18m
console                              4.1.3     True        False         False      4m15s
dns                                  4.1.3     True        False         False      18m
image-registry                       4.1.3     True        False         False      12m
ingress                              4.1.3     True        False         False      13m
kube-apiserver                       4.1.3     True        False         False      15m
kube-controller-manager              4.1.3     True        False         False      15m
kube-scheduler                       4.1.3     True        False         False      15m
machine-api                          4.1.3     True        False         False      18m
machine-config                       4.1.3     True        False         False      17m
marketplace                          4.1.3     True        False         False      13m
monitoring                           4.1.3     True        False         False      12m
network                              4.1.3     True        False         False      18m
node-tuning                          4.1.3     True        False         False      14m
openshift-apiserver                  4.1.3     True        False         False      14m
openshift-controller-manager         4.1.3     True        False         False      17m
openshift-samples                    4.1.3     True        False         False      13m
operator-lifecycle-manager           4.1.3     True        False         False      17m
operator-lifecycle-manager-catalog   4.1.3     True        False         False      17m
service-ca                           4.1.3     True        False         False      18m
service-catalog-apiserver            4.1.3     True        False         False      14m
service-catalog-controller-manager   4.1.3     True        False         False      14m
storage                              4.1.3     True        False         False      14m

Thank you for your relentless support :-)

@staebler
Copy link
Contributor

Whew! I'm glad that it all worked out for you in the end. Sorry that it wasn't a smoother journey. I will take some of the pitfalls that you ran into as a cause for improving areas of the docs.

@DanyC97
Copy link
Contributor

DanyC97 commented Jun 28, 2019

because the vSphere cloud provider does not support ReadWriteMany access for its storage

thanks for the info !! this type of info i'd expect to be in the docs as i bet folks will enable it and then .. surprise surprise.

@neuroserve
Copy link
Author

because the vSphere cloud provider does not support ReadWriteMany access for its storage

thanks for the info !! this type of info i'd expect to be in the docs as i bet folks will enable it and then .. surprise surprise.

It's in the docs: https://docs.openshift.com/container-platform/4.1/storage/understanding-persistent-storage.html#pv-access-modes_understanding-persistent-storage

@neuroserve
Copy link
Author

I like to provide an nfs-share for the registry - but I have to use a second network interface for this. Will that work with the provided ignition file or do I have to modify it? If yes, would that be along the lines of https://coreos.com/os/docs/latest/network-config-with-networkd.html? Can this be used with RHCOS?

@neuroserve
Copy link
Author

I've transferred this to another issue (#1943). This issue can be closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants