Skip to content

USHIFT-168: MicroShift cannot be installed on CentOS Stream 8 (hardcoded rhel volume group) #880

@jiridanek

Description

@jiridanek

What happened:

  • microshift expects to have a rhel volume group available, centos stream creates cs volume group
  • router-default-6795657dbc-nnqmv pod in openshift-ingress namespace fails to start, for (to me) unexplained reasons

What you expected to happen:

🦄

How to reproduce it (as minimally and precisely as possible):

Need install cri-o (covered at https://microshift.io/docs/getting-started/)

sudo dnf module enable -y cri-o:1.21
sudo dnf install -y cri-o cri-tools
sudo systemctl enable crio --now

Need enable openstack repos to have openvswitch2.16 package for microshift-networking, available (not mentioned in RHEL 8 instructions or really anywhere)

sudo dnf config-manager --set-enabled powertools
sudo dnf install -y epel-release centos-release-openstack-yoga

Now I can install microshift rpms

sudo dnf install -y \
    ./packaging/rpm/_rpmbuild/RPMS/x86_64/microshift-4.10.0-2022_08_08_151458_61_g2d7df45a.el8.x86_64.rpm \
    ./packaging/rpm/_rpmbuild/RPMS/x86_64/microshift-networking-4.10.0-2022_08_08_151458_61_g2d7df45a.el8.x86_64.rpm \
    ./packaging/rpm/_rpmbuild/RPMS/noarch/microshift-selinux-4.10.0-2022_08_08_151458_61_g2d7df45a.el8.noarch.rpm

Following the install instructions further

sudo firewall-cmd --zone=trusted --add-source=10.42.0.0/16 --permanent
sudo firewall-cmd --zone=public --add-port=80/tcp --permanent
sudo firewall-cmd --zone=public --add-port=443/tcp --permanent
sudo firewall-cmd --zone=public --add-port=5353/udp --permanent
sudo firewall-cmd --reload
sudo systemctl enable microshift --now

Copy the pull secret (https://github.com/openshift/microshift/blob/main/docs/devenv_rhel8.md)

Do the other things

curl -O https://mirror.openshift.com/pub/openshift-v4/$(uname -m)/clients/ocp/stable/openshift-client-linux.tar.gz
sudo tar -xf openshift-client-linux.tar.gz -C /usr/local/bin oc kubectl

mkdir ~/.kube
sudo cat /var/lib/microshift/resources/kubeadmin/kubeconfig > ~/.kube/config

And, I end up with all pods up, with the exception of topolvm-node-26t6g (and, as I later noticed router-default-6795657dbc-nnqmv)

/usr/local/bin/oc get pods --all-namespaces
NAMESPACE                  NAME                                  READY   STATUS             RESTARTS        AGE
openshift-dns              dns-default-5tsdh                     2/2     Running            0               15m
openshift-dns              node-resolver-cxb2v                   1/1     Running            0               15m
openshift-ingress          router-default-6795657dbc-nnqmv       0/1     Running            2 (80s ago)     15m
openshift-ovn-kubernetes   ovnkube-master-zcbsl                  4/4     Running            0               15m
openshift-ovn-kubernetes   ovnkube-node-h22g5                    1/1     Running            0               15m
openshift-service-ca       service-ca-76649665b5-thmh8           1/1     Running            0               15m
openshift-storage          topolvm-controller-8479455f95-pvqls   4/4     Running            0               15m
openshift-storage          topolvm-node-26t6g                    2/4     CrashLoopBackOff   10 (2m6s ago)   14m

and these errors in the log for topolvm-node-26t6g

/usr/local/bin/oc logs topolvm-node-26t6g -n openshift-storage
Defaulted container "lvmd" out of: lvmd, topolvm-node, csi-registrar, liveness-probe, file-checker (init)
2022-08-23T09:18:49.459028Z topolvm-node-26t6g lvmd info: "configuration file loaded: " device_classes="[0xc00059c9b0]" file_name="/etc/topolvm/lvmd.yaml" socket_name="/run/lvmd/lvmd.sock"
2022-08-23T09:18:49.498429Z topolvm-node-26t6g lvmd error: "Volume group not found:" volume_group="rhel"
Error: not found
not found

Looks like my volume group is named cs and not rhel

vgdisplay 
  --- Volume group ---
  VG Name               cs
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  4
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                3
  Open LV               3
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               <237.46 GiB
  PE Size               4.00 MiB
  Total PE              60789
  Alloc PE / Size       60789 / <237.46 GiB
  Free  PE / Size       0 / 0   
  VG UUID               RQ1RHa-33vc-XEFs-lY9c-LZj5-YeAz-W91cfz

So I renamed the vg following https://forums.centos.org/viewtopic.php?t=62406

And now I noticed the CrashLoopBackOff in router-default-6795657dbc-nnqmv pods

/usr/local/bin/oc logs router-default-6795657dbc-nnqmv -n openshift-ingress
[-]has-synced failed: Router not synced
W0823 09:26:50.017716       1 reflector.go:324] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: failed to list *v1.EndpointSlice: Get "https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host
E0823 09:26:50.017746       1 reflector.go:138] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host
I0823 09:26:50.191462       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
W0823 09:26:50.357748       1 reflector.go:324] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: failed to list *v1.Route: Get "https://10.43.0.1:443/apis/route.openshift.io/v1/routes?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host
E0823 09:26:50.357872       1 reflector.go:138] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: Failed to watch *v1.Route: failed to list *v1.Route: Get "https://10.43.0.1:443/apis/route.openshift.io/v1/routes?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host


[...]


I0823 09:27:25.191607       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:27:26.192461       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:27:27.192082       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:27:28.192125       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:27:29.192098       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:27:30.190609       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced

Anything else we need to know?:

Environment:

  • Microshift version (use microshift version): MicroShift Version: 4.10.0-0.microshift-2022-08-08-151458-61-g2d7df45a Base OCP Version: 4.10.18
  • Hardware configuration:
  • OS (e.g: cat /etc/os-release):

NAME="CentOS Stream"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Stream 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream"

  • Kernel (e.g. uname -a): Linux localhost.jiridanek.net 4.18.0-373.el8.x86_64 Init #1 SMP Tue Mar 22 15:11:47 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • Others:

Relevant Logs

/usr/local/bin/oc describe pod router-default-6795657dbc-nnqmv -n openshift-ingress
Name:                 router-default-6795657dbc-nnqmv
Namespace:            openshift-ingress
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 localhost.jiridanek.github.beta.tailscale.net/10.40.2.205
Start Time:           Tue, 23 Aug 2022 11:03:29 +0200
Labels:               ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default
                      pod-template-hash=6795657dbc
Annotations:          openshift.io/scc: hostnetwork
                      target.workload.openshift.io/management: {"effect": "PreferredDuringScheduling"}
                      unsupported.do-not-use.openshift.io/override-liveness-grace-period-seconds: 10
Status:               Running
IP:                   10.40.2.205
IPs:
  IP:           10.40.2.205
Controlled By:  ReplicaSet/router-default-6795657dbc
Containers:
  router:
    Container ID:  cri-o://4eb7dd3852f97c2178545721b365d4783f1d5d13c4e61b960b590cd8e9982215
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5f9ee3afa744e790dbb61d08f44e30370c9a5ff041054bf99dc1afe58792cd7b
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5f9ee3afa744e790dbb61d08f44e30370c9a5ff041054bf99dc1afe58792cd7b
    Ports:         80/TCP, 443/TCP, 1936/TCP
    Host Ports:    80/TCP, 443/TCP, 1936/TCP
    State:         Waiting
      Reason:      CrashLoopBackOff
    Last State:    Terminated
      Reason:      Error
      Message:      reflector.go:324] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: failed to list *v1.Route: Get "https://10.43.0.1:443/apis/route.openshift.io/v1/routes?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host
E0823 09:34:37.910504       1 reflector.go:138] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: Failed to watch *v1.Route: failed to list *v1.Route: Get "https://10.43.0.1:443/apis/route.openshift.io/v1/routes?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host
I0823 09:34:38.191606       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:34:39.192003       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:34:40.192476       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:34:41.192337       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:34:42.192030       1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
W0823 09:34:45.886374       1 reflector.go:324] github.com/openshift/router/pkg/router/template/service_lookup.go:33: failed to list *v1.Service: Get "https://10.43.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host
E0823 09:34:45.886502       1 reflector.go:138] github.com/openshift/router/pkg/router/template/service_lookup.go:33: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.43.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host

      Exit Code:    137
      Started:      Tue, 23 Aug 2022 11:32:41 +0200
      Finished:     Tue, 23 Aug 2022 11:34:52 +0200
    Ready:          False
    Restart Count:  8
    Requests:
      cpu:      100m
      memory:   256Mi
    Liveness:   http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://localhost:1936/healthz/ready delay=10s timeout=1s period=10s #success=1 #failure=3
    Startup:    http-get http://:1936/healthz/ready delay=0s timeout=1s period=1s #success=1 #failure=120
    Environment:
      STATS_PORT:                                1936
      ROUTER_SERVICE_NAMESPACE:                  openshift-ingress
      DEFAULT_CERTIFICATE_DIR:                   /etc/pki/tls/private
      DEFAULT_DESTINATION_CA_PATH:               /var/run/configmaps/service-ca/service-ca.crt
      ROUTER_CIPHERS:                            TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
      ROUTER_DISABLE_HTTP2:                      true
      ROUTER_DISABLE_NAMESPACE_OWNERSHIP_CHECK:  false
      ROUTER_METRICS_TLS_CERT_FILE:              /etc/pki/tls/private/tls.crt
      ROUTER_METRICS_TLS_KEY_FILE:               /etc/pki/tls/private/tls.key
      ROUTER_METRICS_TYPE:                       haproxy
      ROUTER_SERVICE_NAME:                       default
      ROUTER_SET_FORWARDED_HEADERS:              append
      ROUTER_THREADS:                            4
      SSL_MIN_VERSION:                           TLSv1.2
    Mounts:
      /etc/pki/tls/private from default-certificate (ro)
      /var/run/configmaps/service-ca from service-ca-bundle (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5w76x (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-certificate:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  router-certs-default
    Optional:    false
  service-ca-bundle:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      service-ca-bundle
    Optional:  false
  kube-api-access-5w76x:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  33m                    default-scheduler  0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
  Normal   Scheduled         33m                    default-scheduler  Successfully assigned openshift-ingress/router-default-6795657dbc-nnqmv to localhost.jiridanek.github.beta.tailscale.net
  Warning  FailedMount       28m (x2 over 30m)      kubelet            Unable to attach or mount volumes: unmounted volumes=[default-certificate service-ca-bundle], unattached volumes=[default-certificate service-ca-bundle kube-api-access-5w76x]: timed out waiting for the condition
  Warning  FailedMount       26m (x11 over 33m)     kubelet            MountVolume.SetUp failed for volume "service-ca-bundle" : configmap references non-existent config key: service-ca.crt
  Warning  FailedMount       26m (x11 over 33m)     kubelet            MountVolume.SetUp failed for volume "default-certificate" : secret "router-certs-default" not found
  Warning  FailedMount       26m                    kubelet            Unable to attach or mount volumes: unmounted volumes=[service-ca-bundle default-certificate], unattached volumes=[service-ca-bundle kube-api-access-5w76x default-certificate]: timed out waiting for the condition
  Warning  FailedMount       24m (x5 over 24m)      kubelet            MountVolume.SetUp failed for volume "default-certificate" : secret "router-certs-default" not found
  Warning  FailedMount       24m (x5 over 24m)      kubelet            MountVolume.SetUp failed for volume "service-ca-bundle" : configmap references non-existent config key: service-ca.crt
  Normal   Pulling           24m                    kubelet            Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5f9ee3afa744e790dbb61d08f44e30370c9a5ff041054bf99dc1afe58792cd7b"
  Normal   Pulled            24m                    kubelet            Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5f9ee3afa744e790dbb61d08f44e30370c9a5ff041054bf99dc1afe58792cd7b" in 4.824691275s
  Normal   Created           24m                    kubelet            Created container router
  Normal   Started           24m                    kubelet            Started container router
  Warning  DNSConfigForming  24m (x5 over 24m)      kubelet            Search Line limits were exceeded, some search paths have been omitted, the applied search line is: openshift-ingress.svc.cluster.local svc.cluster.local cluster.local meerkat-justice.ts.net jiridanek.github.beta.tailscale.net brq.redhat.com
  Warning  Unhealthy         24m (x3 over 24m)      kubelet            Startup probe failed: HTTP probe failed with statuscode: 500
  Warning  ProbeError        4m24s (x935 over 24m)  kubelet            Startup probe error: HTTP probe failed with statuscode: 500
body: [-]backend-http failed: reason withheld
[-]has-synced failed: reason withheld
[+]process-running ok
healthz check failed

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions