-
Notifications
You must be signed in to change notification settings - Fork 218
Description
What happened:
- microshift expects to have a
rhel
volume group available, centos stream createscs
volume group router-default-6795657dbc-nnqmv
pod inopenshift-ingress
namespace fails to start, for (to me) unexplained reasons
What you expected to happen:
🦄
How to reproduce it (as minimally and precisely as possible):
Need install cri-o (covered at https://microshift.io/docs/getting-started/)
sudo dnf module enable -y cri-o:1.21
sudo dnf install -y cri-o cri-tools
sudo systemctl enable crio --now
Need enable openstack repos to have openvswitch2.16
package for microshift-networking, available (not mentioned in RHEL 8 instructions or really anywhere)
sudo dnf config-manager --set-enabled powertools
sudo dnf install -y epel-release centos-release-openstack-yoga
Now I can install microshift rpms
sudo dnf install -y \
./packaging/rpm/_rpmbuild/RPMS/x86_64/microshift-4.10.0-2022_08_08_151458_61_g2d7df45a.el8.x86_64.rpm \
./packaging/rpm/_rpmbuild/RPMS/x86_64/microshift-networking-4.10.0-2022_08_08_151458_61_g2d7df45a.el8.x86_64.rpm \
./packaging/rpm/_rpmbuild/RPMS/noarch/microshift-selinux-4.10.0-2022_08_08_151458_61_g2d7df45a.el8.noarch.rpm
Following the install instructions further
sudo firewall-cmd --zone=trusted --add-source=10.42.0.0/16 --permanent
sudo firewall-cmd --zone=public --add-port=80/tcp --permanent
sudo firewall-cmd --zone=public --add-port=443/tcp --permanent
sudo firewall-cmd --zone=public --add-port=5353/udp --permanent
sudo firewall-cmd --reload
sudo systemctl enable microshift --now
Copy the pull secret (https://github.com/openshift/microshift/blob/main/docs/devenv_rhel8.md)
Do the other things
curl -O https://mirror.openshift.com/pub/openshift-v4/$(uname -m)/clients/ocp/stable/openshift-client-linux.tar.gz
sudo tar -xf openshift-client-linux.tar.gz -C /usr/local/bin oc kubectl
mkdir ~/.kube
sudo cat /var/lib/microshift/resources/kubeadmin/kubeconfig > ~/.kube/config
And, I end up with all pods up, with the exception of topolvm-node-26t6g
(and, as I later noticed router-default-6795657dbc-nnqmv)
/usr/local/bin/oc get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
openshift-dns dns-default-5tsdh 2/2 Running 0 15m
openshift-dns node-resolver-cxb2v 1/1 Running 0 15m
openshift-ingress router-default-6795657dbc-nnqmv 0/1 Running 2 (80s ago) 15m
openshift-ovn-kubernetes ovnkube-master-zcbsl 4/4 Running 0 15m
openshift-ovn-kubernetes ovnkube-node-h22g5 1/1 Running 0 15m
openshift-service-ca service-ca-76649665b5-thmh8 1/1 Running 0 15m
openshift-storage topolvm-controller-8479455f95-pvqls 4/4 Running 0 15m
openshift-storage topolvm-node-26t6g 2/4 CrashLoopBackOff 10 (2m6s ago) 14m
and these errors in the log for topolvm-node-26t6g
/usr/local/bin/oc logs topolvm-node-26t6g -n openshift-storage
Defaulted container "lvmd" out of: lvmd, topolvm-node, csi-registrar, liveness-probe, file-checker (init)
2022-08-23T09:18:49.459028Z topolvm-node-26t6g lvmd info: "configuration file loaded: " device_classes="[0xc00059c9b0]" file_name="/etc/topolvm/lvmd.yaml" socket_name="/run/lvmd/lvmd.sock"
2022-08-23T09:18:49.498429Z topolvm-node-26t6g lvmd error: "Volume group not found:" volume_group="rhel"
Error: not found
not found
Looks like my volume group is named cs
and not rhel
vgdisplay
--- Volume group ---
VG Name cs
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 4
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 3
Open LV 3
Max PV 0
Cur PV 1
Act PV 1
VG Size <237.46 GiB
PE Size 4.00 MiB
Total PE 60789
Alloc PE / Size 60789 / <237.46 GiB
Free PE / Size 0 / 0
VG UUID RQ1RHa-33vc-XEFs-lY9c-LZj5-YeAz-W91cfz
So I renamed the vg following https://forums.centos.org/viewtopic.php?t=62406
And now I noticed the CrashLoopBackOff in router-default-6795657dbc-nnqmv pods
/usr/local/bin/oc logs router-default-6795657dbc-nnqmv -n openshift-ingress
[-]has-synced failed: Router not synced
W0823 09:26:50.017716 1 reflector.go:324] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: failed to list *v1.EndpointSlice: Get "https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host
E0823 09:26:50.017746 1 reflector.go:138] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host
I0823 09:26:50.191462 1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
W0823 09:26:50.357748 1 reflector.go:324] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: failed to list *v1.Route: Get "https://10.43.0.1:443/apis/route.openshift.io/v1/routes?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host
E0823 09:26:50.357872 1 reflector.go:138] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: Failed to watch *v1.Route: failed to list *v1.Route: Get "https://10.43.0.1:443/apis/route.openshift.io/v1/routes?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host
[...]
I0823 09:27:25.191607 1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:27:26.192461 1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:27:27.192082 1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:27:28.192125 1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:27:29.192098 1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:27:30.190609 1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
Anything else we need to know?:
Environment:
- Microshift version (use
microshift version
): MicroShift Version: 4.10.0-0.microshift-2022-08-08-151458-61-g2d7df45a Base OCP Version: 4.10.18 - Hardware configuration:
- OS (e.g:
cat /etc/os-release
):
NAME="CentOS Stream"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Stream 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream"
- Kernel (e.g.
uname -a
): Linux localhost.jiridanek.net 4.18.0-373.el8.x86_64 Init #1 SMP Tue Mar 22 15:11:47 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux - Others:
Relevant Logs
/usr/local/bin/oc describe pod router-default-6795657dbc-nnqmv -n openshift-ingress
Name: router-default-6795657dbc-nnqmv
Namespace: openshift-ingress
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: localhost.jiridanek.github.beta.tailscale.net/10.40.2.205
Start Time: Tue, 23 Aug 2022 11:03:29 +0200
Labels: ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default
pod-template-hash=6795657dbc
Annotations: openshift.io/scc: hostnetwork
target.workload.openshift.io/management: {"effect": "PreferredDuringScheduling"}
unsupported.do-not-use.openshift.io/override-liveness-grace-period-seconds: 10
Status: Running
IP: 10.40.2.205
IPs:
IP: 10.40.2.205
Controlled By: ReplicaSet/router-default-6795657dbc
Containers:
router:
Container ID: cri-o://4eb7dd3852f97c2178545721b365d4783f1d5d13c4e61b960b590cd8e9982215
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5f9ee3afa744e790dbb61d08f44e30370c9a5ff041054bf99dc1afe58792cd7b
Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5f9ee3afa744e790dbb61d08f44e30370c9a5ff041054bf99dc1afe58792cd7b
Ports: 80/TCP, 443/TCP, 1936/TCP
Host Ports: 80/TCP, 443/TCP, 1936/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Message: reflector.go:324] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: failed to list *v1.Route: Get "https://10.43.0.1:443/apis/route.openshift.io/v1/routes?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host
E0823 09:34:37.910504 1 reflector.go:138] github.com/openshift/router/pkg/router/controller/factory/factory.go:125: Failed to watch *v1.Route: failed to list *v1.Route: Get "https://10.43.0.1:443/apis/route.openshift.io/v1/routes?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host
I0823 09:34:38.191606 1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:34:39.192003 1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:34:40.192476 1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:34:41.192337 1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0823 09:34:42.192030 1 healthz.go:257] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
W0823 09:34:45.886374 1 reflector.go:324] github.com/openshift/router/pkg/router/template/service_lookup.go:33: failed to list *v1.Service: Get "https://10.43.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host
E0823 09:34:45.886502 1 reflector.go:138] github.com/openshift/router/pkg/router/template/service_lookup.go:33: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.43.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: connect: no route to host
Exit Code: 137
Started: Tue, 23 Aug 2022 11:32:41 +0200
Finished: Tue, 23 Aug 2022 11:34:52 +0200
Ready: False
Restart Count: 8
Requests:
cpu: 100m
memory: 256Mi
Liveness: http-get http://localhost:1936/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://localhost:1936/healthz/ready delay=10s timeout=1s period=10s #success=1 #failure=3
Startup: http-get http://:1936/healthz/ready delay=0s timeout=1s period=1s #success=1 #failure=120
Environment:
STATS_PORT: 1936
ROUTER_SERVICE_NAMESPACE: openshift-ingress
DEFAULT_CERTIFICATE_DIR: /etc/pki/tls/private
DEFAULT_DESTINATION_CA_PATH: /var/run/configmaps/service-ca/service-ca.crt
ROUTER_CIPHERS: TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
ROUTER_DISABLE_HTTP2: true
ROUTER_DISABLE_NAMESPACE_OWNERSHIP_CHECK: false
ROUTER_METRICS_TLS_CERT_FILE: /etc/pki/tls/private/tls.crt
ROUTER_METRICS_TLS_KEY_FILE: /etc/pki/tls/private/tls.key
ROUTER_METRICS_TYPE: haproxy
ROUTER_SERVICE_NAME: default
ROUTER_SET_FORWARDED_HEADERS: append
ROUTER_THREADS: 4
SSL_MIN_VERSION: TLSv1.2
Mounts:
/etc/pki/tls/private from default-certificate (ro)
/var/run/configmaps/service-ca from service-ca-bundle (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5w76x (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-certificate:
Type: Secret (a volume populated by a Secret)
SecretName: router-certs-default
Optional: false
service-ca-bundle:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: service-ca-bundle
Optional: false
kube-api-access-5w76x:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 33m default-scheduler 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
Normal Scheduled 33m default-scheduler Successfully assigned openshift-ingress/router-default-6795657dbc-nnqmv to localhost.jiridanek.github.beta.tailscale.net
Warning FailedMount 28m (x2 over 30m) kubelet Unable to attach or mount volumes: unmounted volumes=[default-certificate service-ca-bundle], unattached volumes=[default-certificate service-ca-bundle kube-api-access-5w76x]: timed out waiting for the condition
Warning FailedMount 26m (x11 over 33m) kubelet MountVolume.SetUp failed for volume "service-ca-bundle" : configmap references non-existent config key: service-ca.crt
Warning FailedMount 26m (x11 over 33m) kubelet MountVolume.SetUp failed for volume "default-certificate" : secret "router-certs-default" not found
Warning FailedMount 26m kubelet Unable to attach or mount volumes: unmounted volumes=[service-ca-bundle default-certificate], unattached volumes=[service-ca-bundle kube-api-access-5w76x default-certificate]: timed out waiting for the condition
Warning FailedMount 24m (x5 over 24m) kubelet MountVolume.SetUp failed for volume "default-certificate" : secret "router-certs-default" not found
Warning FailedMount 24m (x5 over 24m) kubelet MountVolume.SetUp failed for volume "service-ca-bundle" : configmap references non-existent config key: service-ca.crt
Normal Pulling 24m kubelet Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5f9ee3afa744e790dbb61d08f44e30370c9a5ff041054bf99dc1afe58792cd7b"
Normal Pulled 24m kubelet Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5f9ee3afa744e790dbb61d08f44e30370c9a5ff041054bf99dc1afe58792cd7b" in 4.824691275s
Normal Created 24m kubelet Created container router
Normal Started 24m kubelet Started container router
Warning DNSConfigForming 24m (x5 over 24m) kubelet Search Line limits were exceeded, some search paths have been omitted, the applied search line is: openshift-ingress.svc.cluster.local svc.cluster.local cluster.local meerkat-justice.ts.net jiridanek.github.beta.tailscale.net brq.redhat.com
Warning Unhealthy 24m (x3 over 24m) kubelet Startup probe failed: HTTP probe failed with statuscode: 500
Warning ProbeError 4m24s (x935 over 24m) kubelet Startup probe error: HTTP probe failed with statuscode: 500
body: [-]backend-http failed: reason withheld
[-]has-synced failed: reason withheld
[+]process-running ok
healthz check failed