Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] When running on cloud providers ingress and service-ca will not stay online #270

Closed
cooktheryan opened this issue Sep 14, 2021 · 6 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@cooktheryan
Copy link
Contributor

cooktheryan commented Sep 14, 2021

What happened:

Repeated restarts of openshift dns, ingress, and service-ca pods.

NOTE: Observed before SELinux modifcations merged

What you expected to happen:

Pods do not restart

How to reproduce it (as minimally and precisely as possible):

  1. Make microshift
  2. Start microshift Service
  3. kubectl get po -wA

Environment:

  • Microshift version (use microshift version):
[root@microshift ~]# microshift version
Microshift Version: 4.7.0-0.microshift-2021-08-31-224727-17-gb1fb489
Base OKD Version: 4.7.0-0.okd-2021-06-13-090745
  • Hardware configuration:
    RAM
    16GB
    VCPUs
    32 VCPU

  • OS (e.g: cat /etc/os-release): Red Hat Enterprise Linux release 8.4 (Ootpa)

  • Kernel (e.g. uname -a): 4.18.0-305.el8.x86_64

  • Others:

Relevant Logs

Router

I0914 21:06:11.814726       1 template.go:433] router "msg"="starting router"  "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: 9cc0c8fc\nversionFromGit: v0.0.0-unknown\ngitTreeState: dirty\nbuildDate: 2021-06-11T16:32:09Z\n"
I0914 21:06:11.817520       1 metrics.go:154] metrics "msg"="router health and metrics port listening on HTTP and HTTPS"  "address"="0.0.0.0:1936"
I0914 21:06:11.822374       1 router.go:191] template "msg"="creating a new template router"  "writeDir"="/var/lib/haproxy"
I0914 21:06:11.822468       1 router.go:270] template "msg"="router will coalesce reloads within an interval of each other"  "interval"="5s"
I0914 21:06:11.823069       1 router.go:332] template "msg"="watching for changes"  "path"="/etc/pki/tls/private"
I0914 21:06:11.823134       1 router.go:262] router "msg"="router is including routes in all namespaces"  
E0914 21:06:11.929475       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
I0914 21:06:11.965842       1 router.go:579] template "msg"="router reloaded"  "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I0914 21

service ca

I0914 21:10:48.891644       1 reflector.go:530] k8s.io/kube-aggregator/pkg/client/informers/externalversions/factory.go:117: Watch close - *v1.APIService total 0 items received
I0914 21:10:48.891907       1 reflector.go:530] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.MutatingWebhookConfiguration total 0 items received
I0914 21:10:48.891938       1 reflector.go:530] k8s.io/apiserver/pkg/server/dynamiccertificates/configmap_cafile_content.go:206: Watch close - *v1.ConfigMap total 0 items received
E0914 21:10:48.891966       1 leaderelection.go:325] error retrieving resource lock openshift-service-ca/service-ca-controller-lock: Get "https://10.43.0.1:443/api/v1/namespaces/openshift-service-ca/configmaps/service-ca-controller-lock?timeout=35s": read tcp 10.42.0.3:34648->10.43.0.1:443: read: connection timed out
I0914 21:10:48.891462       1 reflector.go:530] k8s.io/apiextensions-apiserver/pkg/client/informers/externalversions/factory.go:117: Watch close - *v1.CustomResourceDefinition total 0 items received
I0914 21:10:58.859985       1 leaderelection.go:278] failed to renew lease openshift-service-ca/service-ca-controller-lock: timed out waiting for the condition
I0914 21:10:58.860110       1 event.go:282] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"", Name:"", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' service-ca-64547678c6-h8smv_c0bf546f-4fa0-4449-b99c-9e9dd548d1fe stopped leading
I0914 21:10:58.861208       1 request.go:844] Error in request: resource name may not be empty
E0914 21:10:58.861261       1 leaderelection.go:301] Failed to release lock: resource name may not be empty
W0914 21:10:58.861309       1 leaderelection.go:75] leader election lost
@cooktheryan cooktheryan added the kind/bug Categorizes issue or PR as related to a bug. label Sep 14, 2021
@rootfs
Copy link
Member

rootfs commented Sep 14, 2021

10.42.0.3:34648->10.43.0.1:443: read: connection timed out

looks a security group config issue? maybe open up port 443 and try again? also maybe the CIDR 10.43.0.0/16 may collide with openstack vpc

@cooktheryan
Copy link
Contributor Author

For my issue there is a service that needs restarted but I haven't been able to identify which service yet. It seems like once the pods are created and I reboot then everything operates as expected after the reboot so it has to be some service

@ianzhang366
Copy link
Contributor

Hi @rootfs @cooktheryan

I tried the following settings:
0. enabled 443 on EC2 instance

  1. disable firewall
  2. start microshift with the following:
cluster:
  url: https://127.0.0.1:6443
  clusterCIDR: 172.17.1.0/24
  serviceCIDR: 172.17.2.0/24
  dns: 172.17.2.10

However, I'm still getting the same error:

$ kubectl logs -n kubevirt-hostpath-provisioner   kubevirt-hostpath-provisioner-4lhx8
F0915 21:13:59.846380       1 hostpath-provisioner.go:270] Error getting server version: Get https://172.17.2.1:443/version?timeout=32s: dial tcp 172.17.2.1:443: i/o timeout

Also, I tried to change the hostname on my EC2 instance but got the same error.

Any idea how to progress this one?

@rootfs
Copy link
Member

rootfs commented Sep 15, 2021

@cooktheryan @ianzhang366 i'll spin an EC2 instance to investigate. In the meantime, can you check if all-in-one microshift container work for you on EC2 and openstack?

sudo setsebool -P container_manage_cgroup true
sudo podman volume create ushift-vol
sudo podman run -d --rm --name ushift --privileged -v /lib/modules:/lib/modules -v ushift-vol:/var/lib --hostname ushift -p 6443:6443 quay.io/microshift/microshift:4.7.0-0.microshift-2021-08-31-224727-aio-linux-amd64

@ianzhang366
Copy link
Contributor

Tried on a EC2

[ec2-user@ip-172-31-33-139 ~]$ sudo setsebool -P container_manage_cgroup true
[ec2-user@ip-172-31-33-139 ~]$ sudo podman volume create ushift-vol
ushift-vol
[ec2-user@ip-172-31-33-139 ~]$ sudo podman run -d --rm --name ushift --privileged -v /lib/modules:/lib/modules -v ushift-vol:/var/lib --hostname ushift -p 6443:6443 quay.io/microshift/microshift:4.7.0-0.microshift-2021-08-31-224727-aio-linux-amd64
Trying to pull quay.io/microshift/microshift:4.7.0-0.microshift-2021-08-31-224727-aio-linux-amd64...
Getting image source signatures
Copying blob dfd8c625d022 done
Copying blob d90f0305a1ba done
Copying blob 6990ea5645c7 done
Copying blob 893dc2344ef6 done
Copying blob 624513fea370 done
Copying blob b63aa9a6b5a1 done
Copying blob c11223d418c7 done
Copying blob 441ee9c555e7 done
Copying blob 00ad89df3ad9 done
Copying config b9465f0252 done
Writing manifest to image destination
Storing signatures
703032913c86ef9ce78c5976218706d8e14c522bfa9ee252544655cf6af207cf
[ec2-user@ip-172-31-33-139 ~]$ podman ps
CONTAINER ID  IMAGE       COMMAND     CREATED     STATUS      PORTS       NAMES
[ec2-user@ip-172-31-33-139 ~]$ date
Thu Sep 16 01:29:00 UTC 2021
[ec2-user@ip-172-31-33-139 ~]$ podman ps -a
CONTAINER ID  IMAGE       COMMAND     CREATED     STATUS      PORTS       NAMES
[ec2-user@ip-172-31-33-139 ~]$ date
Thu Sep 16 01:33:44 UTC 2021
[ec2-user@ip-172-31-33-139 ~]$ podman ps -a
CONTAINER ID  IMAGE       COMMAND     CREATED     STATUS      PORTS       NAMES
[ec2-user@ip-172-31-33-139 ~]$ podman exec -it ushift bash
Error: no container with name or ID "ushift" found: no such container

it seems the RHEL can't bring up the container...

@rootfs
Copy link
Member

rootfs commented Sep 16, 2021

This looks a known issue on RHEL 8.4 but gets resolved in 8.5.

A temp workaround is to disable nm-cloud-setup service:

systemctl disable nm-cloud-setup.service nm-cloud-setup.timer
reboot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants