Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Getting "webhook configurations error" #2179

Closed
om3171991 opened this issue Jul 21, 2021 · 39 comments
Closed

[BUG] Getting "webhook configurations error" #2179

om3171991 opened this issue Jul 21, 2021 · 39 comments
Assignees
Labels
bug Something isn't working end user This label is used to track the issue that is raised by the end user.

Comments

@om3171991
Copy link

om3171991 commented Jul 21, 2021

Hi Team,

We are facing an issue with the Kind Kubernetes cluster where we are seeing the below error message :

  1. Readiness & liveness probes are passing at all and pod is getting restarted again n again because of same.

On checking pod logs, we find out that helm installation is not able to create webhook configurations :

E0721 16:03:38.718669 1 monitor.go:160] WebhookMonitor/registerWebhookIfNotPresent "msg"="missing webhooks" "error"="mutatingwebhookconfigurations.admissionregistration.k8s.io "kyverno-verify-mutating-webhook-cfg" not found"
E0721 16:03:38.808018 1 monitor.go:91] WebhookMonitor "msg"="" "error"="failed to register webhooks: Endpoint not ready"
I0721 16:03:39.078243 1 status.go:75] WebhookMonitor/WebhookStatusControl "msg"="updating deployment annotation" "name"="kyverno" "namespace"="kyverno" "key"="kyverno.io/webhookActive" "val"="true"
E0721 16:03:50.758993 1 main.go:363] setup "msg"="Timeout registering admission control webhooks" "error"=null

➜ kyverno git: kubectl --context kind-test get validatingwebhookconfigurations,mutatingwebhookconfigurations
No resources found

whereas we have installed kyverno on the local docker desktop and AKS cluster and it is running fine as expected - Kyverno pod is started successfully without failing any probes.

kind cluster details -
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
prod-control-plane Ready control-plane,master 7d4h v1.21.1 172.18.0.3 Ubuntu 21.04 5.10.25-linuxkit containerd://1.5.2

kyverno installation details -

➜ kyverno git: helm history --kube-context kind-test kyverno --namespace kyverno
REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
1 Wed Jul 21 21:27:39 2021 superseded kyverno-1.4.1 v1.4.1 Install complete
2 Wed Jul 21 21:30:03 2021 deployed kyverno-1.4.1 v1.4.1 Upgrade complete

I am not sure if I missed any step as the same helm chart repo is used to install in all clusters - docker desktop, AKS & Kind.

Please help us here as local development of policy is becoming difficult to do and also let me know if any other details is required.

cc: @realshuting

@om3171991 om3171991 added the bug Something isn't working label Jul 21, 2021
@om3171991 om3171991 changed the title [BUG] Getting "admission control configuration error" [BUG] Getting "webhook configurations error" Jul 21, 2021
@chipzoller
Copy link
Contributor

What version of KinD on what type of OS? I use KinD myself nearly every day and don't see this.

@om3171991
Copy link
Author

om3171991 commented Jul 22, 2021

@chipzoller -
kind version: 0.11.1
Host OS: Mac OS 10.15.7

I have created 3 clusters and observing the same issue - no webhook for kyverno.
Helm Chart - https://github.com/kyverno/kyverno/tree/main/charts/kyverno

@realshuting
Copy link
Member

realshuting commented Jul 22, 2021

Hi @om3171991 - can you share the command that was used to install Kyverno?

Besides, can you please attach the output of the command kubectl -n kyverno get ep?

@om3171991
Copy link
Author

Hi @realshuting

For installing Kyverno, We have uploaded the helm chart in our internal chartmuseum from below URL:
https://github.com/kyverno/kyverno/tree/main/charts/kyverno

and cmd used to install:

helm install --kube-context kind-test kyverno kyverno --namespace kyverno

➜ kyverno git: ✗ test -n kyverno get ep
NAME ENDPOINTS AGE
kyverno-svc 15h
kyverno-svc-metrics 15h

@realshuting
Copy link
Member

Thanks @om3171991!

It looks like there's no endpoint registered for Kyverno Pod, can you please describe the endpoint and inspect the issue? Here's what I have, you can see that the Pod's IP is listed:

 ✗ k get ep -n kyverno
NAME                  ENDPOINTS         AGE
kyverno-svc           172.17.0.4:9443   4h52m
kyverno-svc-metrics   172.17.0.4:8000   4h52m

@om3171991
Copy link
Author

@realshuting - Can is it because the pod is not in a ready state (It's in CrashLoopBackOff because of the error mention in issue)

kubectl --context kind-test -n kyverno get all
NAME READY STATUS RESTARTS AGE
pod/kyverno-78ff67fbc9-4l7jp 0/1 CrashLoopBackOff 42 13h

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kyverno-svc ClusterIP 10.96.78.141 443/TCP 13h
service/kyverno-svc-metrics ClusterIP 10.96.108.251 8000/TCP 13h

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/kyverno 0/1 1 0 13h

NAME DESIRED CURRENT READY AGE
replicaset.apps/kyverno-78ff67fbc9 1 1 0 13h

@realshuting
Copy link
Member

No, the Pod crashed because the Pod IP was not registered as the endpoint successfully. Can you provide the following output?

kubectl -n kyverno get pod -o wide

kubectl -n kyverno describe endpoint

@om3171991
Copy link
Author

➜ kyverno git : kubectl -n kyverno describe endpoint --context kind-test
error: the server doesn't have a resource type "endpoint"
➜ kyverno git : kubectl --context kind-test -n kyverno get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kyverno-565755c84c-zqmv8 0/1 Running 2 2m16s 10.244.0.11 test-control-plane

@realshuting
Copy link
Member

Sorry it should be kubectl -n kyverno describe endpoints.

@om3171991
Copy link
Author

➜ kyverno git : kubectl -n kyverno describe endpoints --context kind-test
Name: kyverno-svc
Namespace: kyverno
Labels: app=kyverno
app.kubernetes.io/component=kyverno
app.kubernetes.io/instance=kyverno
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=kyverno
app.kubernetes.io/part-of=kyverno
app.kubernetes.io/version=1.4.5
helm.sh/chart=kyverno-1.4.5
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2021-07-21T15:57:41Z
Subsets:
Addresses:
NotReadyAddresses: 10.244.0.10,10.244.0.11
Ports:
Name Port Protocol
---- ---- --------
https 9443 TCP

Events:
Type Reason Age From Message


Warning FailedToUpdateEndpoint 23m endpoint-controller Failed to update endpoint kyverno/kyverno-svc: Operation cannot be fulfilled on endpoints "kyverno-svc": the object has been modified; please apply your changes to the latest version and try again

Name: kyverno-svc-metrics
Namespace: kyverno
Labels: app=kyverno
app.kubernetes.io/component=kyverno
app.kubernetes.io/instance=kyverno
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=kyverno
app.kubernetes.io/part-of=kyverno
app.kubernetes.io/version=1.4.5
helm.sh/chart=kyverno-1.4.5
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2021-07-21T15:57:41Z
Subsets:
Addresses:
NotReadyAddresses: 10.244.0.10,10.244.0.11
Ports:
Name Port Protocol
---- ---- --------
metrics-port 8000 TCP

Events:
Type Reason Age From Message


Warning FailedToUpdateEndpoint 23m endpoint-controller Failed to update endpoint kyverno/kyverno-svc-metrics: Operation cannot be fulfilled on endpoints "kyverno-svc-metrics": the object has been modified; please apply your changes to the latest version and try again

@realshuting
Copy link
Member

Interesting that there are two endpoints but your previous output only shows one Pod.

NotReadyAddresses: 10.244.0.10,10.244.0.11
➜ kyverno git : kubectl --context kind-test -n kyverno get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kyverno-565755c84c-zqmv8 0/1 Running 2 2m16s 10.244.0.11 test-control-plane

Do you currently have 2 Kyverno pods running?

If possible, can you re-install Kyverno and increase initialDelaySeconds of ReadinessProbe? Say 30 for example.

@om3171991
Copy link
Author

om3171991 commented Jul 22, 2021

Reason for 2 IPs - I have run the kyverno helm upgrade command and the old pod is still there with CrashLoopBackOff

➜ kyverno git : kubectl --context kind-test -n kyverno get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kyverno-565755c84c-zqmv8 0/1 Running 2 2m16s 10.244.0.11 test-control-plane
kyverno-78ff67fbc9-4l7jp 0/1 CrashLoopBackOff 45 13h 10.244.0.10 test-control-plane

Increasing initialDelaySeconds - already tried that with 60 seconds for both ReadinessProbe and LivenessProbe but still, it is failing.

Below is config which we are using :

livenessProbe:
httpGet:
path: /health/liveness
port: 9443
scheme: HTTPS
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 30
failureThreshold: 2
successThreshold: 1

readinessProbe:
httpGet:
path: /health/readiness
port: 9443
scheme: HTTPS
initialDelaySeconds: 60
periodSeconds: 60
timeoutSeconds: 30
failureThreshold: 6
successThreshold: 1

@realshuting
Copy link
Member

Were these Probes configured with a clean installation? I'll have to dig into the code and check if there's any dependency between the endpoints' status and the pod's.

@om3171991
Copy link
Author

Yes. So to clear out if any confusion exist - I have reinstalled everything again. Please find the output for all commands below :

➜ kyverno git: kubectl --context kind-test -n kyverno get all
NAME READY STATUS RESTARTS AGE
pod/kyverno-66548b7b4f-rlvhz 0/1 Running 1 6m52s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kyverno-svc ClusterIP 10.96.161.152 443/TCP 8m56s
service/kyverno-svc-metrics ClusterIP 10.96.178.68 8000/TCP 8m57s

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/kyverno 0/1 1 0 8m56s

NAME DESIRED CURRENT READY AGE
replicaset.apps/kyverno-66548b7b4f 1 1 0 6m55s

➜ kubectl --context kind-test -n kyverno get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kyverno-66548b7b4f-rlvhz 0/1 Running 1 6m59s 10.244.0.12 test-control-plane

➜ kyverno git: kubectl -n kyverno describe endpoints --context kind-test
Name: kyverno-svc
Namespace: kyverno
Labels: app=kyverno
app.kubernetes.io/component=kyverno
app.kubernetes.io/instance=kyverno
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=kyverno
app.kubernetes.io/part-of=kyverno
app.kubernetes.io/version=1.4.5
helm.sh/chart=kyverno-1.4.5
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2021-07-22T06:38:53Z
Subsets:
Events:

Name: kyverno-svc-metrics
Namespace: kyverno
Labels: app=kyverno
app.kubernetes.io/component=kyverno
app.kubernetes.io/instance=kyverno
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=kyverno
app.kubernetes.io/part-of=kyverno
app.kubernetes.io/version=1.4.5
helm.sh/chart=kyverno-1.4.5
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2021-07-22T06:38:52Z
Subsets:
Events:

➜ kyverno git: kubectl -n kyverno --context kind-test get ep -n kyverno
NAME ENDPOINTS AGE
kyverno-svc 7m56s
kyverno-svc-metrics 7m56s

➜ kyverno git: kubectl -n kyverno --context kind-test describe pod/kyverno-66548b7b4f-rlvhz
Name: kyverno-66548b7b4f-rlvhz
Namespace: kyverno
Priority: 0
Node: test-control-plane/172.18.0.4
Start Time: Thu, 22 Jul 2021 12:15:45 +0530
Labels: app=kyverno
app.kubernetes.io/component=kyverno
app.kubernetes.io/instance=kyverno
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=kyverno
app.kubernetes.io/part-of=kyverno
app.kubernetes.io/version=1.4.5
helm.sh/chart=kyverno-1.4.5
pod-template-hash=66548b7b4f
Annotations:
Status: Running
IP: 10.244.0.12
IPs:
IP: 10.244.0.12
Controlled By: ReplicaSet/kyverno-66548b7b4f
Init Containers:
kyverno-pre:
Container ID: containerd://6396f74e6ad9b77a674c297e08fbf76bb08ded8a8d7c76b5c89af65aa70f2b5e
Image: ghcr.io/kyverno/kyvernopre:v1.4.1
Image ID: ghcr.io/kyverno/kyvernopre@sha256:d81c7caee2c0f7b0f8a10f57d4041d51602003b658ce75b5815018a44d746e04
Port:
Host Port:
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 22 Jul 2021 12:15:56 +0530
Finished: Thu, 22 Jul 2021 12:16:16 +0530
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 256Mi
Requests:
cpu: 10m
memory: 64Mi
Environment:
KYVERNO_NAMESPACE: kyverno (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cglmz (ro)
Containers:
kyverno:
Container ID: containerd://c0cf182b82cd577dfb5cb49b0e37d2f66d9aa60e0e95eeb54c0b76e08007d243
Image: ghcr.io/kyverno/kyverno:v1.4.1
Image ID: ghcr.io/kyverno/kyverno@sha256:24107c0eb18d43ee137b30306d35c160be613dd9ee4126dd59ef6c6ebe581b37
Ports: 9443/TCP, 8000/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Thu, 22 Jul 2021 12:18:30 +0530
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 22 Jul 2021 12:16:59 +0530
Finished: Thu, 22 Jul 2021 12:18:14 +0530
Ready: False
Restart Count: 2
Limits:
memory: 256Mi
Requests:
cpu: 100m
memory: 50Mi
Liveness: http-get https://:9443/health/liveness delay=60s timeout=30s period=60s #success=1 #failure=2
Readiness: http-get https://:9443/health/readiness delay=60s timeout=30s period=60s #success=1 #failure=6
Environment:
INIT_CONFIG: kyverno
KYVERNO_NAMESPACE: kyverno (v1:metadata.namespace)
KYVERNO_SVC: kyverno-svc
KYVERNO_DEPLOYMENT: kyverno
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-cglmz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-cglmz:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

➜ kyverno git: kubectl -n kyverno --context kind-test logs pod/kyverno-66548b7b4f-rlvhz
I0722 06:52:27.409466 1 version.go:17] "msg"="Kyverno" "Version"="v1.4.1"
I0722 06:52:27.410357 1 version.go:18] "msg"="Kyverno" "BuildHash"="(HEAD/f9a89c467289506ae296c46ef114a2e37adb80c4"
I0722 06:52:27.410733 1 version.go:19] "msg"="Kyverno" "BuildTime"="2021-06-24_10:15:40PM"
I0722 06:52:27.423905 1 config.go:92] CreateClientConfig "msg"="Using in-cluster configuration"
I0722 06:52:27.425198 1 main.go:120] setup "msg"="Enable exposure of metrics, see details at https://github.com/kyverno/kyverno/wiki/Metrics-Kyverno-on-Kubernetes" "port"="8000"
I0722 06:52:28.092384 1 util.go:86] "msg"="CRD found" "gvr"="kyverno.io/v1, Resource=clusterpolicies"
I0722 06:52:28.099509 1 util.go:86] "msg"="CRD found" "gvr"="wgpolicyk8s.io/v1alpha1, Resource=clusterpolicyreports"
I0722 06:52:28.100818 1 util.go:86] "msg"="CRD found" "gvr"="wgpolicyk8s.io/v1alpha1, Resource=policyreports"
I0722 06:52:28.102902 1 util.go:86] "msg"="CRD found" "gvr"="kyverno.io/v1alpha1, Resource=clusterreportchangerequests"
I0722 06:52:28.111192 1 util.go:86] "msg"="CRD found" "gvr"="kyverno.io/v1alpha1, Resource=reportchangerequests"
I0722 06:52:28.133041 1 reflector.go:219] Starting reflector unstructured.Unstructured (15m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:52:28.241466 1 reflector.go:219] Starting reflector unstructured.Unstructured (15m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:52:28.498873 1 reflector.go:219] Starting reflector unstructured.Unstructured (15m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:52:28.600046 1 reflector.go:219] Starting reflector unstructured.Unstructured (15m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:52:28.703147 1 dynamicconfig.go:316] ConfigData "msg"="Init resource " "excludeRoles"=""
I0722 06:52:28.713808 1 leaderelection.go:243] attempting to acquire leader lease kyverno/webhook-register...
I0722 06:52:28.758728 1 leaderelection.go:113] webhookRegister/LeaderElection "msg"="another instance has been elected as leader" "current id"="kyverno-66548b7b4f-rlvhz_1f405e16-9e38-40dc-805c-878f90d096fa" "leader"="kyverno-66548b7b4f-rlvhz_fa64c7a5-950b-44dd-bba4-1505484d4e66"
I0722 06:52:35.917118 1 certmanager.go:108] CertManager "msg"="read TLS pem pair from the secret"
I0722 06:52:36.003019 1 leaderelection.go:243] attempting to acquire leader lease kyverno/kyverno...
I0722 06:52:36.052421 1 reportrequest.go:180] ReportChangeRequestGenerator "msg"="start"
I0722 06:52:36.056139 1 controller.go:112] EventGenerator "msg"="start"
I0722 06:52:36.072657 1 reflector.go:219] Starting reflector unstructured.Unstructured (15m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:52:36.073681 1 informer.go:109] PolicyCacheController "msg"="starting"
I0722 06:52:36.058623 1 reflector.go:219] Starting reflector v1alpha1.ReportChangeRequest (1h0m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:52:36.058879 1 reflector.go:219] Starting reflector v1alpha1.ClusterPolicyReport (1h0m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:52:36.059083 1 reflector.go:219] Starting reflector v1alpha1.PolicyReport (1h0m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:52:36.059161 1 reflector.go:219] Starting reflector v1.GenerateRequest (1h0m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:52:36.059241 1 reflector.go:219] Starting reflector v1.ClusterPolicy (1h0m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:52:36.059340 1 reflector.go:219] Starting reflector v1.Policy (1h0m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:52:36.059425 1 reflector.go:219] Starting reflector v1alpha1.ClusterReportChangeRequest (1h0m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:52:36.059502 1 reflector.go:219] Starting reflector v1.Secret (15m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:52:36.059581 1 reflector.go:219] Starting reflector v1.Role (15m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:52:36.059658 1 reflector.go:219] Starting reflector v1.ClusterRole (15m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:52:36.059736 1 reflector.go:219] Starting reflector v1.Namespace (15m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:52:36.059816 1 reflector.go:219] Starting reflector v1.ConfigMap (15m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:52:36.059884 1 reflector.go:219] Starting reflector v1.RoleBinding (15m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:52:36.060029 1 reflector.go:219] Starting reflector v1.ClusterRoleBinding (15m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:52:36.122469 1 leaderelection.go:113] kyverno/LeaderElection "msg"="another instance has been elected as leader" "current id"="kyverno-66548b7b4f-rlvhz_db734a82-95fa-4150-b819-185a9f1018bb" "leader"="kyverno-66548b7b4f-rlvhz_23f0d0ae-99d4-4d98-9562-743a0ae2eebb"
I0722 06:52:39.374578 1 dynamicconfig.go:241] ConfigData "msg"="Updated resource filters" "name"="kyverno" "namespace"="kyverno" "newFilters"=[{"Kind":"Event","Namespace":"
","Name":"
"},{"Kind":"
","Namespace":"kube-system","Name":"
"},{"Kind":"
","Namespace":"kube-public","Name":"
"},{"Kind":"
","Namespace":"kube-node-lease","Name":"
"},{"Kind":"Node","Namespace":"
","Name":"
"},{"Kind":"APIService","Namespace":"
","Name":"
"},{"Kind":"TokenReview","Namespace":"
","Name":"
"},{"Kind":"SubjectAccessReview","Namespace":"
","Name":"
"},{"Kind":"SelfSubjectAccessReview","Namespace":"
","Name":"
"},{"Kind":"
","Namespace":"kyverno","Name":""},{"Kind":"Binding","Namespace":"","Name":""},{"Kind":"ReplicaSet","Namespace":"","Name":""},{"Kind":"ReportChangeRequest","Namespace":"","Name":""},{"Kind":"ClusterReportChangeRequest","Namespace":"","Name":"*"}] "oldFilters"=null
I0722 06:52:41.771850 1 server.go:526] WebhookServer "msg"="starting service"
I0722 06:52:47.287623 1 leaderelection.go:253] successfully acquired lease kyverno/webhook-register
I0722 06:52:47.401479 1 leaderelection.go:94] webhookRegister/LeaderElection "msg"="started leading" "id"="kyverno-66548b7b4f-rlvhz_1f405e16-9e38-40dc-805c-878f90d096fa"
I0722 06:52:48.177439 1 certRenewer.go:78] CertRenewer/InitTLSPemPair "msg"="using existing TLS key/certificate pair"
I0722 06:52:54.510509 1 leaderelection.go:253] successfully acquired lease kyverno/kyverno
I0722 06:52:54.570715 1 leaderelection.go:94] kyverno/LeaderElection "msg"="started leading" "id"="kyverno-66548b7b4f-rlvhz_db734a82-95fa-4150-b819-185a9f1018bb"
I0722 06:52:54.595575 1 controller.go:222] GenerateCleanUpController "msg"="starting"
I0722 06:52:54.597575 1 validate_controller.go:545] PolicyController "msg"="starting"
I0722 06:52:54.617173 1 reportcontroller.go:195] PolicyReportGenerator "msg"="start"
I0722 06:52:54.618529 1 certmanager.go:129] CertManager "msg"="start managing certificate"
I0722 06:52:54.633994 1 validate_controller.go:261] PolicyController "msg"="policy created" "kind"="ClusterPolicy" "name"="restrict-sysctls" "uid"="f1047c4e-2f00-46c3-88b3-8af5ccb9d9a3"
I0722 06:52:56.095920 1 validate_controller.go:261] PolicyController "msg"="policy created" "kind"="ClusterPolicy" "name"="disallow-host-path" "uid"="4f2b404b-9877-4817-91ea-77b506b27aee"
I0722 06:52:56.241455 1 reflector.go:219] Starting reflector *unstructured.Unstructured (15m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:52:57.484371 1 validate_controller.go:261] PolicyController "msg"="policy created" "kind"="ClusterPolicy" "name"="disallow-host-ports" "uid"="ce6e8f51-7518-4cec-b981-beac525a1196"
I0722 06:52:58.980920 1 reflector.go:219] Starting reflector *unstructured.Unstructured (15m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:53:01.713255 1 reflector.go:219] Starting reflector *unstructured.Unstructured (15m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:53:02.846041 1 validate_controller.go:261] PolicyController "msg"="policy created" "kind"="ClusterPolicy" "name"="disallow-privileged-containers" "uid"="48e0274e-efef-43c2-a443-99974eb102c6"
I0722 06:53:03.305598 1 reflector.go:219] Starting reflector *unstructured.Unstructured (15m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:53:03.468442 1 validate_controller.go:261] PolicyController "msg"="policy created" "kind"="ClusterPolicy" "name"="restrict-apparmor-profiles" "uid"="51da047b-884b-4b16-a1d9-ba9109b7ffae"
I0722 06:53:03.546633 1 reflector.go:219] Starting reflector *unstructured.Unstructured (15m0s) from pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167
I0722 06:53:04.003946 1 validate_controller.go:261] PolicyController "msg"="policy created" "kind"="ClusterPolicy" "name"="require-default-proc-mount" "uid"="68e00ef4-0390-458e-9a44-98c2e7688840"
E0722 06:53:09.352341 1 monitor.go:160] WebhookMonitor/registerWebhookIfNotPresent "msg"="missing webhooks" "error"="mutatingwebhookconfigurations.admissionregistration.k8s.io "kyverno-verify-mutating-webhook-cfg" not found"
I0722 06:53:10.318399 1 validate_controller.go:261] PolicyController "msg"="policy created" "kind"="ClusterPolicy" "name"="disallow-add-capabilities" "uid"="341b08ee-35f7-4267-a54c-48ebe70c5f86"
I0722 06:53:12.003975 1 validate_controller.go:261] PolicyController "msg"="policy created" "kind"="ClusterPolicy" "name"="disallow-host-namespaces" "uid"="8b0887c9-3cd3-4934-9233-623263a3122e"
E0722 06:53:12.658831 1 monitor.go:91] WebhookMonitor "msg"="" "error"="failed to register webhooks: Endpoint not ready"
I0722 06:53:19.314834 1 validate_controller.go:261] PolicyController "msg"="policy created" "kind"="ClusterPolicy" "name"="disallow-root-user-access" "uid"="3247e522-8b05-456f-989b-409681ff6d87"
E0722 06:53:19.735772 1 main.go:363] setup "msg"="Timeout registering admission control webhooks" "error"=null

@om3171991
Copy link
Author

@realshuting - did you able to look and find something on this ?

@realshuting realshuting self-assigned this Jul 22, 2021
@realshuting realshuting added this to the Kyverno Release 1.5.0 milestone Jul 22, 2021
@realshuting
Copy link
Member

No @om3171991 , I'm currently focusing on 1.4.2 milestone completion, will revisit this issue after.

@chipzoller
Copy link
Contributor

@om3171991 I just install KinD 0.11.1 and Kyverno 1.4.1 from the upstream Helm repo on MacOS 10.14.6 running Docker Desktop for Mac v3.5.2 and I do not see an issue running Kyverno.

$ k get validatingwebhookconfigurations,mutatingwebhookconfigurations
NAME                                                                                                  WEBHOOKS   AGE
validatingwebhookconfiguration.admissionregistration.k8s.io/kyverno-policy-validating-webhook-cfg     1          5m35s
validatingwebhookconfiguration.admissionregistration.k8s.io/kyverno-resource-validating-webhook-cfg   1          5m34s

NAME                                                                                              WEBHOOKS   AGE
mutatingwebhookconfiguration.admissionregistration.k8s.io/kyverno-policy-mutating-webhook-cfg     1          5m34s
mutatingwebhookconfiguration.admissionregistration.k8s.io/kyverno-resource-mutating-webhook-cfg   1          5m34s
mutatingwebhookconfiguration.admissionregistration.k8s.io/kyverno-verify-mutating-webhook-cfg     1          5m35s
$ k -n kyverno get po
NAME                       READY   STATUS    RESTARTS   AGE
kyverno-6578fddf4c-z6sfb   1/1     Running   0          6m35s
$ k -n kyverno logs -l app.kubernetes.io/name=kyverno
I0723 20:04:49.196644       1 registration.go:342] Register "msg"="created webhook"  "kind"="MutatingWebhookConfiguration" "name"="kyverno-policy-mutating-webhook-cfg"
I0723 20:04:49.410920       1 validate_controller.go:312] PolicyController "msg"="updating policy"  "name"="disallow-add-capabilities"
I0723 20:04:49.598505       1 registration.go:296] Register "msg"="created webhook" "kind"="ValidatingWebhookConfiguration" "name"="kyverno-resource-validating-webhook-cfg"
I0723 20:04:49.799486       1 registration.go:270] Register "msg"="created webhook" "kind"="MutatingWebhookConfiguration" "name"="kyverno-resource-mutating-webhook-cfg"
I0723 20:04:49.799962       1 registration.go:162] Register/UpdateWebhookConfigurations "msg"="received the signal to update webhook configurations"
I0723 20:04:49.996846       1 registration.go:184] Register/UpdateWebhookConfigurations "msg"="successfully updated mutatingWebhookConfigurations"  "name"="kyverno-resource-mutating-webhook-cfg"
I0723 20:04:50.397525       1 registration.go:191] Register/UpdateWebhookConfigurations "msg"="successfully updated validatingWebhookConfigurations"  "name"="kyverno-resource-validating-webhook-cfg"
I0723 20:05:10.857697       1 monitor.go:184] WebhookMonitor/lastRequestTimeFromAnnotation "msg"="timestamp not set in the annotation, setting"
I0723 20:05:10.885382       1 monitor.go:100] WebhookMonitor "msg"="initialized lastRequestTimestamp"  "time"="2021-07-23T20:05:10.880171794Z"
I0723 20:05:40.844075       1 status.go:75] WebhookMonitor/WebhookStatusControl "msg"="updating deployment annotation" "name"="kyverno" "namespace"="kyverno" "key"="kyverno.io/webhookActive" "val"="true"

@om3171991
Copy link
Author

@chipzoller - really appreciate that you tried on local setup but now I am totally confused why is it not working on my system. Any pointers what else I can check to fix this.

@chipzoller
Copy link
Contributor

I would clean up your setup by removing the KinD node image(s) you have and updating or reinstalling whatever you're using as an underlying engine (like Docker Desktop for Mac). Because Kyverno definitely does work on this combination, it's something specific to your setup.

@realshuting realshuting added the end user This label is used to track the issue that is raised by the end user. label Aug 4, 2021
@nlamirault
Copy link
Contributor

I've got the same error on my RPI cluster. I change the initialDelaySeconds of ReadinessProbe to 30, like @realshuting said.
I've got this crash after :

❯ kubectl -n kyverno logs kyverno-6cddd6dbc8-t4vtl -f                                                                                                                                                               
I1008 10:08:27.980483       1 version.go:17]  "msg"="Kyverno"  "Version"="v1.4.2"                                                                                                                                   
I1008 10:08:27.980564       1 version.go:18]  "msg"="Kyverno"  "BuildHash"="(HEAD/fb6e0f18ea89c9b60c604e5135f38040fafbc1e4"                                                                                         
I1008 10:08:27.980603       1 version.go:19]  "msg"="Kyverno"  "BuildTime"="2021-08-11_08:24:18PM"                                                                                                                  
I1008 10:08:28.067535       1 config.go:92] CreateClientConfig "msg"="Using in-cluster configuration"                                                                                                               
I1008 10:08:28.069027       1 main.go:122] setup "msg"="enabling metrics service"  "address"=":8000"                                                                                                                
I1008 10:08:30.272070       1 util.go:86]  "msg"="CRD found"  "gvr"="kyverno.io/v1, Resource=clusterpolicies"                                                                                                       
I1008 10:08:30.273264       1 util.go:86]  "msg"="CRD found"  "gvr"="wgpolicyk8s.io/v1alpha2, Resource=clusterpolicyreports"                                                                                        
I1008 10:08:30.274480       1 util.go:86]  "msg"="CRD found"  "gvr"="wgpolicyk8s.io/v1alpha2, Resource=policyreports"                                                                                               
I1008 10:08:30.367840       1 util.go:86]  "msg"="CRD found"  "gvr"="kyverno.io/v1alpha2, Resource=clusterreportchangerequests"                                                                                     
I1008 10:08:30.369094       1 util.go:86]  "msg"="CRD found"  "gvr"="kyverno.io/v1alpha2, Resource=reportchangerequests"                                                                                            
I1008 10:08:32.774341       1 dynamicconfig.go:343] ConfigData "msg"="Init resource "  "excludeRoles"=""                                                                                                            
I1008 10:08:32.780043       1 leaderelection.go:243] attempting to acquire leader lease kyverno/webhook-register...                                                                                                 
I1008 10:08:33.271678       1 leaderelection.go:113] webhookRegister/LeaderElection "msg"="another instance has been elected as leader" "current id"="kyverno-6cddd6dbc8-t4vtl_ed1ecdc6-1c09-4ab2-9e65-c53c35033e15"
 "leader"="kyverno-56bbb45b4b-8hwmp_4bd0c4e9-b112-4dbb-805f-5d3953db62c7"                                                                                                                                           
I1008 10:08:56.263093       1 leaderelection.go:253] successfully acquired lease kyverno/webhook-register                                                                                                           
I1008 10:08:56.263581       1 leaderelection.go:94] webhookRegister/LeaderElection "msg"="started leading" "id"="kyverno-6cddd6dbc8-t4vtl_ed1ecdc6-1c09-4ab2-9e65-c53c35033e15"                                     
I1008 10:08:56.398355       1 certRenewer.go:78] CertRenewer/InitTLSPemPair "msg"="using existing TLS key/certificate pair"                                                                                         
I1008 10:09:08.668370       1 certmanager.go:108] CertManager "msg"="read TLS pem pair from the secret"                                                                                                             
I1008 10:09:08.673908       1 leaderelection.go:243] attempting to acquire leader lease kyverno/kyverno...                                                                                                          
I1008 10:09:08.674705       1 reportrequest.go:178] ReportChangeRequestGenerator "msg"="start"                                                                                                                      
I1008 10:09:08.674825       1 controller.go:118] EventGenerator "msg"="start"                                                                                                                                       
I1008 10:09:08.674937       1 informer.go:109] PolicyCacheController "msg"="starting"                                                                                                                               
I1008 10:09:08.967987       1 leaderelection.go:113] kyverno/LeaderElection "msg"="another instance has been elected as leader" "current id"="kyverno-6cddd6dbc8-t4vtl_866f3d36-7b8e-4464-9ce5-7eb431a155c2" "leader
"="kyverno-56bbb45b4b-txh4j_7e203fad-f061-41de-a84b-56f93bbad915"                                                                                                                                                   
I1008 10:09:09.874722       1 server.go:581] WebhookServer "msg"="starting service"                                                                                                                                 
I1008 10:09:10.375871       1 dynamicconfig.go:251] ConfigData "msg"="Updated resource filters" "name"="kyverno" "namespace"="kyverno" "newFilters"=[{"Kind":"Event","Namespace":"*","Name":"*"},{"Kind":"*","Namesp
ace":"kube-system","Name":"*"},{"Kind":"*","Namespace":"kube-public","Name":"*"},{"Kind":"*","Namespace":"kube-node-lease","Name":"*"},{"Kind":"Node","Namespace":"*","Name":"*"},{"Kind":"APIService","Namespace":"
*","Name":"*"},{"Kind":"TokenReview","Namespace":"*","Name":"*"},{"Kind":"SubjectAccessReview","Namespace":"*","Name":"*"},{"Kind":"SelfSubjectAccessReview","Namespace":"*","Name":"*"},{"Kind":"*","Namespace":"ky
verno","Name":"*"},{"Kind":"Binding","Namespace":"*","Name":"*"},{"Kind":"ReplicaSet","Namespace":"*","Name":"*"},{"Kind":"ReportChangeRequest","Namespace":"*","Name":"*"},{"Kind":"ClusterReportChangeRequest","Na
mespace":"*","Name":"*"}] "oldFilters"=null                                                                                                                                                                         
I1008 10:09:23.570264       1 registration.go:607] Register "msg"="Endpoint ready"  "name"="kyverno-svc" "ns"="kyverno"                                                                                             
I1008 10:09:24.368088       1 registration.go:364] Register "msg"="created webhook"  "kind"="MutatingWebhookConfiguration" "name"="kyverno-verify-mutating-webhook-cfg"                                             
I1008 10:09:24.568140       1 registration.go:319] Register "msg"="created webhook"  "kind"="ValidatingWebhookConfiguration" "name"="kyverno-policy-validating-webhook-cfg"                                         
I1008 10:09:24.749578       1 registration.go:342] Register "msg"="created webhook"  "kind"="MutatingWebhookConfiguration" "name"="kyverno-policy-mutating-webhook-cfg"                                             
I1008 10:09:25.168188       1 registration.go:296] Register "msg"="created webhook" "kind"="ValidatingWebhookConfiguration" "name"="kyverno-resource-validating-webhook-cfg"                                        
I1008 10:09:25.368005       1 registration.go:270] Register "msg"="created webhook" "kind"="MutatingWebhookConfiguration" "name"="kyverno-resource-mutating-webhook-cfg"                                            
I1008 10:09:25.368267       1 registration.go:162] Register/UpdateWebhookConfigurations "msg"="received the signal to update webhook configurations"                                                                
E1008 10:09:25.368428       1 registration.go:181] Register/UpdateWebhookConfigurations "msg"="unable to update mutatingWebhookConfigurations" "error"="unable to get mutatingWebhookConfigurations: mutatingwebhook
configurations.admissionregistration.k8s.io \"kyverno-resource-mutating-webhook-cfg\" not found"  "name"="kyverno-resource-mutating-webhook-cfg"                                                                    
I1008 10:09:25.568934       1 registration.go:191] Register/UpdateWebhookConfigurations "msg"="successfully updated validatingWebhookConfigurations"  "name"="kyverno-resource-validating-webhook-cfg"              
I1008 10:09:25.569183       1 registration.go:162] Register/UpdateWebhookConfigurations "msg"="received the signal to update webhook configurations"  
I1008 10:09:25.898400       1 registration.go:184] Register/UpdateWebhookConfigurations "msg"="successfully updated mutatingWebhookConfigurations"  "name"="kyverno-resource-mutating-webhook-cfg"
I1008 10:09:26.168905       1 registration.go:191] Register/UpdateWebhookConfigurations "msg"="successfully updated validatingWebhookConfigurations"  "name"="kyverno-resource-validating-webhook-cfg"
I1008 10:09:28.368534       1 leaderelection.go:253] successfully acquired lease kyverno/kyverno
I1008 10:09:28.369935       1 leaderelection.go:94] kyverno/LeaderElection "msg"="started leading" "id"="kyverno-6cddd6dbc8-t4vtl_866f3d36-7b8e-4464-9ce5-7eb431a155c2" 
I1008 10:09:28.370259       1 controller.go:247] GenerateCleanUpController "msg"="starting"  
I1008 10:09:28.467643       1 validate_controller.go:553] PolicyController "msg"="starting"  
I1008 10:09:28.467995       1 reportcontroller.go:195] PolicyReportGenerator "msg"="start"  
I1008 10:09:28.468806       1 validate_controller.go:262] PolicyController "msg"="policy created"  "kind"="ClusterPolicy" "name"="portefaix-m0001" "uid"="6f2e9c8d-4681-4ac8-b3c0-5e0328b0c3bb"                     
I1008 10:09:36.027050       1 validate_controller.go:262] PolicyController "msg"="policy created"  "kind"="ClusterPolicy" "name"="disallow-host-namespaces" "uid"="c13bb892-1dfc-4e9a-b276-7b844aa048ba"            
I1008 10:09:37.769703       1 validate_controller.go:262] PolicyController "msg"="policy created"  "kind"="ClusterPolicy" "name"="portefaix-c0005" "uid"="7054243f-4092-406a-9d8d-567cb6e46487"                     
I1008 10:09:39.874510       1 trace.go:205] Trace[340007387]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.21.3/tools/cache/reflector.go:167 (08-Oct-2021 10:09:08.768) (total time: 31105ms):
Trace[340007387]: ---"Objects listed" 31104ms (10:09:00.873)
Trace[340007387]: [31.105488292s] [31.105488292s] END 
I1008 10:09:40.068197       1 certmanager.go:129] CertManager "msg"="start managing certificate"  
I1008 10:09:40.882178       1 validate_controller.go:262] PolicyController "msg"="policy created"  "kind"="ClusterPolicy" "name"="portefaix-p0004" "uid"="33550802-d4eb-4b05-b5dc-7834211cad26"
I1008 10:09:43.470697       1 validate_controller.go:262] PolicyController "msg"="policy created"  "kind"="ClusterPolicy" "name"="restrict-apparmor-profiles" "uid"="da392e66-13c8-44fc-94d8-b7b90bb9891d"
E1008 10:09:44.871737       1 runtime.go:78] Observed a panic: &runtime.TypeAssertionError{_interface:(*runtime._type)(0x14d6ac0), concrete:(*runtime._type)(nil), asserted:(*runtime._type)(0x144e900), missingMeth
od:""} (interface conversion: interface {} is nil, not string)
goroutine 5676 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x153e4c0, 0x4007508ed0)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/runtime/runtime.go:74 +0x84
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/runtime/runtime.go:48 +0x80
panic(0x153e4c0, 0x4007508ed0)
        /usr/local/go/src/runtime/panic.go:965 +0x154 
github.com/kyverno/kyverno/pkg/policyreport.updateSummary(0x40073f5570, 0x1, 0x1, 0x40005b3a28)
        /kyverno/pkg/policyreport/policyreport.go:184 +0x768
github.com/kyverno/kyverno/pkg/policyreport.updateResults(0x40072c7f20, 0x40072c7f20, 0x0, 0x0, 0x40073ce2d0, 0x4006458ab8, 0x4003e66e90, 0x12b71c8)
        /kyverno/pkg/policyreport/policyreport.go:101 +0x1a8
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).createReportIfNotPresent(0x400031c1e0, 0x40072a5950, 0x8, 0x4003e66e90, 0x13f6140, 0x40073f2b10, 0x0, 0x0, 0x400091eb40, 0x144e900)
        /kyverno/pkg/policyreport/reportcontroller.go:314 +0x84
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).syncHandler(0x400031c1e0, 0x40072a5950, 0x8, 0xaaf91b1d08bb4600, 0x400118d618, 0x1e0d8, 0x400118d678)
        /kyverno/pkg/policyreport/reportcontroller.go:294 +0x1e8
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).processNextWorkItem(0x400031c1e0, 0x1c400)
        /kyverno/pkg/policyreport/reportcontroller.go:250 +0x1bc
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).runWorker(...)
        /kyverno/pkg/policyreport/reportcontroller.go:232
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x40044cafa0)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:155 +0x64
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x40044cafa0, 0x1faf098, 0x40044ce510, 0x1, 0x40000bc600)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:156 +0x74
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x40044cafa0, 0x3b9aca00, 0x0, 0x40004f6201, 0x40000bc600)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:133 +0x88
k8s.io/apimachinery/pkg/util/wait.Until(0x40044cafa0, 0x3b9aca00, 0x40000bc600)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:90 +0x48
created by github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).Run
        /kyverno/pkg/policyreport/reportcontroller.go:225 +0x4c8
panic: interface conversion: interface {} is nil, not string [recovered]
        panic: interface conversion: interface {} is nil, not string
goroutine 5676 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/runtime/runtime.go:55 +0x108
panic(0x153e4c0, 0x4007508ed0)
        /usr/local/go/src/runtime/panic.go:965 +0x154 
github.com/kyverno/kyverno/pkg/policyreport.updateSummary(0x40073f5570, 0x1, 0x1, 0x40005b3a28)
        /kyverno/pkg/policyreport/policyreport.go:184 +0x768
github.com/kyverno/kyverno/pkg/policyreport.updateResults(0x40072c7f20, 0x40072c7f20, 0x0, 0x0, 0x40073ce2d0, 0x4006458ab8, 0x4003e66e90, 0x12b71c8)
        /kyverno/pkg/policyreport/policyreport.go:101 +0x1a8
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).createReportIfNotPresent(0x400031c1e0, 0x40072a5950, 0x8, 0x4003e66e90, 0x13f6140, 0x40073f2b10, 0x0, 0x0, 0x400091eb40, 0x144e900)
        /kyverno/pkg/policyreport/reportcontroller.go:314 +0x84
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).syncHandler(0x400031c1e0, 0x40072a5950, 0x8, 0xaaf91b1d08bb4600, 0x400118d618, 0x1e0d8, 0x400118d678)
        /kyverno/pkg/policyreport/reportcontroller.go:294 +0x1e8
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).processNextWorkItem(0x400031c1e0, 0x1c400)
        /kyverno/pkg/policyreport/reportcontroller.go:250 +0x1bc
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).runWorker(...)
        /kyverno/pkg/policyreport/reportcontroller.go:232
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x40044cafa0)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:155 +0x64
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x40044cafa0, 0x1faf098, 0x40044ce510, 0x1, 0x40000bc600)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:156 +0x74
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x40044cafa0, 0x3b9aca00, 0x0, 0x40004f6201, 0x40000bc600)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:133 +0x88
k8s.io/apimachinery/pkg/util/wait.Until(0x40044cafa0, 0x3b9aca00, 0x40000bc600)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:90 +0x48
created by github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).Run
        /kyverno/pkg/policyreport/reportcontroller.go:225 +0x4c8

@awoodobvio
Copy link

awoodobvio commented Dec 22, 2021

We have run into this problem ("Timeout registering admission control webhooks") on startup. After becoming the leader, the pod seems to execute on the policies and then fails with this message, shutting down. This means that Kyverno never gets to a stable state.

We aren't 100% sure (we aren't go experts), but evidence points to our issue being related to having Failed pods for reason "Shutdown". K8s doesn't clean these pods up automatically so you can examine them. When we have pods in this state for Kyverno, the new pods that are trying to start seem to get this error. Once we remove these Shutdown pods in the Kyverno namespace then Kyverno comes up and stabalizes.

My guess is that Kyverno is doing something with the list of pods and is failing to account for shutdown pods. For instance, I see evidence in the code that it is trying to get the IP for the pods, but shutdown pods will not have an IP since they are not scheduled on a node.

In our case, we are using Kyverno 1.5.1.

We are in GKE using preemptible nodes, which means that our nodes shutdown and recycle at least 1x a day and evict any pods that were scheduled on them. This causes those pods to enter this "Failed, Shutdown" state.

Pods in this state have the following information in them:

Status: Failed Reason: Shutdown Message: Node is shutting, evicting pods

Again, only circumstantial evidence so far - but it seems to be getting stronger every time we need to restart Kyverno (which is basically daily at this point).

@realshuting
Copy link
Member

realshuting commented Dec 23, 2021

After becoming the leader, the pod seems to execute on the policies and then fails with this message, shutting down.

@awoodsprim - can you share logs for all Kyverno pods? I need to understand why the pods shut down. Was Kyverno functioning before?

@realshuting
Copy link
Member

I've got the same error on my RPI cluster. I change the initialDelaySeconds of ReadinessProbe to 30, like @realshuting said. I've got this crash after :

❯ kubectl -n kyverno logs kyverno-6cddd6dbc8-t4vtl -f                                                                                                                                                               
I1008 10:08:27.980483       1 version.go:17]  "msg"="Kyverno"  "Version"="v1.4.2"                                                                                                                                   
I1008 10:08:27.980564       1 version.go:18]  "msg"="Kyverno"  "BuildHash"="(HEAD/fb6e0f18ea89c9b60c604e5135f38040fafbc1e4"                                                                                         
I1008 10:08:27.980603       1 version.go:19]  "msg"="Kyverno"  "BuildTime"="2021-08-11_08:24:18PM"                                                                                                                  
I1008 10:08:28.067535       1 config.go:92] CreateClientConfig "msg"="Using in-cluster configuration"                                                                                                               
I1008 10:08:28.069027       1 main.go:122] setup "msg"="enabling metrics service"  "address"=":8000"                                                                                                                
I1008 10:08:30.272070       1 util.go:86]  "msg"="CRD found"  "gvr"="kyverno.io/v1, Resource=clusterpolicies"                                                                                                       
I1008 10:08:30.273264       1 util.go:86]  "msg"="CRD found"  "gvr"="wgpolicyk8s.io/v1alpha2, Resource=clusterpolicyreports"                                                                                        
I1008 10:08:30.274480       1 util.go:86]  "msg"="CRD found"  "gvr"="wgpolicyk8s.io/v1alpha2, Resource=policyreports"                                                                                               
I1008 10:08:30.367840       1 util.go:86]  "msg"="CRD found"  "gvr"="kyverno.io/v1alpha2, Resource=clusterreportchangerequests"                                                                                     
I1008 10:08:30.369094       1 util.go:86]  "msg"="CRD found"  "gvr"="kyverno.io/v1alpha2, Resource=reportchangerequests"                                                                                            
I1008 10:08:32.774341       1 dynamicconfig.go:343] ConfigData "msg"="Init resource "  "excludeRoles"=""                                                                                                            
I1008 10:08:32.780043       1 leaderelection.go:243] attempting to acquire leader lease kyverno/webhook-register...                                                                                                 
I1008 10:08:33.271678       1 leaderelection.go:113] webhookRegister/LeaderElection "msg"="another instance has been elected as leader" "current id"="kyverno-6cddd6dbc8-t4vtl_ed1ecdc6-1c09-4ab2-9e65-c53c35033e15"
 "leader"="kyverno-56bbb45b4b-8hwmp_4bd0c4e9-b112-4dbb-805f-5d3953db62c7"                                                                                                                                           
I1008 10:08:56.263093       1 leaderelection.go:253] successfully acquired lease kyverno/webhook-register                                                                                                           
I1008 10:08:56.263581       1 leaderelection.go:94] webhookRegister/LeaderElection "msg"="started leading" "id"="kyverno-6cddd6dbc8-t4vtl_ed1ecdc6-1c09-4ab2-9e65-c53c35033e15"                                     
I1008 10:08:56.398355       1 certRenewer.go:78] CertRenewer/InitTLSPemPair "msg"="using existing TLS key/certificate pair"                                                                                         
I1008 10:09:08.668370       1 certmanager.go:108] CertManager "msg"="read TLS pem pair from the secret"                                                                                                             
I1008 10:09:08.673908       1 leaderelection.go:243] attempting to acquire leader lease kyverno/kyverno...                                                                                                          
I1008 10:09:08.674705       1 reportrequest.go:178] ReportChangeRequestGenerator "msg"="start"                                                                                                                      
I1008 10:09:08.674825       1 controller.go:118] EventGenerator "msg"="start"                                                                                                                                       
I1008 10:09:08.674937       1 informer.go:109] PolicyCacheController "msg"="starting"                                                                                                                               
I1008 10:09:08.967987       1 leaderelection.go:113] kyverno/LeaderElection "msg"="another instance has been elected as leader" "current id"="kyverno-6cddd6dbc8-t4vtl_866f3d36-7b8e-4464-9ce5-7eb431a155c2" "leader
"="kyverno-56bbb45b4b-txh4j_7e203fad-f061-41de-a84b-56f93bbad915"                                                                                                                                                   
I1008 10:09:09.874722       1 server.go:581] WebhookServer "msg"="starting service"                                                                                                                                 
I1008 10:09:10.375871       1 dynamicconfig.go:251] ConfigData "msg"="Updated resource filters" "name"="kyverno" "namespace"="kyverno" "newFilters"=[{"Kind":"Event","Namespace":"*","Name":"*"},{"Kind":"*","Namesp
ace":"kube-system","Name":"*"},{"Kind":"*","Namespace":"kube-public","Name":"*"},{"Kind":"*","Namespace":"kube-node-lease","Name":"*"},{"Kind":"Node","Namespace":"*","Name":"*"},{"Kind":"APIService","Namespace":"
*","Name":"*"},{"Kind":"TokenReview","Namespace":"*","Name":"*"},{"Kind":"SubjectAccessReview","Namespace":"*","Name":"*"},{"Kind":"SelfSubjectAccessReview","Namespace":"*","Name":"*"},{"Kind":"*","Namespace":"ky
verno","Name":"*"},{"Kind":"Binding","Namespace":"*","Name":"*"},{"Kind":"ReplicaSet","Namespace":"*","Name":"*"},{"Kind":"ReportChangeRequest","Namespace":"*","Name":"*"},{"Kind":"ClusterReportChangeRequest","Na
mespace":"*","Name":"*"}] "oldFilters"=null                                                                                                                                                                         
I1008 10:09:23.570264       1 registration.go:607] Register "msg"="Endpoint ready"  "name"="kyverno-svc" "ns"="kyverno"                                                                                             
I1008 10:09:24.368088       1 registration.go:364] Register "msg"="created webhook"  "kind"="MutatingWebhookConfiguration" "name"="kyverno-verify-mutating-webhook-cfg"                                             
I1008 10:09:24.568140       1 registration.go:319] Register "msg"="created webhook"  "kind"="ValidatingWebhookConfiguration" "name"="kyverno-policy-validating-webhook-cfg"                                         
I1008 10:09:24.749578       1 registration.go:342] Register "msg"="created webhook"  "kind"="MutatingWebhookConfiguration" "name"="kyverno-policy-mutating-webhook-cfg"                                             
I1008 10:09:25.168188       1 registration.go:296] Register "msg"="created webhook" "kind"="ValidatingWebhookConfiguration" "name"="kyverno-resource-validating-webhook-cfg"                                        
I1008 10:09:25.368005       1 registration.go:270] Register "msg"="created webhook" "kind"="MutatingWebhookConfiguration" "name"="kyverno-resource-mutating-webhook-cfg"                                            
I1008 10:09:25.368267       1 registration.go:162] Register/UpdateWebhookConfigurations "msg"="received the signal to update webhook configurations"                                                                
E1008 10:09:25.368428       1 registration.go:181] Register/UpdateWebhookConfigurations "msg"="unable to update mutatingWebhookConfigurations" "error"="unable to get mutatingWebhookConfigurations: mutatingwebhook
configurations.admissionregistration.k8s.io \"kyverno-resource-mutating-webhook-cfg\" not found"  "name"="kyverno-resource-mutating-webhook-cfg"                                                                    
I1008 10:09:25.568934       1 registration.go:191] Register/UpdateWebhookConfigurations "msg"="successfully updated validatingWebhookConfigurations"  "name"="kyverno-resource-validating-webhook-cfg"              
I1008 10:09:25.569183       1 registration.go:162] Register/UpdateWebhookConfigurations "msg"="received the signal to update webhook configurations"  
I1008 10:09:25.898400       1 registration.go:184] Register/UpdateWebhookConfigurations "msg"="successfully updated mutatingWebhookConfigurations"  "name"="kyverno-resource-mutating-webhook-cfg"
I1008 10:09:26.168905       1 registration.go:191] Register/UpdateWebhookConfigurations "msg"="successfully updated validatingWebhookConfigurations"  "name"="kyverno-resource-validating-webhook-cfg"
I1008 10:09:28.368534       1 leaderelection.go:253] successfully acquired lease kyverno/kyverno
I1008 10:09:28.369935       1 leaderelection.go:94] kyverno/LeaderElection "msg"="started leading" "id"="kyverno-6cddd6dbc8-t4vtl_866f3d36-7b8e-4464-9ce5-7eb431a155c2" 
I1008 10:09:28.370259       1 controller.go:247] GenerateCleanUpController "msg"="starting"  
I1008 10:09:28.467643       1 validate_controller.go:553] PolicyController "msg"="starting"  
I1008 10:09:28.467995       1 reportcontroller.go:195] PolicyReportGenerator "msg"="start"  
I1008 10:09:28.468806       1 validate_controller.go:262] PolicyController "msg"="policy created"  "kind"="ClusterPolicy" "name"="portefaix-m0001" "uid"="6f2e9c8d-4681-4ac8-b3c0-5e0328b0c3bb"                     
I1008 10:09:36.027050       1 validate_controller.go:262] PolicyController "msg"="policy created"  "kind"="ClusterPolicy" "name"="disallow-host-namespaces" "uid"="c13bb892-1dfc-4e9a-b276-7b844aa048ba"            
I1008 10:09:37.769703       1 validate_controller.go:262] PolicyController "msg"="policy created"  "kind"="ClusterPolicy" "name"="portefaix-c0005" "uid"="7054243f-4092-406a-9d8d-567cb6e46487"                     
I1008 10:09:39.874510       1 trace.go:205] Trace[340007387]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.21.3/tools/cache/reflector.go:167 (08-Oct-2021 10:09:08.768) (total time: 31105ms):
Trace[340007387]: ---"Objects listed" 31104ms (10:09:00.873)
Trace[340007387]: [31.105488292s] [31.105488292s] END 
I1008 10:09:40.068197       1 certmanager.go:129] CertManager "msg"="start managing certificate"  
I1008 10:09:40.882178       1 validate_controller.go:262] PolicyController "msg"="policy created"  "kind"="ClusterPolicy" "name"="portefaix-p0004" "uid"="33550802-d4eb-4b05-b5dc-7834211cad26"
I1008 10:09:43.470697       1 validate_controller.go:262] PolicyController "msg"="policy created"  "kind"="ClusterPolicy" "name"="restrict-apparmor-profiles" "uid"="da392e66-13c8-44fc-94d8-b7b90bb9891d"
E1008 10:09:44.871737       1 runtime.go:78] Observed a panic: &runtime.TypeAssertionError{_interface:(*runtime._type)(0x14d6ac0), concrete:(*runtime._type)(nil), asserted:(*runtime._type)(0x144e900), missingMeth
od:""} (interface conversion: interface {} is nil, not string)
goroutine 5676 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x153e4c0, 0x4007508ed0)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/runtime/runtime.go:74 +0x84
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/runtime/runtime.go:48 +0x80
panic(0x153e4c0, 0x4007508ed0)
        /usr/local/go/src/runtime/panic.go:965 +0x154 
github.com/kyverno/kyverno/pkg/policyreport.updateSummary(0x40073f5570, 0x1, 0x1, 0x40005b3a28)
        /kyverno/pkg/policyreport/policyreport.go:184 +0x768
github.com/kyverno/kyverno/pkg/policyreport.updateResults(0x40072c7f20, 0x40072c7f20, 0x0, 0x0, 0x40073ce2d0, 0x4006458ab8, 0x4003e66e90, 0x12b71c8)
        /kyverno/pkg/policyreport/policyreport.go:101 +0x1a8
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).createReportIfNotPresent(0x400031c1e0, 0x40072a5950, 0x8, 0x4003e66e90, 0x13f6140, 0x40073f2b10, 0x0, 0x0, 0x400091eb40, 0x144e900)
        /kyverno/pkg/policyreport/reportcontroller.go:314 +0x84
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).syncHandler(0x400031c1e0, 0x40072a5950, 0x8, 0xaaf91b1d08bb4600, 0x400118d618, 0x1e0d8, 0x400118d678)
        /kyverno/pkg/policyreport/reportcontroller.go:294 +0x1e8
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).processNextWorkItem(0x400031c1e0, 0x1c400)
        /kyverno/pkg/policyreport/reportcontroller.go:250 +0x1bc
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).runWorker(...)
        /kyverno/pkg/policyreport/reportcontroller.go:232
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x40044cafa0)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:155 +0x64
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x40044cafa0, 0x1faf098, 0x40044ce510, 0x1, 0x40000bc600)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:156 +0x74
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x40044cafa0, 0x3b9aca00, 0x0, 0x40004f6201, 0x40000bc600)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:133 +0x88
k8s.io/apimachinery/pkg/util/wait.Until(0x40044cafa0, 0x3b9aca00, 0x40000bc600)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:90 +0x48
created by github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).Run
        /kyverno/pkg/policyreport/reportcontroller.go:225 +0x4c8
panic: interface conversion: interface {} is nil, not string [recovered]
        panic: interface conversion: interface {} is nil, not string
goroutine 5676 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/runtime/runtime.go:55 +0x108
panic(0x153e4c0, 0x4007508ed0)
        /usr/local/go/src/runtime/panic.go:965 +0x154 
github.com/kyverno/kyverno/pkg/policyreport.updateSummary(0x40073f5570, 0x1, 0x1, 0x40005b3a28)
        /kyverno/pkg/policyreport/policyreport.go:184 +0x768
github.com/kyverno/kyverno/pkg/policyreport.updateResults(0x40072c7f20, 0x40072c7f20, 0x0, 0x0, 0x40073ce2d0, 0x4006458ab8, 0x4003e66e90, 0x12b71c8)
        /kyverno/pkg/policyreport/policyreport.go:101 +0x1a8
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).createReportIfNotPresent(0x400031c1e0, 0x40072a5950, 0x8, 0x4003e66e90, 0x13f6140, 0x40073f2b10, 0x0, 0x0, 0x400091eb40, 0x144e900)
        /kyverno/pkg/policyreport/reportcontroller.go:314 +0x84
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).syncHandler(0x400031c1e0, 0x40072a5950, 0x8, 0xaaf91b1d08bb4600, 0x400118d618, 0x1e0d8, 0x400118d678)
        /kyverno/pkg/policyreport/reportcontroller.go:294 +0x1e8
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).processNextWorkItem(0x400031c1e0, 0x1c400)
        /kyverno/pkg/policyreport/reportcontroller.go:250 +0x1bc
github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).runWorker(...)
        /kyverno/pkg/policyreport/reportcontroller.go:232
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x40044cafa0)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:155 +0x64
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x40044cafa0, 0x1faf098, 0x40044ce510, 0x1, 0x40000bc600)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:156 +0x74
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x40044cafa0, 0x3b9aca00, 0x0, 0x40004f6201, 0x40000bc600)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:133 +0x88
k8s.io/apimachinery/pkg/util/wait.Until(0x40044cafa0, 0x3b9aca00, 0x40000bc600)
        /go/pkg/mod/k8s.io/apimachinery@v0.21.3/pkg/util/wait/wait.go:90 +0x48
created by github.com/kyverno/kyverno/pkg/policyreport.(*ReportGenerator).Run
        /kyverno/pkg/policyreport/reportcontroller.go:225 +0x4c8

@nlamirault - this issue was fixed in 1.5.2 via #2757.

@realshuting
Copy link
Member

realshuting commented Dec 30, 2021

Hi @awoodsprim - I did a few tests and was able to reproduce the same issue using the Kind cluster. The issue is related to how the node shuts down. Here are the tests I did:

Test 1: running 1 Kyverno instance, remove a k8s node by "kubectl delete node "

With this test, the old pod was terminated and the new pod was rescheduled to kind-worker5 by k8s. Kyverno was functioning after it came up.

I1230 11:07:30.160148       1 client.go:243] dclient/Poll "msg"="stopping registered resources sync"
I1230 11:07:30.160402       1 certmanager.go:175] CertManager "msg"="stopping cert renewer"
I1230 11:07:30.160415       1 informer.go:118] PolicyCacheController "msg"="shutting down"
I1230 11:07:30.160426       1 reportrequest.go:192] ReportChangeRequestGenerator "msg"="shutting down"
I1230 11:07:30.160440       1 controller.go:129] EventGenerator "msg"="shutting down"
I1230 11:07:30.160643       1 generate_controller.go:150] GenerateController "msg"="shutting down"
I1230 11:07:30.160664       1 reportcontroller.go:267] PolicyReportGenerator "msg"="shutting down"
I1230 11:07:30.160756       1 policy_controller.go:460] PolicyController "msg"="shutting down"
I1230 11:07:30.160799       1 controller.go:274] GenerateCleanUpController "msg"="shutting down"
I1230 11:07:30.202363       1 leaderelection.go:103] kyverno/LeaderElection "msg"="leadership lost, stopped leading" "id"="kyverno-7bc64dc96b-vvczf_9c951dae-5fd4-475f-a4a1-9250b8794506"
I1230 11:07:30.206415       1 leaderelection.go:103] webhookRegister/LeaderElection "msg"="leadership lost, stopped leading" "id"="kyverno-7bc64dc96b-vvczf_5283dc55-549a-4057-872e-2a05f9003f3b"
I1230 11:07:30.212881       1 registration.go:269] Register/cleanupKyvernoResource "msg"="updating Kyverno Pod, won't clean up Kyverno resources"
I1230 11:07:30.212993       1 main.go:536] setup "msg"="Kyverno shutdown successful"
rpc error: code = NotFound desc = an error occurred when try to find container "283924b9bf5e2eaeeaf273ba900f88f8cb7d6e05e5ee55b1a5d0a17b993f2100": not found%
 ✗ kg -n kyverno pod -o wide -w
NAME                       READY   STATUS    RESTARTS   AGE     IP           NODE           NOMINATED NODE   READINESS GATES
kyverno-7bc64dc96b-vvczf   1/1     Running   0          2m56s   10.244.2.2   kind-worker4   <none>           <none>
kyverno-7bc64dc96b-vvczf   1/1     Terminating   0          4m1s    10.244.2.2   kind-worker4   <none>           <none>
kyverno-7bc64dc96b-vvczf   1/1     Terminating   0          4m1s    10.244.2.2   kind-worker4   <none>           <none>
kyverno-7bc64dc96b-btxmg   0/1     Pending       0          0s      <none>       <none>         <none>           <none>
kyverno-7bc64dc96b-btxmg   0/1     Pending       0          0s      <none>       kind-worker5   <none>           <none>
kyverno-7bc64dc96b-btxmg   0/1     Init:0/1      0          0s      <none>       kind-worker5   <none>           <none>
kyverno-7bc64dc96b-btxmg   0/1     Init:0/1      0          15s     10.244.5.2   kind-worker5   <none>           <none>
kyverno-7bc64dc96b-btxmg   0/1     PodInitializing   0          33s     10.244.5.2   kind-worker5   <none>           <none>
kyverno-7bc64dc96b-btxmg   0/1     Running           0          49s     10.244.5.2   kind-worker5   <none>           <none>
kyverno-7bc64dc96b-btxmg   1/1     Running           0          60s     10.244.5.2   kind-worker5   <none>           <none>

Test 2: running 1 Kyverno instance, remove a k8s node by "docker kill " (a Kind node is a container)

There was no log after the container was killed since docker kill kills the container immediately.

After a few seconds, the node became not ready and the pod remained running. This is because the kubelet on that node also died after docker kill, it can no longer update the pod status. Any operations on the matching resources (resources matched by configured policies) were rejected, see the below error message "connection refused". The only way to bring up a new kyverno instance is to delete mutatingwebhookconfigurations and validatingwebhookconfigurations to let the request pass through.

Events:
  Type     Reason        Age    From               Message
  ----     ------        ----   ----               -------
  Normal   Scheduled     4m5s   default-scheduler  Successfully assigned kyverno/kyverno-7bc64dc96b-btxmg to kind-worker5
  Normal   Pulling       4m5s   kubelet            Pulling image "ghcr.io/kyverno/kyvernopre:v1.5.2"
  Normal   Pulled        3m51s  kubelet            Successfully pulled image "ghcr.io/kyverno/kyvernopre:v1.5.2" in 13.650407899s
  Normal   Created       3m51s  kubelet            Created container kyverno-pre
  Normal   Started       3m51s  kubelet            Started container kyverno-pre
  Normal   Pulling       3m32s  kubelet            Pulling image "ghcr.io/kyverno/kyverno:v1.5.2"
  Normal   Pulled        3m17s  kubelet            Successfully pulled image "ghcr.io/kyverno/kyverno:v1.5.2" in 15.096770042s
  Normal   Created       3m17s  kubelet            Created container kyverno
  Normal   Started       3m17s  kubelet            Started container kyverno
  Warning  NodeNotReady  10s    node-controller    Node is not ready
✗ k run nginx --image=nginx:latest --image-pull-policy=IfNotPresent
Error from server (InternalError): Internal error occurred: failed calling webhook "mutate.kyverno.svc-fail": failed to call webhook: Post "https://kyverno-svc.kyverno.svc:443/mutate?timeout=10s": dial tcp 10.96.242.150:443: connect: connection refused

After I removed all webhook configurations, the pod got rescheduled but went into CrashLoopBackOff due to "Timeout registering admission control webhooks". And in this case, the Pod's IP was never registered in the endpoint (not sure why).

The reason we added endpoint check is to solve #1740.

✗ kg -n kyverno pod -o wide -w
NAME                       READY   STATUS    RESTARTS   AGE     IP           NODE           NOMINATED NODE   READINESS GATES
kyverno-7bc64dc96b-btxmg   1/1     Running   0          3m18s   10.244.5.2   kind-worker5   <none>           <none>
kyverno-7bc64dc96b-btxmg   1/1     Running   0          3m55s   10.244.5.2   kind-worker5   <none>           <none>
kyverno-7bc64dc96b-btxmg   1/1     Terminating   0          6m8s    10.244.5.2   kind-worker5   <none>           <none>
kyverno-7bc64dc96b-zdh6h   0/1     Pending       0          0s      <none>       <none>         <none>           <none>
kyverno-7bc64dc96b-zdh6h   0/1     Pending       0          0s      <none>       kind-worker3   <none>           <none>
kyverno-7bc64dc96b-zdh6h   0/1     Init:0/1      0          0s      <none>       kind-worker3   <none>           <none>
kyverno-7bc64dc96b-zdh6h   0/1     Init:0/1      0          12s     10.244.1.2   kind-worker3   <none>           <none>
kyverno-7bc64dc96b-zdh6h   0/1     PodInitializing   0          30s     10.244.1.2   kind-worker3   <none>           <none>
kyverno-7bc64dc96b-zdh6h   0/1     Running           0          48s     10.244.1.2   kind-worker3   <none>           <none>
kyverno-7bc64dc96b-zdh6h   1/1     Running           0          60s     10.244.1.2   kind-worker3   <none>           <none>
kyverno-7bc64dc96b-zdh6h   0/1     Error             0          94s     10.244.1.2   kind-worker3   <none>           <none>
kyverno-7bc64dc96b-zdh6h   0/1     Running           1 (2s ago)   95s     10.244.1.2   kind-worker3   <none>           <none>
kyverno-7bc64dc96b-zdh6h   1/1     Running           1 (7s ago)   100s    10.244.1.2   kind-worker3   <none>           <none>
kyverno-7bc64dc96b-zdh6h   0/1     Running           2 (0s ago)   2m1s    10.244.1.2   kind-worker3   <none>           <none>
kyverno-7bc64dc96b-zdh6h   1/1     Running           2 (9s ago)   2m10s   10.244.1.2   kind-worker3   <none>           <none>
kyverno-7bc64dc96b-zdh6h   0/1     Error             2 (31s ago)   2m32s   10.244.1.2   kind-worker3   <none>           <none>
kyverno-7bc64dc96b-zdh6h   0/1     CrashLoopBackOff   2 (9s ago)    2m40s   10.244.1.2   kind-worker3   <none>           <none>

The workaround is to run Kyverno with multiple replicas. After I scaled Kyverno up to 2 replicas, the issue was gone for test 2 (killed the node where kyverno leader/non-leader was scheduled on) and the cluster stayed healthy. Can you try the same and verify if it solves your issue?

@awoodobvio
Copy link

awoodobvio commented Jan 2, 2022 via email

@realshuting
Copy link
Member

Unfortunately in my case, multiple replicas doesn't help. All nodes are shutdown by GKE simultaneously and it results in the same behavior. (No replicas are left standing)

Again the issue relates to the shut down process. Was Kyverno scaled down to zero before shutdown?

If the Kyverno shuts down properly, it automatically cleans up all managed webhook configurations and won't impact the cluster.

@awoodobvio
Copy link

awoodobvio commented Jan 3, 2022 via email

@realshuting
Copy link
Member

In my test 2, the container got deleted immediately so the webhook configurations were not garbage collected by the Kyverno instances thus blocking all the rest of the admission requests.

Scaling down to 0 and then back to any number of replicas does not fix this situation after this happens - the pods come back up and immediately start to fail again.

I'm asking if it's possible to scale Kyverno down to 0 before nodes shut down.

The only thing that seems to fix it is removing the pods that are still present in the evicted status (Failed, Shutdown). Once that happens, Kyverno recovers.

If there's no way to terminate Kyverno gracefully, the alternative is to delete existing Kyverno pods when the cluster restarts as you mentioned above.

@awoodobvio
Copy link

awoodobvio commented Jan 3, 2022 via email

@realshuting
Copy link
Member

are you just verifying that at least one of the endpoints is pointing to the first pod from the listed pods?

Yes, the endpoint check was added to solve #1740.

I see there's an improvement that can be made when getting the pod's IP, I'll send a PR to check the pod's status when getting the IPs. Thanks for pointing this out.

@awoodobvio
Copy link

@realshuting Sounds great, happy we were able to track it down. Hopefully that additional check (along with us cleaning up shutdown pods) will fix our issue.

@realshuting realshuting mentioned this issue Jan 4, 2022
2 tasks
@realshuting
Copy link
Member

Hi @awoodsprim - I sent the improvement via #2902, here's the test image that you can verify ghcr.io/realshuting/kyverno:fix-endpoint. I also verified with test 2 and kyverno works properly.

Can you test the fix and let me know if it works? If not, can you attach the kubelet's log?

@awoodobvio
Copy link

Hi @realshuting we are going to put the docker image live in our dev cluster. It can take up to 24 hours to hit the race condition, but we will let you know if by tomorrow this happens again.

@awoodobvio
Copy link

@realshuting Preliminary analysis looks like it solved the problem. We were able to see the Kyverno pod startup with a shutdown pod in the API return array earlier and the pod was able to recover (once the leadership lease was expired).

We want it to go through one more cycle to be sure, but right now it looks good.

@realshuting
Copy link
Member

Great! The fix will be out in 1.5.3 in the following days. Let me know if you see any other issues.

@realshuting
Copy link
Member

Closing the issue, feel free to open if needed.

@roldyxoriginal
Copy link

Hi Team, I was using kyverno 1.7.3 and upgraded to 1.10.3, I removed absolutely everything and installed it from scratch. And I get the same error as the subject of this thread. I don't know how to repair it. I dont know why..but some policies was installed.

@chipzoller
Copy link
Contributor

Please open a new issue or discussion.

@soni-kanishk
Copy link

@chipzoller observing the same issue on upgrading from 1.7.5 to 1.10.5

@roldyxoriginal can you please share the issue link if you got a chance to open a new issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working end user This label is used to track the issue that is raised by the end user.
Projects
None yet
Development

No branches or pull requests

8 participants