Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

knative-serving v1.10.6 webhook pods failing due to "server key missing" missing error #15255

Closed
mukulgit123 opened this issue May 27, 2024 · 3 comments · Fixed by #15256 or knative-extensions/net-istio#1325
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Issues which should be fixed (post-triage)
Milestone

Comments

@mukulgit123
Copy link
Contributor

What version of Knative?

1.10.6

Expected Behavior

Expected webhook-certs secret to get populated with certificate data successfully and webhook pods to start without failing.

Actual Behavior

Webhook-certs do not get populated with certificate data and webhook pods are failing with errors as mentioned in below details.

kubernetes version: v1.26.15-eks
net-certmanager version is 1.10.4 since that apparently is the only compliant version.
knative-serving version is 1.10.6
cert-manager running on the cluster has version v1.13.2

The logs in domainmapping-webhook are reporting below:

domainmapping-webhook-798f675bbb-lb57j {"severity":"warn","timestamp":"2024-05-24T07:15:50.365Z","logger":"domainmapping-webhook","caller":"webhook/webhook.go:176","message":"server key missing","commit":"60ee8de","[knative.dev/pod](http://knative.dev/pod)":"domainmapping-webhook-798f675bbb-lb57j"}    
domainmapping-webhook-798f675bbb-lb57j 2024/05/24 07:15:50 http: TLS handshake error from 10.98.203.196:36666: tls: no certificates configured 
domainmapping-webhook-798f675bbb-lb57j 2024/05/24 07:15:50 http: TLS handshake error from 10.98.203.196:36664: tls: no certificates configured 

Same is the case with webhook logs:

webhook-cc7d45fc9-kfdfz {"severity":"warn","timestamp":"2024-05-24T07:20:50.664Z","logger":"webhook","caller":"webhook/webhook.go:176","message":"server key missing"," 
webhook-cc7d45fc9-kfdfz 2024/05/24 07:20:50 http: TLS handshake error from 10.98.198.110:44652: tls: no certificates configured                       
webhook-cc7d45fc9-kfdfz 2024/05/24 07:20:52 http: TLS handshake error from 10.98.198.110:44660: tls: no certificates configured

Steps to Reproduce the Problem

This issue is only happening in one specific environment.

@mukulgit123 mukulgit123 added the kind/bug Categorizes issue or PR as related to a bug. label May 27, 2024
@mukulgit123
Copy link
Contributor Author

@dprotaso helped with debugging this issue and we tried below:

Enabled debug logging on webhook deployment by following this

We also added below argument to the webhook deployment.

      - args:
        - -v=4

We checked the logs and found below:

I0527 12:39:17.441659       1 merged_client_builder.go:121] Using in-cluster configuration
2024/05/27 12:39:17 Registering 2 clients
2024/05/27 12:39:17 Registering 2 informer factories
2024/05/27 12:39:17 Registering 3 informers
2024/05/27 12:39:17 Registering 3 controllers
{"severity":"debug","timestamp":"2024-05-27T12:39:17.540Z","caller":"logging/config.go:116","message":"Successfully created the logger."}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.540Z","caller":"logging/config.go:117","message":"Logging level set to: debug"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.540Z","logger":"net-istio-webhook","caller":"profiling/server.go:65","message":"Profiling enabled: false","commit":"fcad131-dirty"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.545Z","logger":"net-istio-webhook","caller":"leaderelection/context.go:47","message":"Running with Standard leader election","commit":"fcad131-dirty"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.549Z","logger":"net-istio-webhook","caller":"metrics/metrics_worker.go:76","message":"Flushing the existing exporter before setting up the new exporter.","commit":"fcad131-dirty"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.549Z","logger":"net-istio-webhook","caller":"metrics/prometheus_exporter.go:52","message":"Created Prometheus exporter with config: &{knative.dev/net-istio net_istio_webhook prometheus 5000000000 <nil>  false 41478 0.0.0.0}. Start the server for Prometheus exporter.","commit":"fcad131-dirty"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.549Z","logger":"net-istio-webhook","caller":"metrics/metrics_worker.go:91","message":"Successfully updated the metrics exporter; old config: <nil>; new config &{knative.dev/net-istio net_istio_webhook prometheus 5000000000 <nil>  false 41478 0.0.0.0}","commit":"fcad131-dirty"}
I0527 12:39:17.560657       1 reflector.go:221] Starting reflector *v1.ConfigMap (0s) from k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169
I0527 12:39:17.560673       1 reflector.go:257] Listing and watching *v1.ConfigMap from k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169
{"severity":"info","timestamp":"2024-05-27T12:39:17.560Z","logger":"net-istio-webhook","caller":"sharedmain/main.go:283","message":"Starting configuration manager...","commit":"fcad131-dirty"}
I0527 12:39:17.660674       1 shared_informer.go:285] caches populated
{"level":"info","ts":1716813557.660793,"logger":"fallback","caller":"injection/injection.go:63","msg":"Starting informers..."}
I0527 12:39:17.660953       1 reflector.go:221] Starting reflector *v1.MutatingWebhookConfiguration (10h0m0s) from k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169
I0527 12:39:17.660965       1 reflector.go:257] Listing and watching *v1.MutatingWebhookConfiguration from k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169
I0527 12:39:17.661140       1 reflector.go:221] Starting reflector *v1.ValidatingWebhookConfiguration (10h0m0s) from k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169
I0527 12:39:17.661148       1 reflector.go:257] Listing and watching *v1.ValidatingWebhookConfiguration from k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169
I0527 12:39:17.661218       1 reflector.go:221] Starting reflector *v1.Secret (10h0m0s) from k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169
I0527 12:39:17.661226       1 reflector.go:257] Listing and watching *v1.Secret from k8s.io/client-go@v0.26.0/tools/cache/reflector.go:169
{"severity":"debug","timestamp":"2024-05-27T12:39:17.665Z","logger":"net-istio-webhook.ConfigMapWebhook","caller":"controller/controller.go:420","message":"Adding to queue knative-serving/net-istio-webhook-certs (depth: 1)","commit":"fcad131-dirty","knative.dev/key":"knative-serving/net-istio-webhook-certs"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.665Z","logger":"net-istio-webhook.WebhookCertificates","caller":"controller/controller.go:420","message":"Adding to queue knative-serving/net-istio-webhook-certs (depth: 1)","commit":"fcad131-dirty","knative.dev/key":"knative-serving/net-istio-webhook-certs"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.665Z","logger":"net-istio-webhook.DefaultingWebhook","caller":"controller/controller.go:420","message":"Adding to queue knative-serving/net-istio-webhook-certs (depth: 1)","commit":"fcad131-dirty","knative.dev/key":"knative-serving/net-istio-webhook-certs"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.838Z","logger":"net-istio-webhook.ConfigMapWebhook","caller":"controller/controller.go:420","message":"Adding to queue config.webhook.istio.networking.internal.knative.dev (depth: 2)","commit":"fcad131-dirty","knative.dev/key":"/config.webhook.istio.networking.internal.knative.dev"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.843Z","logger":"net-istio-webhook.DefaultingWebhook","caller":"controller/controller.go:420","message":"Adding to queue webhook.istio.networking.internal.knative.dev (depth: 2)","commit":"fcad131-dirty","knative.dev/key":"/webhook.istio.networking.internal.knative.dev"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.861Z","logger":"net-istio-webhook","caller":"webhook/webhook.go:220","message":"Informers have been synced, unblocking admission webhooks.","commit":"fcad131-dirty"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.861Z","logger":"net-istio-webhook","caller":"sharedmain/main.go:311","message":"Starting controllers...","commit":"fcad131-dirty"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.861Z","logger":"net-istio-webhook","caller":"leaderelection/context.go:149","message":"net-istio-webhook.configmapwebhook.00-of-01 will run in leader-elected mode with id \"ip-10-68-203-196_bf929e35-d818-4a6f-9d9e-573e7924657e\"","commit":"fcad131-dirty"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.861Z","logger":"net-istio-webhook.ConfigMapWebhook","caller":"controller/controller.go:486","message":"Starting controller and workers","commit":"fcad131-dirty"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.861Z","logger":"net-istio-webhook.ConfigMapWebhook","caller":"controller/controller.go:496","message":"Started workers","commit":"fcad131-dirty"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.861Z","logger":"net-istio-webhook.ConfigMapWebhook","caller":"controller/controller.go:513","message":"Processing from queue knative-serving/net-istio-webhook-certs (depth: 1)","commit":"fcad131-dirty"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.861Z","logger":"net-istio-webhook.ConfigMapWebhook","caller":"controller/controller.go:513","message":"Processing from queue config.webhook.istio.networking.internal.knative.dev (depth: 0)","commit":"fcad131-dirty"}
I0527 12:39:17.861619       1 shared_informer.go:285] caches populated
I0527 12:39:17.861650       1 shared_informer.go:285] caches populated
I0527 12:39:17.861654       1 shared_informer.go:285] caches populated
{"severity":"info","timestamp":"2024-05-27T12:39:17.861Z","logger":"net-istio-webhook","caller":"leaderelection/context.go:149","message":"net-istio-webhook.webhookcertificates.00-of-01 will run in leader-elected mode with id \"ip-10-68-203-196_5fc9e68f-b2d8-4446-9aa8-c157f8ec4f29\"","commit":"fcad131-dirty"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.861Z","logger":"net-istio-webhook.WebhookCertificates","caller":"controller/controller.go:486","message":"Starting controller and workers","commit":"fcad131-dirty"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.861Z","logger":"net-istio-webhook.WebhookCertificates","caller":"controller/controller.go:496","message":"Started workers","commit":"fcad131-dirty"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.862Z","logger":"net-istio-webhook.WebhookCertificates","caller":"controller/controller.go:513","message":"Processing from queue knative-serving/net-istio-webhook-certs (depth: 0)","commit":"fcad131-dirty"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.862Z","logger":"net-istio-webhook","caller":"leaderelection/context.go:149","message":"net-istio-webhook.defaultingwebhook.00-of-01 will run in leader-elected mode with id \"ip-10-68-203-196_190f0c2f-86ff-41e2-a16e-9ff5d2faeb92\"","commit":"fcad131-dirty"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.862Z","logger":"net-istio-webhook.DefaultingWebhook","caller":"controller/controller.go:486","message":"Starting controller and workers","commit":"fcad131-dirty"}
{"severity":"info","timestamp":"2024-05-27T12:39:17.862Z","logger":"net-istio-webhook.DefaultingWebhook","caller":"controller/controller.go:496","message":"Started workers","commit":"fcad131-dirty"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.862Z","logger":"net-istio-webhook.DefaultingWebhook","caller":"controller/controller.go:513","message":"Processing from queue knative-serving/net-istio-webhook-certs (depth: 1)","commit":"fcad131-dirty"}
{"severity":"debug","timestamp":"2024-05-27T12:39:17.862Z","logger":"net-istio-webhook.DefaultingWebhook","caller":"controller/controller.go:513","message":"Processing from queue webhook.istio.networking.internal.knative.dev (depth: 0)","commit":"fcad131-dirty"}
I0527 12:39:17.862139       1 leaderelection.go:248] attempting to acquire leader lease knative-serving/net-istio-webhook.configmapwebhook.00-of-01...
I0527 12:39:17.862375       1 leaderelection.go:248] attempting to acquire leader lease knative-serving/net-istio-webhook.webhookcertificates.00-of-01...
I0527 12:39:17.862445       1 leaderelection.go:248] attempting to acquire leader lease knative-serving/net-istio-webhook.defaultingwebhook.00-of-01...
I0527 12:39:17.938291       1 leaderelection.go:352] lock is held by ip-10-68-193-123_023c8900-8001-4a0e-8071-6244c6313cdd and has not yet expired
I0527 12:39:17.938312       1 leaderelection.go:253] failed to acquire lease knative-serving/net-istio-webhook.configmapwebhook.00-of-01
I0527 12:39:17.938404       1 leaderelection.go:352] lock is held by ip-10-68-201-34_e7a20d66-7954-4017-8283-a7cf3c3ebd87 and has not yet expired
I0527 12:39:17.938412       1 leaderelection.go:253] failed to acquire lease knative-serving/net-istio-webhook.webhookcertificates.00-of-01
I0527 12:39:17.938469       1 leaderelection.go:352] lock is held by ip-10-68-201-34_dddd68b1-eccc-47b9-9a6b-9d718a628bfa and has not yet expired
I0527 12:39:17.938479       1 leaderelection.go:253] failed to acquire lease knative-serving/net-istio-webhook.defaultingwebhook.00-of-01
I0527 12:39:30.132824       1 leaderelection.go:352] lock is held by ip-10-68-201-34_e7a20d66-7954-4017-8283-a7cf3c3ebd87 and has not yet expired
I0527 12:39:30.132851       1 leaderelection.go:253] failed to acquire lease knative-serving/net-istio-webhook.webhookcertificates.00-of-01
I0527 12:39:34.633445       1 leaderelection.go:352] lock is held by ip-10-68-193-123_023c8900-8001-4a0e-8071-6244c6313cdd and has not yet expired
I0527 12:39:34.633471       1 leaderelection.go:253] failed to acquire lease knative-serving/net-istio-webhook.configmapwebhook.00-of-01
I0527 12:39:35.971552       1 leaderelection.go:352] lock is held by ip-10-68-201-34_dddd68b1-eccc-47b9-9a6b-9d718a628bfa and has not yet expired
I0527 12:39:35.971571       1 leaderelection.go:253] failed to acquire lease knative-serving/net-istio-webhook.defaultingwebhook.00-of-01
I0527 12:39:46.912546       1 leaderelection.go:352] lock is held by ip-10-68-201-34_e7a20d66-7954-4017-8283-a7cf3c3ebd87 and has not yet expired
I0527 12:39:46.912568       1 leaderelection.go:253] failed to acquire lease knative-serving/net-istio-webhook.webhookcertificates.00-of-01
{"severity":"warn","timestamp":"2024-05-27T12:39:48.649Z","logger":"net-istio-webhook","caller":"webhook/webhook.go:176","message":"server key missing","commit":"fcad131-dirty"}
2024/05/27 12:39:48 http: TLS handshake error from 10.68.203.196:46328: tls: no certificates configured
{"severity":"warn","timestamp":"2024-05-27T12:39:48.649Z","logger":"net-istio-webhook","caller":"webhook/webhook.go:176","message":"server key missing","commit":"fcad131-dirty"}
2024/05/27 12:39:48 http: TLS handshake error from 10.68.203.196:46312: tls: no certificates configured

After we figured out there is something off with the lease we ran below command to check for respective lease status during pod restarts:

watch -d -n1 kubectl get leases -n net-istio-webhook.configmapwebhook.00-of-01 -o yaml

This brought us to the conclusion that lease is having an older timestamp is not getting renewed.

apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
  creationTimestamp: "2023-06-01T14:18:27Z"
  name: net-istio-webhook.webhookcertificates.00-of-01
  namespace: knative-serving
  resourceVersion: "1950716655"
  uid: 26f8b702-2896-4d69-b69c-87c469bc05cf
spec:
  acquireTime: "2024-05-23T11:44:34.479253Z"
  holderIdentity: ip-10-68-201-34_e7a20d66-7954-4017-8283-a7cf3c3ebd87
  leaseDurationSeconds: 60
  leaseTransitions: 2244
  renewTime: "2024-05-23T11:44:44.575179Z"

Dave then pointed out that this is probably a bug with go client and pointed to this fix which has already been introduced in the recent version of knative.

Below workaround was suggested which resolved the issue for us.

  1. Delete the lease which is not getting renewed.
  2. Restart the webhook pods/deployment which is failing

Result:

  1. webhook-certs get successfully populated
  2. MutatingWebhookConfiguration recieves the caBundle for the same
  3. Webhook pods start successfully.

@dprotaso Thanks for the great work and help on this so far!! 🥇 🏆

@dprotaso
Copy link
Member

What I discovered in slack is that the webhook crashes after the logs - thus our liveness probes are failing.

From kubernetes/kubernetes#114872 (comment)

The delay on start protects against a skewed clock component taking over an active lease because it has a wrong view of the current time (or because the active lease component has a wrong view of the current time). This PR appears to drop that protection and make us vulnerable to that again. Is that right?

So essentially we need to adjust our probing to ensure we accomodate that when we have an expired lease we won't have a new leader until the second retry and the lease duration has passed since the elector has started.

Since our default lease duration is 60s that means we'll need to wait at least that long.

@dprotaso
Copy link
Member

/triage accepted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Issues which should be fixed (post-triage)
Projects
None yet
2 participants