-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
knative-serving v1.10.6 webhook pods failing due to "server key missing" missing error #15255
knative-serving v1.10.6 webhook pods failing due to "server key missing" missing error #15255
Comments
@dprotaso helped with debugging this issue and we tried below: Enabled debug logging on webhook deployment by following this We also added below argument to the webhook deployment.
We checked the logs and found below:
After we figured out there is something off with the lease we ran below command to check for respective lease status during pod restarts:
This brought us to the conclusion that lease is having an older timestamp is not getting renewed.
Dave then pointed out that this is probably a bug with go client and pointed to this fix which has already been introduced in the recent version of knative. Below workaround was suggested which resolved the issue for us.
Result:
@dprotaso Thanks for the great work and help on this so far!! 🥇 🏆 |
What I discovered in slack is that the webhook crashes after the logs - thus our liveness probes are failing. From kubernetes/kubernetes#114872 (comment)
So essentially we need to adjust our probing to ensure we accomodate that when we have an expired lease we won't have a new leader until the second retry and the lease duration has passed since the elector has started. Since our default lease duration is 60s that means we'll need to wait at least that long. |
/triage accepted |
What version of Knative?
Expected Behavior
Expected webhook-certs secret to get populated with certificate data successfully and webhook pods to start without failing.
Actual Behavior
Webhook-certs do not get populated with certificate data and webhook pods are failing with errors as mentioned in below details.
kubernetes version: v1.26.15-eks
net-certmanager version is 1.10.4 since that apparently is the only compliant version.
knative-serving version is 1.10.6
cert-manager running on the cluster has version v1.13.2
The logs in domainmapping-webhook are reporting below:
Same is the case with webhook logs:
Steps to Reproduce the Problem
This issue is only happening in one specific environment.
The text was updated successfully, but these errors were encountered: