-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TLS handshake error from xx: read tcp xx -> xx: read: connection reset by peer #718
Comments
Confirming issue and initial testing seems like cert is fine.
|
We're seeing this too, we recently upgraded from |
In my case I see invalid certificate verification:
This is on EKS 1.25 and karpenter v0.27.5 UPDATE: Regarding the
in which case it is expected the cert to be a self-signed CA which is true for |
Can you confirm you successfully uninstalled the webhook
Are there any relevant logs that suggest the certificate is failing to rotate? |
Yes we've uninstalled
Not that I have found yet, but I can dig in further if necessary. I've checked the certificate and it is valid. I wonder if it is related to this? kubernetes/kubernetes#109022 |
Yes, I can see this in my logs:
|
Yep, something is not right here I've noticed the same. For example this happened in the case of
notice how the cert has no |
With karpenter v0.29.2 fresh install, I can see following errors in the logs
|
Karpenter 0.29 with EKS 1.24 |
any update for this one? |
The current plan of record is to deprecate our webhooks and replace them with CEL. Can you confirm that this is just causing log spam, and no other negative side effects? |
so far yes |
I'm observing the same issue. I do not see these errors (or tell me if they are found somewhere else except karpenter pods)
But I not that often I see error messages reported by others
So far Karpenter is functionioning just fine, and I did not notice any critical issues. |
I am also seeing this issue.
we recently migrated from v0.27 to v0.30. |
EKS: 1.24 |
Indeed, looks like "fresh reinstall" has helped. 2 hours and no error messages so far. |
I also tried deleting CA cert(aws/karpenter-provider-aws#1398 (comment)) and reinstalling the helm release, but these errors didn't go away for me. I am still seeing these errors with same frequency. |
Unrelated to the core of the issue, is it also for everyone else that |
Is this something that is likely to go away with the move to the new beta in v0.32.x? |
I have completely cleaned up karpenter (deleted helm charts, crds and the namespace) and reinstalled. I still see the TLS errors from the webhook. My bigger concern is that the pods fail the health check and restart every few minutes. I am running 0.32.1 |
After upgrade to Don't afraid, deletion of helm chart not deletes worker nodes=) |
I was using fargate as a way to run Karpenter and I gave up on that and I create a three node NoeGroup to make sure Karpenter has two different nodes to run on and one extra node for updates. That has gotten rid of all of my Karpenter issues for now. |
We should be able to close this now that we have released v0.33. Webhooks are fully dropped so we should stop seeing these TLS handshake errors if you enable Closing this one for now since the newest version of Karpenter won't run into this problem by default. Please feel free to continue discussion on this one if you are running an older version of Karpenter with the webhooks and need more support. |
Version
Karpenter Version: v0.27.0
Kubernetes Version: v1.24.0
Expected Behavior
Karpenter should not produce these TLS handshake errors generated by a bad connection from the control-plane kube-apiserver to the Karpenter defaulting/validating webhooks.
Actual Behavior
Karpenter generates TLS handshake errors on some calls that are processed through the control-plane kube-apiserver that are then routed to Karpenter's validating/defaulting webhooks. These TLS errors appear to be related to kubernetes/kubernetes#109022 which states that these handshake errors may be generated by some caching mechanism that is happening in the standard library that causes TLS errors on a cert rotation.
Steps to Reproduce the Problem
Unable to repro personally but have seen instances from users of this happening.
Resource Specs and Logs
N/A
Community Note
The text was updated successfully, but these errors were encountered: