-
-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handler does not fire on an Azure cluster #1042
Comments
That might be caused by a known issue with yet unknown solution: Kubernetes sometimes “loses” the connection without closing it. Since the connection is technically opened, Kopf does not reconnect and believes that nothing happens in the cluster. If you search through issues, Azure is mentioned several times as especially affected by this. My guess is that the problem is in load balancers and their connections to the real control plane (in the chain: kopf->lb->k8s). Unconfirmed though. Often, setting the client-side (i.e. kopf-side) connection timeout helps (see settings). Not the best solution, but it works: the operator might be not noticing the changes for the configured time out (e.g 10 mins), or will have to reconnect too often (if you set it to 1m). The “good” value depends on your individual case, there is no “good default”. I see no way to fix this on the Kopf side, unless there is some kind of ping-pong machinery in k8s above low-level tcp. |
Hi, thanks for the quick response. I'm not quite sure how I'm going to proceed yet, but I'll try to look into something more related to Azure and analyze the traffic further. I will also validate changes to the connection timeout. Thanks |
We are facing similar problem with Azure too. I was thinking about configuring a liveness probe which will cause a request to k8s api somehow to keep the connection alive, in case for whatever reason the LB kills the connection after some time if they are not used. @nolar what do you think? |
@francescotimperi The key problem is the existing connection, not a new one. The probe will be showing success since new tcp connections are landing fine. It is the existing connection that remains connected but dysfunctional. You need a response from k8s there to validate it — this is what I meant by ping-ponging. But I saw no such feature in k8s. |
Long story short
I have a handler that listens to events from a deployment that contains a certain declared annotation.
The events are: on creation and update.
The logic is that when the hash of the application image is changed during a deploy, the operator identifies and triggers the handler and executes the tasks that were programmed, which are the deletion and creation of a secret.
When the deployment that contains the annotation defined by me is imaged, nothing happens. There are no logs on the operator pod and I don't know how to identify a possible problem why it doesn't fire.
In this case, if I delete the operator pod, when starting it, it identifies this change in deploy and triggers the handler, which is what I expect it to do.
Weird is that the problem is happening on an Azure cluster. On some clusters on GCP it works successfully.
Is there anything I can analyze, debug this problem, because I don't know.
Below is my RBAC file to analyze if it could be something related to permissions.
Kopf version
1.36.1
Kubernetes version
1.24.10
Python version
3.10.6
Code
Logs
Additional information
No response
The text was updated successfully, but these errors were encountered: