Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handler does not fire on an Azure cluster #1042

Open
aristidesneto opened this issue Jul 24, 2023 · 4 comments
Open

Handler does not fire on an Azure cluster #1042

aristidesneto opened this issue Jul 24, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@aristidesneto
Copy link

Long story short

I have a handler that listens to events from a deployment that contains a certain declared annotation.

The events are: on creation and update.

The logic is that when the hash of the application image is changed during a deploy, the operator identifies and triggers the handler and executes the tasks that were programmed, which are the deletion and creation of a secret.

When the deployment that contains the annotation defined by me is imaged, nothing happens. There are no logs on the operator pod and I don't know how to identify a possible problem why it doesn't fire.

In this case, if I delete the operator pod, when starting it, it identifies this change in deploy and triggers the handler, which is what I expect it to do.

Weird is that the problem is happening on an Azure cluster. On some clusters on GCP it works successfully.

Is there anything I can analyze, debug this problem, because I don't know.

Below is my RBAC file to analyze if it could be something related to permissions.

Kopf version

1.36.1

Kubernetes version

1.24.10

Python version

3.10.6

Code

# main.py
import kopf
import ... # and others

@kopf.on.field(
    'deployments',
    field='spec.template.spec.containers',
    annotations={'my-custom-annotation': 'true'},
    old=kopf.PRESENT,
    new=kopf.PRESENT
)
def image_changed(name, spec, old, new, logger, namespace, annotations, **kwargs):
    # My code here
    # ...
    pass

----------------------
# my-app-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app-dp
  namespace: default
  annotations:
    my-custom-annotation: "true"
...
----------------------
# RBAC File YAML
---
apiVersion: v1
kind: ServiceAccount
metadata:
  namespace: operator
  name: my-operator-sa
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: my-operator-role-cluster
rules:
- apiGroups: ["apps"]
  resources: ["deployments", "deployments/status"]
  verbs: [get, list, watch, patch]
- apiGroups: ["*"]
  resources: ["secrets"]
  verbs: [get, list, create, delete]
- apiGroups: [""]
  resources: [events]
  verbs: [create]
- apiGroups: [""]
  resources: [namespaces]
  verbs: [list, watch]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: my-operator-cluster
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: my-operator-role-cluster
subjects:
- kind: ServiceAccount
  name: my-operator-sa
  namespace: operator

Logs

/usr/local/lib/python3.11/site-packages/kopf/_core/reactor/running.py:176: FutureWarning: Absence of either namespaces or cluster-wide flag will become an error soon. For now, switching to the cluster-wide mode for backward compatibility.
  warnings.warn("Absence of either namespaces or cluster-wide flag will become an error soon."
[2023-07-24 16:40:52,753] kopf._core.engines.a [INFO    ] Initial authentication has been initiated.
[2023-07-24 16:40:52,756] kopf.activities.auth [INFO    ] Activity 'login_via_client' succeeded.
[2023-07-24 16:40:52,756] kopf._core.engines.a [INFO    ] Initial authentication has finished.
[2023-07-24 16:40:53,264] kopf._core.reactor.o [WARNING ] Not enough permissions to watch for resources: changes (creation/deletion/updates) will not be noticed; the resources are only refreshed on operator restarts.

Additional information

No response

@aristidesneto aristidesneto added the bug Something isn't working label Jul 24, 2023
@nolar
Copy link
Owner

nolar commented Jul 24, 2023

That might be caused by a known issue with yet unknown solution: Kubernetes sometimes “loses” the connection without closing it. Since the connection is technically opened, Kopf does not reconnect and believes that nothing happens in the cluster.

If you search through issues, Azure is mentioned several times as especially affected by this. My guess is that the problem is in load balancers and their connections to the real control plane (in the chain: kopf->lb->k8s). Unconfirmed though.

Often, setting the client-side (i.e. kopf-side) connection timeout helps (see settings). Not the best solution, but it works: the operator might be not noticing the changes for the configured time out (e.g 10 mins), or will have to reconnect too often (if you set it to 1m). The “good” value depends on your individual case, there is no “good default”.

I see no way to fix this on the Kopf side, unless there is some kind of ping-pong machinery in k8s above low-level tcp.

@aristidesneto
Copy link
Author

Hi, thanks for the quick response.

I'm not quite sure how I'm going to proceed yet, but I'll try to look into something more related to Azure and analyze the traffic further.

I will also validate changes to the connection timeout.

Thanks

@francescotimperi
Copy link

We are facing similar problem with Azure too. I was thinking about configuring a liveness probe which will cause a request to k8s api somehow to keep the connection alive, in case for whatever reason the LB kills the connection after some time if they are not used. @nolar what do you think?

@nolar
Copy link
Owner

nolar commented Jul 25, 2023

@francescotimperi The key problem is the existing connection, not a new one. The probe will be showing success since new tcp connections are landing fine. It is the existing connection that remains connected but dysfunctional. You need a response from k8s there to validate it — this is what I meant by ping-ponging. But I saw no such feature in k8s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants