Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serviceAnnounced for no reason and excesive ServiceReconciler logs after update to 0.13.7 from 0.12.1 #1770

Closed
pupseba opened this issue Jan 10, 2023 · 2 comments · Fixed by #1791

Comments

@pupseba
Copy link

pupseba commented Jan 10, 2023

After updating MetalLB in a few kubernetes clusters (v1.23.13 with calico cni and kube-proxy using iptables), moving from 0.12.1 to 0.13.7, we started getting these messages for no apparent reason. We only use L2 mode:

{"caller":"main.go:344","event":"serviceAnnounced","ips":["10.147.53.6"],"level":"info","msg":"service has IP, announcing","pool":"kafka-platform","protocol":"layer2","ts":"2023-01-10T13:41:25Z"}

They come in bursts, where all the IPs associated with one particular kubernetes node get announced. They are not moved from one server to another, is just that they get (re)announced by the same node they were already announced.

There is no restart, configuration change...nothing that we can see that explains why the IPs are announced "out of the blue". Yet, this is shown in logs (l2advertisement as an example, other CRs are also observed to log before the force SyncStateReprocessAll used to call ForceReload):
{"caller":"config_controller.go:139","controller":"ConfigReconciler","event":"force service reload","level":"info","ts":"2023-01-17T12:17:38Z"} {"caller":"config_controller.go:51","controller":"ConfigReconciler","level":"info","start reconcile":"metallb-system/l2advertisement1","ts":"2023-01-17T12:17:38Z"} {"caller":"config_controller.go:151","controller":"ConfigReconciler","end reconcile":"metallb-system/l2advertisement1","level":"info","ts":"2023-01-17T12:17:38Z"}

From one day to another, during which these sort of events are seen in logs, we are unalbe to see any change in the "under watch" resources (addresspools,bfdprofiles,bgpadvertisements,bgppeers,communities,ipaddresspools,l2advertisements). We did this comparisson by "get $i -oyaml" on different days and comparing the results, which include the .metadata.resourceVersion.

It is hard to send full logs since after the update, the amount of logging (info level) is scary. You can easily see in this graph from the last 7 days, "when" MetalLB was updated just by the amount of log entries registered. https://imgur.com/a/6Iqle6R

Most of those log entries are for "ServiceReconciler" happening over and over again. I am able to see that during ServiceAnnounced events, there are also events of type "config reloaded" and "force service reload".

@fedepaol
Copy link
Member

I think I know what this is, or I suspect it at least.
The current behavior is like this. We have a controller listening to all the events contributing to the metallb's configuration.
This includes the secrets in the metallb's namespace
Whenever there's a configuration change, we reprocess all the services because some of them may be affected (think about new l2advertisements for example)

Now, even if you didn't change anything, the cert controller that we use to rotate the ca for the webhooks might be causing this.

We can certainly reduce it adding a caching layer, skipping out secrets we don't care about and so on.

@pupseba
Copy link
Author

pupseba commented Jan 19, 2023

Hi!

For testing purposes, the validatingwebhookconfiguration for metallb is deleted and the controller is running with these args.

  spec:
      containers:
        - args:
            - '--port=7472'
            - '--log-level=info'
            - '--cert-service-name=metallb-webhook-service'
            - '--disable-cert-rotation=true'

Sadly, after 18hs of these test configurations are in place, we still see "serviceAnnounced" events around "force service reload" events logged from the "config_controller".

Hope this info helps.

Regards,
Seba

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants