serviceAnnounced for no reason and excesive ServiceReconciler logs after update to 0.13.7 from 0.12.1 #1770

pupseba · 2023-01-10T14:46:38Z

After updating MetalLB in a few kubernetes clusters (v1.23.13 with calico cni and kube-proxy using iptables), moving from 0.12.1 to 0.13.7, we started getting these messages for no apparent reason. We only use L2 mode:

{"caller":"main.go:344","event":"serviceAnnounced","ips":["10.147.53.6"],"level":"info","msg":"service has IP, announcing","pool":"kafka-platform","protocol":"layer2","ts":"2023-01-10T13:41:25Z"}

They come in bursts, where all the IPs associated with one particular kubernetes node get announced. They are not moved from one server to another, is just that they get (re)announced by the same node they were already announced.

There is no restart, configuration change...nothing that we can see that explains why the IPs are announced "out of the blue". Yet, this is shown in logs (l2advertisement as an example, other CRs are also observed to log before the force SyncStateReprocessAll used to call ForceReload):
{"caller":"config_controller.go:139","controller":"ConfigReconciler","event":"force service reload","level":"info","ts":"2023-01-17T12:17:38Z"} {"caller":"config_controller.go:51","controller":"ConfigReconciler","level":"info","start reconcile":"metallb-system/l2advertisement1","ts":"2023-01-17T12:17:38Z"} {"caller":"config_controller.go:151","controller":"ConfigReconciler","end reconcile":"metallb-system/l2advertisement1","level":"info","ts":"2023-01-17T12:17:38Z"}

From one day to another, during which these sort of events are seen in logs, we are unalbe to see any change in the "under watch" resources (addresspools,bfdprofiles,bgpadvertisements,bgppeers,communities,ipaddresspools,l2advertisements). We did this comparisson by "get $i -oyaml" on different days and comparing the results, which include the .metadata.resourceVersion.

It is hard to send full logs since after the update, the amount of logging (info level) is scary. You can easily see in this graph from the last 7 days, "when" MetalLB was updated just by the amount of log entries registered. https://imgur.com/a/6Iqle6R

Most of those log entries are for "ServiceReconciler" happening over and over again. I am able to see that during ServiceAnnounced events, there are also events of type "config reloaded" and "force service reload".

The text was updated successfully, but these errors were encountered:

fedepaol · 2023-01-18T13:54:37Z

I think I know what this is, or I suspect it at least.
The current behavior is like this. We have a controller listening to all the events contributing to the metallb's configuration.
This includes the secrets in the metallb's namespace
Whenever there's a configuration change, we reprocess all the services because some of them may be affected (think about new l2advertisements for example)

Now, even if you didn't change anything, the cert controller that we use to rotate the ca for the webhooks might be causing this.

We can certainly reduce it adding a caching layer, skipping out secrets we don't care about and so on.

pupseba · 2023-01-19T07:36:01Z

Hi!

For testing purposes, the validatingwebhookconfiguration for metallb is deleted and the controller is running with these args.

  spec:
      containers:
        - args:
            - '--port=7472'
            - '--log-level=info'
            - '--cert-service-name=metallb-webhook-service'
            - '--disable-cert-rotation=true'

Sadly, after 18hs of these test configurations are in place, we still see "serviceAnnounced" events around "force service reload" events logged from the "config_controller".

Hope this info helps.

Regards,
Seba

pupseba mentioned this issue Jan 16, 2023

FR: L2 mode supports an option to send gratuitous ARP periodically #1761

Closed

fedepaol added enhancement experience-report labels Jan 18, 2023

fedepaol mentioned this issue Jan 20, 2023

Config Controller: ignore changes that are not relevant #1791

Merged

fedepaol closed this as completed in #1791 Jan 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

serviceAnnounced for no reason and excesive ServiceReconciler logs after update to 0.13.7 from 0.12.1 #1770

serviceAnnounced for no reason and excesive ServiceReconciler logs after update to 0.13.7 from 0.12.1 #1770

pupseba commented Jan 10, 2023 •

edited

fedepaol commented Jan 18, 2023

pupseba commented Jan 19, 2023

serviceAnnounced for no reason and excesive ServiceReconciler logs after update to 0.13.7 from 0.12.1 #1770

serviceAnnounced for no reason and excesive ServiceReconciler logs after update to 0.13.7 from 0.12.1 #1770

Comments

pupseba commented Jan 10, 2023 • edited

fedepaol commented Jan 18, 2023

pupseba commented Jan 19, 2023

pupseba commented Jan 10, 2023 •

edited