Allow setting passive mode for BGP peers #1603
In metallb/metallb#114, I explored how to make my k8s BGP load-balancer interoperate gracefully with Calico clusters that peer with external BGP routers. I've documented my findings at https://master--metallb.netlify.com/configuration/calico/ and metallb/metallb#114 (comment)
In my setup, I'm trying to peer Calico with another BGP speaker running on localhost. The peer does not listen on any ports, so Calico should just wait for an incoming session. Currently, there is no way to tell Calico to treat a peer passively, so Calico always eagerly tries to connect to 127.0.0.1:179... which is itself. This causes repeated session establishment failures, and BIRD goes into error backoff. This makes it increasingly hard/impossible for the real peer to connect, there's a short window of just a few seconds when the error backoff resets, before the failed connection attempts force it back into backoff.
Calico should have a way to specify that a
BIRD supports this, with the
I am trying to make Calico and MetalLB integrate nicely with each other, by setting up a BGP topology like the one I documented for Romana integration. Basically, I want Calico to peer with the outside world, but also with another node agent that pushes routes into Calico for redistribution.
Setting up BGP sessions to/from localhost is notoriously tricky, but with the right set of options, it's possible. Lack of passive mode is one problem I encountered with Calico.
The text was updated successfully, but these errors were encountered:
We'd need to:
Hey @danderson, we're revisiting this issue and trying to reproduce the problem (i.e. error backoff due to repeated session establishment failures) using the latest versions of Calico and MetalLB with the minikube setup from the MetalLB tutorial.
The goal is to reproduce this problem first as a validation step before getting the "passive" mode added to bgp peer.
However, using the setup below, I wasn't able to reproduce the error backoff that leads to difficulty with real peer connection establishment.
The sequence for the setup was:
Before configuring metalb speaker with peering to 127.0.0.1, I had the calico-node peering to 127.0.0.1 to simulate the problem of calico-node peering with itself, i.e. "repeated session establishment failures".
However, I couldn't seem to reproduce an error backoff in the calico-node. Connection retries were evident from the calico-node logs:
However there was no indication from the logs that BIRD had entered error backoff.
Afterwards, initiating a peering from the metallb speaker resulted in an established connection with calico-node without any issues.
Let me know if I'm missing something important in reproducing this.
Still looking for someone to work on this! Would love to review.
The first PR to add the new configuration option would look similar to this one: https://github.com/projectcalico/libcalico-go/pull/1262/files