Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IP-Masquerading for outgoing BGP traffic #2355

Closed
2 tasks done
Zappelphilipp opened this issue Apr 5, 2024 · 7 comments
Closed
2 tasks done

IP-Masquerading for outgoing BGP traffic #2355

Zappelphilipp opened this issue Apr 5, 2024 · 7 comments

Comments

@Zappelphilipp
Copy link

Zappelphilipp commented Apr 5, 2024

Is your feature request related to a problem?

Running Kubernetes nodes with dynamic/changing VM IP's (as in Rancher, where nodes are regularly redeployed during updates, etc.) results in non-deterministic peering partner IP setups on the router. Apart from the Cisco-specific "BGP Peer Group" feature (which allows a whole subnet to be set as potential BGP peers), there is currently no known clean solution for using BGP with dynamic IP's.

Describe the solution you'd like

I have partially found a solution to the problem: MetalLB allows running in BGP and L2 mode simultaneously. So, I set up one or two virtual IP's in L2-Mode, which bind to the speaker-service of the MetalLB instance. These two IP's are then configured on the router as BGP peers. Thus, traffic flow from Firewall to BGP to MetalLB would consistently go through these two fixed virtual IP's in a direct way to the speaker-pods (which in this case are 2 on 2 worker nodes)

The second problem arises with all outgoing peering traffic:

MetalLB --> BGP --> Router

This traffic always uses the source IP's of the Kubernetes nodes, which change regularly. It would be beneficial if the source IP of the BGP traffic could be masqueraded to statically defined IP's. These would be the same two peering addresses used in L2-Mode for incoming BGP traffic.

Additional context

No response

I've read and agree with the following

  • I've checked all open and closed issues and my request is not there.
  • I've checked all open and closed pull requests and my request is not there.
@fedepaol
Copy link
Member

fedepaol commented Apr 5, 2024

Is your feature request related to a problem?

Running Kubernetes nodes with dynamic/changing VM IP's (as in Rancher, where nodes are regularly redeployed during updates, etc.) results in non-deterministic peering partner IP setups on the router. Apart from the Cisco-specific "BGP Peer Group" feature (which allows a whole subnet to be set as potential BGP peers), there is currently no known clean solution for using BGP with dynamic IP's.

Describe the solution you'd like

I have partially found a solution to the problem: MetalLB allows running in BGP and L2 mode simultaneously. So, I set up one or two virtual IP's in L2-Mode, which bind to the speaker-service of the MetalLB instance. These two IP's are then configured on the router as BGP peers. Thus, traffic flow from Firewall to BGP to MetalLB would consistently go through these two fixed virtual IP's.

Whoa, this is pretty creative!

The second problem arises with all outgoing peering traffic:

MetalLB --> BGP --> Router

This traffic always uses the source IP's of the Kubernetes nodes, which change regularly. It would be beneficial if the source IP of the BGP traffic could be masqueraded to statically defined IP's. These would be the same two peering addresses used in L2-Mode for incoming BGP traffic.

I think you could use the srcAddress field of the bgpPeer. This gets translated to the update-source parameter of frr
https://docs.frrouting.org/en/latest/bgp.html#clicmd-neighbor-PEER-update-source-IFNAME-ADDRESS

The downside is, you have to do some node selector gymnastic to have exactly one peer per node.

Additional context

No response

I've read and agree with the following

* [x]  I've checked all open and closed issues and my request is not there.

* [x]  I've checked all open and closed pull requests and my request is not there.

@Zappelphilipp
Copy link
Author

Zappelphilipp commented Apr 5, 2024

Wow, that was a quick response! :D Thank you!

I've already considered the sourceAddress parameter (I believe you're referring to this: MetallB - Advanced BGP Configuration), but I wasn't certain if the L2 VIP on the given node is detected as "existing locally on one of the host’s network interfaces." This uncertainty stems from the fact that the VIP isn't explicitly shown on the node itself as an IP address, as is the case with, for example, keepalived, which adds its VIP's directly onto interfaces. With keepalived, this can be confirmed using ip a, but for MetalLB, I haven't been able to ascertain which node has which L2 VIP, aside from inspecting the speaker logs.

Do you know if the sourceAddress parameter would actually function if I ensure that the speaker is running on the correct node?

And for the bonus question: How is the L2 address actually bound to the node, and how can this be confirmed on the node itself without relying on inspecting the speaker logs?

@fedepaol
Copy link
Member

fedepaol commented Apr 5, 2024

Wow, that was a quick response! :D Thank you!

I've already considered the sourceAddress parameter (I believe you're referring to this: MetallB - Advanced BGP Configuration), but I wasn't certain if the L2 VIP on the given node is detected as "existing locally on one of the host’s network interfaces." This uncertainty stems from the fact that the VIP isn't explicitly shown on the node itself as an IP address, as is the case with, for example, keepalived, which adds its VIP's directly onto interfaces. With keepalived, this can be confirmed using ip a, but for MetalLB, I haven't been able to ascertain which node has which L2 VIP, aside from inspecting the speaker logs.

Do you know if the sourceAddress parameter would actually function if I ensure that the speaker is running on the correct node?

It should, although iirc we don't have e2e tests covering it at the moment. Also, the check of the existance of the ip on a node is related to the native bgp implementation, not the frr one (but I think is necessary). In any case should be trivial to test for one node (and to add the vip to the lo interface of the node)

And for the bonus question: How is the L2 address actually bound to the node, and how can this be confirmed on the node itself without relying on inspecting the speaker logs?

MetalLB doesn't assign the ip to any interfaces of the node. But if you wait, #2351 should land soon and with that the support to expose the node exposing the VIP as a status CR.
Otherwise, I think both metrics and events would work (but events are ephemeral).

@Zappelphilipp
Copy link
Author

thanks, i am going to test that.

What i found out in the meantime, in case somebody cares, what also works (but would be a hassle to maintain, is to add POSTROUTING rules on the nodes itself. for example if you have two worker nodes with and the two L2 VIP peering services get 10.200.10.6 and .7 then one node gets

iptables -t nat -A POSTROUTING -o ens192 -p tcp --dport 179 -j SNAT --to-source 10.200.10.6

and the other gets

iptables -t nat -A POSTROUTING -o ens192 -p tcp --dport 179 -j SNAT --to-source 10.200.10.7

more of a proof of concept but i will report back in case i find any suitable (kubernetes native) solution for that problem.

i am honestly pretty surprised that nobody else seems to have this kind of use-case.

@Zappelphilipp
Copy link
Author

Zappelphilipp commented Apr 6, 2024

I added some label-magic and metallb-config so that the L2-addresses for the peers (.6 and .7 ip address) are bound two two fixed nodes and also added sourceAddress for the BGPPeer, also bound to the same nodes, but it seems like this won' work:

my config looks likes this:

apiVersion: metallb.io/v1beta2
kind: BGPPeer
metadata:
  name: peer-1
  namespace: metallb-system
spec:
  myASN: 65110
  peerASN: 65210
  peerAddress: 10.200.10.1
  sourceAddress: 10.200.10.6
  bfdProfile: testbfdprofile
  nodeSelectors:
  - matchLabels:
      bgp-peer: "1"
---
apiVersion: metallb.io/v1beta2
kind: BGPPeer
metadata:
  name: peer-2
  namespace: metallb-system
spec:
  myASN: 65110
  peerASN: 65210
  peerAddress: 10.200.10.1
  sourceAddress: 10.200.10.7
  bfdProfile: testbfdprofile
  nodeSelectors:
  - matchLabels:
      bgp-peer: "2"

but metallb complains:

Error from server (Forbidden): error when creating "metallb-config.yaml": admission webhook "bgppeersvalidationwebhook.metallb.io" denied the request: peer 10.200.10.1 already exists, FRR mode doesn't support duplicate BGPPeers

Have I misunderstood something here or is this the end of my creative solution?

@fedepaol
Copy link
Member

Talking (at least) about the asn and the src address: if we don't allow different properties on different nodes for the same peer, I don't see the point of having those fields.

On the other hand, relying on the label selector is fragile, because even though we can check if those two peers (same ip, different src address) are not conflicting because they apply to different nodes today, a new node might come in and match both label selectors.

I am keen to say we must rethink the api, either allowing a per node configuration by adding a node name field to the bgppeer, or having another structure that contains the per node configuration of a given frr instance.

cc @oribon to open the discussion.

@fedepaol
Copy link
Member

@Zappelphilipp fwiw, I filed #2367 . After discussing offline with @oribon , we think it'd make sense to allow the configuration.

I am closing this as I think we can refer to the other issue to track the implementation (which will happen on a best effort basis).
Feel free to reopen if you think there's something missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants