Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BGP not announcing LoadBalancer services with externalTrafficPolicy Local for prefixes /32 and using route-reflectors #8162

Closed
AMacedoP opened this issue Oct 25, 2023 · 6 comments · Fixed by #8358

Comments

@AMacedoP
Copy link
Contributor

When announcing LoadBalancer services using BGP and route-reflectors, services using externalTrafficPolicy: Local are not announced if the pods are not running in the same nodes as the route-reflectors.

This looks like a edge case not covered in #6074 because we are also using /32 prefixes in BGPConfiguration.

Expected Behavior

LoadBalancer service IPs with externalTrafficPolicy: Local are announced by the route-reflectors to external network devices

Current Behavior

LoadBalancer service IPs are not announced by the route-reflectors, unless the route-reflector node has a pod that matches the service selectors

Possible Solution

Route reflectors nodes should announce all service IPs with externalTrafficPolicy: Local regardless of whether a pod is scheduled there

Steps to Reproduce (for bugs)

  1. Setup a cluster with a node as a route reflector and disable to node-to-node mesh according to the documentation
  2. Configure the default BGPConfiguration to announce LoadBalancer IPs with /32 prefixes:
    apiVersion: projectcalico.org/v3
    kind: BGPConfiguration
    metadata:
      name: default
    spec:
      asNumber: 65501
      listenPort: 179
      logSeverityScreen: Info
      nodeToNodeMeshEnabled: false
      serviceLoadBalancerIPs:
      - cidr: "10.100.17.223/32"
      - cidr: "10.100.17.72/32"
      - cidr: "10.100.18.30/32"
  3. Add an external network device as a BGPPeer only for the route-reflectors
    apiVersion: projectcalico.org/v3
    kind: BGPPeer
    metadata:
      name: external-device
    spec:
      peerIP: 20.0.0.6
      asNumber: 65501
      nodeSelector: has(route-reflector)
  4. Create a deployment and service with externalTrafficPolicy: Local and check that the pod was scheduled to a node not configured as a route reflector
  5. Check that the new service is not present in the routing table of the external network device

Your Environment

  • Calico version: v.3.25.1
  • Orchestrator version (e.g. kubernetes, mesos, rkt): kubernetes
  • Operating System and version: Ubuntu 20.04
@louygan
Copy link

louygan commented Nov 6, 2023

Have similar problem on Azure Operator Nexus (AON), not using route-reflectors, with /32 loadBalancer ip, and using externalTrafficPolicy: Local, if service pod restarted, there is no announce of the loadBalancer ip, traffic can only restored when client detected tcp retransmit time out ( about 15 minutes ). If use metallb, there is announcement in the same use case.

@AMacedoP
Copy link
Contributor Author

Hi @mgleung @sridhartigera, is there a fix planned to solve this issue? If not we could try solvivng it but we need pointers on where to find the error in the source code

@caseydavenport
Copy link
Member

I think the first thing to identify would be why the route isn't being advertised by the route reflector.

I am guessing that this is because the filters we create in the BIRD configuration aren't tuned correctly for the RR case, where the RR needs to allow export of routes that it learned from other speakers even if it doesn't have the route locally.

Suspect the fix is either in the route logic here: https://github.com/projectcalico/calico/blob/master/confd/pkg/backends/calico/routes.go

Or in the BIRD configuration files: https://github.com/projectcalico/calico/tree/master/confd/etc/calico/confd/templates

Or, both. Depending on what the simplest / most elegant solution seems to be 😁

@AMacedoP
Copy link
Contributor Author

I recreated my lab and indeed the problem is in the filters bird uses to export bgp routes.

In the calico_export_to_bgp_peers filter in /etc/calico/confd/config/bird_ipam.cfg the route for the externalTrafficPolicy: Local service is not present. When I manually add it to the RR node, bird announces it without problems.

I'll try and send a PR to solve it

@AMacedoP
Copy link
Contributor Author

@caseydavenport I've sent a PR, can you review it please?

@caseydavenport
Copy link
Member

@AMacedoP yep, I saw it and will take a look soon. It may be a few days as the holidays are a bit hectic and folks are taking time off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants