-
Notifications
You must be signed in to change notification settings - Fork 1k
Metallb FRR integration #1014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metallb FRR integration #1014
Conversation
This commit implements a hook to apply the latest BGP configuration to FRR. It does so by sending a SIGUP to the FRR reloader process which in turn will reprogram FRR via the frr-reload.py tool. Signed-off-by: Carlos Goncalves <cgoncalves@redhat.com>
49ff189 to
b88845f
Compare
|
Double checked the artifacts, at least the frr-ipv4 lane is really running in frr mode |
tasks.py
Outdated
| "GO111MODULE": "on", | ||
| } | ||
| if "speaker" in binaries: | ||
| run("cp frr-reloader/frr-reloader.sh build/{arch}/speaker/".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how much it matters but the rest of the script uses shutil.copy/copyfile to copy
manifests/metallb_frr.yaml
Outdated
| @@ -0,0 +1,611 @@ | |||
| # This is a WIP temporary file and will be cleaned or changed until FRR is integrated. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a request to add a bit more info #935 (comment) (this could maybe help oribon@76ef2e3)
internal/bgp/frr/config.go
Outdated
|
|
||
| return nil | ||
| var reloadConfig = func(sf *singleflight.Group) error { | ||
| _, err, _ := sf.Do("reload", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
| address-family ipv4 unicast | ||
| neighbor 10.2.2.254 activate | ||
| address-family ipv6 unicast | ||
| neighbor 10.2.2.254 activate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't right to active v4 neigh from v6 AF ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure about this, but I think it's possible to advertise an ip6 family to a neighbour reachable via ipv4 (and viceversa) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be the same as https://github.com/metallb/metallb/pull/1014/files/b88845f28a5aa8ac430c75678d8414dec856ea14#diff-1d9a33c302812585b1b2d1897f84126492129d1f9415786a73dcfa26603686c9R53 (but we need to validate it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the config just I doubt if it will work as is i have seen some doc about how to get this to work
Enable
Configure terminal
Route-Map IPV6 permit 10
Set ipv6 next-hop FE80::F816:3EFF:FE40:2985
Exit
Router bgp 1
Neighbor 2.0.0.2 remote-as 2
Address-family ipv6 unicast
Neighbor 2.0.0.2 activate
Neighbor 2.0.0.2 route-map IPV6 out
Network 2002:2::2/128
might be way more involved than u think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is for enabling the session only (address-family without routes) to avoid the issue described here #1014 (comment)
I think we'll need to hammer our way through v6 / dual stack anyway, so I can either remove it or leave it as it is since it's not harming v4 e2e tests.
| neighbor {{$n.Addr}} activate | ||
| address-family ipv6 unicast | ||
| neighbor {{$n.Addr}} activate | ||
| exit-address-family |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this for testing ? we can't have bgp peer with having addresspool config if that happened it should have caught eariler or should have been caught if that isn't the case ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's to mimic the same behaviour we have now.
Took me a while to find it. Without a service, frr won't establish a connection with the peer because of the no bgp default ipv4-unicast.
"By default, only the IPv4 unicast address family is announced to all neighbors. Using the ‘no bgp default ipv4-unicast’ configuration overrides this default so that all address families need to be enabled explicitly."
So without this, FRR will complain of no address families configured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that was my question I did noticed too if no service no bgp neighbors I couldn't answer if this is an ok thing or not why do we care of having bgp peering if we don't have any network to advertise , I don't know hence asked that queston
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure it's right or not, but the current implementation does that and it did not seem bad to replicate that.
Also, that would be the default frr behaviour if the no bgp default ipv4-unicast parameter (which is needed to allow both ipv4 and ipv6 to co-exist) wasn't added.
internal/bgp/frr/config.go
Outdated
| } | ||
| return nil, nil | ||
| }) | ||
| return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is interesting package so the trick here is the sleep 2s u wanted to prevent doing back2back reload for 2sec window ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single flight squash all the calls that happen while one is in progress.
Adding the sleep will make the call last longer and will squash all the configuration requests into a single one (with the drawback of 2 seconds timeout). As a second thought, I can make the timeout even shorter since we care about squashing the calls caused by the same config change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point about blocking makes total sense and I did not think about it. This will be likely to be called many subsequent times from the single goroutine, so the effect will be of delaying instead of really squashing them.
Let me put a goroutine in the middle so it will sort out the desired effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any possibility for this working async now that we could fire frr reload with hlaf baked config ? from what I have seen FRR will drop the entire config if one line is incorrect, just thinking loud here since we don't have any idea when the reload took effect debugging invalid config will be interesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Configuration updates happen in discrete steps, and each step is supposed to provide a valid config (not meaning that we won't have bugs). What I am saying is that we won't produce a half baked config, we'll produce a config with less stuff that the caller expects:
What could happen here, is that the timeout falls in the middle (i.e. we are looping here
metallb/speaker/bgp_controller.go
Line 172 in 9cb1653
| func (c *bgpController) syncPeers(l log.Logger) error { |
metallb/speaker/bgp_controller.go
Line 269 in 9cb1653
| if err := peer.session.Set(allAds...); err != nil { |
With that, we would have less addresses announced, but the configuration (bugs aside) will still be valid.
In any case, I am torn about this because as it is now it's just introducing a delay and not working asynchronously and I don't want this to hold the integration to get in. Will remove the commit and rework it in a separate PR, using goroutines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the reloading optimization, will handle it in a separate PR
tasks.py
Outdated
| import sys | ||
| import yaml | ||
| import tempfile | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please re-do this commit without all of the unrelated formatting changes? @cgoncalves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's actually my fault :-)
The "no bgp default ipv4-unicast" prevents the pairing with the remote peer if no address families are defined. The session remains in IDLE for "No AFI/SAFI activated for peer". To prevent this, we activate ipv4 unicast for those peers with no advertisements. Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
In order to reload the frr configuration, we add the frr-reloader.sh script to the container. Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
Signed-off-by: Carlos Goncalves <cgoncalves@redhat.com>
This change adds a new --bgp-type/-b option to dev-env to define which BGP type implementation should be deployed. Options are "native" (default) and "frr". $ inv dev -p bgp -b native $ inv dev -p bgp -b frr Signed-off-by: Carlos Goncalves <cgoncalves@redhat.com>
cb475b0 to
b6010b3
Compare
Adding manifests that integrate FRR and the reloader with the current speaker deployment. The manifests were created using the current manifests, adding a sidecar FRR and reloader containers. They also include the ConfigMap for the initial FRR files and also an initContainer to set everything up. Furthermore, shareProcessNamespace: true was added to allow the speaker container to signal the reloader container, the speaker and reloader share a volume where it would place the rendered configuration file, while the reloader and FRR share the FRR configuration and sockets volumes so the reloader can communicate with the FRR container via vtysh and update its configuration. Signed-off-by: Ori Braunshtein <obraunsh@redhat.com>
| if [ "<< parameters.ip-family >>" = "ipv6" ]; then SKIP="--skip IPV4\|BGP"; fi # TODO: we are currently skipping BGP for ipv6, re enable when we enable the support | ||
| inv e2etest $SKIP -e /tmp/kind_logs | ||
| SKIP="none" | ||
| if [ "<< parameters.bgp-type >>" = "frr" ]; then SKIP="$SKIP\|metrics"; fi # TODO: enable metrics tests when implementing in FRR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced with this SKIP approach for two reasons:
- hard to human-parse and likely not scaling that well
- it will run CI jobs that all they do is setup the environment and skip all tests, while reporting as "succeed" giving wrong idea that they work and are supported.
Can't we add invalid/unsupported topologies to the exclude list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are temporary. My expectation is that we are gonna remove the skips in a matter of days, and for the metric thing it does not make any sense to add a new combination to skip.
|
Ugh, PR approved while reviewing it. |
This PR connects the dots in FRR integration by collecting the remaining bits from #969 and: