Skip to content

Conversation

@fedepaol
Copy link
Member

@fedepaol fedepaol commented Oct 30, 2021

This PR connects the dots in FRR integration by collecting the remaining bits from #969 and:

  • adding the reloader to the speaker image
  • implementing the reload function
  • adding a metallb_frr.yaml manifest to deploy metallb in frr experimental mode
  • adding a CI lane that tests bgp in frr mode

This commit implements a hook to apply the latest BGP configuration to
FRR. It does so by sending a SIGUP to the FRR reloader process which in
turn will reprogram FRR via the frr-reload.py tool.

Signed-off-by: Carlos Goncalves <cgoncalves@redhat.com>
@fedepaol fedepaol force-pushed the integrate branch 2 times, most recently from 49ff189 to b88845f Compare October 30, 2021 14:55
@fedepaol
Copy link
Member Author

Double checked the artifacts, at least the frr-ipv4 lane is really running in frr mode

tasks.py Outdated
"GO111MODULE": "on",
}
if "speaker" in binaries:
run("cp frr-reloader/frr-reloader.sh build/{arch}/speaker/".format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how much it matters but the rest of the script uses shutil.copy/copyfile to copy

@@ -0,0 +1,611 @@
# This is a WIP temporary file and will be cleaned or changed until FRR is integrated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a request to add a bit more info #935 (comment) (this could maybe help oribon@76ef2e3)


return nil
var reloadConfig = func(sf *singleflight.Group) error {
_, err, _ := sf.Do("reload",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

address-family ipv4 unicast
neighbor 10.2.2.254 activate
address-family ipv6 unicast
neighbor 10.2.2.254 activate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't right to active v4 neigh from v6 AF ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure about this, but I think it's possible to advertise an ip6 family to a neighbour reachable via ipv4 (and viceversa) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the config just I doubt if it will work as is i have seen some doc about how to get this to work

Enable
Configure terminal
Route-Map IPV6 permit 10
Set ipv6 next-hop FE80::F816:3EFF:FE40:2985
Exit
Router bgp 1
Neighbor 2.0.0.2 remote-as 2
Address-family ipv6 unicast
Neighbor 2.0.0.2 activate
Neighbor 2.0.0.2 route-map IPV6 out
Network 2002:2::2/128

might be way more involved than u think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is for enabling the session only (address-family without routes) to avoid the issue described here #1014 (comment)

I think we'll need to hammer our way through v6 / dual stack anyway, so I can either remove it or leave it as it is since it's not harming v4 e2e tests.

neighbor {{$n.Addr}} activate
address-family ipv6 unicast
neighbor {{$n.Addr}} activate
exit-address-family
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this for testing ? we can't have bgp peer with having addresspool config if that happened it should have caught eariler or should have been caught if that isn't the case ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's to mimic the same behaviour we have now.
Took me a while to find it. Without a service, frr won't establish a connection with the peer because of the no bgp default ipv4-unicast.

"By default, only the IPv4 unicast address family is announced to all neighbors. Using the ‘no bgp default ipv4-unicast’ configuration overrides this default so that all address families need to be enabled explicitly."
So without this, FRR will complain of no address families configured.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was my question I did noticed too if no service no bgp neighbors I couldn't answer if this is an ok thing or not why do we care of having bgp peering if we don't have any network to advertise , I don't know hence asked that queston

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure it's right or not, but the current implementation does that and it did not seem bad to replicate that.
Also, that would be the default frr behaviour if the no bgp default ipv4-unicast parameter (which is needed to allow both ipv4 and ipv6 to co-exist) wasn't added.

}
return nil, nil
})
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is interesting package so the trick here is the sleep 2s u wanted to prevent doing back2back reload for 2sec window ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single flight squash all the calls that happen while one is in progress.
Adding the sleep will make the call last longer and will squash all the configuration requests into a single one (with the drawback of 2 seconds timeout). As a second thought, I can make the timeout even shorter since we care about squashing the calls caused by the same config change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point about blocking makes total sense and I did not think about it. This will be likely to be called many subsequent times from the single goroutine, so the effect will be of delaying instead of really squashing them.
Let me put a goroutine in the middle so it will sort out the desired effect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any possibility for this working async now that we could fire frr reload with hlaf baked config ? from what I have seen FRR will drop the entire config if one line is incorrect, just thinking loud here since we don't have any idea when the reload took effect debugging invalid config will be interesting

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configuration updates happen in discrete steps, and each step is supposed to provide a valid config (not meaning that we won't have bugs). What I am saying is that we won't produce a half baked config, we'll produce a config with less stuff that the caller expects:

What could happen here, is that the timeout falls in the middle (i.e. we are looping here

func (c *bgpController) syncPeers(l log.Logger) error {
and the threshold happens half way through
if err := peer.session.Set(allAds...); err != nil {
)

With that, we would have less addresses announced, but the configuration (bugs aside) will still be valid.

In any case, I am torn about this because as it is now it's just introducing a delay and not working asynchronously and I don't want this to hold the integration to get in. Will remove the commit and rework it in a separate PR, using goroutines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the reloading optimization, will handle it in a separate PR

tasks.py Outdated
import sys
import yaml
import tempfile

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please re-do this commit without all of the unrelated formatting changes? @cgoncalves

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually my fault :-)

fedepaol and others added 4 commits November 2, 2021 17:29
The "no bgp default ipv4-unicast" prevents the pairing with the remote
peer if no address families are defined. The session remains in IDLE for
"No AFI/SAFI activated for peer". To prevent this, we activate ipv4
unicast for those peers with no advertisements.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
In order to reload the frr configuration, we add the frr-reloader.sh
script to the container.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
Signed-off-by: Carlos Goncalves <cgoncalves@redhat.com>
This change adds a new --bgp-type/-b option to dev-env to define which
BGP type implementation should be deployed. Options are "native"
(default) and "frr".

  $ inv dev -p bgp -b native
  $ inv dev -p bgp -b frr

Signed-off-by: Carlos Goncalves <cgoncalves@redhat.com>
@fedepaol fedepaol force-pushed the integrate branch 2 times, most recently from cb475b0 to b6010b3 Compare November 2, 2021 16:31
Adding manifests that integrate FRR and the reloader with the
current speaker deployment.

The manifests were created using the current manifests, adding a sidecar
FRR and reloader containers. They also include the ConfigMap for the initial
FRR files and also an initContainer to set everything up.
Furthermore, shareProcessNamespace: true was added to allow the speaker container
to signal the reloader container, the speaker and reloader share a volume where
it would place the rendered configuration file, while the reloader and FRR share
the FRR configuration and sockets volumes so the reloader can communicate with
the FRR container via vtysh and update its configuration.

Signed-off-by: Ori Braunshtein <obraunsh@redhat.com>
@russellb russellb merged commit ca2f0da into metallb:main Nov 2, 2021
if [ "<< parameters.ip-family >>" = "ipv6" ]; then SKIP="--skip IPV4\|BGP"; fi # TODO: we are currently skipping BGP for ipv6, re enable when we enable the support
inv e2etest $SKIP -e /tmp/kind_logs
SKIP="none"
if [ "<< parameters.bgp-type >>" = "frr" ]; then SKIP="$SKIP\|metrics"; fi # TODO: enable metrics tests when implementing in FRR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced with this SKIP approach for two reasons:

  1. hard to human-parse and likely not scaling that well
  2. it will run CI jobs that all they do is setup the environment and skip all tests, while reporting as "succeed" giving wrong idea that they work and are supported.

Can't we add invalid/unsupported topologies to the exclude list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are temporary. My expectation is that we are gonna remove the skips in a matter of days, and for the metric thing it does not make any sense to add a new combination to skip.

@cgoncalves
Copy link
Contributor

Ugh, PR approved while reviewing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants