Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the frr configuration template #1714

Merged
merged 7 commits into from Nov 25, 2022
Merged

Conversation

fedepaol
Copy link
Member

@fedepaol fedepaol commented Nov 16, 2022

Removing duplication and isolating separate parts in separate subtemplates to make it a bit easier to navigate it.

We also change the way the prefixes are being added, because the previous version was misleading in how the configuration was rendered, making the reader think that the prefixes are advertised per neighbor while they are per router (and we have per neighbor specific filters to choose what prefixes should go to a given neighbor).

Fixes #1715

@fedepaol
Copy link
Member Author

Fixes #1715

@fedepaol fedepaol force-pushed the refactorff branch 2 times, most recently from cd5e4f9 to c11274f Compare November 18, 2022 09:16
Comment on lines -347 to -356
err = metrics.ValidateCounterValue(metrics.GreaterThan(0), "metallb_bgp_notifications_sent", map[string]string{"peer": addr}, speakerMetrics)
if err != nil {
return err
}

err = metrics.ValidateOnPrometheus(promPod, fmt.Sprintf(`metallb_bgp_notifications_sent{peer="%s"} >= 1`, addr), metrics.There)
if err != nil {
return err
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it non-deterministic?
When we configure the bgp peers and advertise the ip don't we always get bgp notification sent?
I don't remember seeing flakes here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we always had notifications because of this #1715
With this new layout, we don't have that change anymore, so the notification is not sent in that scenario (while before, it was ALWAYS been sent), but it still sent sometimes.

Copy link
Member

@oribon oribon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nits, other than that looks good :)

}).Parse(configTemplate)
"dict": func(values ...interface{}) (map[string]interface{}, error) {
if len(values)%2 != 0 {
return nil, errors.New("invalid dict call")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wdyt about having the err be a little bit more detailed? something like "dict expects even amount of values, got %d"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't hurt (even though this kind of mistake will be easily caught by CI), will add

for i := 0; i < len(values); i += 2 {
key, ok := values[i].(string)
if !ok {
return nil, errors.New("dict keys must be strings")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same here, something like "got a non-string key: %v %T"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
return dict, nil
},
}).ParseFS(templates, "templates/*")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat

Removing duplication and isolating separate parts in separate
subtemplates to make it a bit easier to navigate it.

We also change the way the prefixes are being added, because the
previous version was misleading in how the configuration was rendered,
making the reader think that the prefixes are advertised per neighbor
while they are per router (and we have per neighbor specific filters to
choose what prefixes should go to a given neighbor).

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
We move the files to a "templates" dir to make navigability easier and
to leverage syntax highlighting. We leverage the go embed feature to
ship them as part of the binary.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
The metallb_bgp_notifications_sent metric is not deterministic and
there's no way to trigger it from configuring metallb. Because of this,
we stop checking the metric.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
Copy link
Member

@oribon oribon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@liornoy liornoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments after taking a closer look at the .golden files

network 172.16.1.10/24
network 172.16.1.10/24
network 172.16.1.11/24
network 172.16.1.11/24
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need the IP address duplication here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, I can improve it

address-family ipv4 unicast
network 172.16.1.10/24
network 172.16.1.11/24
network 172.16.1.11/24
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplication here as well.

exit-address-family
address-family ipv6 unicast
neighbor 10.4.4.255 activate
neighbor 10.4.4.255 route-map 10.4.4.255-in in
neighbor 10.4.4.255 route-map 10.4.4.255-out out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had this extra indentation in the past, did FRR parse it as usual and functioned as required,
or did we miss some use cases in the tests and missed that bug?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a mistake that has been fixed here. It was working but the file was rendered poorly

Comment on lines 294 to 296
router.IPV4Prefixes = append(router.IPV4Prefixes, prefix)
case ipfamily.IPv6:
router.IPV6Prefixes = append(router.IPV6Prefixes, prefix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should check here if the prefix already exists in the slice, to prevent duplications.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's the if that was supposed to prevent dupes. Need to check why it didn't work

Comment on lines +37 to +41
address-family ipv6 unicast
neighbor 10.2.2.254 activate
neighbor 10.2.2.254 route-map 10.2.2.254-in in
neighbor 10.2.2.254 route-map 10.2.2.254-out out
exit-address-family
Copy link
Contributor

@liornoy liornoy Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From looking at the TestSingleAdvertisementChange test It shouldn't have an ipv6 family here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This the only effective change introduced by this PR. We always add the peers to both IPv4 and IPv6 families instead of adding to both if there are no advertisements, and adding them only to the family where we have advertisements in case at least one advertisement is present (which causes the route flapping), as explained in #1715

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, now I get it. thanks,

Comment on lines +52 to +56
address-family ipv6 unicast
neighbor 10.2.2.254 activate
neighbor 10.2.2.254 route-map 10.2.2.254-in in
neighbor 10.2.2.254 route-map 10.2.2.254-out out
exit-address-family
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From looking at the TestSingleAdvertisement test, I think we shouldn't have this here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Comment on lines +36 to +40
address-family ipv6 unicast
neighbor 10.2.2.254 activate
neighbor 10.2.2.254 route-map 10.2.2.254-in in
neighbor 10.2.2.254 route-map 10.2.2.254-out out
exit-address-family
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Comment on lines +52 to +56
address-family ipv6 unicast
neighbor 10.2.2.254 activate
neighbor 10.2.2.254 route-map 10.2.2.254-in in
neighbor 10.2.2.254 route-map 10.2.2.254-out out
exit-address-family
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

We have a few config fields (or subfields) that are maps, which makes
the rendering non deterministic because ranging over a map doesn't give
guarantee over the order. So, we decouple the internal representation of
the config from the one we feed the template with, and we feed the
template with slices instead of maps.

Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
@fedepaol
Copy link
Member Author

So, this turned out to be the occasion also to refactor the fact that we have maps we iterate on and so the rendering is not always consistent. I still wonder why we never hit this in CI....

Comment on lines +304 to +306
rout.ipV4Prefixes[prefix] = prefix
case ipfamily.IPv6:
rout.ipV6Prefixes[prefix] = prefix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we make rout.ipV4Prefixes and rout.ipV6Prefixes to be slices?
I don't think maps are required here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On top of that, I think that we should refactor the whole createConfig() function in the sense
of separating into smaller functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The maps are to avoid duplicates

Copy link
Member Author

@fedepaol fedepaol Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On top of that, I think that we should refactor the whole createConfig() function in the sense of separating into smaller functions.

I am not sure it would be better. I kind of like the fact the rendering/conversion mechanism is clear. The lenght is mostly because of literals that take multiple lines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracting to separate, smaller, and more atomic functions can only improve clarity IMO.
But anyways, I see you addressed all the comments I wrote above and it's LGTM from me. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am generally in favour of hiding complexity behind funcitons with meaningful names, but long function = not clear is not always necessarily true. If you need to jump between the loop and the subfunctions to understand the behaviour, the clarity is less.

This function is easy to follow, it's long but as I wrote, the biggest reason for the length is the literal initializiatoins that consume multiple rows.

@fedepaol fedepaol merged commit 0e0952a into metallb:main Nov 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FRR: Publishing a new service causes the bgp session to flake
3 participants