New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the frr configuration template #1714
Conversation
a7831d1
to
def6e92
Compare
Fixes #1715 |
cd5e4f9
to
c11274f
Compare
err = metrics.ValidateCounterValue(metrics.GreaterThan(0), "metallb_bgp_notifications_sent", map[string]string{"peer": addr}, speakerMetrics) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
err = metrics.ValidateOnPrometheus(promPod, fmt.Sprintf(`metallb_bgp_notifications_sent{peer="%s"} >= 1`, addr), metrics.There) | ||
if err != nil { | ||
return err | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it non-deterministic?
When we configure the bgp peers and advertise the ip don't we always get bgp notification sent?
I don't remember seeing flakes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we always had notifications because of this #1715
With this new layout, we don't have that change anymore, so the notification is not sent in that scenario (while before, it was ALWAYS been sent), but it still sent sometimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nits, other than that looks good :)
internal/bgp/frr/config.go
Outdated
}).Parse(configTemplate) | ||
"dict": func(values ...interface{}) (map[string]interface{}, error) { | ||
if len(values)%2 != 0 { | ||
return nil, errors.New("invalid dict call") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wdyt about having the err be a little bit more detailed? something like "dict expects even amount of values, got %d"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't hurt (even though this kind of mistake will be easily caught by CI), will add
internal/bgp/frr/config.go
Outdated
for i := 0; i < len(values); i += 2 { | ||
key, ok := values[i].(string) | ||
if !ok { | ||
return nil, errors.New("dict keys must be strings") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same here, something like "got a non-string key: %v %T"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
return dict, nil | ||
}, | ||
}).ParseFS(templates, "templates/*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neat
Removing duplication and isolating separate parts in separate subtemplates to make it a bit easier to navigate it. We also change the way the prefixes are being added, because the previous version was misleading in how the configuration was rendered, making the reader think that the prefixes are advertised per neighbor while they are per router (and we have per neighbor specific filters to choose what prefixes should go to a given neighbor). Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
We move the files to a "templates" dir to make navigability easier and to leverage syntax highlighting. We leverage the go embed feature to ship them as part of the binary. Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
The metallb_bgp_notifications_sent metric is not deterministic and there's no way to trigger it from configuring metallb. Because of this, we stop checking the metric. Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments after taking a closer look at the .golden files
network 172.16.1.10/24 | ||
network 172.16.1.10/24 | ||
network 172.16.1.11/24 | ||
network 172.16.1.11/24 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need the IP address duplication here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, I can improve it
address-family ipv4 unicast | ||
network 172.16.1.10/24 | ||
network 172.16.1.11/24 | ||
network 172.16.1.11/24 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplication here as well.
exit-address-family | ||
address-family ipv6 unicast | ||
neighbor 10.4.4.255 activate | ||
neighbor 10.4.4.255 route-map 10.4.4.255-in in | ||
neighbor 10.4.4.255 route-map 10.4.4.255-out out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we had this extra indentation in the past, did FRR parse it as usual and functioned as required,
or did we miss some use cases in the tests and missed that bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a mistake that has been fixed here. It was working but the file was rendered poorly
internal/bgp/frr/frr.go
Outdated
router.IPV4Prefixes = append(router.IPV4Prefixes, prefix) | ||
case ipfamily.IPv6: | ||
router.IPV6Prefixes = append(router.IPV6Prefixes, prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should check here if the prefix already exists in the slice, to prevent duplications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's the if that was supposed to prevent dupes. Need to check why it didn't work
address-family ipv6 unicast | ||
neighbor 10.2.2.254 activate | ||
neighbor 10.2.2.254 route-map 10.2.2.254-in in | ||
neighbor 10.2.2.254 route-map 10.2.2.254-out out | ||
exit-address-family |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From looking at the TestSingleAdvertisementChange
test It shouldn't have an ipv6 family here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This the only effective change introduced by this PR. We always add the peers to both IPv4 and IPv6 families instead of adding to both if there are no advertisements, and adding them only to the family where we have advertisements in case at least one advertisement is present (which causes the route flapping), as explained in #1715
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, now I get it. thanks,
address-family ipv6 unicast | ||
neighbor 10.2.2.254 activate | ||
neighbor 10.2.2.254 route-map 10.2.2.254-in in | ||
neighbor 10.2.2.254 route-map 10.2.2.254-out out | ||
exit-address-family |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From looking at the TestSingleAdvertisement
test, I think we shouldn't have this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
address-family ipv6 unicast | ||
neighbor 10.2.2.254 activate | ||
neighbor 10.2.2.254 route-map 10.2.2.254-in in | ||
neighbor 10.2.2.254 route-map 10.2.2.254-out out | ||
exit-address-family |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
address-family ipv6 unicast | ||
neighbor 10.2.2.254 activate | ||
neighbor 10.2.2.254 route-map 10.2.2.254-in in | ||
neighbor 10.2.2.254 route-map 10.2.2.254-out out | ||
exit-address-family |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
We have a few config fields (or subfields) that are maps, which makes the rendering non deterministic because ranging over a map doesn't give guarantee over the order. So, we decouple the internal representation of the config from the one we feed the template with, and we feed the template with slices instead of maps. Signed-off-by: Federico Paolinelli <fpaoline@redhat.com>
So, this turned out to be the occasion also to refactor the fact that we have maps we iterate on and so the rendering is not always consistent. I still wonder why we never hit this in CI.... |
rout.ipV4Prefixes[prefix] = prefix | ||
case ipfamily.IPv6: | ||
rout.ipV6Prefixes[prefix] = prefix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we make rout.ipV4Prefixes
and rout.ipV6Prefixes
to be slices?
I don't think maps are required here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On top of that, I think that we should refactor the whole createConfig() function in the sense
of separating into smaller functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The maps are to avoid duplicates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On top of that, I think that we should refactor the whole createConfig() function in the sense of separating into smaller functions.
I am not sure it would be better. I kind of like the fact the rendering/conversion mechanism is clear. The lenght is mostly because of literals that take multiple lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracting to separate, smaller, and more atomic functions can only improve clarity IMO.
But anyways, I see you addressed all the comments I wrote above and it's LGTM from me. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am generally in favour of hiding complexity behind funcitons with meaningful names, but long function = not clear is not always necessarily true. If you need to jump between the loop and the subfunctions to understand the behaviour, the clarity is less.
This function is easy to follow, it's long but as I wrote, the biggest reason for the length is the literal initializiatoins that consume multiple rows.
Removing duplication and isolating separate parts in separate subtemplates to make it a bit easier to navigate it.
We also change the way the prefixes are being added, because the previous version was misleading in how the configuration was rendered, making the reader think that the prefixes are advertised per neighbor while they are per router (and we have per neighbor specific filters to choose what prefixes should go to a given neighbor).
Fixes #1715