Skip to content

Conversation

@mlguerrero12
Copy link

These changes ensure that the local preference is unique among BGP ADVs with the same type of BGP update. It is still possible to have BGP ADVs with different values of localPref if the set of BGP peers are different. BGP ADVs with common BGP peers must not have common nodes. Checks are performed for BGP ADVs that have the same aggregation length.

Fixed #1739

@mlguerrero12
Copy link
Author

/retest

@mlguerrero12 mlguerrero12 force-pushed the localprefwebhook branch 4 times, most recently from 9645e39 to 8459c3b Compare February 6, 2023 14:29
return nil
}

func uniqueConfig(newAdv, adv *BGPAdvertisement) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's call this "advertisementsAreCompatible"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

// BGP ADVs with different set of nodes do not collide.
for node := range newAdv.Nodes {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's handle the case when one of the two maps is empty. I don't remember if when we get here, the nodes are already unrolled in the map but I wouldn't rely on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is totally sane and makes sense.
I am a bit afraid of a scenario where a new node is added (or a label is changed to a node), and that change makes the configuration invalid. That won't be caught by the webhook (because we don't have webhooks on nodes and we can't have them), so from a user's point of view the configuration will be invalid without an explicit reason.
I think that the most common use case is to have the localpref be differentiated on peer basis rather than on node basis, so I am tempted to say let's be more restrictive and not allow advertisements on different nodes to have different prefixes.

Thoughts? Also invoking @oribon

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nodes are already unrolled and it already covers the case when one of the maps is empty. It will simply not collide.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with you in your concern about adding a node. We need to come up with a solution here.

Regarding the use of localPref, I think it is all the opposite. The most common use case is with nodes. What's the point of setting the same localPref for all speakers against a specific peer? We could choose any value for localPref since it won't add anything to the updates. It will be useless in my opinion.

The way I've used localPref before was with the aim of giving preference to some updates in iBGP. In this case, users might select a set of nodes with higher preference to receive traffic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the late response
I'm a bit concerned about this approach, mainly because a change in the cluster which we can't listen for that suddenly makes the config bad feels wrong to me, especially when we (still) don't have an accessible way to expose if the running configuration is good or not.

there's a thing I'd like to get your opinion that I'm still not sure of - for all of the other fields that overlap between different resources targeting the same service (like communities) we union/add them together and suddenly with localpref we'd take an "intersection" approach, saying that they all have to be the same value for a given advertisement.
is it wrong to treat overlapping localprefs similarly, where "adding" them together means choosing the highest value for the advertisement?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we shouldn't go this way if we can't expose the state of the running configuration easily.

Indeed, community is another (and I think the only other) attribute that should ideally be unique. I think that our design was simply applying what it was present in the BGP Adv regardless of the attributes' values. It fit perfect for communities since it can only have 2 states (present or not), but not so well for localpref, as you saw it in the issue you reported. We could choose the highest value but I think we will need to implement some additional logic in the speaker (bgp controller). It might get complicated there. See the TODO in updateAds.

Another option is to act as a mutating webhook by simply overriding the value of localpref if there is another one present (or using the one with the highest value). This way we won't need changes in the speaker. We can return a warning saying that the localpref was changed so that users can be aware of it.

What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fedepaol, any thoughts on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even if we had a mutating webhook, we wouldn't be able to be always on the safe side because we might have concurrent add operations that sneak in together.
We'd have to invalidate the config anyway in this scenario.

I am almost convinced that the "newly added node" invalidating the config is not that terrible.

If the current configuration is valid, it means that the user took care of labelling the nodes in such a way that they are partiotioned wrt localpref (or that they just don't care).
If a new node comes in and invalidates the configuration, it means it wasn't labeled properly,

What is going to happen is:

  • the existing speakers will keep the old configuration, without the extra node, but they will keep running
  • the speaker running on the new node won't advertise

also, any new config change won't have any effect.
Having the same node aimed by two localprefs is not right, the current behaviour is not predictable and resorting to priorities will just hide a misconfiguration under the carpet, because what a user really wants is to apply one single localpref from one node. They will find metallb is behaving, it's just that the overridden localpref might be more subtle to find, compared to metallb not working at all on one node (plus we have metrics for stale configs and we will land https://github.com/metallb/metallb/pull/1685/files soon or later)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so we continue with this approach then. I fixed the unit tests.

// Verify that BGP ADVs set a unique local preference value per BGP update.
for _, bgpAdv := range pool.BGPAdvertisements {
if adv.LocalPref != bgpAdv.LocalPref {
if adv.AggregationLength == bgpAdv.AggregationLength || adv.AggregationLengthV6 == bgpAdv.AggregationLengthV6 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's embed this if in the "are compatible" function

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- get
- list
- watch
- apiGroups:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case we decide we want to validate against nodes, we'll have to add this to the charts too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I saw the tests failing due to this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create a PR on the operator for the same


// Verify that BGP ADVs set a unique local preference value per BGP update.
for _, bgpAdv := range pool.BGPAdvertisements {
if adv.LocalPref != bgpAdv.LocalPref {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can embed this one in the function I guess

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. No need to insert the attribute in the function. We might use it in the future for something else.

}

func advertisementsAreCompatible(newAdv, adv *BGPAdvertisement) bool {
if adv.AggregationLength == newAdv.AggregationLength || adv.AggregationLengthV6 == newAdv.AggregationLengthV6 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be flipped and return (I hope :D )

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mlguerrero12 mlguerrero12 force-pushed the localprefwebhook branch 2 times, most recently from f6e0b7c to 2cef742 Compare February 7, 2023 07:59
@mlguerrero12 mlguerrero12 force-pushed the localprefwebhook branch 2 times, most recently from 6d9cee5 to 1d02c50 Compare March 14, 2023 11:19
@mlguerrero12
Copy link
Author

@fedepaol, is there a way to link a pr from the operator to this pr so that the ci doesn't fail?

@fedepaol
Copy link
Member

lgtm, thanks and sorry again for the delay!

@fedepaol fedepaol added this pull request to the merge queue Mar 20, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 20, 2023
@mlguerrero12
Copy link
Author

@fedepaol, e2e-use-operator will always fail

@fedepaol fedepaol added this pull request to the merge queue Mar 20, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Mar 20, 2023
These changes ensure that the local preference is unique among BGP ADVs
with the same type of BGP update. It is still possible to have
BGP ADVs with different values of localPref if the set of BGP peers are different.
BGP ADVs with common BGP peers must not have common nodes. Checks are performed
for BGP ADVs that have the same aggregation length.

Fixed metallb#1739

Signed-off-by: Marcelo Guerrero Viveros <marguerr@redhat.com>
@fedepaol fedepaol enabled auto-merge March 21, 2023 16:00
@fedepaol fedepaol added this pull request to the merge queue Mar 21, 2023
@fedepaol fedepaol merged commit 7b6d91d into metallb:main Mar 21, 2023
mlguerrero12 pushed a commit to mlguerrero12/metallb-operator that referenced this pull request Mar 22, 2023
Bump metallb to the latest version mainly due to changes
in the cluster role of the controller. See metallb/metallb#1820.

Signed-off-by: Marcelo Guerrero Viveros <marguerr@redhat.com>
mlguerrero12 pushed a commit to mlguerrero12/metallb-operator that referenced this pull request Mar 22, 2023
Bump metallb to the latest version mainly due to changes
in the cluster role of the controller. See metallb/metallb#1820.

Signed-off-by: Marcelo Guerrero Viveros <marguerr@redhat.com>
fedepaol pushed a commit to metallb/metallb-operator that referenced this pull request Mar 23, 2023
Bump metallb to the latest version mainly due to changes
in the cluster role of the controller. See metallb/metallb#1820.

Signed-off-by: Marcelo Guerrero Viveros <marguerr@redhat.com>
mlguerrero12 pushed a commit to mlguerrero12/metallb-operator that referenced this pull request Mar 28, 2023
Bump metallb to the latest version mainly due to changes
in the cluster role of the controller. See metallb/metallb#1820.

Signed-off-by: Marcelo Guerrero Viveros <marguerr@redhat.com>
mlguerrero12 pushed a commit to mlguerrero12/metallb-operator that referenced this pull request Mar 28, 2023
Bump metallb to the latest version mainly due to changes
in the cluster role of the controller. See metallb/metallb#1820.

Signed-off-by: Marcelo Guerrero Viveros <marguerr@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

localPref behavior not uniform

3 participants