RHEL-97088: set slaves down before bond reactivation #5229

lxin · 2025-08-07T18:22:54Z

Some Cisco switches disable peer ports upon receiving LACP packets with agg=0 during negotiation, disrupting the network. Cisco attributes this behavior to ambiguity in the IEEE spec.

Such packets are sent when slaves are removed during NetworkManager bond reactivation. To prevent this, set slaves down before reactivation. This is a safe change, as NetworkManager bond reactivation will re-add and bring them up.

Some Cisco switches disable peer ports upon receiving LACP packets with agg=0 during negotiation, disrupting the network. Cisco attributes this behavior to ambiguity in the IEEE spec. Such packets are sent when slaves are removed during NetworkManager bond reactivation. To prevent this, set slaves down before reactivation. This is a safe change, as NetworkManager bond reactivation will re-add and bring them up. Signed-off-by: Xin Long <lucien.xin@gmail.com>

openshift-ci-robot · 2025-08-07T18:22:58Z

@lxin: This pull request references RHEL-97088 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.20.0" version, but no target version was set.

In response to this:

Some Cisco switches disable peer ports upon receiving LACP packets with agg=0 during negotiation, disrupting the network. Cisco attributes this behavior to ambiguity in the IEEE spec.

Such packets are sent when slaves are removed during NetworkManager bond reactivation. To prevent this, set slaves down before reactivation. This is a safe change, as NetworkManager bond reactivation will re-add and bring them up.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-08-07T18:23:27Z

Hi @lxin. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci · 2025-08-07T18:23:29Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: lxin
Once this PR has been reviewed and has the lgtm label, please assign cheesesashimi for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

lxin · 2025-08-19T14:05:42Z

@jcaamano

jcaamano · 2025-08-19T14:22:53Z

So throughout the existing code there is a very clear intention to not deactivate bond slaves. The reason for this is because we have had to deal with races in NetworkManager when we inadvertently cause multiple slaves to recycle at the same time where the bond ended up inactive.

Forcefully deactivating a slave, while achievable with a one line change as shown on this PR, really makes a lot of the other surrounding things being said and done meaningless:

For example the prefacing comment expects slaves to be already active but it doesn't make any sense to keep that comment if we are bringing it down forcefully.
Later on we also wait on the connection to be implicitly active, but that does't make sense if we are bringing it down ourselves because then we could just synchronously recycle the connection ourselves.

Other comments that I have:

Why are we bringing the link down behind the back of NetworkManager instead of using NetworkManager to do it?
When is that link brought back up? Is that done by Network Manager on its own accord or is it done by us a few lines down?
What NetworkManager profile is going to be used to bring the link back up after it is has been brought down behind NetworkManager's back? The old one or the cloned one?
What is the impact that it is going to have on the other slave? If a slave reactivates with a different profile, it might reactivate the master as well, reactivating the other slave as well. And then if two slaves recycle at the same time we are on that potential NM race territory.

This feels like a hack (just as what we already have in place, but that makes me so less inclined to change it) to fix a specific problem that might have an overall impact and does not consider the general approach of the surrounding code. We have other approaches available for customers to configure their own network (instead of relying on this script) that seem more appropriate to address customer specific circumstances. The primary reason we are providing that alternative is to avoid overloading this script which has clearly reached its limits and we are looking forward to phase out. Continuing to address specific circumstances as if we could actually manage them for a customer here is not our preferred way forward either.

I know that we have some test coverage for bonds, but I am not sure how good it is and how would we go about verifying this type of change. That is another reason I don't feel inclined to change this just to address a specific manufacturer issue. But @rbbratta knows more about this so I will let him speak up.

So my recap is:

This change seems to be doing the opposite of what the surrounding code is trying to achieve making the overall state of things to not make much sense.
I am not so sure this change is as safe as we think it is in general.

cc @cybertron

bengal · 2025-08-19T17:04:06Z

Hi,

I'm not very familiar with this script, I am describing what I have found and discussed with Xin and bringing the NM perspective. Sorry for the long comment, but I wanted to try explain clearly how (I think) the patch works.

--

What happens without this patch is the following. Initially there is e.g. a bond bond0 with a port (slave) eth0 attached. During the execution of the script the bond gets moved under a OVS bridge. Then at the end activate_nm_connections() gets called to ensure that all the needed NM connections are active.

Eventually, activate_nm_connections() activates the OVS bridge. Since all connections have property autoconnect-slaves enabled, all dependent connections are also (re-)activated by NM.

When NM is asked to reactivate bond0, it first detaches the port from the bond. When doing this, the kernel sends a LACP packet to the switch to signal that the port is no longer part of a channel aggregation. Then NM proceeds in re-configuring bond0 and eth0, and reattaches the port to the bond.

As mention by Xin, the switch reacts to the LACP packet by completely disabling the port, which is a behavior we want to prevent.

--

So the proposed solution is this. Before (re-)activating the OVS bridge (and so, also the bond and its port), we manually set the port link down with iproute2. Note that this doesn't have any effect on NM. The old profile stays up according to NM, it's only a change that affects the kernel link.

The only difference is that when NM reactivates the bond and detaches the ethernet port, now the kernel doesn't send the LACP packet because the interface is down. The rest of the procedure is the same, i.e. NM brings up the port and reattaches it to the bond.

Why are we bringing the link down behind the back of NetworkManager instead of using NetworkManager to do it?

NM is a high level tool that allows users to configure interfaces in term of connection profiles. When doing so, it performs the needed low-level operations (e.g. netlink, sysfs) in a predefined order. It doesn't allow users to control the order of such operations like with iproute2.

When is that link brought back up? Is that done by Network Manager on its own accord or is it done by us a few lines down?

This is a good question. NM doesn't bring the link back automatically. The patch expects that some action later in
activate_nm_connections will re-activate the connection on the bond slaves. I see that there are different actions that can bring up the slave again in the function:

setting autoconnect yes activates the slave if it was not active before. However, it doesn't do anything if the slave is already active;
if the master gets activated below via the explicit activation (nmcli conn up) or via the previous point, then also the slave gets reactivated.

However, given my little understanding of the script, I am not sure the two scenarios above cover all the possibilities. @jcaamano, @cybertron, do you know? Are all slave connection supposed to be brought up again in the function? If not, the patch can possibly break things because the link will stay down forever.

What NetworkManager profile is going to be used to bring the link back up after it is has been brought down behind NetworkManager's back? The old one or the cloned one?

As I said before, bringing down the interface with ip link doesn't have any effect on NM. The previously active profile stays up.

What is the impact that it is going to have on the other slave? If a slave reactivates with a different profile, it might reactivate the master as well, reactivating the other slave as well. And then if two slaves recycle at the same time we are on that potential NM race territory.

See the previous reply, there is no change of profile.

jcaamano · 2025-08-20T10:03:12Z

Thanks for chiming in @bengal.

Note that this doesn't have any effect on NM. The old profile stays up according to NM, it's only a change that affects the kernel link.

We are not really sure of the state of a connection when the change on this PR brings the link down. Let's say it can be in any state (not active yet but to be activated, activating, active); you don't foresee any internal issues with NM regardless of the state NM might be in with respect to this connection when the link is set down?

if the master gets activated below via the explicit activation (nmcli conn up) or via the previous point, then also the slave gets reactivated.

Master would be inactive when this script runs on boot, but the script could be ran at any other time when master is already active and then this wouldn't happen.
So if you run this script in a live system, which it is prepared for, then I guess it would leave the slave links down.

As previously mentioned, the general tone of this script is to be the least disruptive possible with NM because we have faced issues in the past that for us were difficult to understand and fix.

jcaamano · 2025-08-20T10:18:44Z

When NM is asked to reactivate bond0

Are you referring to this as us re-activating the bond after the activation of the OVS bridge has already activated it? This shouldn't be happening.

Or are you referring to the activation of a bond with a different profile that happens when the OVS bridge is activated?

bengal · 2025-08-20T11:43:49Z

Thanks for chiming in @bengal.

Note that this doesn't have any effect on NM. The old profile stays up according to NM, it's only a change that affects the kernel link.

We are not really sure of the state of a connection when the change on this PR brings the link down. Let's say it can be in any state (not active yet but to be activated, activating, active); you don't foresee any internal issues with NM regardless of the state NM might be in with respect to this connection when the link is set down?

If the master is going to be activated later and has "autoconnect-slaves=yes", I don't foresee any issues. The master will always bring up the slave, no matter what state it is in.

However, from the following discussion, it seems that there is no guarantee that the master will go through another activation if it's already connected. I think this can make the patch unreliable, because the slave will stay with the link down.

if the master gets activated below via the explicit activation (nmcli conn up) or via the previous point, then also the slave gets reactivated.

Master would be inactive when this script runs on boot but the script could be ran at any other time when master is already active and then this wouldn't happen. So if you run this script in a live system, which it is prepared for, then I guess it would leave the slave links down.

Yes, I think that's a problem.

jcaamano · 2025-08-21T12:04:10Z

Aside form the problems here, this script might not be the only place where the state of the bond is changed. A user could use knmstate to reconfigure different aspects of the bridge interface (DNS, MTU, ...) that could potentially cause a similar issue.

Another problem that I thought about: what if one of the slaves ends up not being active for whatever other reason? How would the external switch notice if that packet is not being sent?

cybertron · 2025-08-26T21:29:52Z

Before (re-)activating the OVS bridge (and so, also the bond and its port), we manually set the port link down with iproute2.

Why does this not trigger the kernel to send the agg 0 packet?

A user could use knmstate to reconfigure different aspects of the bridge interface (DNS, MTU, ...) that could potentially cause a similar issue.

I don't think we need to worry about that. We don't allow knmstate modification of anything associated with br-ex unless the NMState custom br-ex feature is used, in which case configure-ovs does not run.

cybertron · 2025-08-28T20:55:48Z

I think I can answer my own question. Based on this:

When NM is asked to reactivate bond0, it first detaches the port from the bond.

I'm guessing that setting the link down with ip doesn't actually detach it from the bond. Which I suppose raises the question of whether the switch will correctly route traffic, but since the switch seems to be misbehaving when we do tell it the interface is no longer part of the bond, I guess it's a wash?

bengal · 2025-08-29T07:32:31Z

Before (re-)activating the OVS bridge (and so, also the bond and its port), we manually set the port link down with iproute2.

Why does this not trigger the kernel to send the agg 0 packet?

When the port is detached from the bond, the kernel sends a LACPDU frame to indicate that it is no longer part of the aggregation.

Instead, if the port is brought down it stops transmitting any frame immediately without sending any LACPDU.

lxin · 2025-09-04T14:12:10Z

Close this, as a better fix is created: #5274

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 7, 2025

openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 7, 2025

openshift-ci bot requested review from RishabhSaini and cheesesashimi August 7, 2025 18:23

lxin closed this Sep 4, 2025

RHEL-97088: set slaves down before bond reactivation #5229

RHEL-97088: set slaves down before bond reactivation #5229

Uh oh!

Conversation

lxin commented Aug 7, 2025

Uh oh!

openshift-ci-robot commented Aug 7, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Aug 7, 2025

Uh oh!

openshift-ci bot commented Aug 7, 2025

Uh oh!

lxin commented Aug 19, 2025

Uh oh!

jcaamano commented Aug 19, 2025

Uh oh!

bengal commented Aug 19, 2025

Uh oh!

jcaamano commented Aug 20, 2025

Uh oh!

jcaamano commented Aug 20, 2025

Uh oh!

bengal commented Aug 20, 2025

Uh oh!

jcaamano commented Aug 21, 2025

Uh oh!

cybertron commented Aug 26, 2025

Uh oh!

cybertron commented Aug 28, 2025

Uh oh!

bengal commented Aug 29, 2025

Uh oh!

lxin commented Sep 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

openshift-ci-robot commented Aug 7, 2025 •

edited by openshift-ci bot

Loading