New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the mii monitoring during bond creation #6598
Conversation
This commit add the miimon polling rate during the creation of the bond interface. The missing miimon configuration prevent the failover of the interface. This fix can be improved, the selection of monitoring and monitoring rate is now static, the addition to the UI can be useful to select the wanted monitoring type (ARP or MII), polling rate and/or arp target
Can one of the admins verify this patch? |
ok to test |
@amotin asking for your help here: what is FreeBSD behavior? I don't know much about the subject, but it seems that we should detect and use either ARP, or MII monitoring, depending on the driver and hardware. |
I'm curious how this effects |
Hi @yocalebo, miimon can and is recommended to be used with LACP, in this case the LACP slave can go down if the miimonitoring report the interface down or if the LACP PDUs aren't recived, the difference is that the LACP PDU is sent every 30s (slow-rate) or 1s (fast-rate), the mii status is polled every 100ms. The following is the output of an LACP bond example with miimon (under proxmox, debian-based) (cat /proc/net/bonding/bondX)
|
@MatteoManzoni thanks for the detailed information. Apologies, as I should have clarified my vague question. However, my main question is does the |
No problem @yocalebo, I'll try to articulate better my answer The miimonitoring works in conjunction with the LACP PDU, they operate at different levels. MII check if the carrier is present, Layer 1. On the other hands the LACP PDU communicates how the peers are load balancing the traffic and other aggregation data, Layer 2. Eg.: the link is up (miimonitor sensing the carrier) but misbehaving (Wrong LACP PDU/No LACP PDU), the resultant LACP slave is down Let me know if I was more clear this time |
@MatteoManzoni very much appreciate the clear answer. Anyways, this sounds like |
@yocalebo I think that LACP is more like BFD, MII is more like me going in the datacenter and looking at the cable to see if it is connected to my server (10 times a second, or even more), there is no session establishment between the peers |
Hi all, |
@MatteoManzoni I agree by adding this, however, what happens in the rare circumstance that the underlying NIC does not support this? Am I understanding that this option is supported by the driver of the NIC? I realize that most modern NICs will support this but TrueNAS is installed on a large assortment of devices in the wild 😄 I'm mostly just curious what will happen if we try to enable this on a NIC that doesn't support MII link monitoring. |
Hi @yocalebo, I've done a quick research in MII support across in-tree kernel drivers and I didn't find any instance of driver not supporting MII. I'll try disabling on firmware level the MII of an old Connect-X NIC and I'll report back what happens. |
FreeBSD NIC drivers do not expose hardware MII interfaces directly (MII/GMII/etc are really a hardware interfaces and there are plenty of NICs not using them), so there is no concept of MII monitoring interval in FreeBSD LAGG. Instead every NIC driver that is able to detect link presence and speed just reports that information to network stack in abstracted way. LAGG code uses those abstracted link status change reports. LACP uses that in combination with PDU receive, since immediate report from NIC hardware via interrupt is always faster than PDU timeout or even link status polling. According to https://www.kernel.org/doc/Documentation/networking/bonding.txt use_carrier option controls whether MII or abstracted netif_carrier_ok() interface should be used, and the last one should be used default, while the MII is deprecated. I'd investigate whether the NIC really doesn't implement the proper KPI. I don't like the idea of polling in general. |
Given what Alexander said, I am include to close this PR, unless you have different thoughts @MatteoManzoni |
Hi all, sorry for the delay, after some time-off I've forgot the issue. I've reimaged (with the newly released 21.04) my cluster and the problem is still present. Only adding miimon 100 and installing ifenslave with apt made my failover bond working. Not releated I've noticed that on 2 different machines the computed MAC Address of the bond is the same causing collisions as attached |
Do you mean the ifenslave package is enough? Or you need that and the mii change? Perhaps more investigation is necessary on upstream why it doesn't work without mii. Have you ever installed other OSes? I wonder how they handle it. |
Hi, I've made some test on a dev server with ConnectX-4 and ubuntu 20.04. With netplan I've created an active-backup bond and didn't work neither. Once I've installed ifenslave package the bond went operative even without setting a miimon rate. I'll try reimage my truenas scale cluster to test this theory. |
I've reimaged the cluster, the problem is resolved after the installation of ifenslave package |
To make it clear, all we need to do is have that package in the base system and this change is no longer required? |
After the tests I've done, yes |
Sorry to bother you guys, it seems the problem still exists. I went through the Internet, but there was little information. So I decided to leave a comment here, continuing the discussion. System info How to reproduce In linux shell, run command: My temporary workaround And as for
I don't know the detail. But, when using netplan to create bond interface, if However, using |
The problem still exists which makes the LACP Failover mechanism (as used by Scale) unusable. |
This PR add the miimon polling rate during the creation of the
bond interface. The missing miimon configuration prevent the failover of
the interface.
This fix can be improved, the selection of monitoring and monitoring
rate is now static, an addition to the UI can be useful to select the
wanted monitoring type (ARP or MII), polling rate and/or arp target
This issue is referenced here in the forum