Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
net: bridge: avoid uselessly making offloaded ports promiscuous
The bridge driver's intention by making ports promiscuous is to turn off their RX filters such that these ports receive packets with any MAC DA. A quick survey of the kernel drivers that call switchdev_bridge_port_offload() shows that either these do not implement ndo_change_rx_flags() at all, or they explicitly ignore changes to IFF_PROMISC (am65_cpsw_slave_set_promisc, cpsw_set_promiscious, ocelot_set_rx_mode). This makes sense, because hardware that is purpose-built to do L2 forwarding generally already knows it should accept any MAC DA on its ports. That is not to say that IFF_PROMISC makes no sense for switchdev drivers. For example, DSA has the concept of multiple address databases (this is achieved by effectively partitioning the FDB: reserve a database - FID - for each port operating as standalone, a FID for each VLAN-unaware bridge, a FID for each bridge VLAN). The address database of a standalone port is managed through the standard dev->uc and dev->uc lists and is used to filter towards the hosts the addresses required for local termination. The bridge-related address databases are managed using switchdev (SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE). IFF_PROMISC is intrinsically connected to dev->uc and dev->mc (see the implementation of __dev_set_rx_mode which puts the interface in promiscuous mode if the unicast list isn't empty but the device doesn't support IFF_UNICAST_FLT), and therefore to what DSA implements as the standalone port address database (there, an entry in dev->uc means "forward it to CPU", the absence of it means "drop it", and promiscuity means "put the CPU in the flood mask of packets with unknown MAC DA"). Whereas there is no IFF_PROMISC equivalent to the FDB entries notified through switchdev (therefore to the bridge-related address databases), because none is needed. In this model, the bridge driver, which is only trying to secure its reception of packets, is in fact overstepping, because it manages something which is outside of its competence: the host flooding of the standalone port database, when in fact that database will not be the one used by packets handled by the bridging service. In turn, this prevents further optimizations from being applied in particular to DSA, and in general to any switchdev driver. A desirable goal is to eliminate host flooding of packets which are known to be unnecessary and only dropped later in software [1]. In an ideal world with ideal hardware: (a) flooding would be controlled per FID rather than per port (b) egress flooding towards a certain port can be controlled independently depending on the actual port ingress port, rather than globally, regardless of ingress port When (a) does not hold true, the bridge will force the port to keep host flooding enabled, even if this is not otherwise needed (there is no station behind a "foreign interface" that requires software forwarding; the only packets sent by the accelerator to the CPU are for termination purposes). When (b) does not hold true, it means that a 4-port switch where 1 port is standalone and 3 are bridged (again with no foreign interface) will have host flooding enabled for all 4 ports (including the standalone port, because the bridge is keeping host flooding enabled, and all ports are serviced by the same CPU port). Since DSA is a framework and not just a driver for a single device, these nonidealities do hold true, and the bridge unnecessarily setting IFF_PROMISC on its ports is a real roadblock towards disabling host flooding in practical scenarios. The proposed solution is to make the bridge driver stop touching port promiscuity for offloaded switchdev ports, and let them manage promiscuity by themselves as they see fit. It can achieve this by looking at net_bridge_port :: offload_count, which is updated voluntarily by switchdev drivers using switchdev_bridge_port_offload(). br_manage_promisc() is already called by nbp_update_port_count() on a port join/leave, and the implicit assumption is that switchdev_bridge_port_offload() has already been called by that time (from netdev_master_upper_dev_link). [1] https://www.youtube.com/watch?v=B1HhxEcU7Jg Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
- Loading branch information