Filter IPv4 anycast specific ARP packets from/to bat0 and always use anycast MAC address for them #1490
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The commit b3762fc ("gluon-client-bridge: move IPv4 local subnet route to br-client (#1312)") moves the IPv4 prefix from the local-port interface to br-client. A client requesting an IPv4 connection to the IPv4 anycast address of the node (the device running gluon) will create following packets:
The step 4 is extremely problematic here. ARP replies with the anycast IPv4 address must not be submitted or received via bat0 - expecially not when it contains an node specific MAC address as source. When it is still done then the wrong MAC address is stored in the batadv DAT cache and ARP packet is maybe even forwarded to clients. This latter is especially true for ARP requests which are broadcast and will be flooded to the complete mesh.
Clients will see these ARP packets and change their own neighbor IP (translation) table. They will then try to submit the packets for IPv4 anycast addresses to the complete wrong device in the mesh. This will for example break the access to the status page to the connected device or the anycast DNS forwarder implementation. Especially the latter causes extreme latency when clients try to connect to server using a domain name or even breaks the connection setup process completely. Both are caused by the unanswered DNS requests which at first glance look like packet loss.
An node must therefore take care of:
The step 4 and 5 are problematic here because packets use the node specific MAC addresses from br-client instead of the anycast MAC address. The client will receive the ARP packet with the node specific MAC address and change their own neighbor IP (translation) table. This will for example break the access to the status page to the connected device or the anycast DNS forwarder implementation when the client roams to a different node.
This is for the master branch. If you don't want to revert b3762fc ("gluon-client-bridge: move IPv4 local subnet route to br-client (#1312)") then please only accept the gluon-mesh-batman-adv change and implement a better solution which works better for you and still uses only the anycast MAC in all communication from/to the anycast IPv4 address.
See #1488 for the related ticket.