Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter IPv4 anycast specific ARP packets from/to bat0 and always use anycast MAC address for them #1490

Merged
merged 2 commits into from
Jul 22, 2018

Conversation

ecsv
Copy link
Contributor

@ecsv ecsv commented Jul 22, 2018

The commit b3762fc ("gluon-client-bridge: move IPv4 local subnet route to br-client (#1312)") moves the IPv4 prefix from the local-port interface to br-client. A client requesting an IPv4 connection to the IPv4 anycast address of the node (the device running gluon) will create following packets:

  1. ARP packet from client to get the MAC of the mac address of the anycast IPv4 address
  2. ARP reply from node to client with the anycast MAC address for the IPv4 anycast address
  3. IPv4 packet from client which requires reply (for example ICMP echo request)
  4. ARP request for the client MAC address for its IPv4 address in prefix4 (done with the mac address of br-client and transmitted over br-client)
  5. IPv4 packet from node (transmitted over br-client with br-client MAC address) as reply for the client IPv4 packet (for example ICMP echo reply)

The step 4 is extremely problematic here. ARP replies with the anycast IPv4 address must not be submitted or received via bat0 - expecially not when it contains an node specific MAC address as source. When it is still done then the wrong MAC address is stored in the batadv DAT cache and ARP packet is maybe even forwarded to clients. This latter is especially true for ARP requests which are broadcast and will be flooded to the complete mesh.

Clients will see these ARP packets and change their own neighbor IP (translation) table. They will then try to submit the packets for IPv4 anycast addresses to the complete wrong device in the mesh. This will for example break the access to the status page to the connected device or the anycast DNS forwarder implementation. Especially the latter causes extreme latency when clients try to connect to server using a domain name or even breaks the connection setup process completely. Both are caused by the unanswered DNS requests which at first glance look like packet loss.

An node must therefore take care of:

  • not transmitting ARP packets related to the anycast IPv4 address over bat0
  • drop ARP packets related to the anycast IPv4 when they are received on bat0 from a still broken node
  • don't accept ARP packets related to the anycast IPv4 replies on local node when it comes from bat0

The step 4 and 5 are problematic here because packets use the node specific MAC addresses from br-client instead of the anycast MAC address. The client will receive the ARP packet with the node specific MAC address and change their own neighbor IP (translation) table. This will for example break the access to the status page to the connected device or the anycast DNS forwarder implementation when the client roams to a different node.


This is for the master branch. If you don't want to revert b3762fc ("gluon-client-bridge: move IPv4 local subnet route to br-client (#1312)") then please only accept the gluon-mesh-batman-adv change and implement a better solution which works better for you and still uses only the anycast MAC in all communication from/to the anycast IPv4 address.

See #1488 for the related ticket.

ecsv added 2 commits July 22, 2018 11:21
The commit b3762fc ("gluon-client-bridge: move IPv4 local subnet route
to br-client (freifunk-gluon#1312)") moves the IPv4 prefix from the local-port interface
to br-client. A client requesting an IPv4 connection to the IPv4 anycast
address of the node (the device running gluon) will create following
packets:

1. ARP packet from client to get the MAC of the mac address of the anycast
   IPv4 address
2. ARP reply from node to client with the anycast MAC address for the IPv4
   anycast address
3. IPv4 packet from client which requires reply (for example ICMP echo
   request)
4. ARP request for the client MAC address for its IPv4 address in prefix4
   (done with the mac address of br-client and transmitted over br-client)
5. IPv4 packet from node (transmitted over br-client with br-client MAC
   address) as reply for the client IPv4 packet (for example ICMP echo
   reply)

The step 4 is extremely problematic here. ARP replies with the anycast IPv4
address must not be submitted or received via bat0 - expecially not when it
contains an node specific MAC address as source. When it is still done then
the wrong MAC address is stored in the batadv DAT cache and ARP packet is
maybe even forwarded to clients. This latter is especially true for ARP
requests which are broadcast and will be flooded to the complete mesh.

Clients will see these ARP packets and change their own neighbor IP
(translation) table. They will then try to submit the packets for IPv4
anycast addresses to the complete wrong device in the mesh. This will for
example break the access to the status page to the connected device or the
anycast DNS forwarder implementation. Especially the latter causes extreme
latency when clients try to connect to server using a domain name or even
breaks the connection setup process completely. Both are caused by the
unanswered DNS requests which at first glance look like packet loss.

An node must therefore take care of:

* not transmitting ARP packets related to the anycast IPv4 address over
  bat0
* drop ARP packets related to the anycast IPv4 when they are received on
  bat0 from a still broken node
* don't accept ARP packets related to the anycast IPv4 replies on local
  node when it comes from bat0

Fixes: b3762fc ("gluon-client-bridge: move IPv4 local subnet route to br-client (freifunk-gluon#1312)")
freifunk-gluon#1312)"

The commit b3762fc ("gluon-client-bridge: move IPv4 local subnet route
to br-client (freifunk-gluon#1312)") moves the IPv4 prefix from the local-port interface
to br-client. A client requesting an IPv4 connection to the IPv4 anycast
address of the node (the device running gluon) will create following
packets:

1. ARP packet from client to get the MAC of the mac address of the anycast
   IPv4 address
2. ARP reply from node to client with the anycast MAC address for the IPv4
   anycast address
3. IPv4 packet from client which requires reply (for example ICMP echo
   request)
4. ARP request for the client MAC address for its IPv4 address in prefix4
   (done with the mac address of br-client and transmitted over br-client)
5. IPv4 packet from node (transmitted over br-client with br-client MAC
   address) as reply for the client IPv4 packet (for example ICMP echo
   reply)

The step 4 and 5 are problematic here because packets use the node specific
MAC addresses from br-client instead of the anycast MAC address. The client
will receive the ARP packet with the node specific MAC address and change
their own neighbor IP (translation) table. This will for example break the
access to the status page to the connected device or the anycast DNS
forwarder implementation when the client roams to a different node.

This reverts commit b3762fc and adds an
upgrade code to remove local_node_route on on existing installations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0. type: bug This is a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants