Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHCP Failover Peer IP is not correctly synced over XMLRPC if larger matching subnet appears before correct one in route table #5002

Closed
bimbar opened this issue May 20, 2021 · 16 comments
Assignees
Labels
cleanup Low impact changes
Milestone

Comments

@bimbar
Copy link

bimbar commented May 20, 2021

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

I have multiple DHCP networks that sync over XMLRPC.

On network 1, primary IP is 10.0.0.2, secondary ip is 10.0.0.3, network is 10/24, I have configured 10.0.0.3 on primary, and the secondary syncs that as 10.0.0.2, which is ok.
On network 2, primary IP is 10.0.2.130, secondary ip is 10.0.2.131, network is 10.0.2.128/25, I have configured 10.0.2.131 on primary, and the secondary syncs that as 127.0.0.1, which is wrong.
On network 3, primary IP is 10.0.2.18, secondary ip is 10.0.2.19, network is 10.0.2.16/28, I have configured 10.0.2.19 on primary, and the secondary syncs that as 127.0.0.1, which is also wrong.

So it seems to me that it doesn't find the correct interface if the subnet mask is not /24 or something like that.

Tip: to validate your setup was working with the previous version, use opnsense-revert (https://docs.opnsense.org/manual/opnsense_tools.html#opnsense-revert)

To Reproduce

Steps to reproduce the behavior:

  1. Create non /24 networks
  2. Enable XMLRPC
  3. Create a redundant DHCP Server for those networks
  4. In the XMLRPC Slave, the peer IP will be 127.0.0.1

Expected behavior

The DHCP peer IP on the slave should be the IP of the primary DHCP

Describe alternatives you considered

Correct it manually and disable XMLRPC

Screenshots

DHCP peer IP on primary
image

DHCP peer IP on slave
image

Relevant log files

If applicable, information from log files supporting your claim.

Additional context

Add any other context about the problem here.

Environment

Software version used and hardware type if relevant, e.g.:

OPNsense 21.1.5-amd64
FreeBSD 12.1-RELEASE-p16-HBSD
OpenSSL 1.1.1k 25 Mar 2021

@AdSchellevis AdSchellevis added the support Community support label May 20, 2021
@AdSchellevis
Copy link
Member

this likely needs more info, 127.0.0.1 is quite weird considering how the sync determines the address.

if (is_array($transport_data['dhcpd'])) {
foreach($transport_data['dhcpd'] as $dhcpif => $dhcpifconf) {
if (isset($dhcpifconf['failover_peerip']) && $dhcpifconf['failover_peerip'] != '') {
$int = guess_interface_from_ip($dhcpifconf['failover_peerip']);
$intip = find_interface_ip($int);
$transport_data['dhcpd'][$dhcpif]['failover_peerip'] = $intip;
}
}
}

Relevant parts are the interface configuration and logical mapping (lan, wan, opt) of the master machine.

@bimbar
Copy link
Author

bimbar commented May 20, 2021

Does that help?

image

image

image

@AdSchellevis
Copy link
Member

not really, the matching issue might be related to one of the other interfaces or routes. At a first glance guess_interface_from_ip first checks netstat and if that doesn't work asks local routing.

/usr/bin/netstat -rnWf inet
/sbin/route -n get 10.0.2.131

Might shed more light on it.

@bimbar
Copy link
Author

bimbar commented May 20, 2021

root@scylla:~ # /usr/bin/netstat -rnWf inet
Routing tables

Internet:
Destination Gateway Flags Use Mtu Netif Expire
default 10.0.1.2 UGS 300756642 1500 igb2_vlan999
10.0.0.0/24 link#2 U 174349246 1500 igb1
10.0.0.0/20 127.0.0.1 UGSB 0 16384 lo0
10.0.0.0/8 127.0.0.1 UGSB 0 16384 lo0
10.0.0.1 link#2 UHS 0 16384 lo0
10.0.0.2 link#2 UHS 26860 16384 lo0
10.0.1.0/28 link#8 U 9043308 1500 igb2_vlan999
10.0.1.12 link#8 UHS 0 16384 lo0
10.0.1.14 link#8 UHS 0 16384 lo0
10.0.2.0/28 link#11 U 2025 1500 igb2_vlan900
10.0.2.1 link#11 UHS 0 16384 lo0
10.0.2.2 link#11 UHS 0 16384 lo0
10.0.2.16/28 link#12 U 1303251 1500 igb2_vlan901
10.0.2.17 link#12 UHS 0 16384 lo0
10.0.2.18 link#12 UHS 2 16384 lo0
10.0.2.128/25 link#14 U 7941848 1500 igb2_vlan902
10.0.2.129 link#14 UHS 0 16384 lo0
10.0.2.130 link#14 UHS 0 16384 lo0
10.0.3.0/28 link#13 U 3555 1500 igb2_vlan800
10.0.3.2 link#13 UHS 0 16384 lo0
10.0.4.0/24 link#9 U 1399389 1500 igb2_vlan104
10.0.4.1 link#9 UHS 0 16384 lo0
10.0.4.2 link#9 UHS 0 16384 lo0
10.0.5.0/24 link#10 U 650 1500 igb2_vlan105
10.0.5.1 link#10 UHS 0 16384 lo0
10.0.5.2 link#10 UHS 0 16384 lo0
127.0.0.1 link#5 UH 1866017 16384 lo0
172.16.0.0/12 127.0.0.1 UGSB 0 16384 lo0
172.26.0.0/16 link#15 U 0 2800 ztanv9hnl3ppnq9
172.26.0.2 link#15 UHS 0 16384 lo0
192.168.0.0/16 127.0.0.1 UGSB 0 16384 lo0
root@scylla:~ # /sbin/route -n get 10.0.2.131
route to: 10.0.2.131
destination: 10.0.2.128
mask: 255.255.255.128
fib: 0
interface: igb2_vlan902
flags: <UP,DONE,PINNED>
recvpipe sendpipe ssthresh rtt,msec mtu weight expire
0 0 0 0 1500 1 0

@AdSchellevis
Copy link
Member

AdSchellevis commented May 20, 2021

it doesn't have anything todo with the netmask, 10.0.0.0/8 127.0.0.1 UGSB 0 16384 lo0 just looks like a close enough match for the sync. using a routing daemon? (frr/bird/...)

@bimbar
Copy link
Author

bimbar commented May 21, 2021

No, but I did Nullroute private networks - should I not do that?

image

@mimugmail
Copy link
Member

No, only if you don't use them

@bimbar
Copy link
Author

bimbar commented May 21, 2021

So, it's a bug because the first match wins, instead of the most specific match?

@AdSchellevis
Copy link
Member

It's an omission, but also quite a specific edge case. Guessing networks will never be perfect, I'm not sure we should try to add more logic here to be honest.

@bimbar
Copy link
Author

bimbar commented May 21, 2021

I'm not sure it should be an edge case.
In my opinion it should be best practice to null-route private networks. That would be especially relevant in more enterprise deployments, where you would also mostly find clusters, so I don't think it's that unusual. Even more so if you use routing protocols.

@jasonpcrowley
Copy link
Contributor

I don't see this as an edge case. We have a number of firewalls with a 172.16.0.0/13 route related to an OpenVPN tunnel. We have others with static routes for 172.16.0.0/12 or other RFC1918 supernets. Because these show up in the routing table before the appropriate route and the DHCP server's IP is within the larger subnet, we're getting the wrong IP in our DHCP server configuration on the secondary DHCP server.

I think logic should be added to allow guess_interface_from_ip() to parse the route table in a more accurate manner.

@jasonpcrowley
Copy link
Contributor

Swapping the existing guess_interface_from_ip() function in interfaces.inc with this function seems to fix the problem for me. @AdSchellevis, does this look acceptable?

function guess_interface_from_ip($ipaddress)
{
    if (is_ipaddrv4($ipaddress)) {
        $family = "inet";
    } elseif (is_ipaddrv6($ipaddress)) {
        $family = "inet6";
    } else {
        return false;
    }

    /* create a route table we can search */
    exec("/usr/bin/netstat -rnWf " . $family, $output, $ret);

    /* search for the route with the largest subnet mask */
    $largest_mask = 0;
    $best_if = null;
    foreach ($output as $line) {
        $fields = preg_split("/\s+/", $line);
        if (is_subnet($fields[0])) {
            if (ip_in_subnet($ipaddress, $fields[0])) {
                list($ip, $mask) = explode('/', $subnet);
		if ($mask > $largest_mask) {
                    $best_if = $fields[5];
                    $largest_mask = $mask;
                }
            }
        }
    }
    if isset($best_if) {
      return $best_if;
    }

    $ret = exec_command("/sbin/route -n get {$ipaddress} | /usr/bin/awk '/interface/ { print \$2; };'");
    if (empty($ret)) {
        return false;
    }
    return $ret;
}

@jasonpcrowley
Copy link
Contributor

I also suggest the title of this issue be changed to:

DHCP Failover Peer IP is not correctly synced over XMLRPC if larger matching subnet appears before correct one in route table

@AdSchellevis
Copy link
Member

@jasonpcrowley just open a PR (mind the typo in explode('/', $subnet)) and we can probably accept it. The results look good on my end.

@bimbar bimbar changed the title DHCPv4 Failover Peer IP is not correctly synced over XMLRPC if subnet mask is not /24 DHCP Failover Peer IP is not correctly synced over XMLRPC if larger matching subnet appears before correct one in route table Oct 13, 2021
@jasonpcrowley
Copy link
Contributor

I have submitted PR #5281 to resolve this issue.

@AdSchellevis
Copy link
Member

In case anyone wants to try @jasonpcrowley's work, on OPNsense 21.7.3 you should be able to pull the patch using

opnsense-patch ab5cbcd

@AdSchellevis AdSchellevis added cleanup Low impact changes and removed support Community support labels Oct 14, 2021
@AdSchellevis AdSchellevis self-assigned this Oct 14, 2021
@fichtner fichtner added this to the 22.1 milestone Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup Low impact changes
Development

No branches or pull requests

5 participants