Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPv6 outbound NAT doesn't update translation target when interface address changes #7412

Open
2 tasks done
maurice-w opened this issue Apr 25, 2024 · 5 comments
Open
2 tasks done
Assignees
Labels
cleanup Low impact changes
Milestone

Comments

@maurice-w
Copy link
Member

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

When an IPv6 outbound NAT rule exists on a SLAAC WAN interface and the interface address changes, the translation target isn't updated. Instead, the outbound NAT rule keeps using the deprecated address. As a result, all NATed connections fail.
Even a "disable - apply - enable - apply" of the outbound NAT rule does not fix this, it keeps using the deprecated address.

To Reproduce

Steps to reproduce the behavior:

  1. Go to 'Interfaces: [WAN]', set the IPv6 Configuration Type to SLAAC, save & apply
  2. Go to 'Firewall: NAT: Outbound', add a rule (interface WAN, TCP/IP version IPv6), save & apply
  3. Test the NAT by e. g. performing a ping test with the source address set to the LAN interface's address
  4. Wait for the WAN address to change (upstream router advertises a new prefix, WAN interface autoconfigures a new address and marks the old address as deprecated)
  5. Repeat the test from step 3, see error: test fails
  6. Perform a packet capture to verify that the source address of NATed outbound packets is indeed the old, deprecated address

Expected behavior

IPv6 outbound NAT rules should update the translation target when the address is deprecated and the interface has a new, valid address.

Describe alternatives you considered

Trigger a link down / up event on the WAN interface.

Relevant log files

root@router:~ # ifconfig hn5
hn5: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: WAN_LTE (opt8)
        options=80018<VLAN_MTU,VLAN_HWTAGGING,LINKSTATE>
        ether 00:12:34:56:78:9a
        inet6 fe80::212:34ff:fe56:789a%hn5 prefixlen 64 scopeid 0xa
        inet6 2001:db8:1:a:212:34ff:fe56:789a prefixlen 64 deprecated autoconf
        inet6 2001:db8:1:b:212:34ff:fe56:789a prefixlen 64 autoconf
        media: Ethernet autoselect (10Gbase-T <full-duplex>)
        status: active
        nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>

(Packet capture shows alle NATed outbound packets have deprecated source address 2001:db8:1:a:212:34ff:fe56:789a.)

Environment

OPNsense 24.1.6 (amd64)
Hyper-V Gen2

@fichtner
Copy link
Member

fichtner commented Apr 26, 2024

The rule in question is relevant as generated in /tmp/rules.debug, but I'm suspecting this is a ":0" syntax oddity. SLAAC addresses are stateless so we have no real way to track... ideally the deprecated addresses should be flushed rather timely instead of lingering in cases of a new autoconf being available. This might also be due to the ordering of the addresses in the kernel BTW.

For some reason I doubt the "deprecation" process because if it's deprecated it should still be usable?! I thought that was the whole point of having it.

Cheers,
Franco

@fichtner fichtner added the support Community support label Apr 26, 2024
@maurice-w
Copy link
Member Author

root@router:~ # cat /tmp/rules.debug | grep "nat on hn5"
nat on hn5 inet6 from !(hn5) to any -> (hn5:0) port 1024:65535 # IPv6 NAT for LTE WAN

You're right, this is probably a kernel / pf issue. According to pf.conf(5), "the rule is automatically updated whenever the interface changes its address". I'd say deprecating an address while simultaneously adding a new non-deprecated one qualifies as an address change, but apparently pf doesn't think so. Not sure whether this is intentional or pf just isn't aware of the deprecation status.

There also seems to be an issue with my upstream router. When its LTE modem reconnects, the router sends RAs with both the old (now invalid) and the new prefix. The old prefix is advertised with a zero preferred lifetime (which deprecates it), but both prefixes keep getting advertised with a one hour valid lifetime. This indeed indicates that the old prefix is still usable, which is not the case. I will raise this issue with the vendor of the upstream router.

@fichtner fichtner self-assigned this May 6, 2024
@fichtner fichtner added cleanup Low impact changes and removed support Community support labels May 6, 2024
@fichtner fichtner added this to the 24.7 milestone May 6, 2024
@fichtner
Copy link
Member

fichtner commented May 6, 2024

@maurice-w Thanks for confirming. I'll take a look but can't make any promises.

While I have your attention: https://forum.opnsense.org/index.php?topic=37813.msg197098#msg197098

Would you mind leaving your opinion? Removing the code would be easy, but it should be for the right reason.

Cheers,
Franco

@maurice-w
Copy link
Member Author

Thanks @fichtner. I performed additional testing. It not only affects deprecated addresses, but invalid / removed addresses, too:

When the upstream router stops advertising the old prefix, the old autoconf address eventually expires and gets removed. The interface then only has the new, valid address. But pf keeps using the old, non-existing address as the translation target.

We might be able to work around this, but since it seems to be a pf bug, I think this is where it should get fixed. Before I raise this issue with the pf folks, do you have any thoughts?

I'll respond to the other topic on the forum.

Cheers
Maurice

@fichtner
Copy link
Member

fichtner commented May 7, 2024

When the upstream router stops advertising the old prefix, the old autoconf address eventually expires and gets removed. The interface then only has the new, valid address. But pf keeps using the old, non-existing address as the translation target.

This could mean two things:

  1. Does a filter reload fix it?
  2. If 1.) is a no this could also be a sticky state issue.

We might be able to work around this, but since it seems to be a pf bug, I think this is where it should get fixed. Before I raise this issue with the pf folks, do you have any thoughts?

Don't tell them you found the bug on OPNsense. The pf maintainer is notoriously known for blocking bug reports and even some bugfixes from getting into FreeBSD. Yes, we reached that low point a while ago already.

Cheers,
Franco

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup Low impact changes
Development

No branches or pull requests

2 participants