Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding IP Alias to tracking LAN interface breaks tracking #3310

Closed
maurice-w opened this issue Mar 10, 2019 · 92 comments
Closed

Adding IP Alias to tracking LAN interface breaks tracking #3310

maurice-w opened this issue Mar 10, 2019 · 92 comments
Assignees
Labels
bug Production bug
Milestone

Comments

@maurice-w
Copy link
Member

maurice-w commented Mar 10, 2019

Describe the bug
TL;WR: Adding an IPv6 Alias to a LAN interface which is configured to track an IPv6 WAN interface causes the LAN interface to stop tracking after a reboot. It subsequently only uses the IP Alias and hosts in the LAN lose Internet connectivity.

To Reproduce
Steps to reproduce the behavior:

  1. Set the WAN interface IPv6 configuration type to DHCPv6.
  2. Configure the LAN interface to track the WAN interface and enable Manual configuration. (The ULA prefix which we add in the next step won't be advertised unless we "allow the manual adjustment of DHCPv6 and Router Advertisements", although we don't actually adjust anything manually. This might be considered another bug, but let's not get into this here.)
  3. Add a Virtual IP to the LAN interface: Type IP Alias, address fd01:2:3:4::1/64 (ULA)
  4. On the Dashboard, restart the radvd service. This seems to be required to make it pick up the additional interface address and advertise the additional prefix.
  5. Check that hosts in the LAN have both GUAs and ULAs and working Internet connectivity. In my tests this worked fine at this point.
  6. Reload DHCP on WAN interface or reboot the router.

Expected behavior
LAN interface has an auto-generated GUA as well as the manually added ULA and advertises both prefixes. LAN hosts have GUAs and ULAs and working Internet connectivity.

Actual behavior
After a WAN DHCP Reload or a router reboot, the LAN interface doesn't track the WAN interface, uses the IP Alias as its only address and only advertises the ULA prefix. Hosts in the LAN only have ULAs and lose Internet connectivity.

Additional context
This has been mentioned on the forum here and here but I couldn't find a matching bug report.
This is important because using an IP Alias seems to be the only way to add ULAs to a tracking LAN interface. Using both GUAs and ULAs is recommended when the prefix delegated by the ISP isn't static. (For example, OpenWrt adds ULAs to all LANs by default.)

Environment
OPNsense 19.1.3-amd64

Update
I initially mixed up advertising prefixes and routes. I updated the steps to reproduce the behavior accordingly. This doesn't change anything about the bug itself.

@fichtner fichtner self-assigned this Apr 25, 2019
@fichtner fichtner added the bug Production bug label Apr 25, 2019
@fichtner fichtner added this to the 19.7 milestone Apr 25, 2019
@fichtner
Copy link
Member

fichtner commented May 2, 2019

@maurice-w would you mind testing this on 19.1.7's development version (19.7.a_701)?

@maurice-w
Copy link
Member Author

@fichtner, thanks for looking into this. I've tested with the development version and the issue has shifted: Tracking seems to keep working after a DHCP reload or reboot. But the IP Alias doesn't seem to be applied correctly. I don't get a response when pinging the IP Alias, the web interface isn't reachable via the IP Alias etc. Also, I got this error message (which I've never seen in 19.1.x); not sure whether it is related:

PHP Warning: escapeshellarg() expects exactly 1 parameter, 2 given in /usr/local/etc/inc/interfaces.inc on line 1667

After rolling back to 19.1.7 the IP Alias works again (but of course the original issue is back).

@fichtner
Copy link
Member

fichtner commented May 4, 2019

@maurice-w uhh, my bad, can you try devel again with 8427198 on top?

# opnsense-patch 8427198

Cheers,
Franco

@maurice-w
Copy link
Member Author

maurice-w commented May 4, 2019

@fichtner, with devel + patch it's back to the original issue: No GUA on the tracking LAN interface after a DHCP reload on the WAN. Also, new error messages (right after DHCP reload):

[04-May-2019 15:43:28 Europe/Berlin] PHP Warning:  vsprintf(): Too few arguments in /usr/local/etc/inc/util.inc on line 984
[04-May-2019 15:43:39 Europe/Berlin] PHP Fatal error:  Uncaught Error: Call to undefined function lookup_gateway_interface_by_name() in /usr/local/etc/rc.dyndns:46
Stack trace:
#0 {main}
  thrown in /usr/local/etc/rc.dyndns on line 46
[04-May-2019 15:43:45 Europe/Berlin] PHP Warning:  vsprintf(): Too few arguments in /usr/local/etc/inc/util.inc on line 984

(BTW, in 19.1.7 the "no GUA on LAN" issue is 100% reproducible on WAN DHCP reloads, but not on reboots. After a reboot it often works, but sometimes doesn't. Seems like some kind of race condition.)

Cheers
Maurice

@fichtner
Copy link
Member

fichtner commented May 4, 2019

the error is from not using os-dyndns-devel, but can be neglected for the purpose of this ticket. let me see if I can reproduce this locally...

EDIT: OK I don't have tracking at home. Need to try Monday at the office.

@fichtner
Copy link
Member

Note: I wan't at work this week due to sick leave.

@maurice-w
Copy link
Member Author

No worries and get well soon!
(Prefix tracking + ULAs essentially works as long as the delegated prefix doesn't change and you don't reboot or make configuration changes which cause a DHCP reload. For me it's currently mostly an extra step of sometimes having to remove and re-add the ULA IP Alias after reboots. It probably becomes a much bigger issue if the delegated prefix actually changes regularly.)

@fichtner fichtner modified the milestones: 19.7, 20.1 Jul 11, 2019
@fichtner
Copy link
Member

I have to move this to the next version due to time constraints. My day job away from OPNsense is quite challenging at the moment. Sorry. :(

EugenMayer pushed a commit to KontextWork/opnsense_core that referenced this issue Jul 22, 2019
EugenMayer pushed a commit to KontextWork/opnsense_core that referenced this issue Jul 22, 2019
@maurice-w
Copy link
Member Author

Is there something I can do to help find the root cause of this issue? Where to start?
I'm aware priorities differ. For me, this currently is the single most annoying bug. The issue shifted from "sometimes breaks after reboots" to "always breaks after reboots" (I think since the 19.7 upgrade). Which unfortunately means having to manually reconfigure a production OPNsense instance after every single reboot.

@fichtner fichtner modified the milestones: 20.1, 20.7 Jan 24, 2020
@marjohn56
Copy link
Member

Morning all.. had a look at this and I have it behaving now. I was able to replicate the issue that @maurice-w gave in his bug report, namely that on reboot the VIP came up on the interface before the dhcp6 address, of course this will also happen when the prefix changes. What I've done to cure it, at least it works on my test system, is in the call to interface_track6_configure() I remove the VIP from the interface, carry on with the dhcp6c routine and then before it returns is re-apply the VIP to the interface, this appears to work OK and the dhcp6 server is now showing the proper address and the VIP is shown on the interface. I cannot fully test this as I need someone who has a silly ISP that doesn't do static to check this. I guess we overlooked this issue when we refactored the dhcp6c stuff last year.

@fichtner - thoughts, is this a valid solution?

@marjohn56
Copy link
Member

@maurice-w do you want to try this to see if it fixes your issues and has no side effects?

@maurice-w
Copy link
Member Author

@marjohn56, thanks a lot for picking this up! Looks like a sensible workaround to me.

I performed opnsense-patch cb7af9b on OPNsense 19.7.10 and then did a reboot as well as a DHCP reload on the WAN. Looking good! Don't consider this in-depth testing, but another data point that your patch seems to work as intended. I will keep you updated if any side effects should come up.

Again, thanks a lot! This was a big PITA for a long time.

@marjohn56
Copy link
Member

Thanks @maurice-w.

Let's run it for a while and see how it behaves.

@maurice-w
Copy link
Member Author

maurice-w commented Feb 1, 2020

As expected, the issue was back after upgrading to 20.1. I re-applied the patch and now it's working again. Still no side effects.

@marjohn56
Copy link
Member

Yes, as expected. Glad that it's still behaving with zero side effects. @fichtner is aware of this, it's just that his 'to do' list is never ending.

@fichtner
Copy link
Member

fichtner commented Feb 3, 2020

Side effects will happen eventually. I'm still not convinced this is the minimum impact solution, especially since aliases are marked as such but seemingly ignored for what they are elsewhere.

@marjohn56
Copy link
Member

Can I throw something into the mud pit here.. The issue appears to be that when dhcp6c removes its address or there is not a dhcp6c assigned address on the interface ( when dhcp6c is in use on that interface ) i.e. during a boot up or address change, then when dhcp6 server is configured it is picking up the alias address from the interface directly and configuring the server with that address as the alias address is now top of the list on the interface. That leaves two options, either we remove the alias from the interfaces during a dhcp6c configure or force a re-configure of the dhcpc6 server after the GUA has been assigned by dhcp6c and making sure it ignores any alias IPv6 address already active on the interfaces. Unless you know of a way of re-arranging the order of the addresses on the interfaces when dhcp6c assigns its address.

fichtner added a commit that referenced this issue Feb 10, 2020
@fichtner
Copy link
Member

and how's _35 ?

@marjohn56
Copy link
Member

Don't know, I cannot remember...for me it was a long time ago.

@fichtner
Copy link
Member

haha :P in general the idea should work now... we already have a concept of a primary address in the interface stats subframework, but we can't directly manipulate ifconfig (interfaces.lib.inc) because there we shouldn't know about the config.xml... so we need to merge somewhere, interfaces.inc seems like the appropriate spot

@marjohn56
Copy link
Member

That appears to have sorted it...:)

@fichtner
Copy link
Member

fichtner commented Feb 10, 2020

good, now the only thing left on my list is fix that just-discovered bug in the interface stats regarding separate IPv6 interfaces such as stf/6RD. Anything else you guys see in the scope of this ticket?

@marjohn56
Copy link
Member

I think this one is put to bed, dhcpd6.conf and radvd.conf look good too.

@marjohn56
Copy link
Member

Is this stf/6rd that one mentioned in the German forum - IPv6 radvd config (Telekom VDSL) ?

@maurice-w
Copy link
Member Author

_35 fixed it in the dashboard widget, but it's now a little weird on status_interfaces.php: The order is correct (IP Alias is second), but the primary (tracked) address is displayed like this: 2001:db8:1:2:234:56ff:fe78:9abc / 2001:db8:1:2::/64

Still IP Alias only on the console (banner after login),

Regarding other issues:

  • What are your thoughts about letting the user choose which prefix to use for the DHCPv6 server?
  • When "allow manual adjustment of DHCPv6 and Router Advertisements" is disabled, radvd.conf is still missing the IP Alias prefix.

Probably out of scope for this ticket, but I noticed some other radvd.conf oddities in the "automatic" mode:

  • AdvManagedFlag is not set although a range6 is specified in dhcpdv6.conf.
  • MinRtrAdvInterval and MaxRtrAdvInterval are set to very low values (3 and 10).
  • DeprecatePrefix is not set.

Should we clean this up?

@maurice-w
Copy link
Member Author

The diag tools (ping, port probe, trace route) use the IP Alias as the source address...

@maurice-w
Copy link
Member Author

Should we automatically restart radvd when a VIP is added / modified / deleted?

@marjohn56
Copy link
Member

marjohn56 commented Feb 11, 2020

Hmm, sorry I only did a quick look at the interfaces widget and very cursory look at the overview. It appears it's showing the primary address and a sort of prefix, that would be cool if the prefix was correct, but it isn't. It's just showing the first 64 bits of the address.

Just noticed something else too, dhcpv6 is showing the available prefix size as /57, it should be /56. To top that, I cannot see the 'dhcp6c added a prefix *****' log entry either. I'll go take a look at that and find out why that has vanished. I cannot check it on my primary router as that's full static.

@fichtner
Copy link
Member

fichtner commented Feb 11, 2020

Now we are wading into esoteric territory... I'd like to wrap up this ticket, split off some tasks if so be it. But we can't classify everything as a bug especially if we want to avoid work for things that nobody needed in almost two decades worth of time. We easily have the same amount of time to make IPv6 just right. ;)

I also indicated that with b8beea435d it is just the beginning and it is applicable virtually everywhere.

@marjohn56
Copy link
Member

OK, well if you can clean up that 'stutter' of half an address being shown in the interfaces info page and the radvd config thing then that resolves this ticket.
image

I'm still going off to have a look at why the prefixes log entry is missing from dhcp6c, I used to use that a lot for debugging. :)

@fichtner
Copy link
Member

have you tried d21780177b yet

@fichtner
Copy link
Member

just arrived at the office, OPNsense 20.7.a_36-amd64 looks good even on status page

@marjohn56
Copy link
Member

Nuts, posted last comment on the commit. Yes, its looking fine, I was looking at my live system which is on 35... :(

@marjohn56
Copy link
Member

It seems that the issue with dhcp6c is that the d_printf entry for the prefix is using INFO where the one for the IA is using DEBUG, guess we need to change the one for prefix to debug. I'll issue a PR for that.

@marjohn56
Copy link
Member

@maurice-w is correct, tools is using the Alias.

@fichtner
Copy link
Member

for tools please create a new feature ticket. I don't think we should add dhcp reload to VIP pages, basically we start to restart everything on every minor change and this affects operation and could cause new side effects.

When "allow manual adjustment of DHCPv6 and Router Advertisements" is disabled, radvd.conf is still missing the IP Alias prefix.

Also a feature request, not a bug.

What are your thoughts about letting the user choose which prefix to use for the DHCPv6 server?

IMO this should only work with the primary address. Tying DHCPd to VIPs will only lead to more validation and complexity we do not wish to support from a core perspective.

Probably out of scope for this ticket, but I noticed some other radvd.conf oddities in the "automatic" mode [...] Should we clean this up?

Sure, please create a ticket.

I'll try to include this particular fix in 20.1.2.

@maurice-w
Copy link
Member Author

maurice-w commented Feb 11, 2020

  • 20.7.a_36 indeed fixed the interfaces overview, thanks.
  • I'll create a ticket for the tools issue. Would classify it as a bug, but this debate is as old as software.
  • I'm okay with having to manually reload radvd after VIP changes. Will create a PR with some help text on firewall_virtual_ip.php which explains that.
  • I'm also okay with having to enable manual configuration if you want to advertise IP Alias prefixes. Will create a PR with help text for that, too.
  • About dhcpdv6 not being able to use a VIP prefix, I'm not so sure. I have no immediate need for this so let's leave it at that until someone else also requests is.
  • I'll create a PR for the out of scope radvd.conf oddities in auto mode.

@marjohn56, an available prefix delegation size of /57 is correct. If you get a /56 from upstream, you can delegate no more than a /57 to downstream.

@marjohn56
Copy link
Member

Whilst playing with the combined WAN dhcp6c stuff I noticed there is one more cleanup, console only shows the first v6 address, if there is an alias it's only showing that one. Can we make it show all alias and GUAs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Production bug
Development

No branches or pull requests

3 participants