New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After reboot - dpinger not starting on secondary IPv6 WAN interface. #7400
Comments
The message seems to originate from:
... in /usr/local/etc/inc/plugins.inc.d/dpinger.inc - but as you can see in the last screenshot, the IPv6 gateway isn't empty for said WAN02 |
Seems like a timing issue, where the information isn't available quick enough when the dpinger configure kicks in..
in /usr/local/etc/inc/plugins.inc.d/dpinger.inc Hope this helps you in finding some way to fix this race condition. |
Could you try adding a tunable „net.inet6.ip6.dad_count“ with value „0“ to see if that problem goes away? To be frank we wait for this to settle in rc.newwanipv6, but it's been known to still fail: Lines 71 to 76 in 3f184a6
Debugging this is impossible without a setup where this happens for "reasons". Cheers, |
I think you broke a fix-record in replying on a rainy Sunday. 😄 Anyway I can help you guys out on a more permanent fix - seeing I have a "special set-up where this happens for reasons"? (Or we document this behaviour in the OPNsense documentation?) |
Hmm, you said dual WAN…. Are both DHCPv6? If yes does it work when you disable the primary one? |
(Just the IPv6 part I mean) |
I'll give it a try! Both are DHCPv6 yes. |
Removed the dad_count - disabled IPv6 Gateway on WAN01 -> upon reboot: IPv6 Gateway on WAN02 doesn't come up. For what it's worth, this is a dual-port, copper, IGB-based NIC. |
Let me get back tomorrow with a fresh idea. Thanks for the help! |
Seems like Linux suffers from this as well..
If FreeBSD has implemented this in a similar fashion, then yes, we'll run into this if we go "too fast" - not sure if there's a way on FreeBSD to check that "tentative" status. In any case - many thanks for the lightning fast response 💯 |
https://reviews.freebsd.org/D40103 Apparently, the interface flag "IN6_IFF_TENTATIVE" could be looked at. https://redmine.pfsense.org/projects/pfsense/repository/2/revisions/3a335f0798cae05f86d61a43148fd0efc83408d7/diff |
Let's not get carried away here... tentative is what it is and we exclude it when we look for viable addresses because it wouldn't work anyway: core/src/etc/inc/interfaces.inc Line 4164 in 3f184a6
The problem is that when we look for a dynamic "primary" IP address we don't know what we are looking for and we can't push tentative addresses to the caller as it would end up patching every spot trying to use a non-usable address so that's why we try to exclude it from the lookup. rc.newwanipv6 is supposed to be started by dhcp6c after addresses are assigned, which means we get a tentative count and the initial wait should do the trick as mentioned. There could be complications, however:
Netlink could help with this in the future, but ideally we don't want individual code spots to loop until they have a viable address like done elsewhere as that just clogs up subsystems. |
That being said if the sleep ends up "+ 2"instead of "+ 1" in order to work I'm not against it I guess. |
Which address does it find on WAN BTW? IA-NA or one from the prefix or a SLAAC on WAN? |
Interfaces: Settings: Log level mode to "Info" might help, but needs a reboot. |
It gets one from the prefix in this case. That said, I don't think there's an issue with using net.inet6.ip6.dad_count "0" in the case of an "authorative" device on your network. It should be the one having the specific IP. But maybe we should document the case better? I've set the log level - will reboot as soon as possible. |
Ok that means it already "swings" to a tracking LAN because there is no GUA IPv6 on WAN itself. Theoretically speaking that is the slowest form of address acquire in the chain. |
DHCP log level is set to info - any specific log output you would like? 👍🏻 |
The full general („system“) log on reboot for both dad_counter unset and set to zero (set to debug level or just grab the file on the disk). You can send it to franco AT OPNsense DOT org thanks! |
Log sent - many thanks! 👍🏻 |
As a test - we updated the following line to " + 2" That made the system behave correctly. I'll monitor over the course of a week, with some interface reloads / simulated link-loss how the system behaves. |
The + 1 was completely arbitrary to begin with (derived from FreeBSD scripting), but if part of the system needs longer to cope with tentative state then this would be an easy way to make it more reliable. If + 3 makes sense for the next person is something I want to doubt, however. Special thanks go to @Wireheadbe for pursuing and testing this.
Closing - f2e60c1 fixes this 👍🏻 |
The + 1 was completely arbitrary to begin with (derived from FreeBSD scripting), but if part of the system needs longer to cope with tentative state then this would be an easy way to make it more reliable. If + 3 makes sense for the next person is something I want to doubt, however. Special thanks go to @Wireheadbe for pursuing and testing this. (cherry picked from commit f2e60c1)
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
Describe the bug
After a reboot, or complete link loss, dpinger for the secondary IPv6 gateway fails to restart. This happened on previous versions as well (24.x)
To Reproduce
Steps to reproduce the behavior:
-Reboot the system
-Log in
-Dpinger not started for secondary IPv6 gateway
Expected behavior
-Dpinger to be started for IPv6 interface
Describe alternatives you considered
-Restarting dpinger manually works (naturally, this is a workaround)
-Going to the gateways page and saving, reconfigured dpinger, upon which all dpingers start correctly
Screenshots
Relevant log files
See above screenshots. Upon reboot, a dpinger_configure is exectuted for all Gateways, except WAN02_DHCP6.
It's seemingly skipped because "WAN02_DHCP6 IPv6 interface address could not be found, skipping."
I also see a mention of "Skipping gateway WAN02_DHCP6 due to empty 'gateway' property."
But it has an IPv6 address (Interfaces -> overview):
Additional context
The opnsense system uses a LAGG towards a switch. If the switch goes completely down (as mentioned above), the same behaviour occurs: WAN01 (IPv4 & IPv6) -> dpinger starts correctly. WAN02: only IPv4 dpinger starts.
Environment
Software version used and hardware type if relevant, e.g.:
OPNsense 24.1.6 (amd64).
Intel(R) Core(TM) i5-7400 CPU
NIC: Intel igb -> LAGG to Switch.
The text was updated successfully, but these errors were encountered: