-
Notifications
You must be signed in to change notification settings - Fork 702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
radvd stops announcing IPv6 prefix after a while #4338
Comments
Thank you for the details. Just for more data points... the following fixes the issue temporarily?
Cheers, |
I'm also seeing this issue. The next time it pops up, I'll run that command and report back. |
Also PPPoE? It looks a bit like the adapter is gone and a new one with the same name exists but radvd doesn’t know and still references the old one which obviously doesn’t work. |
Sorry, nope. My WAN is a normal ethernet connection (Verizon FiOS). |
Exactly. As soon as I run this command IPv6 is working again. Before running this command radvd does not send any RAs and does not react to solicit requests. For some reason today it just took ~1 day before radvd started to hang. As soon as the process is restarted through the command provided above it's working again (temporarily). Edit: I don't see any events in ppps.log since the last reboot (5 days ago). I guess this means that the PPPoE link remains unchanged (it should in my case). |
There seems to be an issue with list management code in the kernel regarding multicast. Radvd reload behaviour didn't change as far as I can see and you guys are right that the interfaces do not change as well. Initially we added the join/leave to cope with updates in ravdv 2.x which worked well on 11.x but 12.x seems to be allergic to too many iterations. I don't see any readily available commit to cherry-pick so this will take a while to find the cause. Cheers, |
I wonder if a custom cronjob to restart radvd periodically would be sufficient as a workaround. |
It should be. I have a CRON job in place right now as losing internet connectivity every other day is not an option (it's required for emergency calls over here). I don't know if you guys want to ship such an uber-ugly hack in the default distribution though. Does anybody know which upstream issue we are talking about? I guess it's an issue within HardenedBSD kernel's, or am I wrong here? I could not find a bug tracker from them which is kinda weird. Does anybody have a link to the upstream issue? |
The issue likely comes from HardenedBSD's upstream, FreeBSD. HardenedBSD has only few changes to kernel networking code--changes that wouldn't cause this behavior (like enabling IP ID randomization by default.) |
I experienced this issue under 20.1, too after uptime of 30 days. |
Asserting the same issue in 20.1 is speculation without the appropriate data points to support this. While it’s annoying, please refrain from telling how annoying this is for the sake of keeping this technical and on point. Cheers, |
The data points are:
|
Hm, upon user requests we moved from radvd version 1 to version 2 with 20.1.6. Could it be that 20.1 - 20.1.5 were not showing this issue? Given this is true the kernel bug always existed but moving from 11.2 to 12.1 operating system version made this worse. Quite a bit of coincidence. I'm not sure if this is the same issue or two separate issues that look the same. |
On 20.1.9 (two instances, APU2, Config: Unmanaged or Stateless), I can't confirm the bug. Even after a long uptime (> 30 days) everything is fine. |
@robgnu yes this definitively involves some sort of dynamic interface creation. I wrote a POC yesterday that wouldn't trigger the bug on hardware interfaces, but here it seems that GIF and PPPoE can interfere. |
For everyone affected please try this version based on a debug patch that realigns the interface index with every invoke in case it changed...
It also has an additional debug line should the interface not be available at all during connectivity issues / reconfigure. Cheers, |
Installed.. it's a very intermittent bug, I've not seen it for a while then yesterday it appeared again whilst I was setting up a new FreeBSD HyperV instance... bloody odd. |
I mean by that that the new instance was not getting an address! |
The problem @marjohn56 describes is similar to what I had (which I described above) but only on a cold boot...a reboot was never a problem with not getting an ipv6 connection for hosts. I noticed it after we had two power outages where I was offline for about an hour each time. I don't have a separate test system unfortunately...but will test when I can. Installed the patch...thanks. |
Thanks @fichtner for providing this version. I've occasionally run into the same issue as described here, but I haven't found a way to replicate it. I've installed your patched version and maybe it will yield some new information. 👍 |
@fichtner - I think to debug this a little more we need to be able to get debug logs. Man says it needs to run foreground and have the level and the logfile specified. I think we should create a specific patch for this, thoughts? |
I have also seen this issue. |
I tend to agree with @fichtner that it would be preferable to keep radvd, as he said, it served well for many years without an issue, Although the patch for rtadvd works and is stable it's not a finished article, There are features in radvd that do not have an equivalent within rtadvd, though no-one using rtadvd has yet missed them; on the flip-side rtadvd can be left running and updates to interfaces and config are just signalled to it by its control program - which is useful. Something I'm about to look at is will a change in the interface address ( such as what would happen on a dhcp6c lease change ) be reflected by a a deprecation and re-advertise of the new lease just by a config change. |
Isn't that the same case for radvd? For anything that is not bug-bound beyond the documented purposes?
We are weighing whether or not a long running service works as expected (as per the previous question). Wikipedia says radvd is 25 years old and still maintained. That is roughly the same time frame, isn't it? In these regards I see no proof that rtadvd is better than radvd other than rtadvd is probably a better fit for FreeBSD, but that is more a BSD thing than bad for Linux in general. So FreeBSD needs to put functional code in the port to make it work but that doesn't mean it doesn't work fine elsewhere? Every other year we feel our expectations broken by non-functional states in FreeBSD-centric software/implementations. If we switch to rtadvd we have to know it is actually a lot better, but personally I have no data other than this non-representative thread. Cheers, |
For me on US Spectrum cable, radvd 2.18 would crash immediately. Only have 1 interface. rtadvd Patch 9a4a908 – 10+ days no problems Could the problem simply be radvd 2.18? |
@agh1701 changes from 2.18 to 2.19 are minimal upstream, but we changed the FreeBSD patching to avoid an alleged issue with FreeBSD 12... |
radvd-2.19 has been rocksolid here |
9+ days solid radvd 2.19 |
I also can confirm: 5+ days and no issues anymore with radvd. |
Thanks for the feedback! So we will be shipping the updated radvd in 20.7.8 and 21.1-RC1 and if that at least improves the situation also provide a patch to FreeBSD ports. Cheers, |
When is 20.7.8? |
Same here with 2.19, work perfectly ;) So thanks ;) |
Likely next week. |
Running Radvd 2.19 for 5 hours, so far so good! Everything seems to be fine so far but there are some errors reported in the log file
|
So to understand.. to good people in this thread reporting 2.19 behaves well. Does radvd-2.19 alone make the difference? This almost sounds too good to be true. (Though I really hope it is true) Looking at radvd-project/radvd@v2.18...v2.19 there’re only three slightly relevant changes, namely commit 9644266, dec4402 and 0d891e8, the issues they address hardly manifest as stochastically as what we’ve seen. 9644266 maybe, but still..🤔 Finger crossed🤞 Also what’s the alleged FreeBSD12 issue? Thanks a lot! |
@ivwang the relevant parts are FreeBSD specific, see opnsense/ports@a5ace74ef2273eeb7 and opnsense/ports@54152320fa817 I don't think any changes of upstream are at play here. Note the absence of |
By the way, the patch has been working flawlessly. IPv6 has been working for around 8-11 days now. With the old radvd, it only worked for 3 days and then stopped announcing and working like clockwork. Thanks! |
just an update: Radvd has been behaving since I swapped back via the patch. About a week of uptime. |
Great to hear, looks like raided-2.19 does improve it for almost everyone.. getting my hope up for the next release :) |
I upgraded from 20.1 to 20.7.7. After about 1 or 2 days my problems with radvd popped up again. I decided to manually install radvd 2.19 via the above pkg -f command. This I found in syslog:
This is my radvd.conf:
Even after I downgraded radvd version corresponding to the 20.7 release it stays broken. |
Please don’t hijack this thread. As I said you have a different issue and you need to provide proper amount of details (see bug report template).
… On 17. Jan 2021, at 23:23, pmisch ***@***.***> wrote:
I upgraded from 20.1 to 20.7.7. After about 1 or 2 days my problems with radvd popped up again. I decided to manuall install radvd 2.19 via the above pkg -f command.
Now my IPv6 connectivity is completely broken, even my upstream isn't getting an IPv6 address any more... thus no router advertisements are being sent. I don't know what's happening, but I think I will be downgrading to 20.1 once again.
This I found in syslog:
Jan 17 11:45:52 uprouter1 opnsense[67855]: /usr/local/etc/rc.newwanipv6: The command '/usr/local/sbin/radvd -p /var/run/radvd.pid -C /var/etc/radvd.conf -m syslog' returned exit code '255', the output was ''
This is my radvd.conf:
# Automatically generated, do not edit
# Generated for DHCPv6 server opt5
interface lagg0_vlan51 {
AdvSendAdvert on;
MinRtrAdvInterval 10;
MaxRtrAdvInterval 30;
AdvLinkMTU 1500;
AdvDefaultPreference medium;
AdvManagedFlag on;
AdvOtherConfigFlag on;
prefix 2001:1234:1234:6000::/64 {
DeprecatePrefix on;
AdvOnLink on;
AdvAutonomous off;
};
RDNSS 2001:1234:1234:5000::12 2a00:fe0:3f:3::6 {
};
DNSSL mydom.ain {
};
};
# Generated for DHCPv6 server opt1
interface lagg0_vlan5 {
AdvSendAdvert on;
MinRtrAdvInterval 200;
MaxRtrAdvInterval 600;
AdvLinkMTU 1492;
AdvDefaultPreference medium;
AdvManagedFlag on;
AdvOtherConfigFlag on;
prefix 2001:1234:1234:1b::/64 {
DeprecatePrefix on;
AdvOnLink on;
AdvAutonomous off;
};
RDNSS 2001:1234:1234:5000::12 2a00:fe0:3f:3::6 {
};
DNSSL mydom.ain {
};
};
# Generated for DHCPv6 server opt2
interface lagg0_vlan13 {
AdvSendAdvert on;
MinRtrAdvInterval 10;
MaxRtrAdvInterval 30;
AdvLinkMTU 1500;
AdvDefaultPreference medium;
AdvManagedFlag on;
AdvOtherConfigFlag on;
prefix 2001:1234:1234:5000::/64 {
DeprecatePrefix on;
AdvOnLink on;
AdvAutonomous off;
};
RDNSS 2001:1234:1234:5000::12 2a00:fe0:3f:3::6 {
};
DNSSL mydom.ain {
};
};
# Generated for DHCPv6 server opt4
interface lagg0_vlan10 {
AdvSendAdvert on;
MinRtrAdvInterval 200;
MaxRtrAdvInterval 600;
AdvLinkMTU 1500;
AdvDefaultPreference medium;
prefix 2001:1234:1234:1001::/64 {
DeprecatePrefix on;
AdvOnLink on;
AdvAutonomous on;
};
RDNSS 2001:1234:1234:5000::12 2a00:fe0:3f:3::6 {
};
DNSSL mydom.ain {
};
};
Even after I downgraded the radvd version back to the one corresponding the the 20.7 release it stays completely broken.
I downgraded to 20.1 and everything is working fine at least so far.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
20.7.8 addressed this issue. If you have issues with IPv6 please open tickets with enough relevant info. Thanks, |
Hi! I'm currently using pfSense and the same defect is driving me nuts (which pfSense dev are failing to investigate). I want to move to OPNsense, but I'm not familiar with the OPNsense release cycle. Can I assume that if I install the latest version of OPNsense this defect will be fixed with the proposed patch? Thanks in advance. |
Hi @robel, Install 20.7 and update to 20.7.8 where fixed, or install 21.1-RC1 where this is already fixed (and which directly updates to 21.1 later this week). Cheers, |
I'm currently running 21.1.1 and still seem to be experiencing this same issue where radvd stops announcing prefixes. I have router advertisements set to assisted and after several hours, it stops working. Restarting radvd from the web ui fixes it for a few hours, but this problem then comes back after a few hours. |
I can also confirm this issue with my current OPNsense install:
Data:
|
Unfortunately this issue has cost us a lot of community support time and we do not see any easy way forward chasing a kernel bug we can't reproduce any longer. The same issue also affects ISC dhcpd in IPv6 mode but the radvd code was vastly different and even though there was controversy over BSD support I don't see a reason to blame radvd and its patching any longer. If anything we just made the bug harder to trigger. For more details see #4691 We will target 13.0 which is currently being planned for 22.1 and along the way I hope that this issue simply disappears. If anyone is hoping to have this fixed sooner please find a reliable way to trigger and/or confirm there is a specific patch available in FreeBSD that addresses this. We are happy to be of further assistance, but as I said not on community support time. Cheers, |
A good candidate for this issue is opnsense/src@93e9cefd053b and from the original bug report you can see this affects 12.0-RELEASE, see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233683 We've been using version 12.1 since OPNsense 20.7 which is when this bug started to happen... ;) The actual trigger for this bug was to add the existing IPv6 via ifconfig utility again in order to see the bug so I suppose some renew situation trashed the ability of radvd to act in the multicast group. Cheers, |
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md
I have searched the existing issues and I'm convinced that mine is new.
Describe the bug
After upgrading from 20.1 to 20.7.2 I am losing IPv6 internet connectivity after ~50-60 hours. This happens because radvd stops announcing the prefix and does not reply to solicit messages any more.
This has nothing to do with chaning IPv6 prefixes in my case, as there is no PPPoE reconnect and no prefix change request from my ISP (my ISP enforces this every 180 days).
Restarting radvd from the web GUI fixes this.
To Reproduce
Possibly related: #4282 (this issue mentiones reconnects, which do not apply in my case)
Possibly related forum threads:
https://forum.opnsense.org/index.php?topic=19032.0
https://forum.opnsense.org/index.php?topic=18868.0
https://forum.opnsense.org/index.php?topic=18549.0
Expected behavior
radvd should always announce the IPv6 prefix without hanging after a while :)
Relevant log files
truss
output of a defective radvd and it looks very interesting:Defective truss output on radvd process
truss output of working radvd (still advertising routes)
I am not a BSD guy but the following lines in the output of the broken radvd instance look very suspicious:
The Can't assign requested address is also present in the working radvd truss output from time to time. The Cannot allocate memory' sounds very fishy though. Maybe it's an issue with setting up the multicast group?
Environment
Software version used and hardware type if relevant.
OPNsense 20.7.2-amd64, openssl
APU2C4
Network Intel® I210-AT
PPPoE-connected fiber modem (DHCPv6-PD)
I did not experience the issue in OPNsense 20.1.
The text was updated successfully, but these errors were encountered: