Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

radvd stops announcing IPv6 prefix after a while #4338

Closed
2 tasks done
klada opened this issue Sep 9, 2020 · 219 comments
Closed
2 tasks done

radvd stops announcing IPv6 prefix after a while #4338

klada opened this issue Sep 9, 2020 · 219 comments
Assignees
Labels
upstream Third party issue
Milestone

Comments

@klada
Copy link

klada commented Sep 9, 2020

Important notices
Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

After upgrading from 20.1 to 20.7.2 I am losing IPv6 internet connectivity after ~50-60 hours. This happens because radvd stops announcing the prefix and does not reply to solicit messages any more.

This has nothing to do with chaning IPv6 prefixes in my case, as there is no PPPoE reconnect and no prefix change request from my ISP (my ISP enforces this every 180 days).

Restarting radvd from the web GUI fixes this.

To Reproduce

  1. Connect to PPPoE network with DHCPv6-PD
  2. LAN interface with IPv6 tracking on WAN
  3. IPv6 will be working in the LAN for a while (round about two days)
  4. After a while IPv6 connectivity is lost. The reason is that the prefix is no longer announced. It looks like radvd is hanging (see logs down below which support this theory).
  5. Restart radvd from web GUI and have a working IPv6 network again for the next ~50-60 hours

Possibly related: #4282 (this issue mentiones reconnects, which do not apply in my case)

Possibly related forum threads:

https://forum.opnsense.org/index.php?topic=19032.0
https://forum.opnsense.org/index.php?topic=18868.0
https://forum.opnsense.org/index.php?topic=18549.0

Expected behavior

radvd should always announce the IPv6 prefix without hanging after a while :)

Relevant log files

  • radvd does not crash. The process remains running and there are no error logs.
  • There are no relevant log entries which show any issues with interfaces/networks/reconnects/...
  • I have checked the truss output of a defective radvd and it looks very interesting:
Defective truss output on radvd process
truss -p 14675
ppoll(0x64c6da008a0,0x2,0x380a3796c28,0x64c6da00880) = 0 (0x0)
socket(PF_INET,SOCK_DGRAM|SOCK_CLOEXEC,0)        = 8 (0x8)
ioctl(8,SIOCGIFINDEX,0x6f78eb00ac00)             = 0 (0x0)
close(8)                                         = 0 (0x0)
ioctl(6,SIOCGIFFLAGS,0x64c6da007f8)              = 0 (0x0)
ioctl(6,SIOCGIFMTU,0x64c6da007f8)                = 0 (0x0)
__sysctl(0x6f78eb00abd0,0x6,0x0,0x6f78eb00abc8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00abd0,0x6,0x64c2d333000,0x6f78eb00abc8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ab40,0x6,0x0,0x6f78eb00ab38,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ab40,0x6,0x64c2d333000,0x6f78eb00ab38,0x0,0x0) = 0 (0x0)
setsockopt(6,IPPROTO_IPV6,IPV6_LEAVE_GROUP,0x64c6da00800,20) ERR#49 'Can't assign requested address'
setsockopt(6,IPPROTO_IPV6,IPV6_JOIN_GROUP,0x64c6da00800,20) ERR#12 'Cannot allocate memory'
__sysctl(0x6f78eb0083a0,0x2,0x6f78eb00a8f0,0x6f78eb008398,0x0,0x0) = 0 (0x0)
getpid()                                         = 14675 (0x3953)
sendto(5,"<27>1 2020-09-09T12:24:53.557337"...,116,0,NULL,0) = 116 (0x74)
ppoll(0x64c6da008a0,0x2,0x380a3796c28,0x64c6da00880) = 0 (0x0)
socket(PF_INET,SOCK_DGRAM|SOCK_CLOEXEC,0)        = 8 (0x8)
ioctl(8,SIOCGIFINDEX,0x6f78eb00ac00)             = 0 (0x0)
close(8)                                         = 0 (0x0)
ioctl(6,SIOCGIFFLAGS,0x64c6da007f8)              = 0 (0x0)
ioctl(6,SIOCGIFMTU,0x64c6da007f8)                = 0 (0x0)
__sysctl(0x6f78eb00abd0,0x6,0x0,0x6f78eb00abc8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00abd0,0x6,0x64c2d333000,0x6f78eb00abc8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ab40,0x6,0x0,0x6f78eb00ab38,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ab40,0x6,0x64c2d333000,0x6f78eb00ab38,0x0,0x0) = 0 (0x0)
setsockopt(6,IPPROTO_IPV6,IPV6_LEAVE_GROUP,0x64c6da00800,20) ERR#49 'Can't assign requested address'
setsockopt(6,IPPROTO_IPV6,IPV6_JOIN_GROUP,0x64c6da00800,20) ERR#12 'Cannot allocate memory'
__sysctl(0x6f78eb0083a0,0x2,0x6f78eb00a8f0,0x6f78eb008398,0x0,0x0) = 0 (0x0)
getpid()                                         = 14675 (0x3953)
sendto(5,"<27>1 2020-09-09T12:25:01.135191"...,110,0,NULL,0) = 110 (0x6e)
ppoll(0x64c6da008a0,0x2,0x380a3796c28,0x64c6da00880) = 0 (0x0)
socket(PF_INET,SOCK_DGRAM|SOCK_CLOEXEC,0)        = 8 (0x8)
ioctl(8,SIOCGIFINDEX,0x6f78eb00ac00)             = 0 (0x0)
close(8)                                         = 0 (0x0)
ioctl(6,SIOCGIFFLAGS,0x64c6da007f8)              = 0 (0x0)
ioctl(6,SIOCGIFMTU,0x64c6da007f8)                = 0 (0x0)
__sysctl(0x6f78eb00abd0,0x6,0x0,0x6f78eb00abc8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00abd0,0x6,0x64c2d333000,0x6f78eb00abc8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ab40,0x6,0x0,0x6f78eb00ab38,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ab40,0x6,0x64c2d333000,0x6f78eb00ab38,0x0,0x0) = 0 (0x0)
setsockopt(6,IPPROTO_IPV6,IPV6_LEAVE_GROUP,0x64c6da00800,20) ERR#49 'Can't assign requested address'
setsockopt(6,IPPROTO_IPV6,IPV6_JOIN_GROUP,0x64c6da00800,20) ERR#12 'Cannot allocate memory'
__sysctl(0x6f78eb0083a0,0x2,0x6f78eb00a8f0,0x6f78eb008398,0x0,0x0) = 0 (0x0)
getpid()                                         = 14675 (0x3953)
sendto(5,"<27>1 2020-09-09T12:25:08.924928"...,117,0,NULL,0) = 117 (0x75)

truss output of working radvd (still advertising routes)
ioctl(6,SIOCGIFFLAGS,0x64c6da007f8)              = 0 (0x0)
ioctl(6,SIOCGIFMTU,0x64c6da007f8)                = 0 (0x0)
__sysctl(0x6f78eb00abd0,0x6,0x0,0x6f78eb00abc8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00abd0,0x6,0x64c2d333000,0x6f78eb00abc8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ab40,0x6,0x0,0x6f78eb00ab38,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ab40,0x6,0x64c2d333000,0x6f78eb00ab38,0x0,0x0) = 0 (0x0)
setsockopt(6,IPPROTO_IPV6,IPV6_LEAVE_GROUP,0x64c6da00800,20) = 0 (0x0)
setsockopt(6,IPPROTO_IPV6,IPV6_JOIN_GROUP,0x64c6da00800,20) = 0 (0x0)
sendmsg(6,{{ AF_INET6 [ff02::1]:58 },28,[{"\M^F\0\0\0@\0\0\M-4\0\0\0\0\0\0"...,120}],1,{{level=IPPROTO_IPV6,type=IPV6_PKTINFO,data={0xfe,0x80,0x00,0x00,0x00,0x00,0x00,0x00,0x02,0x0d,0xb9,0xff,0xfe,0x4a,0x7c,0x02,0x13,0x00,0x00,0x00}}},40,0},0) = 120 (0x78)
ppoll(0x64c6da008a0,0x2,0x380a3796c28,0x64c6da00880) = 1 (0x1)
recvmsg(6,{{ AF_INET6 [fe80::20d:b9ff:fe4a:7c02]:0 },28,[{"\M^F\0\M^KI@\0\0\M-4\0\0\0\0\0\0"...,1500}],1,{{level=IPPROTO_IPV6,type=IPV6_PKTINFO,data={0xff,0x02,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x01,0x13,0x00,0x00,0x00}},{level=IPPROTO_IPV6,type=IPV6_HOPLIMIT,data={0xff,0x00,0x00,0x00}}},64,0},0) = 120 (0x78)
__sysctl(0x6f78eb00ac20,0x6,0x0,0x6f78eb00ac18,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ac20,0x6,0x64c2d333000,0x6f78eb00ac18,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00abe0,0x6,0x0,0x6f78eb00abd8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00abe0,0x6,0x64c2d333000,0x6f78eb00abd8,0x0,0x0) = 0 (0x0)
ppoll(0x64c6da008a0,0x2,0x380a3796c28,0x64c6da00880) = 0 (0x0)
socket(PF_INET,SOCK_DGRAM|SOCK_CLOEXEC,0)        = 8 (0x8)
ioctl(8,SIOCGIFINDEX,0x6f78eb00ac00)             = 0 (0x0)
close(8)                                         = 0 (0x0)
ioctl(6,SIOCGIFFLAGS,0x64c6da007f8)              = 0 (0x0)
ioctl(6,SIOCGIFMTU,0x64c6da007f8)                = 0 (0x0)
__sysctl(0x6f78eb00abd0,0x6,0x0,0x6f78eb00abc8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00abd0,0x6,0x64c2d333000,0x6f78eb00abc8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ab40,0x6,0x0,0x6f78eb00ab38,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ab40,0x6,0x64c2d333000,0x6f78eb00ab38,0x0,0x0) = 0 (0x0)
setsockopt(6,IPPROTO_IPV6,IPV6_LEAVE_GROUP,0x64c6da00800,20) ERR#49 'Can't assign requested address'
setsockopt(6,IPPROTO_IPV6,IPV6_JOIN_GROUP,0x64c6da00800,20) = 0 (0x0)
sendmsg(6,{{ AF_INET6 [ff02::1]:58 },28,[{"\M^F\0\0\0@\0\0\M-4\0\0\0\0\0\0"...,120}],1,{{level=IPPROTO_IPV6,type=IPV6_PKTINFO,data={0xfe,0x80,0x00,0x00,0x00,0x00,0x00,0x00,0x02,0x0d,0xb9,0xff,0xfe,0x4a,0x7c,0x02,0x03,0x00,0x00,0x00}}},40,0},0) = 120 (0x78)
ppoll(0x64c6da008a0,0x2,0x380a3796c28,0x64c6da00880) = 1 (0x1)
recvmsg(6,{{ AF_INET6 [fe80::20d:b9ff:fe4a:7c02]:0 },28,[{"\M^F\0\M^K\M-i@\0\0\M-4\0\0\0\0"...,1500}],1,{{level=IPPROTO_IPV6,type=IPV6_PKTINFO,data={0xff,0x02,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x01,0x03,0x00,0x00,0x00}},{level=IPPROTO_IPV6,type=IPV6_HOPLIMIT,data={0xff,0x00,0x00,0x00}}},64,0},0) = 120 (0x78)
__sysctl(0x6f78eb00ac20,0x6,0x0,0x6f78eb00ac18,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00ac20,0x6,0x64c2d333000,0x6f78eb00ac18,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00abe0,0x6,0x0,0x6f78eb00abd8,0x0,0x0) = 0 (0x0)
__sysctl(0x6f78eb00abe0,0x6,0x64c2d333000,0x6f78eb00abd8,0x0,0x0) = 0 (0x0)

I am not a BSD guy but the following lines in the output of the broken radvd instance look very suspicious:

setsockopt(6,IPPROTO_IPV6,IPV6_LEAVE_GROUP,0x64c6da00800,20) ERR#49 'Can't assign requested address'
setsockopt(6,IPPROTO_IPV6,IPV6_JOIN_GROUP,0x64c6da00800,20) ERR#12 'Cannot allocate memory'

The Can't assign requested address is also present in the working radvd truss output from time to time. The Cannot allocate memory' sounds very fishy though. Maybe it's an issue with setting up the multicast group?

Environment
Software version used and hardware type if relevant.

OPNsense 20.7.2-amd64, openssl
APU2C4
Network Intel® I210-AT
PPPoE-connected fiber modem (DHCPv6-PD)

I did not experience the issue in OPNsense 20.1.

@fichtner fichtner added the support Community support label Sep 9, 2020
@fichtner
Copy link
Member

fichtner commented Sep 9, 2020

Thank you for the details. Just for more data points... the following fixes the issue temporarily?

# pluginctl -s radvd restart

Cheers,
Franco

@lattera
Copy link
Contributor

lattera commented Sep 9, 2020

I'm also seeing this issue. The next time it pops up, I'll run that command and report back.

@fichtner
Copy link
Member

fichtner commented Sep 9, 2020

Also PPPoE? It looks a bit like the adapter is gone and a new one with the same name exists but radvd doesn’t know and still references the old one which obviously doesn’t work.

@lattera
Copy link
Contributor

lattera commented Sep 9, 2020

Sorry, nope. My WAN is a normal ethernet connection (Verizon FiOS).

@lattera
Copy link
Contributor

lattera commented Sep 9, 2020

I'm using TunnelBroker to get IPv6 support. Relevant screenshots attached.

2020-09-09_ipv6-01
2020-09-09_ipv6-02

@klada
Copy link
Author

klada commented Sep 10, 2020

@fichtner

Thank you for the details. Just for more data points... the following fixes the issue temporarily?

# pluginctl -s radvd restart

Exactly. As soon as I run this command IPv6 is working again. Before running this command radvd does not send any RAs and does not react to solicit requests. For some reason today it just took ~1 day before radvd started to hang.

As soon as the process is restarted through the command provided above it's working again (temporarily).

Edit: I don't see any events in ppps.log since the last reboot (5 days ago). I guess this means that the PPPoE link remains unchanged (it should in my case).

@fichtner
Copy link
Member

There seems to be an issue with list management code in the kernel regarding multicast. Radvd reload behaviour didn't change as far as I can see and you guys are right that the interfaces do not change as well. Initially we added the join/leave to cope with updates in ravdv 2.x which worked well on 11.x but 12.x seems to be allergic to too many iterations. I don't see any readily available commit to cherry-pick so this will take a while to find the cause.

Cheers,
Franco

@fichtner fichtner added upstream Third party issue and removed support Community support labels Sep 11, 2020
@fichtner fichtner self-assigned this Sep 11, 2020
@fichtner fichtner added this to the 21.1 milestone Sep 11, 2020
@lattera
Copy link
Contributor

lattera commented Sep 11, 2020

I wonder if a custom cronjob to restart radvd periodically would be sufficient as a workaround.

@klada
Copy link
Author

klada commented Sep 11, 2020

I wonder if a custom cronjob to restart radvd periodically would be sufficient as a workaround.

It should be. I have a CRON job in place right now as losing internet connectivity every other day is not an option (it's required for emergency calls over here). I don't know if you guys want to ship such an uber-ugly hack in the default distribution though.

Does anybody know which upstream issue we are talking about? I guess it's an issue within HardenedBSD kernel's, or am I wrong here? I could not find a bug tracker from them which is kinda weird. Does anybody have a link to the upstream issue?

@lattera
Copy link
Contributor

lattera commented Sep 11, 2020

The issue likely comes from HardenedBSD's upstream, FreeBSD. HardenedBSD has only few changes to kernel networking code--changes that wouldn't cause this behavior (like enabling IP ID randomization by default.)

@greggitter
Copy link

greggitter commented Sep 11, 2020

I wonder if a custom cronjob to restart radvd periodically would be sufficient as a workaround.

I have a similar issue where radvd is not responding to solicitations directly, but doesn't seem to fail at sending unsolicited advs. So the situation is a host solicits and gets nothing back, finally once the unsolicited interval triggers, the host can then establish it's ipv6 address and connect. That creates a delay others reported in how soon a host can establish it's connection.

The other symptom is on a cold boot, there is no ipv6 connectivity until re-saving both the wan interface settings followed by re-saving lan interface settings (this happened recently when I noticed nagios was failing an ipv6 ping for 90 minutes). This all seems to be related. What I haven't tested is after cold start whether restarting radvd also solves the issue.
radvd

@imp1sh
Copy link

imp1sh commented Sep 22, 2020

I experienced this issue under 20.1, too after uptime of 30 days.
Here's another thread that seems to be related:
https://forum.opnsense.org/index.php?topic=18663.0
I hope this gets fixed soon. It's really annoying.

@fichtner
Copy link
Member

Asserting the same issue in 20.1 is speculation without the appropriate data points to support this.

While it’s annoying, please refrain from telling how annoying this is for the sake of keeping this technical and on point.

Cheers,
Franco

@imp1sh
Copy link

imp1sh commented Sep 22, 2020

The data points are:

  • OPNSense 20.1.9_1-amd64
  • radvd suddenly stops sending router advertisements (in my case after 30 days of uptime)
  • radvd keeps running
  • stateless IPv6 for all clients fail, default gateway vanishes
  • restarting radvd fixes it
    Actually I downgraded to 20.1 because of that bug in 20.7. As far as I can tell it takes a lot more time for this problem to show up in 20.1 but the symptoms are exactly the same.
    If you need anything else, just ask me, glad to help.

@fichtner
Copy link
Member

Hm, upon user requests we moved from radvd version 1 to version 2 with 20.1.6. Could it be that 20.1 - 20.1.5 were not showing this issue?

Given this is true the kernel bug always existed but moving from 11.2 to 12.1 operating system version made this worse. Quite a bit of coincidence. I'm not sure if this is the same issue or two separate issues that look the same.

fichtner added a commit to opnsense/ports that referenced this issue Sep 23, 2020
@robgnu
Copy link

robgnu commented Sep 24, 2020

On 20.1.9 (two instances, APU2, Config: Unmanaged or Stateless), I can't confirm the bug. Even after a long uptime (> 30 days) everything is fine.

@fichtner
Copy link
Member

@robgnu yes this definitively involves some sort of dynamic interface creation. I wrote a POC yesterday that wouldn't trigger the bug on hardware interfaces, but here it seems that GIF and PPPoE can interfere.

@fichtner
Copy link
Member

fichtner commented Sep 24, 2020

For everyone affected please try this version based on a debug patch that realigns the interface index with every invoke in case it changed...

# pkg add -f https://pkg.opnsense.org/FreeBSD:12:amd64/20.7/misc/radvd-2.18_2.txz

It also has an additional debug line should the interface not be available at all during connectivity issues / reconfigure.

Cheers,
Franco

@marjohn56
Copy link
Member

Installed.. it's a very intermittent bug, I've not seen it for a while then yesterday it appeared again whilst I was setting up a new FreeBSD HyperV instance... bloody odd.

@marjohn56
Copy link
Member

I mean by that that the new instance was not getting an address!

@greggitter
Copy link

The problem @marjohn56 describes is similar to what I had (which I described above) but only on a cold boot...a reboot was never a problem with not getting an ipv6 connection for hosts. I noticed it after we had two power outages where I was offline for about an hour each time. I don't have a separate test system unfortunately...but will test when I can. Installed the patch...thanks.

@9numbernine9
Copy link

Thanks @fichtner for providing this version. I've occasionally run into the same issue as described here, but I haven't found a way to replicate it. I've installed your patched version and maybe it will yield some new information. 👍

@marjohn56
Copy link
Member

@fichtner - I think to debug this a little more we need to be able to get debug logs. Man says it needs to run foreground and have the level and the logfile specified. I think we should create a specific patch for this, thoughts?

@m-schaeffler
Copy link

I have also seen this issue.

@marjohn56
Copy link
Member

I tend to agree with @fichtner that it would be preferable to keep radvd, as he said, it served well for many years without an issue, Although the patch for rtadvd works and is stable it's not a finished article, There are features in radvd that do not have an equivalent within rtadvd, though no-one using rtadvd has yet missed them; on the flip-side rtadvd can be left running and updates to interfaces and config are just signalled to it by its control program - which is useful. Something I'm about to look at is will a change in the interface address ( such as what would happen on a dhcp6c lease change ) be reflected by a a deprecation and re-advertise of the new lease just by a config change.

@fichtner
Copy link
Member

fichtner commented Jan 2, 2021

  • rtadvd works as documented

Isn't that the same case for radvd? For anything that is not bug-bound beyond the documented purposes?

  • rtadvd was in FreeBSD since KAME IPv6 was first integrated

We are weighing whether or not a long running service works as expected (as per the previous question). Wikipedia says radvd is 25 years old and still maintained. That is roughly the same time frame, isn't it?

In these regards I see no proof that rtadvd is better than radvd other than rtadvd is probably a better fit for FreeBSD, but that is more a BSD thing than bad for Linux in general. So FreeBSD needs to put functional code in the port to make it work but that doesn't mean it doesn't work fine elsewhere?

Every other year we feel our expectations broken by non-functional states in FreeBSD-centric software/implementations. If we switch to rtadvd we have to know it is actually a lot better, but personally I have no data other than this non-representative thread.

Cheers,
Franco

@agh1701
Copy link

agh1701 commented Jan 4, 2021

For me on US Spectrum cable, radvd 2.18 would crash immediately. Only have 1 interface.

rtadvd Patch 9a4a908 – 10+ days no problems
radvd 2.19 – 5+ days no problems

Could the problem simply be radvd 2.18?
Thanks @marjohn56 and @fichtner

@fichtner
Copy link
Member

fichtner commented Jan 4, 2021

@agh1701 changes from 2.18 to 2.19 are minimal upstream, but we changed the FreeBSD patching to avoid an alleged issue with FreeBSD 12...

@Wireheadbe
Copy link

radvd-2.19 has been rocksolid here

@agh1701
Copy link

agh1701 commented Jan 9, 2021

9+ days solid radvd 2.19

@robgnu
Copy link

robgnu commented Jan 9, 2021

I also can confirm: 5+ days and no issues anymore with radvd.

@fichtner
Copy link
Member

Thanks for the feedback! So we will be shipping the updated radvd in 20.7.8 and 21.1-RC1 and if that at least improves the situation also provide a patch to FreeBSD ports.

Cheers,
Franco

@agh1701
Copy link

agh1701 commented Jan 11, 2021

When is 20.7.8?

@zzyonn
Copy link

zzyonn commented Jan 11, 2021

Same here with 2.19, work perfectly ;)
It fixed another problem to me, with 2.18 when I changed any parameter it was like restarting the radvd (even if I changed the theme for example)

So thanks ;)

@fichtner
Copy link
Member

When is 20.7.8?

Likely next week.

@Staticznld
Copy link

Staticznld commented Jan 11, 2021

Running Radvd 2.19 for 5 hours, so far so good!

Everything seems to be fine so far but there are some errors reported in the log file

  • radvd[9177] our AdvPreferredLifetime on igb2 for xxxx:xxxx:839:bbbb:: doesn't agree with fe80::4262:31ff:fe02:cb19
  • radvd[9177] our AdvValidLifetime on igb2 for xxxx:xxxx:839:bbbb:: doesn't agree with fe80::4262:31ff:fe02:cb19

@ivwang
Copy link

ivwang commented Jan 13, 2021

So to understand.. to good people in this thread reporting 2.19 behaves well.

Does radvd-2.19 alone make the difference? This almost sounds too good to be true. (Though I really hope it is true)

Looking at radvd-project/radvd@v2.18...v2.19 there’re only three slightly relevant changes, namely commit 9644266, dec4402 and 0d891e8, the issues they address hardly manifest as stochastically as what we’ve seen. 9644266 maybe, but still..🤔

Finger crossed🤞

Also what’s the alleged FreeBSD12 issue?

Thanks a lot!

@fichtner
Copy link
Member

fichtner commented Jan 13, 2021

@ivwang the relevant parts are FreeBSD specific, see opnsense/ports@a5ace74ef2273eeb7 and opnsense/ports@54152320fa817

I don't think any changes of upstream are at play here. Note the absence of setsockopt(sock, IPPROTO_IPV6, IPV6_LEAVE_GROUP, &mreq, sizeof(mreq)); in version 2.19 and how people say now it works on FreeBSD 12 even though the code worked with it fine on FreeBSD 11.

@Aidan16
Copy link

Aidan16 commented Jan 13, 2021

By the way, the patch has been working flawlessly. IPv6 has been working for around 8-11 days now. With the old radvd, it only worked for 3 days and then stopped announcing and working like clockwork.

Thanks!

@maxfield-allison
Copy link
Contributor

just an update: Radvd has been behaving since I swapped back via the patch. About a week of uptime.

@ivwang
Copy link

ivwang commented Jan 16, 2021

Great to hear, looks like raided-2.19 does improve it for almost everyone.. getting my hope up for the next release :)

@imp1sh
Copy link

imp1sh commented Jan 17, 2021

I upgraded from 20.1 to 20.7.7. After about 1 or 2 days my problems with radvd popped up again. I decided to manually install radvd 2.19 via the above pkg -f command.
Now my IPv6 connectivity is completely broken, even my upstream isn't getting an IPv6 address any more... thus no router advertisements are being sent. I don't know what's happening actually, I see that a /48 prefix is being assigned to me but even from the firewall itself I cannot ping6 into the internet even though I'm having a default route assigned.

This I found in syslog:

Jan 17 11:45:52 uprouter1 opnsense[67855]: /usr/local/etc/rc.newwanipv6: The command '/usr/local/sbin/radvd -p /var/run/radvd.pid -C /var/etc/radvd.conf -m syslog' returned exit code '255', the output was '' 

This is my radvd.conf:

# Automatically generated, do not edit
# Generated for DHCPv6 server opt5
interface lagg0_vlan51 {
        AdvSendAdvert on;
        MinRtrAdvInterval 10;
        MaxRtrAdvInterval 30;
        AdvLinkMTU 1500;
        AdvDefaultPreference medium;
        AdvManagedFlag on;
        AdvOtherConfigFlag on;
        prefix 2001:1234:1234:6000::/64 {
                DeprecatePrefix on;
                AdvOnLink on;
                AdvAutonomous off;
        };
        RDNSS 2001:1234:1234:5000::12 2a00:fe0:3f:3::6 {
        };
        DNSSL mydom.ain {
        };
};
# Generated for DHCPv6 server opt1
interface lagg0_vlan5 {
        AdvSendAdvert on;
        MinRtrAdvInterval 200;
        MaxRtrAdvInterval 600;
        AdvLinkMTU 1492;
        AdvDefaultPreference medium;
        AdvManagedFlag on;
        AdvOtherConfigFlag on;
        prefix 2001:1234:1234:1b::/64 {
                DeprecatePrefix on;
                AdvOnLink on;
                AdvAutonomous off;
        };
        RDNSS 2001:1234:1234:5000::12 2a00:fe0:3f:3::6 {
        };
        DNSSL mydom.ain {
        };
};
# Generated for DHCPv6 server opt2
interface lagg0_vlan13 {
        AdvSendAdvert on;
        MinRtrAdvInterval 10;
        MaxRtrAdvInterval 30;
        AdvLinkMTU 1500;
        AdvDefaultPreference medium;
        AdvManagedFlag on;
        AdvOtherConfigFlag on;
        prefix 2001:1234:1234:5000::/64 {
                DeprecatePrefix on;
                AdvOnLink on;
                AdvAutonomous off;
        };
        RDNSS 2001:1234:1234:5000::12 2a00:fe0:3f:3::6 {
        };
        DNSSL mydom.ain {
        };
};
# Generated for DHCPv6 server opt4
interface lagg0_vlan10 {
        AdvSendAdvert on;
        MinRtrAdvInterval 200;
        MaxRtrAdvInterval 600;
        AdvLinkMTU 1500;
        AdvDefaultPreference medium;
        prefix 2001:1234:1234:1001::/64 {
                DeprecatePrefix on;
                AdvOnLink on;
                AdvAutonomous on;
        };
        RDNSS 2001:1234:1234:5000::12 2a00:fe0:3f:3::6 {
        };
        DNSSL mydom.ain {
        };
};

Even after I downgraded radvd version corresponding to the 20.7 release it stays broken.
I downgraded to 20.1 and everything is working fine at least so far.

@fichtner
Copy link
Member

fichtner commented Jan 18, 2021 via email

@fichtner
Copy link
Member

20.7.8 addressed this issue. If you have issues with IPv6 please open tickets with enough relevant info.

Thanks,
Franco

@robel
Copy link

robel commented Jan 27, 2021

Hi! I'm currently using pfSense and the same defect is driving me nuts (which pfSense dev are failing to investigate). I want to move to OPNsense, but I'm not familiar with the OPNsense release cycle. Can I assume that if I install the latest version of OPNsense this defect will be fixed with the proposed patch? Thanks in advance.

@fichtner
Copy link
Member

Hi @robel,

Install 20.7 and update to 20.7.8 where fixed, or install 21.1-RC1 where this is already fixed (and which directly updates to 21.1 later this week).

Cheers,
Franco

@alexyao2015
Copy link

I'm currently running 21.1.1 and still seem to be experiencing this same issue where radvd stops announcing prefixes. I have router advertisements set to assisted and after several hours, it stops working. Restarting radvd from the web ui fixes it for a few hours, but this problem then comes back after a few hours.

@QNimbus
Copy link

QNimbus commented Mar 7, 2021

I can also confirm this issue with my current OPNsense install:

I'm currently running 21.1.1 and still seem to be experiencing this same issue where radvd stops announcing prefixes. I have router advertisements set to assisted and after several hours, it stops working. Restarting radvd from the web ui fixes it for a few hours, but this problem then comes back after a few hours.

Data:

  • OPNsense 21.1.2-amd64
  • FreeBSD 12.1-RELEASE-p13-HBSD
  • radvd 2.19_1 (Linux/BSD IPv6 router advertisement daemon)
  • radvd suddenly stops sending router advertisements (within 24 hours of uptime)
  • radvd process keeps running (so my monit service doesn't restart it automatically)
  • radvd is running in 'managed' mode, so no SLAAC only DHCPv6
  • radvd restart fixes the issue (temporarily)

@fichtner
Copy link
Member

fichtner commented Mar 7, 2021

Unfortunately this issue has cost us a lot of community support time and we do not see any easy way forward chasing a kernel bug we can't reproduce any longer. The same issue also affects ISC dhcpd in IPv6 mode but the radvd code was vastly different and even though there was controversy over BSD support I don't see a reason to blame radvd and its patching any longer. If anything we just made the bug harder to trigger. For more details see #4691

We will target 13.0 which is currently being planned for 22.1 and along the way I hope that this issue simply disappears. If anyone is hoping to have this fixed sooner please find a reliable way to trigger and/or confirm there is a specific patch available in FreeBSD that addresses this.

We are happy to be of further assistance, but as I said not on community support time.

Cheers,
Franco

@fichtner
Copy link
Member

A good candidate for this issue is opnsense/src@93e9cefd053b and from the original bug report you can see this affects 12.0-RELEASE, see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233683

We've been using version 12.1 since OPNsense 20.7 which is when this bug started to happen... ;)

The actual trigger for this bug was to add the existing IPv6 via ifconfig utility again in order to see the bug so I suppose some renew situation trashed the ability of radvd to act in the multicast group.

Cheers,
Franco

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream Third party issue
Development

No branches or pull requests