-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pf: delayed responses on ICMP6 neighbor solicitation packets since 24.7.1 (redux) #218
Comments
It was introduced in 24.7.1 by FreeBSD and there is a ticket for it. #217 |
didn't mean to imply this is not useful... after reading the details this is very helpful indeed, thanks! |
How to scrape that info from pfctl verbatim:
|
I only looked for bugs in the core-repo, so I didn't find it. Did you want to have the output of your command? Let me know if I can check anything else.
|
You mentioned out6_block_packets so I had to chase down the raw output source for the FreeBSD ticket. I’m seeing the same. Testing against original 24.7 kernel now where the problem wasn’t present to see if there are significantly less drops visible. |
I see the discussion in the FreeBSD ticket has sadly gone fubar. I have reverted to the last known good kernel now. If anyone else has this issue:
|
If I give you two test kernels would you test them and tell me which one doesn't work as expected regarding this particular issue? |
Sure, but also how to install them, please. |
Ok first things first let me make this the active ticket to track it since you've been so helpful already. ❤️ |
I can also try out some test kernels, if it would be helpful. |
Thanks all! Here are two kernels to test. This is not a blackbox test by any means but I will just say for now that on the surface that at least one of these should work as expected. If you could tell which one works and which one doesn't for this particular issue that would be very helpful to locate the problematic code.
or
Cheers, |
The testing method is nice, since I can reproduce this on a standalone Proxmox VM instance without annoying my wife. With the standard kernel on 24.7.2, the behaviour was the same as reported (i.e. most NDs time out). Was that a double-blind study? ;-) |
This is at least an interesting turn of events although within the defined test parameters. I'll throw in two more kernels:
and
It's a treasure hunt! :) |
To double check we need this one anyway:
Happy hunting and thanks in advance. |
24.7.1_13 works |
Ok that would mean our winner as determined by @meyergru _14 is: https://cgit.freebsd.org/src/commit/?id=46755f5224 OpenBSD commit for reference openbsd/src@ef4bccd7509e Independent confirmation is appreciated. |
_11 and _12 work for me as well, for some reason I can't download the other ones:
|
@Crazyachmed not synced to leaseweb yet, try default mirror |
|
Uh oh, this has progressed even more than I imagined. Anyway, I just tested 24.7.1_11 and 24.7.1_12 and had interesting results.
So, some packet capures follow after my "analysis". I have grouped the request+response pairs with surrounding newlines for easier reading. with 24.7.1_11 kernel:
with 24.7.1_12 kernel:
|
Okay, now I tested with three more kernels, 24.7.1_13-15. With kernel 24.7.1_13
With kernel 24.7.1_14
With kernel 24.7.1_15
|
With everyone at home being very much p***ed off with the constant reboots, I'll make it short - mostly matches what @sjm42 posted anyway, Here's a completely unscientific
Will try to do something more extensive tomorrow. Now - 🍺 🍻 badly needed. |
I've checked my WAN connection after looking at sjm42's analysis, but my internet is GPON fibre + PPP, so all of that fun stuff is negotiated beforehand and exactly once. The only thing I see are Prefix (Router) Advertisements every ~17 minutes. I guess that also explains the limited impact on my setup. I currently run the 24.7.1_13 kernel, is that "okay" or should I revert to one of the other versions or 24.7? |
Well there's no need to revert if it's working for you. 😉 |
Use a kernel that works from the _1x series for now. That's the closes to 24.7.2 without the issue. |
@sjm42 thanks, your analysis in particular is interesting suggesting this gets increasingly worse with each commit, but as I understand not all commits affect IPv6 directly. _11 is without the SA commits. _12 is the first commit and so on... I'll look at the code again with this in mind. Note to self: _11 -> no SA |
Ok so Commit: ee7b012c5 Kernel to test:
Looks like Also I found this. 2012 greeting us. openbsd/src@2633ae8c4c8a64 And 2023? openbsd/src@49f39043a02d6 |
^^^ updated the previous comment, but bumping for awareness: test kernel inside! |
Funny that openbsd/src@49f39043a02d6 adds a |
I wonder what role this port omission plays in the grand scheme of things: Lines 1891 to 1896 in c61a3c2
Since Lines 7177 to 7191 in c61a3c2
|
testing:
Will update post when done edit: seems to be behaving - will keep monitoring |
opnsense-update -zkr 24.7.2-nd Looks promising traceroute from "net analalyzer" android app s22 looks ok to me. Also my ripe atlas probe is right back online. Right now not able to check nd sollicitant. |
@Staticznld traceroutes were supposedly fixed in 24.7.2 already but thanks for taking a peek so quickly! |
I thought traceroute from Windows was working fine. `Asus-PN50 (2a02:a450:xxxx:xxxx::2000) -> google.com (2a00:1450:400e:811::200e) 2024-08-24T07:27:49+0200
|
testing:
Very fast online after applying DHCPv6 on the WAN interface. This would take a while and lock up the interface on the previous kernels. Previous config for internal DHCPv6 & RA was active instantly as well. 15 minutes in and no loss to report. |
Well, for me 24.7.2-nd seems to behave well. I ran tcpdump for 20 minutes and it was all like this excerpt below.
|
Havn't had my coffee yet, but 24.7.2-nd looks good on my side as well. @fichtner: One note from when I have the issue: From time to time a single host would randomly behave for a couple of minutes. That smells very much like a state is created for some reason. |
Just chiming in here - Maybe it still allows carefully crafted packets from attackers to trigger extremely dangerous echo reply even when it should be blocked by |
No this is locked to ND_NEIGHBOR_SOLICIT/ND_NEIGHBOR_ADVERT |
Thanks to all chiming in BTW. We will keep this on top of the agenda regardless of how FreeBSD's stance is. ❤️ |
24.7.2-nd looking good for the synthetic test on a VM instance for me as well. Also applied it to my bare-metal box to verify traceroute and mtr and still looks good there, too. |
Hello, the 24.7.2-nd works perfectly. My network is almost ipv6-only and I have no more packet loss. Thank you very much. 😁 |
I have been troubleshooting an issue where my Comcast router responds with This patch is likely going to fix a lot of intermittent IPv6 issues. |
24.7.2-nd also works without issues (that I've noticed or seen) on my apu2. |
@aque @Slashic - I've borrowed your pictures for the upstream bug report, hopefully it makes it more clear how bad things may be for some people with these bad upstream patches applied. Things have not been nearly that bad for me in most of my setups, I'm probably lucky. |
I've added some more things there that have been on my mind for most of the weekend including the ping statistics which are a good indicator indeed. Let's focus on a shippable improvement in OPNsense for 24.7.3 tomorrow. The release will likely land later in this week so we still have time to test a bit more. :) Cheers, |
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
Describe the bug
Android devices drop WiFi connectivity every couple of minutes and reconnect immediately. A capture showed that opnsense does not reply to IPv6 neighbor solicitation for its link-local address right before the disconnections.
Using multiple wired Linux devices and the tool ndisc6 I can replicate the issue sending about one solicitation per second, failure rate is above 90%.
To Reproduce
Change IP to link-local address of firewall and change interface to the name of the interface on the sending station.
Expected behavior
Every neighbor solicitation should be answered, however a reasonably rate-limit may apply (much higher than here)
Describe alternatives you considered
I suspected my Proxmox, Switch or APs to be at fault, but for me the tcpdump on opnsense nails it down for me.
Relevant log files
Additional context
I think this behavior was introduced in the update to 24.7 or 24.7.1. No other config changes at the time except the update.
Environment
OPNsense 24.7.2-amd64 on Proxmox 8.2.4, Kernel 6.8.12-1-pve
Virtual NIC is virtio with 4 queues and MTU 9000
The text was updated successfully, but these errors were encountered: