Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pf+route-to: panic when shared forwarding is enabled #52

Closed
AdSchellevis opened this issue Jan 9, 2020 · 4 comments
Closed

pf+route-to: panic when shared forwarding is enabled #52

AdSchellevis opened this issue Jan 9, 2020 · 4 comments
Assignees
Labels
bug Production bug

Comments

@AdSchellevis
Copy link
Member

When shared forwarding is enabled and traffic is send to a non existing interface it can lead to erratic panics it seems.

Tested on both HardenedBSD 11.2 and 12.1, where the latter seems to crash sooner (easier to reproduce).

Steps below are executed on the 12.1 version.


Test setup

[client 1] ---> [igb0 ----OPNsense--- igb4] <---- [client 2]

client 1 has a couple of tcp sessions (ssh, https) to igb0, while client 2 keeps pinging igb4's address.

Relevant rules:

pass in quick on igb4 route-to ( wg0 192.168.2.1 ) inet from {any} to {!(igb4:network)} keep state 
pass in quick on igb0 inet from {any} to {any} keep state

Interface wg0 doesn't exist during test.

Result

After some time, the machine panics, although the messages are not always exactly alike, I've added some partial textdumps below:

Example from 11.2:

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 04
fault virtual address   = 0x8
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80cc19a0
stack pointer           = 0x28:0xfffffe0231e793b0
frame pointer           = 0x28:0xfffffe0231e793e0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq278: igb0:que 2)

Tracing pid 12 tid 100070 td 0xfffff800190b0000
sbcut_internal() at sbcut_internal+0x40/frame 0xfffffe0231e793e0
tcp_do_segment() at tcp_do_segment+0xfeb/frame 0xfffffe0231e794e0
tcp_input() at tcp_input+0xf5b/frame 0xfffffe0231e79640
ip_input() at ip_input+0x141/frame 0xfffffe0231e796a0
netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0231e796f0
ether_demux() at ether_demux+0x140/frame 0xfffffe0231e79720
ether_nh_input() at ether_nh_input+0x336/frame 0xfffffe0231e79780
netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0231e797d0
ether_input() at ether_input+0x26/frame 0xfffffe0231e797f0
igb_rxeof() at igb_rxeof+0x721/frame 0xfffffe0231e79890
igb_msix_que() at igb_msix_que+0x117/frame 0xfffffe0231e798e0
intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe0231e79920
ithread_loop() at ithread_loop+0xe7/frame 0xfffffe0231e79970
fork_exit() at fork_exit+0x83/frame 0xfffffe0231e799b0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0231e799b0

Example from 12.1:

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 04
fault virtual address   = 0x8
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80d70680
stack pointer           = 0x28:0xfffffe003faf3300
frame pointer           = 0x28:0xfffffe003faf3330
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (if_io_tqg_2)
trap number             = 12
panic: page fault

db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe003faf2fb0
vpanic() at vpanic+0x1a2/frame 0xfffffe003faf3000
panic() at panic+0x43/frame 0xfffffe003faf3060
trap_fatal() at trap_fatal+0x39c/frame 0xfffffe003faf30c0
trap_pfault() at trap_pfault+0x49/frame 0xfffffe003faf3120
trap() at trap+0x29f/frame 0xfffffe003faf3230
calltrap() at calltrap+0x8/frame 0xfffffe003faf3230
--- trap 0xc, rip = 0xffffffff80d70680, rsp = 0xfffffe003faf3300, rbp = 0xfffffe003faf3330 ---
sbcut_internal() at sbcut_internal+0x40/frame 0xfffffe003faf3330
tcp_do_segment() at tcp_do_segment+0x22d5/frame 0xfffffe003faf3420
tcp_input() at tcp_input+0xdf6/frame 0xfffffe003faf35a0
ip_input() at ip_input+0x175/frame 0xfffffe003faf3650
netisr_dispatch_src() at netisr_dispatch_src+0xcf/frame 0xfffffe003faf36a0
ether_demux() at ether_demux+0x139/frame 0xfffffe003faf36d0
ether_nh_input() at ether_nh_input+0x346/frame 0xfffffe003faf3730
netisr_dispatch_src() at netisr_dispatch_src+0xcf/frame 0xfffffe003faf3780
ether_input() at ether_input+0x4b/frame 0xfffffe003faf37b0
iflib_rxeof() at iflib_rxeof+0xaae/frame 0xfffffe003faf38a0
_task_fn_rx() at _task_fn_rx+0x75/frame 0xfffffe003faf38e0
gtaskqueue_run_locked() at gtaskqueue_run_locked+0x144/frame 0xfffffe003faf3940
gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0x98/frame 0xfffffe003faf3970
fork_exit() at fork_exit+0x83/frame 0xfffffe003faf39b0

Changing the route-to to a non existing address on an existing interface seems to be stable, which is expected.

@AdSchellevis
Copy link
Member Author

AdSchellevis commented Feb 17, 2020

Just for my own reference, so I won't forget.

pf_route_shared() should probably drop the packet when not a being send to a valid target next hop, pf_route() seems to be doing that in pf_test() if I'm understanding the flow correctly

src/sys/netpfil/pf/pf.c

Lines 5585 to 5587 in a7a4a06

if (oifp != ifp) {
if (pf_test(PF_OUT, 0, ifp, &m0, inp) != PF_PASS)
goto bad;

Since we don't want to send the packets back out directly, we likely only want to test if it would be possible to send it out on approx the same spot.

@fichtner
Copy link
Member

pf_test() is the dangerous fast path that evaluates outgoing direction on the spot and if we call it we cannot undo it. The goal of shared forwarding is to land in outgoing pf_test() through the proper channels. Besides this != PF_PASS is first and foremost rule evaluation, not specifically sanity checking for higher level TCP input behaviour: here we are receiving an incoming packet that is supposed to hit the local network stack which shouldn't be routed in this direction.

Maybe a proper debug trace with line numbers would help prevent this?

I'm still not a fan of WireGuard's tendency to break the network stack in general terms. :)

@AdSchellevis
Copy link
Member Author

I'm not exactly sure where we should fix this, but the problem is introduced on the inbound package via pf if I'm not mistaken. Since the normal function discards it on the outbound path, it apparently doesn't matter if the interface doesn't exist. I haven't looked at ipfw by the way, not sure if something similar exists there.

It's not per-se an issue with wireguard, sending traffic to an interface that doesn't exist will likely result in this issue when using shared forwarding.

There's no rush here, just wanted to drop some notes while reading through the code, if a proper core dump would be better, I can probably generate that later, although I rather wait for 12.1 to dig into this deeper, since it tends to crash faster it seems.

AdSchellevis added a commit that referenced this issue May 12, 2020
when ifp is null packets are marked bad, which means *m points to a freed address. reset *m and return in that case.
AdSchellevis added a commit that referenced this issue May 12, 2020
when ifp is null packets are marked bad, which means *m points to a freed address. reset *m and return in that case.
@AdSchellevis AdSchellevis self-assigned this May 12, 2020
AdSchellevis added a commit that referenced this issue May 12, 2020
when ifp is null packets are marked bad, which means *m points to a freed address. reset *m and return in that case.
fichtner pushed a commit that referenced this issue May 12, 2020
when ifp is null packets are marked bad, which means *m points to a freed address. reset *m and return in that case.
@AdSchellevis
Copy link
Member Author

fixed in #59 fixes the issue

fichtner pushed a commit that referenced this issue May 12, 2020
when ifp is null packets are marked bad, which means *m points to a freed address. reset *m and return in that case.
lattera pushed a commit to BlackhawkNest/opnsense-src that referenced this issue Oct 30, 2020
…se#52)

when ifp is null packets are marked bad, which means *m points to a freed address. reset *m and return in that case.

(cherry picked from commit 923c95c)
Signed-off-by: Shawn Webb <swebb@blackhawknest.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Production bug
Development

No branches or pull requests

2 participants