Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generic_rx_handler panic with netmap+realtek driver #1481

Closed
oparoz opened this issue Mar 18, 2017 · 20 comments
Closed

generic_rx_handler panic with netmap+realtek driver #1481

oparoz opened this issue Mar 18, 2017 · 20 comments
Assignees
Labels
upstream Third party issue
Milestone

Comments

@oparoz
Copy link
Contributor

oparoz commented Mar 18, 2017

Suricata crashes hard when using the emulated netmap mode, so admins should not be allowed to turn DPI on when using such drivers, unless they are offered a choice between FreeBSD and Realtek drivers.

See: https://redmine.openinfosecfoundation.org/issues/1688

@AdSchellevis
Copy link
Member

I don't think we can easily distinct between them, the problem is that if we do this we could end up adding way more magic then we like to add (we're not going to try to learn our system all different drivers that exist in the state their in).
If there's a very easy and predictive way to determine the state of the driver, we could consider probing, but given our earlier experiences I don't expect there will be a good fix here, other then fixing the actual problem (which is upstream).

@fichtner
Copy link
Member

Did you check that we have this issue in OPNsense?

Netmap emulation broken on FreeBSD 11.0, not in OPNsense 17.1.

Official Realtek driver instead of FreeBSD driver since 17.1.2.

No issues with Suricata in IPS mode, definitely no netmap crashes.

@oparoz
Copy link
Contributor Author

oparoz commented Mar 18, 2017

(we're not going to try to learn our system all different drivers that exist in the state their in).

Makes sense (:+1:)

Did you check that we have this issue in OPNsense?

That's how I discovered that the issue had not been solved upstream.
I upgraded to 17.1.2, rebooted and got the kernel panics as soon as the router tried to do too much on the Internet.

Official Realtek driver instead of FreeBSD driver since 17.1.2.

Those driver are not patched with netmap support, but Suricata is forcing netmap mode nonetheless which creates the kernel panics.

Tested on Zotac ci323 with 8168 NIC

@fichtner
Copy link
Member

Well, the emulation mode has been run successfully on vmxnet3 on 17.1, 16.7 used emulation mode for the e1000 driver. There is not much wrong with emulation mode per se.

We'll need the actual kernel panic to start with.

@oparoz
Copy link
Contributor Author

oparoz commented Mar 18, 2017

Well, the emulation mode has been run successfully on vmxnet3 on 17.1, 16.7 used emulation mode for the e1000 driver. There is not much wrong with emulation mode per se.

Indeed, it works fine when using software which tries to use it. Suricata is not trying to use it. Instead it forces hardware mode because it doesn't recognise that the driver does not have a hardware mode.

We'll need the actual kernel panic to start with.

It's the same as the one found here:
https://redmine.openinfosecfoundation.org/issues/1688

@fichtner
Copy link
Member

I need the 17.1.2 panic, the 16.1.x panic is not ging to help at all. The realtek driver is also newer than the one used there because it came out in Feburary 2017.

Several users on the forum use Zotac, maybe we need to poll for people using theirs as IPS.

Software cannot force netmap hardware mode. It's on or off.

@oparoz
Copy link
Contributor Author

oparoz commented Mar 18, 2017

I need the 17.1.2 panic, the 16.1.x panic is not ging to help at all

Unfortunately, /var/crash is empty

Software cannot force netmap hardware mode. It's on or off.

Using the tools provided by netmap work fine in software mode. No crash. It's just Suricata which does the wrong thing.
Suricata is using something which sends hardware netmap commands instead of software ones. That's all I know.

Several users on the forum use Zotac, maybe we need to poll for people using theirs as IPS.

Good idea!

@oparoz
Copy link
Contributor Author

oparoz commented Mar 18, 2017

That's the only trace I have

20170318_151902

@L1ghtn1ng
Copy link

@inliniac what are your thoughts on this?

@fichtner
Copy link
Member

luigirizzo/netmap#189

@fichtner
Copy link
Member

I have a newer test kernel for these latest netmap changes if you are interested. the bug wasn't fixed before June 2016 so it's not in FreeBSD 11.0.

I would really appreciate for others to do this research first before reporting an issue to present compiled evidence, not just force others into assuming the worst.

Thanks,
Franco

@oparoz
Copy link
Contributor Author

oparoz commented Mar 18, 2017

I'm definitely interested in testing the fix, yes.

# opnsense-update -bkr 17.1.3-next ?

@fichtner
Copy link
Member

I need to rebuild this and will upload as 17.1.3-netmap, but must wait till tomorrow

@fichtner fichtner self-assigned this Mar 18, 2017
@fichtner fichtner added bug Production bug upstream Third party issue labels Mar 18, 2017
@fichtner fichtner added this to the 17.7 milestone Mar 18, 2017
@oparoz
Copy link
Contributor Author

oparoz commented Mar 18, 2017

OK, many thanks @fichtner

@fichtner fichtner changed the title Do not turn on DPI when using re driver generic_rx_handler panic with netmap+realtek driver Mar 18, 2017
@fichtner
Copy link
Member

@oparoz The netmap code is from the official git repository, I think around early January 2017:

# opnsense-update -kr 17.1.3-netmap
# /usr/local/etc/rc.reboot

Cheers,
Franco

@oparoz
Copy link
Contributor Author

oparoz commented Mar 19, 2017

No dice.

356.481189 [1244] generic_netmap_attach     Created generic NA 0xfffff80007a47400 (prev 0)
356.481240 [1172] generic_netmap_dtor       Restored native NA 0
re0: permanently promiscuous mode enabled
356.506058 [1244] generic_netmap_attach     Created generic NA 0xfffff8000d130800 (prev 0)
356.506105 [1172] generic_netmap_dtor       Restored native NA 0
356.506156 [1244] generic_netmap_attach     Created generic NA 0xfffff8000d130800 (prev 0)
356.715517 [ 442] generic_netmap_register   Generic adapter 0xfffff8000d130800 goes on
356.715579 [ 487] generic_netmap_register   RX ring 0 of generic adapter 0xfffff8000d130800 goes on
356.715618 [ 494] generic_netmap_register   TX ring 0 of generic adapter 0xfffff8000d130800 goes on
356.715776 [ 487] generic_netmap_register   RX ring 1 of generic adapter 0xfffff8000d130800 goes on
356.715816 [ 494] generic_netmap_register   TX ring 1 of generic adapter 0xfffff8000d130800 goes on


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 04
fault virtual address	= 0xc
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff8073963d
stack pointer	        = 0x28:0xfffffe0231c091c0
frame pointer	        = 0x28:0xfffffe0231c09210
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 91675 (W#01-re0)

FreeBSD host.localdomain 11.0-RELEASE-p8 FreeBSD 11.0-RELEASE-p8 #0 582844267(stable/17.1): Sun Mar 19 09:04:32 CET 2017 root@sensey64:/usr/obj/usr/src/sys/SMP amd64

@fichtner
Copy link
Member

Alright, but I really really need the backtrace to open a bug report.

@oparoz
Copy link
Contributor Author

oparoz commented Mar 19, 2017

I have this:

# ls -altrs /var/crash
total 444948
     4 drwxr-xr-x  27 root  wheel        512 Mar 11 04:53 ..
     4 -rw-r--r--   1 root  wheel          5 Mar 11 04:54 minfree
     4 -rw-r--r--   1 root  wheel          2 Mar 19 15:35 bounds
     4 -rw-------   1 root  wheel        449 Mar 19 15:35 info.0
444928 -rw-------   1 root  wheel  838430720 Mar 19 15:35 vmcore.0
     0 lrwxr-xr-x   1 root  wheel          6 Mar 19 15:35 info.last -> info.0
     0 lrwxr-xr-x   1 root  wheel          8 Mar 19 15:35 vmcore.last -> vmcore.0
     4 drwxr-x---   2 root  wheel        512 Mar 19 15:35 .

@fichtner
Copy link
Member

No progress here and no good reason to start debugging netmap code now when upstream won't take it anyway. ;(

@oparoz
Copy link
Contributor Author

oparoz commented Jul 21, 2017

OK, thanks for the update @fichtner.

@fichtner fichtner removed the bug Production bug label May 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream Third party issue
Development

No branches or pull requests

4 participants