Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOR with panic [mutex nm_kn_lock not owned] on FreeBSD INVARIANTS/WITNESS enabled kernels #314

Closed
gh-ix opened this issue Jun 8, 2017 · 5 comments

Comments

@gh-ix
Copy link

gh-ix commented Jun 8, 2017

Since I'm only netmap user (no developer), I can just report this issue here, no patch unfortunately.
Please see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219846 for the original report, referencing netmap code base from FreeBSD HEAD (which should be very close to the one here on GH), merged to stable/11.

With WITNESS/INVARIANTS options, there's this panic:

lock order reversal: (sleepable after non-sleepable)
 1st 0xfffff8007519a960 vm object (vm object) @
/usr/local/share/deploy-tools/RELENG_11/src/sys/vm/vm_fault.c:572
 2nd 0xfffff8003299b000 (d)->nm_mtx ((d)->nm_mtx) @
/usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap_mem2.c:577
stack backtrace:
#0 0xffffffff805e7900 at witness_debugger+0x70
#1 0xffffffff805e77f3 at witness_checkorder+0xe23
#2 0xffffffff80591f4e at _sx_xlock+0x5e
#3 0xffffffff8042275a at netmap_mem2_ofstophys+0x2a
#4 0xffffffff8041f3ab at netmap_dev_pager_fault+0x3b
#5 0xffffffff8082d834 at dev_pager_getpages+0x74
#6 0xffffffff80856d2a at vm_pager_get_pages+0x4a
#7 0xffffffff8083aa92 at vm_fault_hold+0xa52
#8 0xffffffff80839ff5 at vm_fault+0x75
#9 0xffffffff8088048f at trap_pfault+0xff
#10 0xffffffff8087fc38 at trap+0x348
#11 0xffffffff808645d1 at calltrap+0x8

KDB: stack backtrace:
#0 0xffffffff805ca4b7 at kdb_backtrace+0x67
#1 0xffffffff8058a3f6 at vpanic+0x186
#2 0xffffffff8058a473 at panic+0x43
#3 0xffffffff8056b564 at __mtx_assert+0xb4
#4 0xffffffff80544780 at knlist_add+0x20
#5 0xffffffff8041ead0 at netmap_kqfilter+0x110
#6 0xffffffff804657f7 at devfs_kqfilter_f+0x77
#7 0xffffffff80542a0b at kqueue_register+0x78b
#8 0xffffffff80543432 at kqueue_kevent+0x92
#9 0xffffffff80543336 at kern_kevent_fp+0x96
#10 0xffffffff8054324f at kern_kevent+0x9f
#11 0xffffffff80543058 at sys_kevent+0x138
#12 0xffffffff80880dda at amd64_syscall+0x57a
#13 0xffffffff808648bb at Xfast_syscall+0xfb

#0  doadump (textdump=<value optimized out>) at pcpu.h:222
#1  0xffffffff80589e70 in kern_reboot (howto=260) at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff8058a430 in vpanic (fmt=<value optimized out>, ap=<value
optimized out>)
    at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff8058a473 in panic (fmt=<value optimized out>) at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:690
#4  0xffffffff8056b564 in __mtx_assert (c=0x0, what=0, file=0x0, line=0)
at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_mutex.c:1000
#5  0xffffffff80544780 in knlist_add (knl=0xfffffe000a055450,
kn=0xfffff8026d097e80, islocked=1)
    at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:2089
#6  0xffffffff8041ead0 in netmap_kqfilter (dev=<value optimized out>,
kn=0xfffff8026d097e80)
    at
/usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap_freebsd.c:1354
#7  0xffffffff804657f7 in devfs_kqfilter_f (fp=0xfffff8001faf8780,
kn=0xfffff8026d097e80)
    at
/usr/local/share/deploy-tools/RELENG_11/src/sys/fs/devfs/devfs_vnops.c:837
#8  0xffffffff80542a0b in kqueue_register (kq=0xfffff8001a65c000,
kev=0xfffffe045b4dc650, td=0xfffff8014ca71000, waitok=<value optimized out>)
    at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:1334
#9  0xffffffff80543432 in kqueue_kevent (kq=0xfffff8001a65c000,
td=0xfffff8014ca71000, nchanges=4, nevents=<value optimized out>,
k_ops=0xfffffe045b4dc8a0,
    timeout=<value optimized out>) at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:1019
#10 0xffffffff80543336 in kern_kevent_fp (td=0xfffff8014ca71000,
fp=<value optimized out>, nchanges=4, nevents=<value optimized out>,
    k_ops=<value optimized out>, timeout=<value optimized out>) at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:1050
#11 0xffffffff8054324f in kern_kevent (td=0xfffff8014ca71000, fd=7,
nchanges=4, nevents=0, k_ops=0xfffffe045b4dc8a0, timeout=0x0)
    at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:993
#12 0xffffffff80543058 in sys_kevent (td=0xfffff8014ca71000,
uap=0xfffffe045b4dca30) at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:925
#13 0xffffffff80880dda in amd64_syscall (td=0xfffff8014ca71000,
traced=0) at subr_syscall.c:135
#14 0xffffffff808648bb in Xfast_syscall () at
/usr/local/share/deploy-tools/RELENG_11/src/sys/amd64/amd64/exception.S:396
#15 0x000000080122813a in ?? ()

-harry

@vmaffione
Copy link
Collaborator

Hi,
I did a fix, can you retry with the latest code?

@gh-ix
Copy link
Author

gh-ix commented Jun 12, 2017

Thanks a lot, I tried your diff by applying manually (don't know how I can download it via GH).
Unfortunately the panic still happens:

panic: mutex nm_kn_lock not owned at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:2169
cpuid = 7
KDB: stack backtrace:
#0 0xffffffff805ca567 at kdb_backtrace+0x67
#1 0xffffffff8058a4a6 at vpanic+0x186
#2 0xffffffff8058a523 at panic+0x43
#3 0xffffffff8056b5c4 at __mtx_assert+0xb4
#4 0xffffffff805447e0 at knlist_add+0x20
#5 0xffffffff8041ead0 at netmap_kqfilter+0x110
#6 0xffffffff80465807 at devfs_kqfilter_f+0x77
#7 0xffffffff80542a6b at kqueue_register+0x78b
#8 0xffffffff80543492 at kqueue_kevent+0x92
#9 0xffffffff80543396 at kern_kevent_fp+0x96
#10 0xffffffff805432af at kern_kevent+0x9f
#11 0xffffffff805430b8 at sys_kevent+0x138
#12 0xffffffff80880eda at amd64_syscall+0x57a
#13 0xffffffff808649bb at Xfast_syscall+0xfb
Uptime: 4m24s
#0  doadump (textdump=<value optimized out>) at pcpu.h:222
#1  0xffffffff80589f20 in kern_reboot (howto=260) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff8058a4e0 in vpanic (fmt=<value optimized out>, ap=<value optimized out>)
    at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:759
#3  0xffffffff8058a523 in panic (fmt=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:690
#4  0xffffffff8056b5c4 in __mtx_assert (c=0x0, what=0, file=0x0, line=0) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_mutex.c:1000
#5  0xffffffff805447e0 in knlist_add (knl=0xfffffe0013c7f450, kn=0xfffff8026b8e5e80, islocked=1)
    at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:2089
#6  0xffffffff8041ead0 in netmap_kqfilter (dev=<value optimized out>, kn=0xfffff8026b8e5e80)
    at /usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap_freebsd.c:1354
#7  0xffffffff80465807 in devfs_kqfilter_f (fp=0xfffff8003f6f6190, kn=0xfffff8026b8e5e80)
    at /usr/local/share/deploy-tools/RELENG_11/src/sys/fs/devfs/devfs_vnops.c:837
#8  0xffffffff80542a6b in kqueue_register (kq=0xfffff8003273e500, kev=0xfffffe0446ccd650, td=0xfffff8009e127000, waitok=<value optimized out>)
    at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:1334
#9  0xffffffff80543492 in kqueue_kevent (kq=0xfffff8003273e500, td=0xfffff8009e127000, nchanges=4, nevents=<value optimized out>, k_ops=0xfffffe0446ccd8a0, 
    timeout=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:1019
#10 0xffffffff80543396 in kern_kevent_fp (td=0xfffff8009e127000, fp=<value optimized out>, nchanges=4, nevents=<value optimized out>, 
    k_ops=<value optimized out>, timeout=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:1050
#11 0xffffffff805432af in kern_kevent (td=0xfffff8009e127000, fd=6, nchanges=4, nevents=0, k_ops=0xfffffe0446ccd8a0, timeout=0x0)
    at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:993
#12 0xffffffff805430b8 in sys_kevent (td=0xfffff8009e127000, uap=0xfffffe0446ccda30) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:925
#13 0xffffffff80880eda in amd64_syscall (td=0xfffff8009e127000, traced=0) at subr_syscall.c:135
#14 0xffffffff808649bb in Xfast_syscall () at /usr/local/share/deploy-tools/RELENG_11/src/sys/amd64/amd64/exception.S:396
#15 0x000000080122813a in ?? ()

Please note that I'm running stable/11 (11.1-Beta1 atm.), with netmap code from HEAD!

To double-check. here's the actual diff I applied:

--- src/sys/dev/netmap/netmap_kern.h.orig    2017-06-12 09:26:44.449586000 +0200
+++ src/sys/dev/netmap/netmap_kern.h        2017-06-12 20:07:54.760885000 +0200
@@ -85,6 +85,7 @@
 #define NM_MTX_INIT(m)         sx_init(&(m), #m)
 #define NM_MTX_DESTROY(m)      sx_destroy(&(m))
 #define NM_MTX_LOCK(m)         sx_xlock(&(m))
+#define NM_MTX_SPINLOCK(m) while (!sx_try_xlock(&(m))) ;
 #define NM_MTX_UNLOCK(m)       sx_xunlock(&(m))
 #define NM_MTX_ASSERT(m)       sx_assert(&(m), SA_XLOCKED)
 
--- src/sys/dev/netmap/netmap_mem2.c.orig    2017-06-12 09:26:44.451576000 +0200
+++ src/sys/dev/netmap/netmap_mem2.c       2017-06-12 20:06:24.136327000 +0200
@@ -226,6 +226,7 @@
 #define NMA_LOCK_INIT(n)       NM_MTX_INIT((n)->nm_mtx)
 #define NMA_LOCK_DESTROY(n)    NM_MTX_DESTROY((n)->nm_mtx)
 #define NMA_LOCK(n)            NM_MTX_LOCK((n)->nm_mtx)
+#define NMA_SPINLOCK(n)                NM_MTX_SPINLOCK((n)->nm_mtx)
 #define NMA_UNLOCK(n)          NM_MTX_UNLOCK((n)->nm_mtx)
 
 #ifdef NM_DEBUG_MEM_PUTGET
@@ -574,7 +575,14 @@
        vm_paddr_t pa;
        struct netmap_obj_pool *p;
 
+#if defined(__FreeBSD__)
+       /* This function is called by netmap_dev_pager_fault(), which holds a
+        * non-sleepable lock since FreeBSD 12. Since we cannot sleep, we
+        * spin on the trylock. */
+       NMA_SPINLOCK(nmd);
+#else
        NMA_LOCK(nmd);
+#endif
        p = nmd->pools;
 
        for (i = 0; i < NETMAP_POOLS_NR; offset -= p[i].memtotal, i++) {

Thanks,

-harry

@gh-ix
Copy link
Author

gh-ix commented Jun 13, 2017 via email

@vmaffione
Copy link
Collaborator

The patch is ok and it fixes the first panic you reported.

To download github code you can just

$ git clone https://github.com/luigirizzo/netmap.git

and a netmap directory will show up.

You get a panic again, but this is a different one. What is needed to reproduce it? Can you report it on a separate github issue?

@gh
Copy link

gh commented Jun 13, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants