Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOR with panic [mutex nm_kn_lock not owned] on FreeBSD INVARIANTS/WITNESS enabled kernels #318

Open
gh-ix opened this issue Jun 14, 2017 · 11 comments
Labels

Comments

@gh-ix
Copy link

gh-ix commented Jun 14, 2017

Unfortunately in my last report there were two issues /stack traces) overlapping, masking the more severe problem causing this panic on stable/11 with netmap code merged from HEAD, including latest freebsd "spin on trylock in netmap_mem2_ofstophys()" fix:

panic: mutex nm_kn_lock not owned at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:2169
cpuid = 7
KDB: stack backtrace:
#0 0xffffffff805ca567 at kdb_backtrace+0x67
#1 0xffffffff8058a4a6 at vpanic+0x186
#2 0xffffffff8058a523 at panic+0x43
#3 0xffffffff8056b5c4 at __mtx_assert+0xb4
#4 0xffffffff805447e0 at knlist_add+0x20
#5 0xffffffff8041ead0 at netmap_kqfilter+0x110
#6 0xffffffff80465807 at devfs_kqfilter_f+0x77
#7 0xffffffff80542a6b at kqueue_register+0x78b
#8 0xffffffff80543492 at kqueue_kevent+0x92
#9 0xffffffff80543396 at kern_kevent_fp+0x96
#10 0xffffffff805432af at kern_kevent+0x9f
#11 0xffffffff805430b8 at sys_kevent+0x138
#12 0xffffffff80880eda at amd64_syscall+0x57a
#13 0xffffffff808649bb at Xfast_syscall+0xfb
Uptime: 4m24s
#0 doadump (textdump=) at pcpu.h:222
#1 0xffffffff80589f20 in kern_reboot (howto=260) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:366
#2 0xffffffff8058a4e0 in vpanic (fmt=, ap=)
at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:759
#3 0xffffffff8058a523 in panic (fmt=) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:690
#4 0xffffffff8056b5c4 in __mtx_assert (c=0x0, what=0, file=0x0, line=0) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_mutex.c:1000
#5 0xffffffff805447e0 in knlist_add (knl=0xfffffe0013c7f450, kn=0xfffff8026b8e5e80, islocked=1)
at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:2089
#6 0xffffffff8041ead0 in netmap_kqfilter (dev=, kn=0xfffff8026b8e5e80)
at /usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap_freebsd.c:1354
#7 0xffffffff80465807 in devfs_kqfilter_f (fp=0xfffff8003f6f6190, kn=0xfffff8026b8e5e80)
at /usr/local/share/deploy-tools/RELENG_11/src/sys/fs/devfs/devfs_vnops.c:837
#8 0xffffffff80542a6b in kqueue_register (kq=0xfffff8003273e500, kev=0xfffffe0446ccd650, td=0xfffff8009e127000, waitok=)
at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:1334
#9 0xffffffff80543492 in kqueue_kevent (kq=0xfffff8003273e500, td=0xfffff8009e127000, nchanges=4, nevents=, k_ops=0xfffffe0446ccd8a0,
timeout=) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:1019
#10 0xffffffff80543396 in kern_kevent_fp (td=0xfffff8009e127000, fp=, nchanges=4, nevents=,
k_ops=, timeout=) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:1050
#11 0xffffffff805432af in kern_kevent (td=0xfffff8009e127000, fd=6, nchanges=4, nevents=0, k_ops=0xfffffe0446ccd8a0, timeout=0x0)
at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:993
#12 0xffffffff805430b8 in sys_kevent (td=0xfffff8009e127000, uap=0xfffffe0446ccda30) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:925
#13 0xffffffff80880eda in amd64_syscall (td=0xfffff8009e127000, traced=0) at subr_syscall.c:135
#14 0xffffffff808649bb in Xfast_syscall () at /usr/local/share/deploy-tools/RELENG_11/src/sys/amd64/amd64/exception.S:396
#15 0x000000080122813a in ?? ()

Thanks, let me knwo if there's more info needed.

-harry

@vmaffione
Copy link
Collaborator

It seems you are using kevent(). Which program are you using to trigger this?

@gh-ix
Copy link
Author

gh-ix commented Jun 14, 2017 via email

@vmaffione
Copy link
Collaborator

vmaffione commented Jun 15, 2017

What does it happen with this change?

diff --git a/sys/dev/netmap/netmap_freebsd.c b/sys/dev/netmap/netmap_freebsd.c
index 6d0453d3..472dc8e1 100644
--- a/sys/dev/netmap/netmap_freebsd.c
+++ b/sys/dev/netmap/netmap_freebsd.c
@@ -1374,7 +1374,7 @@ netmap_kqfilter(struct cdev *dev, struct knote *kn)
        kn->kn_fop = (ev == EVFILT_WRITE) ?
                &netmap_wfiltops : &netmap_rfiltops;
        kn->kn_hook = priv;
-       knlist_add(&si->si.si_note, kn, 1);
+       knlist_add(&si->si.si_note, kn, 0);
        // XXX unlock(priv)
        ND("register %p %s td %p priv %p kn %p np_nifp %p kn_fp/fpop %s",
                na, na->ifp->if_xname, curthread, priv, kn,

@gh-ix
Copy link
Author

gh-ix commented Jun 15, 2017 via email

@vmaffione
Copy link
Collaborator

vmaffione commented Jun 15, 2017

mmm, at this point I think we need help from some FreeBSD developer that knows about how the kevent subsystem is supposed to work, unfortunately I don't have enough knowledge on that.

The only thing you can try is

diff --git a/sys/dev/netmap/netmap_freebsd.c b/sys/dev/netmap/netmap_freebsd.c
index 4a684e01372..03ae468a5b6 100644
--- a/sys/dev/netmap/netmap_freebsd.c
+++ b/sys/dev/netmap/netmap_freebsd.c
@@ -1256,7 +1256,7 @@ nm_os_selwakeup(struct nm_selinfo *si)
        /* use a non-zero hint to tell the notification from the
         * call done in kqueue_scan() which uses 0
         */
-       KNOTE_UNLOCKED(&si->si.si_note, 0x100 /* notification */);
+       KNOTE_LOCKED(&si->si.si_note, 0x100 /* notification */);
 }
 
 void
@@ -1374,7 +1374,7 @@ netmap_kqfilter(struct cdev *dev, struct knote *kn)
        kn->kn_fop = (ev == EVFILT_WRITE) ?
                &netmap_wfiltops : &netmap_rfiltops;
        kn->kn_hook = priv;
-       knlist_add(&si->si.si_note, kn, 1);
+       knlist_add(&si->si.si_note, kn, 0);
        // XXX unlock(priv)
        ND("register %p %s td %p priv %p kn %p np_nifp %p kn_fp/fpop %s",
                na, na->ifp->if_xname, curthread, priv, kn,

that is trying to do the same that sys/net/if_tap.c does.
But it's really a blind attempt, unfortunately I won't have the chance to look for a real solution anytime soon.

I also suggest you paying attention at mixing code between FreeBSD versions, you can easily get into trouble. I would say either you start from HEAD code (as it is) or start from 11/stable and replace netmap code with github one. Then of course you can apply the small patches for vlan etc.

@gh-ix
Copy link
Author

gh-ix commented Jun 15, 2017 via email

@vmaffione
Copy link
Collaborator

Good! Actually netmap is already used in production (specially on Linux), as most of its features are in good shape.

FreeBSD-specific kevent support unfortunately is an exception, and still needs some work. Some FreeBSD developers fixed it some time ago, so I'm surprised it does not work properly anymore. We should ask them if they can help to fix it.

I agree that starting from 11/stable is a good idea to avoid the instability introduced by iflib overhaul.

@gh-ix
Copy link
Author

gh-ix commented Jun 16, 2017 via email

@vmaffione
Copy link
Collaborator

is this issue still there?

@vmaffione vmaffione added the bug label Sep 17, 2017
@gh-ix
Copy link
Author

gh-ix commented Sep 17, 2017

I don't run a netmap(4) environment with debugging kernel - cuerently no debugging kernel anywhere, so I can't falsify easily.
But I'm still building kernels with this patch:

> --- src/sys/dev/netmap/netmap_freebsd.c.orig    2017-06-16 19:48:55.760647000 +0200
> +++ src/sys/dev/netmap/netmap_freebsd.c 2017-06-17 11:22:53.685116000 +0200
> @@ -1374,7 +1374,7 @@
>         kn->kn_fop = (ev == EVFILT_WRITE) ?
>                 &netmap_wfiltops : &netmap_rfiltops;
>         kn->kn_hook = priv;
> -       knlist_add(&si->si.si_note, kn, 1);
> +       knlist_add(&si->si.si_note, kn, 0);
>         // XXX unlock(priv)
>         ND("register %p %s td %p priv %p kn %p np_nifp %p kn_fp/fpop %s",
>                 na, na->ifp->if_xname, curthread, priv, kn,

I guess as soon as I'm including WITNESS and INVARIANTS in the kernel options, I'll see the panic again, since I haven't noticed any changes in that area.
I'll try to find some time to falsify.

Thanks,

-harry

P.S.: Sorry, couldn't find out how to stop github mutilating the diff above...

@vmaffione
Copy link
Collaborator

Thanks, I just wanted to know whether the issue was still open. I fixed the diff layout for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants