-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LOR with panic [mutex nm_kn_lock not owned] on FreeBSD INVARIANTS/WITNESS enabled kernels #318
Comments
|
It seems you are using kevent(). Which program are you using to trigger this? |
|
Bezüglich Vincenzo Maffione's Nachricht vom 14.06.2017 19:48 (localtime):
It seems you are using kevent(). Which program are you using to trigger
this?
bhyve(8).
After booting: vale-ctl -h vale232:vlan232
Then, bhyve -.... -s N,virtio-net,vale232:guest ...
With debugging kernel, this leads to the reported panic.
Without debugging (options WITNESS/INVARIANTS) but else exactly the same
code base, the guest comes up and - thanks tou your recent if_vlan
patch, everything works like expected.
Thanks,
…-harry
|
|
What does it happen with this change? |
|
Bezüglich Vincenzo Maffione's Nachricht vom 15.06.2017 11:58 (localtime):
What does it happen with this change?
|diff --git a/sys/dev/netmap/netmap_freebsd.c b/sys/dev/netmap/netmap_freebsd.c
index 6d0453d3..472dc8e1 100644
--- a/sys/dev/netmap/netmap_freebsd.c
+++ b/sys/dev/netmap/netmap_freebsd.c
@@ -1374,7 +1374,7 @@ netmap_kqfilter(struct cdev *dev, struct knote *kn)
kn->kn_fop = (ev == EVFILT_WRITE) ?
&netmap_wfiltops : &netmap_rfiltops;
kn->kn_hook = priv;
- knlist_add(&si->si.si_note, kn, 1);
+ knlist_add(&si->si.si_note, kn, 0);
// XXX unlock(priv)
ND("register %p %s td %p priv %p kn %p np_nifp %p kn_fp/fpop %s",
na, na->ifp->if_xname, curthread, priv, kn,
Had to merge r319881 (recent commit to HEAD, updating netmap code) to
stable/11 first and falsified the panic positive with that code base
before I tried the change.
Panic is now different (also happening at different point during guest
boot):
````
panic: mutex nm_kn_lock owned at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:2176
cpuid = 6
KDB: stack backtrace:
#0 0xffffffff805cb1a7 at kdb_backtrace+0x67
#1 0xffffffff8058b0e6 at vpanic+0x186
#2 0xffffffff8058b163 at panic+0x43
#3 0xffffffff8056c216 at __mtx_assert+0xc6
now:
#4 0xffffffff80545219 at knote+0x39
#5 0xffffffff8041ebc7 at nm_os_selwakeup+0x87
#6 0xffffffff8041c94d at netmap_notify+0x1d
#7 0xffffffff8041c701 at netmap_poll+0x821
#8 0xffffffff8041f63c at netmap_knrw+0x6c
#9 0xffffffff805443d7 at kqueue_kevent+0x397
#10 0xffffffff80543fd6 at kern_kevent_fp+0x96
#11 0xffffffff80543eef at kern_kevent+0x9f
#12 0xffffffff80543cf8 at sys_kevent+0x138
#13 0xffffffff80881b5a at amd64_syscall+0x57a
#14 0xffffffff8086563b at Xfast_syscall+0xfb
#0 doadump (textdump=<value optimized out>) at pcpu.h:222
#1 0xffffffff8058ab60 in kern_reboot (howto=260) at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:366
#2 0xffffffff8058b120 in vpanic (fmt=<value optimized out>, ap=<value
optimized out>)
at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:759
#3 0xffffffff8058b163 in panic (fmt=<value optimized out>) at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:690
#4 0xffffffff8056c216 in __mtx_assert (c=<value optimized out>,
what=<value optimized out>, file=<value optimized out>, line=<value
optimized out>)
at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_mutex.c:1013
#5 0xffffffff80545219 in knote (list=<value optimized out>, hint=256,
lockflags=0) at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:2034
#6 0xffffffff8041ebc7 in nm_os_selwakeup (si=0xfffffe00088b94c0) at
/usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap_freebsd.c:1259
#7 0xffffffff8041c94d in netmap_notify (kring=<value optimized out>,
flags=<value optimized out>)
at
/usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap.c:2777
#8 0xffffffff8041c701 in netmap_poll (priv=<value optimized out>,
events=<value optimized out>, sr=0x0)
at
/usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap.c:2735
#9 0xffffffff8041f63c in netmap_knrw (kn=<value optimized out>,
hint=<value optimized out>, events=1)
at
/usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap_freebsd.c:1313
#10 0xffffffff805443d7 in kqueue_kevent (kq=0xfffff8001aa08600,
td=0xfffff80141470560, nchanges=<value optimized out>, nevents=<value
optimized out>,
k_ops=0xfffffe045b5028a0, timeout=<value optimized out>) at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:1700
#11 0xffffffff80543fd6 in kern_kevent_fp (td=0xfffff80141470560,
fp=<value optimized out>, nchanges=0, nevents=<value optimized out>,
k_ops=<value optimized out>, timeout=<value optimized out>) at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:1050
#12 0xffffffff80543eef in kern_kevent (td=0xfffff80141470560, fd=6,
nchanges=0, nevents=64, k_ops=0xfffffe045b5028a0, timeout=0x0)
at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:993
#13 0xffffffff80543cf8 in sys_kevent (td=0xfffff80141470560,
uap=0xfffffe045b502a30) at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:925
#14 0xffffffff80881b5a in amd64_syscall (td=0xfffff80141470560,
traced=0) at subr_syscall.c:135
#15 0xffffffff8086563b in Xfast_syscall () at
/usr/local/share/deploy-tools/RELENG_11/src/sys/amd64/amd64/exception.S:396
#16 0x000000080122813a in ?? ()
````
Thanks a lot four your support!
…-harry
|
|
mmm, at this point I think we need help from some FreeBSD developer that knows about how the kevent subsystem is supposed to work, unfortunately I don't have enough knowledge on that. The only thing you can try is that is trying to do the same that sys/net/if_tap.c does. I also suggest you paying attention at mixing code between FreeBSD versions, you can easily get into trouble. I would say either you start from HEAD code (as it is) or start from 11/stable and replace netmap code with github one. Then of course you can apply the small patches for vlan etc. |
|
Bezüglich Vincenzo Maffione's Nachricht vom 15.06.2017 22:40 (localtime):
mmm, at this point I think we need help from some FreeBSD developer that
knows about how the kevent subsystem is supposed to work, unfortunately
I don't have enough knowledge on that.
The only thing you can try is
|diff --git a/sys/dev/netmap/netmap_freebsd.c b/sys/dev/netmap/netmap_freebsd.c
index 4a684e01372..03ae468a5b6 100644
--- a/sys/dev/netmap/netmap_freebsd.c
+++ b/sys/dev/netmap/netmap_freebsd.c
@@ -1256,7 +1256,7 @@ nm_os_selwakeup(struct nm_selinfo *si)
/* use a non-zero hint to tell the notification from the
* call done in kqueue_scan() which uses 0
*/
- KNOTE_UNLOCKED(&si->si.si_note, 0x100 /* notification */);
+ KNOTE_LOCKED(&si->si.si_note, 0x100 /* notification */);
}
void
@@ -1374,7 +1374,7 @@ netmap_kqfilter(struct cdev *dev, struct knote *kn)
kn->kn_fop = (ev == EVFILT_WRITE) ?
&netmap_wfiltops : &netmap_rfiltops;
kn->kn_hook = priv;
- knlist_add(&si->si.si_note, kn, 1);
+ knlist_add(&si->si.si_note, kn, 0);
// XXX unlock(priv)
ND("register %p %s td %p priv %p kn %p np_nifp %p kn_fp/fpop %s",
na, na->ifp->if_xname, curthread, priv, kn,
|
that is trying to do the same that sys/net/if_tap.c does.
But it's really a blind attempt, unfortunately I won't have the chance
to look for a real solution anytime soon.
Thank you very much, I'll try that tomorrow. I highly appreciate your
help and I think these little patches will help getting netmap(4) and
it's great possibilities into wider production state. Actually,
everything I ever wanted was already there, it just wasn't ready for the
useres like me.
I also suggest you paying attention at mixing code between FreeBSD
versions, you could easily get into trouble. I would say either you
start from HEAD code (as it is) or start from 11/stable and replace
netmap code with github one. Then of course you can apply the small
patches for vlan etc.
Since my first attempt replacing stable/11 netmap codebase with the
github clone was successfull (with some small Makefile adjustments), I
carefully checked what changes were done in HEAD relative to the github
merge. There was nothing magical, so I thought that route might be the
easier one and the more developer-friendly one. Since the iflib change
affects only sys/dev/non-netmap code, HEAD and 11 are still quite close.
I think up to 11.1 I should be safe. And – thanks to your help – I see
the chance to get 11.1 into production with my desired netmap setup.
…-harry
|
|
Good! Actually netmap is already used in production (specially on Linux), as most of its features are in good shape. FreeBSD-specific kevent support unfortunately is an exception, and still needs some work. Some FreeBSD developers fixed it some time ago, so I'm surprised it does not work properly anymore. We should ask them if they can help to fix it. I agree that starting from 11/stable is a good idea to avoid the instability introduced by iflib overhaul. |
|
Bezüglich Vincenzo Maffione's Nachricht vom 15.06.2017 22:40 (localtime):
mmm, at this point I think we need help from some FreeBSD developer that
knows about how the kevent subsystem is supposed to work, unfortunately
I don't have enough knowledge on that.
The only thing you can try is
|diff --git a/sys/dev/netmap/netmap_freebsd.c b/sys/dev/netmap/netmap_freebsd.c
index 4a684e01372..03ae468a5b6 100644
--- a/sys/dev/netmap/netmap_freebsd.c
+++ b/sys/dev/netmap/netmap_freebsd.c
@@ -1256,7 +1256,7 @@ nm_os_selwakeup(struct nm_selinfo *si)
/* use a non-zero hint to tell the notification from the
* call done in kqueue_scan() which uses 0
*/
- KNOTE_UNLOCKED(&si->si.si_note, 0x100 /* notification */);
+ KNOTE_LOCKED(&si->si.si_note, 0x100 /* notification */);
}
void
@@ -1374,7 +1374,7 @@ netmap_kqfilter(struct cdev *dev, struct knote *kn)
kn->kn_fop = (ev == EVFILT_WRITE) ?
&netmap_wfiltops : &netmap_rfiltops;
kn->kn_hook = priv;
- knlist_add(&si->si.si_note, kn, 1);
+ knlist_add(&si->si.si_note, kn, 0);
// XXX unlock(priv)
ND("register %p %s td %p priv %p kn %p np_nifp %p kn_fp/fpop %s",
na, na->ifp->if_xname, curthread, priv, kn,
|
that is trying to do the same that sys/net/if_tap.c does.
But it's really a blind attempt, unfortunately I won't have the chance
to look for a real solution anytime soon.
Thank you for your help so far!
Just for your info, this change leads to erlier locking panic.
Not at bhyve invocation, but already at vale-ctl invocation, when I add
the vlan-if:
KDB: stack backtrace:
#0 0xffffffff805cb207 at kdb_backtrace+0x67
#1 0xffffffff8058b0f6 at vpanic+0x186
#2 0xffffffff8058b173 at panic+0x43
#3 0xffffffff8056c214 at __mtx_assert+0xb4
#4 0xffffffff80545235 at knote+0x45
#5 0xffffffff8041ebca at nm_os_selwakeup+0x8a
#6 0xffffffff8041c94d at netmap_notify+0x1d
#7 0xffffffff8041d3ba at netmap_reset+0x1ba
#8 0xffffffff8037a401 at em_init_locked+0x261
#9 0xffffffff803804f2 at em_netmap_reg+0x232
#10 0xffffffff8041cc1c at netmap_hw_reg+0x2c
#11 0xffffffff8042da78 at netmap_bwrap_reg+0x1d8
#12 0xffffffff8041ae6a at netmap_do_regif+0x3ba
#13 0xffffffff8042dff2 at netmap_bwrap_bdg_ctl+0x62
#14 0xffffffff8042caed at netmap_bdg_ctl+0xa2d
#15 0xffffffff8041b1b2 at netmap_ioctl+0x302
#16 0xffffffff8041ed4e at freebsd_netmap_ioctl+0x3e
#17 0xffffffff80466168 at devfs_ioctl_f+0x138
Uptime: 28s
Dumping 1618 out of 15733
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
#0 doadump (textdump=<value optimized out>) at pcpu.h:222
#1 0xffffffff8058ab70 in kern_reboot (howto=260) at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:366
#2 0xffffffff8058b130 in vpanic (fmt=<value optimized out>, ap=<value
optimized out>)
at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:759
#3 0xffffffff8058b173 in panic (fmt=<value optimized out>) at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:690
#4 0xffffffff8056c214 in __mtx_assert (c=0x0, what=0, file=0x0, line=0)
at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_mutex.c:1000
#5 0xffffffff80545235 in knote (list=<value optimized out>, hint=256,
lockflags=1) at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:2034
#6 0xffffffff8041ebca in nm_os_selwakeup (si=0xfffff800423bf040) at
/usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap_freebsd.c:1259
#7 0xffffffff8041c94d in netmap_notify (kring=<value optimized out>,
flags=<value optimized out>)
at
/usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap.c:2777
#8 0xffffffff8041d3ba in netmap_reset (na=<value optimized out>,
tx=NR_TX, n=0, new_cur=<value optimized out>)
at
/usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap.c:3233
#9 0xffffffff8037a401 in em_init_locked (adapter=<value optimized out>)
at
/usr/local/share/deploy-tools/RELENG_11/src/sys/dev/e1000/if_em.c:3585
#10 0xffffffff803804f2 in em_netmap_reg (na=0xfffff8001aa6f800, onoff=1)
at if_em_netmap.h:105
#11 0xffffffff8041cc1c in netmap_hw_reg (na=0xfffff8001aa6f800, onoff=1)
at
/usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap.c:2874
#12 0xffffffff8042da78 in netmap_bwrap_reg (na=0xfffff80042561000,
onoff=<value optimized out>)
at
/usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap_vale.c:2492
#13 0xffffffff8041ae6a in netmap_do_regif (priv=0xfffff8009a2af100,
na=0xfffff80042561000, ringid=<value optimized out>, flags=<value
optimized out>)
at
/usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap.c:2106
#14 0xffffffff8042dff2 in netmap_bwrap_bdg_ctl (na=0xfffff80042561000,
nmr=<value optimized out>, attach=<value optimized out>)
at
/usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap_vale.c:2700
#15 0xffffffff8042caed in netmap_bdg_ctl (nmr=0xfffffe0446d02880,
bdg_ops=0x0) at
/usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap_vale.c:867
#16 0xffffffff8041b1b2 in netmap_ioctl (priv=0xfffff8009a655b80,
cmd=<value optimized out>, data=0xfffffe0446d02880 "vale17:nic1bmc",
td=0xfffff80037ac5560)
at netmap_kern.h:860
#17 0xffffffff8041ed4e in freebsd_netmap_ioctl (dev=<value optimized
out>, cmd=3225184658, data=0xfffffe0446d02880 "vale17:nic1bmc",
ffla=<value optimized out>, td=0xfffff80037ac5560) at
/usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap_freebsd.c:1412
#18 0xffffffff80466168 in devfs_ioctl_f (fp=0xfffff8001fa751e0,
com=3225184658, data=0xfffffe0446d02880, cred=0xfffff8001f7ed400,
td=0xfffff80037ac5560)
at
/usr/local/share/deploy-tools/RELENG_11/src/sys/fs/devfs/devfs_vnops.c:791
#19 0xffffffff805ed7bd in kern_ioctl (td=<value optimized out>,
fd=<value optimized out>, com=<value optimized out>, data=<value
optimized out>) at file.h:323
#20 0xffffffff805ed46e in sys_ioctl (td=0xfffff80037ac5560,
uap=0xfffffe0446d02a30) at
/usr/local/share/deploy-tools/RELENG_11/src/sys/kern/sys_generic.c:745
#21 0xffffffff80881c1a in amd64_syscall (td=0xfffff80037ac5560,
traced=0) at subr_syscall.c:135
#22 0xffffffff808656fb in Xfast_syscall () at
/usr/local/share/deploy-tools/RELENG_11/src/sys/amd64/amd64/exception.S:396
#23 0x0000000800984b3a in ?? ()
I'll open a bug report describing the initial panic.
Thanks,
…-harry
|
|
is this issue still there? |
|
I don't run a netmap(4) environment with debugging kernel - cuerently no debugging kernel anywhere, so I can't falsify easily. I guess as soon as I'm including WITNESS and INVARIANTS in the kernel options, I'll see the panic again, since I haven't noticed any changes in that area. Thanks, -harry P.S.: Sorry, couldn't find out how to stop github mutilating the diff above... |
|
Thanks, I just wanted to know whether the issue was still open. I fixed the diff layout for you. |
Unfortunately in my last report there were two issues /stack traces) overlapping, masking the more severe problem causing this panic on stable/11 with netmap code merged from HEAD, including latest freebsd "spin on trylock in netmap_mem2_ofstophys()" fix:
Thanks, let me knwo if there's more info needed.
-harry
The text was updated successfully, but these errors were encountered: