Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build failure without INVARIANTS (includes suggested patch) #5

Closed
BSDer opened this issue May 10, 2022 · 1 comment
Closed

Build failure without INVARIANTS (includes suggested patch) #5

BSDer opened this issue May 10, 2022 · 1 comment
Labels
bug Something isn't working patch Patch is available

Comments

@BSDer
Copy link

BSDer commented May 10, 2022

When you try and build a kernel without INVARIANTS, such as a NODEBUG kernel, compilation fails with:

--- all_subdir_dpaa2 ---
/usr/src/sys/dev/dpaa2/dpaa2_swp.c:1034:5: error: variable 'r' set but not used [-Werror,-Wunused-but-set-variable]
        } *r;
           ^
1 error generated.
*** [dpaa2_swp.o] Error code 1

make[4]: stopped in /usr/src/sys/modules/dpaa2
1 error

make[4]: stopped in /usr/src/sys/modules/dpaa2

A possible fix is to surround by #if ... both the definition and the assignment of variable r.

In this case I took the #if ... from sys/sys/kassert.h, where KASSERT is also defined:


diff --git a/sys/dev/dpaa2/dpaa2_swp.c b/sys/dev/dpaa2/dpaa2_swp.c
index 200beb8dce57..c2b231826d38 100644
--- a/sys/dev/dpaa2/dpaa2_swp.c
+++ b/sys/dev/dpaa2/dpaa2_swp.c
@@ -1028,10 +1028,12 @@ static int
 dpaa2_swp_exec_mgmt_command(struct dpaa2_swp *swp, struct dpaa2_swp_cmd *cmd,
     struct dpaa2_swp_rsp *rsp, uint8_t cmdid)
 {
+#if (defined(_KERNEL) && defined(INVARIANTS)) || defined(_STANDALONE)
        struct __packed with_verb {
                uint8_t verb;
                uint8_t _reserved[63];
        } *r;
+#endif
        uint16_t flags;
        int error;
 
@@ -1057,7 +1059,9 @@ dpaa2_swp_exec_mgmt_command(struct dpaa2_swp *swp, struct dpaa2_swp_cmd *cmd,
        }
        dpaa2_swp_unlock(swp);
 
+#if (defined(_KERNEL) && defined(INVARIANTS)) || defined(_STANDALONE)
        r = (struct with_verb *) rsp;
+#endif
        KASSERT((r->verb & CMD_VERB_MASK) == cmdid,
            ("wrong VERB byte in response: resp=0x%02x, expected=0x%02x",
            r->verb, cmdid));

With the patch the NODEBUG kernel builds and after a two way transfer of ~1G in both directions (at ~ 960mbps):

# sysctl dev.dpaa2_ni.0 && netstat -i -b -n -I dpni0
dev.dpaa2_ni.0.stats.in_all_frames: 1087188
dev.dpaa2_ni.0.stats.in_all_bytes: 1120373738
dev.dpaa2_ni.0.stats.in_multi_frames: 70
dev.dpaa2_ni.0.stats.eg_all_frames: 1086856
dev.dpaa2_ni.0.stats.eg_all_bytes: 1120347810
dev.dpaa2_ni.0.stats.eg_multi_frames: 0
dev.dpaa2_ni.0.stats.in_filtered_frames: 0
dev.dpaa2_ni.0.stats.in_discarded_frames: 0
dev.dpaa2_ni.0.stats.in_nobuf_discards: 0
dev.dpaa2_ni.0.stats.rx_ieoi_err_frames: 0
dev.dpaa2_ni.0.stats.rx_enq_rej_frames: 0
dev.dpaa2_ni.0.stats.rx_sg_buf_frames: 0
dev.dpaa2_ni.0.stats.rx_single_buf_frames: 1087192
dev.dpaa2_ni.0.stats.rx_anomaly_frames: 0
dev.dpaa2_ni.0.channels.15.tx_dropped: 0
dev.dpaa2_ni.0.channels.15.tx_frames: 0
dev.dpaa2_ni.0.channels.14.tx_dropped: 0
dev.dpaa2_ni.0.channels.14.tx_frames: 0
dev.dpaa2_ni.0.channels.13.tx_dropped: 0
dev.dpaa2_ni.0.channels.13.tx_frames: 0
dev.dpaa2_ni.0.channels.12.tx_dropped: 0
dev.dpaa2_ni.0.channels.12.tx_frames: 0
dev.dpaa2_ni.0.channels.11.tx_dropped: 0
dev.dpaa2_ni.0.channels.11.tx_frames: 0
dev.dpaa2_ni.0.channels.10.tx_dropped: 0
dev.dpaa2_ni.0.channels.10.tx_frames: 0
dev.dpaa2_ni.0.channels.9.tx_dropped: 0
dev.dpaa2_ni.0.channels.9.tx_frames: 0
dev.dpaa2_ni.0.channels.8.tx_dropped: 0
dev.dpaa2_ni.0.channels.8.tx_frames: 0
dev.dpaa2_ni.0.channels.7.tx_dropped: 0
dev.dpaa2_ni.0.channels.7.tx_frames: 0
dev.dpaa2_ni.0.channels.6.tx_dropped: 0
dev.dpaa2_ni.0.channels.6.tx_frames: 0
dev.dpaa2_ni.0.channels.5.tx_dropped: 0
dev.dpaa2_ni.0.channels.5.tx_frames: 0
dev.dpaa2_ni.0.channels.4.tx_dropped: 0
dev.dpaa2_ni.0.channels.4.tx_frames: 0
dev.dpaa2_ni.0.channels.3.tx_dropped: 0
dev.dpaa2_ni.0.channels.3.tx_frames: 0
dev.dpaa2_ni.0.channels.2.tx_dropped: 0
dev.dpaa2_ni.0.channels.2.tx_frames: 0
dev.dpaa2_ni.0.channels.1.tx_dropped: 0
dev.dpaa2_ni.0.channels.1.tx_frames: 0
dev.dpaa2_ni.0.channels.0.tx_dropped: 0
dev.dpaa2_ni.0.channels.0.tx_frames: 1086897
dev.dpaa2_ni.0.%parent: dpaa2_rc0
dev.dpaa2_ni.0.%pnpinfo: 
dev.dpaa2_ni.0.%location: 
dev.dpaa2_ni.0.%driver: dpaa2_ni
dev.dpaa2_ni.0.%desc: DPAA2 Network Interface
Name    Mtu Network       Address              Ipkts Ierrs Idrop     Ibytes    Opkts Oerrs     Obytes  Coll
dpni0  1500 <Link#1>      82:e3:3f:86:00:11        0     0     0 1120375388        0     0          0     0
dpni0     - 192.168.1.0/2 192.168.1.52       1087085     -     - 1105137104  1086901     - 1105137884     -
@dsalychev dsalychev added bug Something isn't working patch Patch is available labels May 10, 2022
@dsalychev
Copy link

Fixed in b02e944.

dsalychev pushed a commit that referenced this issue Apr 3, 2023
Under certain loads, the following panic is hit:

    panic: page fault
    KDB: stack backtrace:
    #0 0xffffffff805db025 at kdb_backtrace+0x65
    #1 0xffffffff8058e86f at vpanic+0x17f
    #2 0xffffffff8058e6e3 at panic+0x43
    #3 0xffffffff808adc15 at trap_fatal+0x385
    #4 0xffffffff808adc6f at trap_pfault+0x4f
    #5 0xffffffff80886da8 at calltrap+0x8
    #6 0xffffffff80669186 at vgonel+0x186
    #7 0xffffffff80669841 at vgone+0x31
    #8 0xffffffff8065806d at vfs_hash_insert+0x26d
    #9 0xffffffff81a39069 at sfs_vgetx+0x149
    #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #11 0xffffffff8065a28c at lookup+0x45c
    #12 0xffffffff806594b9 at namei+0x259
    #13 0xffffffff80676a33 at kern_statat+0xf3
    #14 0xffffffff8067712f at sys_fstatat+0x2f
    #15 0xffffffff808ae50c at amd64_syscall+0x10c
    #16 0xffffffff808876bb at fast_syscall_common+0xf8

The page fault occurs because vgonel() will call VOP_CLOSE() for active
vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While
here, define vop_open for consistency.

After adding the necessary vop, the bug progresses to the following
panic:

    panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1)
    cpuid = 17
    KDB: stack backtrace:
    #0 0xffffffff805e29c5 at kdb_backtrace+0x65
    #1 0xffffffff8059620f at vpanic+0x17f
    #2 0xffffffff81a27f4a at spl_panic+0x3a
    #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40
    #4 0xffffffff8066fdee at vinactivef+0xde
    #5 0xffffffff80670b8a at vgonel+0x1ea
    #6 0xffffffff806711e1 at vgone+0x31
    #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d
    #8 0xffffffff81a39069 at sfs_vgetx+0x149
    #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4
    #10 0xffffffff80661c2c at lookup+0x45c
    #11 0xffffffff80660e59 at namei+0x259
    #12 0xffffffff8067e3d3 at kern_statat+0xf3
    #13 0xffffffff8067eacf at sys_fstatat+0x2f
    #14 0xffffffff808b5ecc at amd64_syscall+0x10c
    #15 0xffffffff8088f07b at fast_syscall_common+0xf8

This is caused by a race condition that can occur when allocating a new
vnode and adding that vnode to the vfs hash. If the newly created vnode
loses the race when being inserted into the vfs hash, it will not be
recycled as its usecount is greater than zero, hitting the above
assertion.

Fix this by dropping the assertion.

FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Reviewed-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Alek Pinchuk <apinchuk@axcient.com>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Submitted-by: Klara, Inc.
Sponsored-by: rsync.net
Closes #14501
dsalychev pushed a commit that referenced this issue Aug 3, 2023
Avoid locking issues when if_allmulti() calls the driver's if_ioctl,
because that may acquire sleepable locks (while we hold a non-sleepable
rwlock).

Fortunately there's no pressing need to hold the mroute lock while we
do this, so we can postpone the call slightly, until after we've
released the lock.

This avoids the following WITNESS warning (with iflib drivers):

	lock order reversal: (sleepable after non-sleepable)
	 1st 0xffffffff82f64960 IPv4 multicast forwarding (IPv4 multicast forwarding, rw) @ /usr/src/sys/netinet/ip_mroute.c:1050
	 2nd 0xfffff8000480f180 iflib ctx lock (iflib ctx lock, sx) @ /usr/src/sys/net/iflib.c:4525
	lock order IPv4 multicast forwarding -> iflib ctx lock attempted at:
	#0 0xffffffff80bbd6ce at witness_checkorder+0xbbe
	#1 0xffffffff80b56d10 at _sx_xlock+0x60
	#2 0xffffffff80c9ce5c at iflib_if_ioctl+0x2dc
	#3 0xffffffff80c7c395 at if_setflag+0xe5
	#4 0xffffffff82f60a0e at del_vif_locked+0x9e
	#5 0xffffffff82f5f0d5 at X_ip_mrouter_set+0x265
	#6 0xffffffff80bfd402 at sosetopt+0xc2
	#7 0xffffffff80c02105 at kern_setsockopt+0xa5
	#8 0xffffffff80c02054 at sys_setsockopt+0x24
	#9 0xffffffff81046be8 at amd64_syscall+0x138
	#10 0xffffffff8101930b at fast_syscall_common+0xf8

See also:	https://redmine.pfsense.org/issues/12079
Reviewed by:	mjg
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D41209
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working patch Patch is available
Projects
None yet
Development

No branches or pull requests

2 participants