Support working under VFIO passthrough #9

mcbridematt · 2022-06-01T08:16:18Z

This fixes some issues that were preventing the driver from working under VFIO / QEMU passthrough.
This has been tested with muvirt on Ten64 and MC firmware 10.20 (see our forum on how to set this up: https://forum.traverse.com.au/t/restool-in-muvirt/63/17?u=mcbridematt).

There are some changes to VFIO in more recent MC firmwares (>10.24) and QEMU versions so some adjustments may still be required.

Key changes:

Extend ICID to 32-bits. The top 16 bits (marked as 'reserved' in NXP documentation) are used to differentiate host and guest interrupts in the (v)GIC [interrupt controller]. Without the full ICID the interrupts will not be routed in the kernel

dpaa2_rc0: Isolation context ID: 23 # Bare metal
dpaa2_rc0: Isolation context ID: 65536 # VM

Treat an unreachable DPMAC (when we ask if it is PHY or FIXED) as a FIXED link.
(This is because the DPMAC is usually sitting in the host / root DPRC and cannot be accessed by the child DPRC)
Ideally, this would be avoided by having the DPMAC driver 'own' the PHY connection (signalling the MC on MII/MDIO events), and the DPNI receiving it's link state indications from the MC via interrupts. However, it's not worth the effort to implement this unless FreeBSD gains the ability to act as a host for VFIO passthrough.
Support non-DPMAC partner types: DPNI, DPDMUX, DPSW.
These function exactly the same as a DPMAC in FIXED mode, but their MAC addresses are set in the DPNI object only, so move the MAC query out of the DPMAC only section.

…IXED link We try to query the connected DPMAC to determine it's endpoint type (PHY or FIXED), but this approach fails when the link partner is *outside* our own DPRC. For now just assume an unreachable DPMAC is a "fixed" link.

While the ICID is only documented as a 16-bit value, 32-bits are actually used (taking over the 'reserved' bits next to it). The top 16 bits are then used to differentiate containers running under a VM.

These do not have a driver implementation (yet), but a DPNI (or a DPMAC) can be a link partner of these two objects, so we need to be able to recognize them.

Non-MAC partners like DPNI or DPDMUX have their MAC address stored in the DPNI attributes which a ddpaa2_ni_set_mac_addr already handles.

dsalychev · 2022-06-02T19:32:43Z

Patch applied starting from ef64c6e. Thanks!

Under certain loads, the following panic is hit: panic: page fault KDB: stack backtrace: #0 0xffffffff805db025 at kdb_backtrace+0x65 #1 0xffffffff8058e86f at vpanic+0x17f #2 0xffffffff8058e6e3 at panic+0x43 #3 0xffffffff808adc15 at trap_fatal+0x385 #4 0xffffffff808adc6f at trap_pfault+0x4f #5 0xffffffff80886da8 at calltrap+0x8 #6 0xffffffff80669186 at vgonel+0x186 #7 0xffffffff80669841 at vgone+0x31 #8 0xffffffff8065806d at vfs_hash_insert+0x26d #9 0xffffffff81a39069 at sfs_vgetx+0x149 #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #11 0xffffffff8065a28c at lookup+0x45c #12 0xffffffff806594b9 at namei+0x259 #13 0xffffffff80676a33 at kern_statat+0xf3 #14 0xffffffff8067712f at sys_fstatat+0x2f #15 0xffffffff808ae50c at amd64_syscall+0x10c #16 0xffffffff808876bb at fast_syscall_common+0xf8 The page fault occurs because vgonel() will call VOP_CLOSE() for active vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While here, define vop_open for consistency. After adding the necessary vop, the bug progresses to the following panic: panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1) cpuid = 17 KDB: stack backtrace: #0 0xffffffff805e29c5 at kdb_backtrace+0x65 #1 0xffffffff8059620f at vpanic+0x17f #2 0xffffffff81a27f4a at spl_panic+0x3a #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40 #4 0xffffffff8066fdee at vinactivef+0xde #5 0xffffffff80670b8a at vgonel+0x1ea #6 0xffffffff806711e1 at vgone+0x31 #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d #8 0xffffffff81a39069 at sfs_vgetx+0x149 #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #10 0xffffffff80661c2c at lookup+0x45c #11 0xffffffff80660e59 at namei+0x259 #12 0xffffffff8067e3d3 at kern_statat+0xf3 #13 0xffffffff8067eacf at sys_fstatat+0x2f #14 0xffffffff808b5ecc at amd64_syscall+0x10c #15 0xffffffff8088f07b at fast_syscall_common+0xf8 This is caused by a race condition that can occur when allocating a new vnode and adding that vnode to the vfs hash. If the newly created vnode loses the race when being inserted into the vfs hash, it will not be recycled as its usecount is greater than zero, hitting the above assertion. Fix this by dropping the assertion. FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700 Reviewed-by: Andriy Gapon <avg@FreeBSD.org> Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Alek Pinchuk <apinchuk@axcient.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Co-authored-by: Rob Wing <rob.wing@klarasystems.com> Submitted-by: Klara, Inc. Sponsored-by: rsync.net Closes #14501

Avoid locking issues when if_allmulti() calls the driver's if_ioctl, because that may acquire sleepable locks (while we hold a non-sleepable rwlock). Fortunately there's no pressing need to hold the mroute lock while we do this, so we can postpone the call slightly, until after we've released the lock. This avoids the following WITNESS warning (with iflib drivers): lock order reversal: (sleepable after non-sleepable) 1st 0xffffffff82f64960 IPv4 multicast forwarding (IPv4 multicast forwarding, rw) @ /usr/src/sys/netinet/ip_mroute.c:1050 2nd 0xfffff8000480f180 iflib ctx lock (iflib ctx lock, sx) @ /usr/src/sys/net/iflib.c:4525 lock order IPv4 multicast forwarding -> iflib ctx lock attempted at: #0 0xffffffff80bbd6ce at witness_checkorder+0xbbe #1 0xffffffff80b56d10 at _sx_xlock+0x60 #2 0xffffffff80c9ce5c at iflib_if_ioctl+0x2dc #3 0xffffffff80c7c395 at if_setflag+0xe5 #4 0xffffffff82f60a0e at del_vif_locked+0x9e #5 0xffffffff82f5f0d5 at X_ip_mrouter_set+0x265 #6 0xffffffff80bfd402 at sosetopt+0xc2 #7 0xffffffff80c02105 at kern_setsockopt+0xa5 #8 0xffffffff80c02054 at sys_setsockopt+0x24 #9 0xffffffff81046be8 at amd64_syscall+0x138 #10 0xffffffff8101930b at fast_syscall_common+0xf8 See also: https://redmine.pfsense.org/issues/12079 Reviewed by: mjg Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D41209

netlink(4) calls back into the driver during detach and it attempts to start an internal synchronized op recursively, causing an interruptible hang. Fix it by failing the ioctl if the VI has been marked as DOOMED by cxgbe_detach. Here's the stack for the hang for reference. #6 begin_synchronized_op #7 cxgbe_media_status #8 ifmedia_ioctl #9 cxgbe_ioctl #10 if_ioctl #11 get_operstate_ether #12 get_operstate #13 dump_iface #14 rtnl_handle_ifevent #15 rtnl_handle_ifnet_event #16 rt_ifmsg #17 if_unroute #18 if_down #19 if_detach_internal #20 if_detach #21 ether_ifdetach #22 cxgbe_vi_detach #23 cxgbe_detach #24 DEVICE_DETACH MFC after: 3 days Sponsored by: Chelsio Communications

mcbridematt added 5 commits May 24, 2022 20:44

dpaa2: rc: extend ICID to 32-bits for VFIO support

dbfad2a

While the ICID is only documented as a 16-bit value, 32-bits are actually used (taking over the 'reserved' bits next to it). The top 16 bits are then used to differentiate containers running under a VM.

dpaa2: add DPDMUX and DPSW types

7848e48

These do not have a driver implementation (yet), but a DPNI (or a DPMAC) can be a link partner of these two objects, so we need to be able to recognize them.

dpaa2: ni: make MAC address retreival common for all partner types

a0e0bb5

Non-MAC partners like DPNI or DPDMUX have their MAC address stored in the DPNI attributes which a ddpaa2_ni_set_mac_addr already handles.

dpaa2: ni: setup fixed link for DPDMUX and DPSW

350badb

dsalychev added bug Something isn't working enhancement New feature or request patch Patch is available labels Jun 2, 2022

dsalychev self-assigned this Jun 2, 2022

dsalychev closed this Jun 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support working under VFIO passthrough #9

Support working under VFIO passthrough #9

mcbridematt commented Jun 1, 2022

dsalychev commented Jun 2, 2022 •

edited

Loading

Support working under VFIO passthrough #9

Support working under VFIO passthrough #9

Conversation

mcbridematt commented Jun 1, 2022

dsalychev commented Jun 2, 2022 • edited Loading

dsalychev commented Jun 2, 2022 •

edited

Loading