Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vhci-hcd missing from kernel-uek-modules-extra #18

Closed
evert-mouw opened this issue Oct 25, 2023 · 4 comments
Closed

vhci-hcd missing from kernel-uek-modules-extra #18

evert-mouw opened this issue Oct 25, 2023 · 4 comments

Comments

@evert-mouw
Copy link

evert-mouw commented Oct 25, 2023

The vhci-hcd driver is missing in the x86_64 version of package kernel-uek-modules-extra
This means that usbip bind fails...
Note that the aarch64 version does include both usbip-host.ko.xz and vhci-hcd.ko.xz

It seems to effect both OL8 and OL8 (UEK)
https://pkgs.org/download/kernel-uek-modules-extra

You can quickly check this as well using
ls /lib/modules/$(uname -a | cut -d' ' -f3)/kernel/drivers/usb/usbip

@totalamateurhour
Copy link
Member

It's not surprising that this module is missing from the UEK modules, as this kernel is not necessarily built for desktop use cases. Though it is strange that it's there in the aarch64 version. What is your use case?

@evert-mouw
Copy link
Author

Yes, it should be included, like in the aarch64 version.

(My personal use case is simple; I want to use an USB webcam on my home server. A bit silly; I run whatsapp inside an android VM. Need to authenticate using QR and cam to access whatsapp using a browser on a desktop pc. Could also use a long usb cable but where's the fun in that?)

More serious applications include usage of various USB-based devices over the network, from scanners and printers to whatever. I guess it's not much work to fix this issue as aarch64 already includes the good stuff.

@totalamateurhour totalamateurhour transferred this issue from oracle/oracle-linux Nov 8, 2023
oraclelinuxkernel pushed a commit that referenced this issue Nov 10, 2023
[ Upstream commit a154f5f ]

The following call trace shows a deadlock issue due to recursive locking of
mutex "device_mutex". First lock acquire is in target_for_each_device() and
second in target_free_device().

 PID: 148266   TASK: ffff8be21ffb5d00  CPU: 10   COMMAND: "iscsi_ttx"
  #0 [ffffa2bfc9ec3b18] __schedule at ffffffffa8060e7f
  #1 [ffffa2bfc9ec3ba0] schedule at ffffffffa8061224
  #2 [ffffa2bfc9ec3bb8] schedule_preempt_disabled at ffffffffa80615ee
  #3 [ffffa2bfc9ec3bc8] __mutex_lock at ffffffffa8062fd7
  #4 [ffffa2bfc9ec3c40] __mutex_lock_slowpath at ffffffffa80631d3
  #5 [ffffa2bfc9ec3c50] mutex_lock at ffffffffa806320c
  #6 [ffffa2bfc9ec3c68] target_free_device at ffffffffc0935998 [target_core_mod]
  #7 [ffffa2bfc9ec3c90] target_core_dev_release at ffffffffc092f975 [target_core_mod]
  #8 [ffffa2bfc9ec3ca0] config_item_put at ffffffffa79d250f
  #9 [ffffa2bfc9ec3cd0] config_item_put at ffffffffa79d2583
 #10 [ffffa2bfc9ec3ce0] target_devices_idr_iter at ffffffffc0933f3a [target_core_mod]
 #11 [ffffa2bfc9ec3d00] idr_for_each at ffffffffa803f6fc
 #12 [ffffa2bfc9ec3d60] target_for_each_device at ffffffffc0935670 [target_core_mod]
 #13 [ffffa2bfc9ec3d98] transport_deregister_session at ffffffffc0946408 [target_core_mod]
 #14 [ffffa2bfc9ec3dc8] iscsit_close_session at ffffffffc09a44a6 [iscsi_target_mod]
 #15 [ffffa2bfc9ec3df0] iscsit_close_connection at ffffffffc09a4a88 [iscsi_target_mod]
 #16 [ffffa2bfc9ec3df8] finish_task_switch at ffffffffa76e5d07
 #17 [ffffa2bfc9ec3e78] iscsit_take_action_for_connection_exit at ffffffffc0991c23 [iscsi_target_mod]
 #18 [ffffa2bfc9ec3ea0] iscsi_target_tx_thread at ffffffffc09a403b [iscsi_target_mod]
 #19 [ffffa2bfc9ec3f08] kthread at ffffffffa76d8080
 #20 [ffffa2bfc9ec3f50] ret_from_fork at ffffffffa8200364

Fixes: 36d4cb4 ("scsi: target: Avoid that EXTENDED COPY commands trigger lock inversion")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Link: https://lore.kernel.org/r/20230918225848.66463-1-junxiao.bi@oracle.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit eae6aabc48eea9b75cf50d98ddf9a19401d87adf)
oraclelinuxkernel pushed a commit that referenced this issue Nov 17, 2023
We have a customer usecase for this, so enable these both modules.

Note that these would still be part of extras rpm and these are already
part of aarch64 configs.

More details on the user requirement is here:
Link: #18

Orabug: 35994192

Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
@tvierling
Copy link
Member

The modules should now exist in latest UEK7 per the above config commit.

@evert-mouw
Copy link
Author

Thanks a lot guys! :-)

oraclelinuxkernel pushed a commit that referenced this issue Jan 19, 2024
[ Upstream commit e3e82fc ]

When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a
cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when
removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be
dereferenced as wrong struct in irdma_free_pending_cqp_request().

  PID: 3669   TASK: ffff88aef892c000  CPU: 28  COMMAND: "kworker/28:0"
   #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34
   #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2
   #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f
   #3 [fffffe0000549eb8] do_nmi at ffffffff81079582
   #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4
      [exception RIP: native_queued_spin_lock_slowpath+1291]
      RIP: ffffffff8127e72b  RSP: ffff88aa841ef778  RFLAGS: 00000046
      RAX: 0000000000000000  RBX: ffff88b01f849700  RCX: ffffffff8127e47e
      RDX: 0000000000000000  RSI: 0000000000000004  RDI: ffffffff83857ec0
      RBP: ffff88afe3e4efc8   R8: ffffed15fc7c9dfa   R9: ffffed15fc7c9dfa
      R10: 0000000000000001  R11: ffffed15fc7c9df9  R12: 0000000000740000
      R13: ffff88b01f849708  R14: 0000000000000003  R15: ffffed1603f092e1
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
  -- <NMI exception stack> --
   #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b
   #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4
   #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363
   #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma]
   #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma]
   #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma]
   #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma]
   #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb
   #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6
   #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278
   #15 [ffff88aa841efb88] device_del at ffffffff82179d23
   #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice]
   #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice]
   #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a
   #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff
   #20 [ffff88aa841eff10] kthread at ffffffff811d87a0
   #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f

Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions")
Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn
Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com>
Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn>
Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 0511a9c56e5854958021de15967d879ceac5d821)
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue May 10, 2024
systems, using igb driver, crash while executing poweroff command
as per following call stack:

crash> bt -a
PID: 62583    TASK: ffff97ebbf28dc40  CPU: 0    COMMAND: "poweroff"
 #0 [ffffa7adcd64f8a0] machine_kexec at ffffffffa606c7c1
 #1 [ffffa7adcd64f900] __crash_kexec at ffffffffa613bb52
 #2 [ffffa7adcd64f9d0] panic at ffffffffa6099c45
 #3 [ffffa7adcd64fa50] oops_end at ffffffffa603359a
 #4 [ffffa7adcd64fa78] die at ffffffffa6033c32
 #5 [ffffa7adcd64faa8] do_trap at ffffffffa60309a0
 #6 [ffffa7adcd64faf8] do_error_trap at ffffffffa60311e7
 #7 [ffffa7adcd64fbc0] do_invalid_op at ffffffffa6031320
 #8 [ffffa7adcd64fbd0] invalid_op at ffffffffa6a01f2a
    [exception RIP: free_msi_irqs+408]
    RIP: ffffffffa645d248  RSP: ffffa7adcd64fc88  RFLAGS: 00010286
    RAX: ffff97eb1396fe00  RBX: 0000000000000000  RCX: ffff97eb1396fe00
    RDX: ffff97eb1396fe00  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffffa7adcd64fcb0   R8: 0000000000000002   R9: 000000000000fbff
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff98c047af4720
    R13: ffff97eb87cd32a0  R14: ffff97eb87cd3000  R15: ffffa7adcd64fd57
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffffa7adcd64fc80] free_msi_irqs at ffffffffa645d0fc
 #10 [ffffa7adcd64fcb8] pci_disable_msix at ffffffffa645d896
 #11 [ffffa7adcd64fce0] igb_reset_interrupt_capability at ffffffffc024f335 [igb]
 #12 [ffffa7adcd64fd08] __igb_shutdown at ffffffffc0258ed7 [igb]
 #13 [ffffa7adcd64fd48] igb_shutdown at ffffffffc025908b [igb]
 #14 [ffffa7adcd64fd70] pci_device_shutdown at ffffffffa6441e3a
 #15 [ffffa7adcd64fd98] device_shutdown at ffffffffa6570260
 #16 [ffffa7adcd64fdc8] kernel_power_off at ffffffffa60c0725
 #17 [ffffa7adcd64fdd8] SYSC_reboot at ffffffffa60c08f1
 #18 [ffffa7adcd64ff18] sys_reboot at ffffffffa60c09ee
 #19 [ffffa7adcd64ff28] do_syscall_64 at ffffffffa6003ca9
 #20 [ffffa7adcd64ff50] entry_SYSCALL_64_after_hwframe at ffffffffa6a001b1

This happens because igb_shutdown has not yet freed up allocated irqs and
free_msi_irqs finds irq_has_action true for involved msi irqs here and this
condition triggers BUG_ON.

Freeing irqs before proceeding further in igb_clear_interrupt_scheme,
fixes this problem.

This issue does not happen in v5.17 or later kernel versions because
'commit 9fb9eb4 ("PCI/MSI: Let core code free MSI descriptors")',
explicitly frees up MSI based irqs and hence indirectly fixes this issue
as well. But this change is dependent on framework for runtime extension
of MSI-X irqs and including these changes will break the kABI because it
changes some core data structures like pci_device, device and others.

So in kernels prior to v5.17 we need to have this change to fix this issue,
without breaking the kABI.

Orabug: 36547249

Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue May 17, 2024
systems, using igb driver, crash while executing poweroff command
as per following call stack:

crash> bt -a
PID: 62583    TASK: ffff97ebbf28dc40  CPU: 0    COMMAND: "poweroff"
 #0 [ffffa7adcd64f8a0] machine_kexec at ffffffffa606c7c1
 #1 [ffffa7adcd64f900] __crash_kexec at ffffffffa613bb52
 #2 [ffffa7adcd64f9d0] panic at ffffffffa6099c45
 #3 [ffffa7adcd64fa50] oops_end at ffffffffa603359a
 #4 [ffffa7adcd64fa78] die at ffffffffa6033c32
 #5 [ffffa7adcd64faa8] do_trap at ffffffffa60309a0
 #6 [ffffa7adcd64faf8] do_error_trap at ffffffffa60311e7
 #7 [ffffa7adcd64fbc0] do_invalid_op at ffffffffa6031320
 #8 [ffffa7adcd64fbd0] invalid_op at ffffffffa6a01f2a
    [exception RIP: free_msi_irqs+408]
    RIP: ffffffffa645d248  RSP: ffffa7adcd64fc88  RFLAGS: 00010286
    RAX: ffff97eb1396fe00  RBX: 0000000000000000  RCX: ffff97eb1396fe00
    RDX: ffff97eb1396fe00  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffffa7adcd64fcb0   R8: 0000000000000002   R9: 000000000000fbff
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff98c047af4720
    R13: ffff97eb87cd32a0  R14: ffff97eb87cd3000  R15: ffffa7adcd64fd57
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffffa7adcd64fc80] free_msi_irqs at ffffffffa645d0fc
 #10 [ffffa7adcd64fcb8] pci_disable_msix at ffffffffa645d896
 #11 [ffffa7adcd64fce0] igb_reset_interrupt_capability at ffffffffc024f335 [igb]
 #12 [ffffa7adcd64fd08] __igb_shutdown at ffffffffc0258ed7 [igb]
 #13 [ffffa7adcd64fd48] igb_shutdown at ffffffffc025908b [igb]
 #14 [ffffa7adcd64fd70] pci_device_shutdown at ffffffffa6441e3a
 #15 [ffffa7adcd64fd98] device_shutdown at ffffffffa6570260
 #16 [ffffa7adcd64fdc8] kernel_power_off at ffffffffa60c0725
 #17 [ffffa7adcd64fdd8] SYSC_reboot at ffffffffa60c08f1
 #18 [ffffa7adcd64ff18] sys_reboot at ffffffffa60c09ee
 #19 [ffffa7adcd64ff28] do_syscall_64 at ffffffffa6003ca9
 #20 [ffffa7adcd64ff50] entry_SYSCALL_64_after_hwframe at ffffffffa6a001b1

This happens because igb_shutdown has not yet freed up allocated irqs and
free_msi_irqs finds irq_has_action true for involved msi irqs here and this
condition triggers BUG_ON.

Freeing irqs before proceeding further in igb_clear_interrupt_scheme,
fixes this problem.

This issue does not happen in v5.17 or later kernel versions because
'commit 9fb9eb4 ("PCI/MSI: Let core code free MSI descriptors")',
explicitly frees up MSI based irqs and hence indirectly fixes this issue
as well. But this change is dependent on framework for runtime extension
of MSI-X irqs and including these changes will break the kABI because it
changes some core data structures like pci_device, device and others.

So in kernels prior to v5.17 we need to have this change to fix this issue,
without breaking the kABI.

Orabug: 36547250

Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Sherry Yang <sherry.yang@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue May 17, 2024
systems, using igb driver, crash while executing poweroff command
as per following call stack:

crash> bt -a
PID: 62583    TASK: ffff97ebbf28dc40  CPU: 0    COMMAND: "poweroff"
 #0 [ffffa7adcd64f8a0] machine_kexec at ffffffffa606c7c1
 #1 [ffffa7adcd64f900] __crash_kexec at ffffffffa613bb52
 #2 [ffffa7adcd64f9d0] panic at ffffffffa6099c45
 #3 [ffffa7adcd64fa50] oops_end at ffffffffa603359a
 #4 [ffffa7adcd64fa78] die at ffffffffa6033c32
 #5 [ffffa7adcd64faa8] do_trap at ffffffffa60309a0
 #6 [ffffa7adcd64faf8] do_error_trap at ffffffffa60311e7
 #7 [ffffa7adcd64fbc0] do_invalid_op at ffffffffa6031320
 #8 [ffffa7adcd64fbd0] invalid_op at ffffffffa6a01f2a
    [exception RIP: free_msi_irqs+408]
    RIP: ffffffffa645d248  RSP: ffffa7adcd64fc88  RFLAGS: 00010286
    RAX: ffff97eb1396fe00  RBX: 0000000000000000  RCX: ffff97eb1396fe00
    RDX: ffff97eb1396fe00  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffffa7adcd64fcb0   R8: 0000000000000002   R9: 000000000000fbff
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff98c047af4720
    R13: ffff97eb87cd32a0  R14: ffff97eb87cd3000  R15: ffffa7adcd64fd57
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffffa7adcd64fc80] free_msi_irqs at ffffffffa645d0fc
 #10 [ffffa7adcd64fcb8] pci_disable_msix at ffffffffa645d896
 #11 [ffffa7adcd64fce0] igb_reset_interrupt_capability at ffffffffc024f335 [igb]
 #12 [ffffa7adcd64fd08] __igb_shutdown at ffffffffc0258ed7 [igb]
 #13 [ffffa7adcd64fd48] igb_shutdown at ffffffffc025908b [igb]
 #14 [ffffa7adcd64fd70] pci_device_shutdown at ffffffffa6441e3a
 #15 [ffffa7adcd64fd98] device_shutdown at ffffffffa6570260
 #16 [ffffa7adcd64fdc8] kernel_power_off at ffffffffa60c0725
 #17 [ffffa7adcd64fdd8] SYSC_reboot at ffffffffa60c08f1
 #18 [ffffa7adcd64ff18] sys_reboot at ffffffffa60c09ee
 #19 [ffffa7adcd64ff28] do_syscall_64 at ffffffffa6003ca9
 #20 [ffffa7adcd64ff50] entry_SYSCALL_64_after_hwframe at ffffffffa6a001b1

This happens because igb_shutdown has not yet freed up allocated irqs and
free_msi_irqs finds irq_has_action true for involved msi irqs here and this
condition triggers BUG_ON.

Freeing irqs before proceeding further in igb_clear_interrupt_scheme,
fixes this problem.

This issue does not happen in v5.17 or later kernel versions because
'commit 9fb9eb4 ("PCI/MSI: Let core code free MSI descriptors")',
explicitly frees up MSI based irqs and hence indirectly fixes this issue
as well. But this change is dependent on framework for runtime extension
of MSI-X irqs and including these changes will break the kABI because it
changes some core data structures like pci_device, device and others.

So in kernels prior to v5.17 we need to have this change to fix this issue,
without breaking the kABI.

Orabug: 36547251

Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jun 12, 2024
The cited commit adds a compeletion to remove dependency on rtnl
lock. But it causes a deadlock for multiple encapsulations:

 crash> bt ffff8aece8a64000
 PID: 1514557  TASK: ffff8aece8a64000  CPU: 3    COMMAND: "tc"
  #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418
  #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898
  #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8
  #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb
  #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core]
  #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core]
  #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core]
  #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core]
  #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core]
 #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core]
 #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core]
 #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8
 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower]
 #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower]
 #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047
 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31
 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853
 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835
 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27
 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245
 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482
 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a
 #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2
 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2
 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f
 #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8
 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c
 crash> bt 0xffff8aeb07544000
 PID: 1110766  TASK: ffff8aeb07544000  CPU: 0    COMMAND: "kworker/u20:9"
  #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418
  #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88
  #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b
  #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core]
  #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core]
  #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core]
  #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c
  #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012
  #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d
 #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f

After the first encap is attached, flow will be added to encap
entry's flows list. If neigh update is running at this time, the
following encaps of the flow can't hold the encap_tbl_lock and
sleep. If neigh update thread is waiting for that flow's init_done,
deadlock happens.

Fix it by holding lock outside of the for loop. If neigh update is
running, prevent encap flows from offloading. Since the lock is held
outside of the for loop, concurrent creation of encap entries is not
allowed. So remove unnecessary wait_for_completion call for res_ready.

Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Orabug: 35383105

(cherry picked from commit 37c3b9f)
cherry-pick-repo=kernel/git/torvalds/linux.git
unmodified-from-upstream: 37c3b9f

Signed-off-by: Mikhael Goikhman <migo@nvidia.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jun 14, 2024
[ Upstream commit f8bbc07 ]

vhost_worker will call tun call backs to receive packets. If too many
illegal packets arrives, tun_do_read will keep dumping packet contents.
When console is enabled, it will costs much more cpu time to dump
packet and soft lockup will be detected.

net_ratelimit mechanism can be used to limit the dumping rate.

PID: 33036    TASK: ffff949da6f20000  CPU: 23   COMMAND: "vhost-32980"
 #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253
 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3
 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e
 #3 [fffffe00003fced0] do_nmi at ffffffff8922660d
 #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663
    [exception RIP: io_serial_in+20]
    RIP: ffffffff89792594  RSP: ffffa655314979e8  RFLAGS: 00000002
    RAX: ffffffff89792500  RBX: ffffffff8af428a0  RCX: 0000000000000000
    RDX: 00000000000003fd  RSI: 0000000000000005  RDI: ffffffff8af428a0
    RBP: 0000000000002710   R8: 0000000000000004   R9: 000000000000000f
    R10: 0000000000000000  R11: ffffffff8acbf64f  R12: 0000000000000020
    R13: ffffffff8acbf698  R14: 0000000000000058  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594
 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470
 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6
 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605
 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558
 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124
 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07
 #12 [ffffa65531497b68] printk at ffffffff89318306
 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765
 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun]
 #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun]
 #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net]
 #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost]
 #18 [ffffa65531497f10] kthread at ffffffff892d2e72
 #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f

Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors")
Signed-off-by: Lei Chen <lei.chen@smartx.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a50dbeca28acf7051dfa92786b85f704c75db6eb)
Signed-off-by: Vijayendra Suman <vijayendra.suman@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jun 14, 2024
[ Upstream commit f8bbc07 ]

vhost_worker will call tun call backs to receive packets. If too many
illegal packets arrives, tun_do_read will keep dumping packet contents.
When console is enabled, it will costs much more cpu time to dump
packet and soft lockup will be detected.

net_ratelimit mechanism can be used to limit the dumping rate.

PID: 33036    TASK: ffff949da6f20000  CPU: 23   COMMAND: "vhost-32980"
 #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253
 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3
 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e
 #3 [fffffe00003fced0] do_nmi at ffffffff8922660d
 #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663
    [exception RIP: io_serial_in+20]
    RIP: ffffffff89792594  RSP: ffffa655314979e8  RFLAGS: 00000002
    RAX: ffffffff89792500  RBX: ffffffff8af428a0  RCX: 0000000000000000
    RDX: 00000000000003fd  RSI: 0000000000000005  RDI: ffffffff8af428a0
    RBP: 0000000000002710   R8: 0000000000000004   R9: 000000000000000f
    R10: 0000000000000000  R11: ffffffff8acbf64f  R12: 0000000000000020
    R13: ffffffff8acbf698  R14: 0000000000000058  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594
 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470
 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6
 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605
 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558
 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124
 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07
 #12 [ffffa65531497b68] printk at ffffffff89318306
 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765
 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun]
 #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun]
 #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net]
 #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost]
 #18 [ffffa65531497f10] kthread at ffffffff892d2e72
 #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f

Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors")
Signed-off-by: Lei Chen <lei.chen@smartx.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 4b0dcae5c4797bf31c63011ed62917210d3fdac3)
Signed-off-by: Sherry Yang <sherry.yang@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jun 21, 2024
[ Upstream commit f8bbc07 ]

vhost_worker will call tun call backs to receive packets. If too many
illegal packets arrives, tun_do_read will keep dumping packet contents.
When console is enabled, it will costs much more cpu time to dump
packet and soft lockup will be detected.

net_ratelimit mechanism can be used to limit the dumping rate.

PID: 33036    TASK: ffff949da6f20000  CPU: 23   COMMAND: "vhost-32980"
 #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253
 #1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3
 #2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e
 #3 [fffffe00003fced0] do_nmi at ffffffff8922660d
 #4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663
    [exception RIP: io_serial_in+20]
    RIP: ffffffff89792594  RSP: ffffa655314979e8  RFLAGS: 00000002
    RAX: ffffffff89792500  RBX: ffffffff8af428a0  RCX: 0000000000000000
    RDX: 00000000000003fd  RSI: 0000000000000005  RDI: ffffffff8af428a0
    RBP: 0000000000002710   R8: 0000000000000004   R9: 000000000000000f
    R10: 0000000000000000  R11: ffffffff8acbf64f  R12: 0000000000000020
    R13: ffffffff8acbf698  R14: 0000000000000058  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffffa655314979e8] io_serial_in at ffffffff89792594
 #6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470
 #7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6
 #8 [ffffa65531497a20] uart_console_write at ffffffff8978b605
 #9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558
 #10 [ffffa65531497ac8] console_unlock at ffffffff89316124
 #11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07
 #12 [ffffa65531497b68] printk at ffffffff89318306
 #13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765
 #14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun]
 #15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun]
 #16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net]
 #17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost]
 #18 [ffffa65531497f10] kthread at ffffffff892d2e72
 #19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f

Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors")
Signed-off-by: Lei Chen <lei.chen@smartx.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 68459b8e3ee554ce71878af9eb69659b9462c588)
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
(cherry picked from commit eaa8c23a83b5a719ac9bc795481595bbfc02fc18)
Signed-off-by: Yifei Liu <yifei.l.liu@oracle.com>
mark-nicholson pushed a commit that referenced this issue Jul 2, 2024
The cited commit adds a compeletion to remove dependency on rtnl
lock. But it causes a deadlock for multiple encapsulations:

 crash> bt ffff8aece8a64000
 PID: 1514557  TASK: ffff8aece8a64000  CPU: 3    COMMAND: "tc"
  #0 [ffffa6d14183f368] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14183f3f8] schedule at ffffffffb8ba8418
  #2 [ffffa6d14183f418] schedule_preempt_disabled at ffffffffb8ba8898
  #3 [ffffa6d14183f428] __mutex_lock at ffffffffb8baa7f8
  #4 [ffffa6d14183f4d0] mutex_lock_nested at ffffffffb8baabeb
  #5 [ffffa6d14183f4e0] mlx5e_attach_encap at ffffffffc0f48c17 [mlx5_core]
  #6 [ffffa6d14183f628] mlx5e_tc_add_fdb_flow at ffffffffc0f39680 [mlx5_core]
  #7 [ffffa6d14183f688] __mlx5e_add_fdb_flow at ffffffffc0f3b636 [mlx5_core]
  #8 [ffffa6d14183f6f0] mlx5e_tc_add_flow at ffffffffc0f3bcdf [mlx5_core]
  #9 [ffffa6d14183f728] mlx5e_configure_flower at ffffffffc0f3c1d1 [mlx5_core]
 #10 [ffffa6d14183f790] mlx5e_rep_setup_tc_cls_flower at ffffffffc0f3d529 [mlx5_core]
 #11 [ffffa6d14183f7a0] mlx5e_rep_setup_tc_cb at ffffffffc0f3d714 [mlx5_core]
 #12 [ffffa6d14183f7b0] tc_setup_cb_add at ffffffffb8931bb8
 #13 [ffffa6d14183f810] fl_hw_replace_filter at ffffffffc0dae901 [cls_flower]
 #14 [ffffa6d14183f8d8] fl_change at ffffffffc0db5c57 [cls_flower]
 #15 [ffffa6d14183f970] tc_new_tfilter at ffffffffb8936047
 #16 [ffffa6d14183fac8] rtnetlink_rcv_msg at ffffffffb88c7c31
 #17 [ffffa6d14183fb50] netlink_rcv_skb at ffffffffb8942853
 #18 [ffffa6d14183fbc0] rtnetlink_rcv at ffffffffb88c1835
 #19 [ffffa6d14183fbd0] netlink_unicast at ffffffffb8941f27
 #20 [ffffa6d14183fc18] netlink_sendmsg at ffffffffb8942245
 #21 [ffffa6d14183fc98] sock_sendmsg at ffffffffb887d482
 #22 [ffffa6d14183fcb8] ____sys_sendmsg at ffffffffb887d81a
 #23 [ffffa6d14183fd38] ___sys_sendmsg at ffffffffb88806e2
 #24 [ffffa6d14183fe90] __sys_sendmsg at ffffffffb88807a2
 #25 [ffffa6d14183ff28] __x64_sys_sendmsg at ffffffffb888080f
 #26 [ffffa6d14183ff38] do_syscall_64 at ffffffffb8b9b6a8
 #27 [ffffa6d14183ff50] entry_SYSCALL_64_after_hwframe at ffffffffb8c0007c
 crash> bt 0xffff8aeb07544000
 PID: 1110766  TASK: ffff8aeb07544000  CPU: 0    COMMAND: "kworker/u20:9"
  #0 [ffffa6d14e6b7bd8] __schedule at ffffffffb8ba7f45
  #1 [ffffa6d14e6b7c68] schedule at ffffffffb8ba8418
  #2 [ffffa6d14e6b7c88] schedule_timeout at ffffffffb8baef88
  #3 [ffffa6d14e6b7d10] wait_for_completion at ffffffffb8ba968b
  #4 [ffffa6d14e6b7d60] mlx5e_take_all_encap_flows at ffffffffc0f47ec4 [mlx5_core]
  #5 [ffffa6d14e6b7da0] mlx5e_rep_update_flows at ffffffffc0f3e734 [mlx5_core]
  #6 [ffffa6d14e6b7df8] mlx5e_rep_neigh_update at ffffffffc0f400bb [mlx5_core]
  #7 [ffffa6d14e6b7e50] process_one_work at ffffffffb80acc9c
  #8 [ffffa6d14e6b7ed0] worker_thread at ffffffffb80ad012
  #9 [ffffa6d14e6b7f10] kthread at ffffffffb80b615d
 #10 [ffffa6d14e6b7f50] ret_from_fork at ffffffffb8001b2f

After the first encap is attached, flow will be added to encap
entry's flows list. If neigh update is running at this time, the
following encaps of the flow can't hold the encap_tbl_lock and
sleep. If neigh update thread is waiting for that flow's init_done,
deadlock happens.

Fix it by holding lock outside of the for loop. If neigh update is
running, prevent encap flows from offloading. Since the lock is held
outside of the for loop, concurrent creation of encap entries is not
allowed. So remove unnecessary wait_for_completion call for res_ready.

Fixes: 95435ad ("net/mlx5e: Only access fully initialized flows in neigh update")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>

Orabug: 35383105

(cherry picked from commit 37c3b9f)
cherry-pick-repo=kernel/git/torvalds/linux.git
unmodified-from-upstream: 37c3b9f

Signed-off-by: Mikhael Goikhman <migo@nvidia.com>
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jul 26, 2024
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed
packet, with shorter length can eventually cause system panic further
down in the code path. Avoid it by validating the length and dropping it
at the earliest.

Following is seen in our env with shorter skb->len

crash> bt
PID: 76981    TASK: ff19828cfe508000  CPU: 106  COMMAND: "vhost-76942"
 #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801
 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142
 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640
 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1
 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de
 #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96
 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e
 #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30
 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6
 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd
    [exception RIP: memcpy_erms+6]
    RIP: ffffffffae261ab6  RSP: ff2d20159b39f700  RFLAGS: 00010293
    RAX: ff198291741ecf2e  RBX: ff19828e70d6a100  RCX: fffffffffea1af2b
    RDX: fffffffffffffffd  RSI: ff19828eba6d7e5e  RDI: ff198291757d2000
    RBP: ff2d20159b39f760   R8: ff198291741ecf00   R9: 000000000000037c
    R10: 000000000000003c  R11: ff19828ffe953940  R12: ff198291741ecf20
    R13: ff198267dcb1b600  R14: ff19828eeebb09c0  R15: ff198291741ecf00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core]
 #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core]
 #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766
 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564
 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e
 #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee
 #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370
 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q]
 #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766
 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a
 #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e
 #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan]
 #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766
 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564
 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e
 #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81
 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370
 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap]
 #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net]
 #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net]
 #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net]
 #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net]
 #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost]
 #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5
 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364

This change was discussed with Nvidia and they are in agreement.

Orabug: 36879156
CVE: CVE-2024-41090
CVE: CVE-2024-41091

Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration")
Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Brian Maly <brian.maly@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jul 26, 2024
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed
packet, with shorter length can eventually cause system panic further
down in the code path. Avoid it by validating the length and dropping it
at the earliest.

Following is seen in our env with shorter skb->len

crash> bt
PID: 76981    TASK: ff19828cfe508000  CPU: 106  COMMAND: "vhost-76942"
 #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801
 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142
 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640
 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1
 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de
 #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96
 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e
 #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30
 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6
 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd
    [exception RIP: memcpy_erms+6]
    RIP: ffffffffae261ab6  RSP: ff2d20159b39f700  RFLAGS: 00010293
    RAX: ff198291741ecf2e  RBX: ff19828e70d6a100  RCX: fffffffffea1af2b
    RDX: fffffffffffffffd  RSI: ff19828eba6d7e5e  RDI: ff198291757d2000
    RBP: ff2d20159b39f760   R8: ff198291741ecf00   R9: 000000000000037c
    R10: 000000000000003c  R11: ff19828ffe953940  R12: ff198291741ecf20
    R13: ff198267dcb1b600  R14: ff19828eeebb09c0  R15: ff198291741ecf00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core]
 #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core]
 #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766
 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564
 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e
 #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee
 #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370
 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q]
 #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766
 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a
 #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e
 #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan]
 #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766
 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564
 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e
 #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81
 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370
 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap]
 #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net]
 #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net]
 #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net]
 #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net]
 #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost]
 #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5
 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364

This change was discussed with Nvidia and they are in agreement.

Orabug: 36879157
CVE: CVE-2024-41090
CVE: CVE-2024-41091

Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration")
Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>
(cherry picked from commit e7fd2c25dfed19d69e8158ff50d36f90400a7335)
Signed-off-by: Sherry Yang <sherry.yang@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jul 26, 2024
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed
packet, with shorter length can eventually cause system panic further
down in the code path. Avoid it by validating the length and dropping it
at the earliest.

Following is seen in our env with shorter skb->len

crash> bt
PID: 76981    TASK: ff19828cfe508000  CPU: 106  COMMAND: "vhost-76942"
 #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801
 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142
 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640
 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1
 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de
 #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96
 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e
 #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30
 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6
 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd
    [exception RIP: memcpy_erms+6]
    RIP: ffffffffae261ab6  RSP: ff2d20159b39f700  RFLAGS: 00010293
    RAX: ff198291741ecf2e  RBX: ff19828e70d6a100  RCX: fffffffffea1af2b
    RDX: fffffffffffffffd  RSI: ff19828eba6d7e5e  RDI: ff198291757d2000
    RBP: ff2d20159b39f760   R8: ff198291741ecf00   R9: 000000000000037c
    R10: 000000000000003c  R11: ff19828ffe953940  R12: ff198291741ecf20
    R13: ff198267dcb1b600  R14: ff19828eeebb09c0  R15: ff198291741ecf00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core]
 #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core]
 #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766
 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564
 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e
 #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee
 #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370
 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q]
 #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766
 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a
 #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e
 #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan]
 #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766
 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564
 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e
 #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81
 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370
 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap]
 #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net]
 #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net]
 #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net]
 #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net]
 #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost]
 #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5
 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364

This change was discussed with Nvidia and they are in agreement.

Orabug: 36879158
CVE: CVE-2024-41090
CVE: CVE-2024-41091

Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration")
Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Jack Vogel <jack.vogel@oracle.com>

Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com>
oraclelinuxkernel pushed a commit that referenced this issue Jul 26, 2024
Add a check to mlx5e_xmit() for shorter frames. A corrupted/malformed
packet, with shorter length can eventually cause system panic further
down in the code path. Avoid it by validating the length and dropping it
at the earliest.

Following is seen in our env with shorter skb->len

crash> bt
PID: 76981    TASK: ff19828cfe508000  CPU: 106  COMMAND: "vhost-76942"
 #0 [ff2d20159b39f2c8] machine_kexec at ffffffffad884801
 #1 [ff2d20159b39f328] __crash_kexec at ffffffffad976142
 #2 [ff2d20159b39f3f8] panic at ffffffffad8b3640
 #3 [ff2d20159b39f4a0] no_context at ffffffffad8954e1
 #4 [ff2d20159b39f518] __bad_area_nosemaphore at ffffffffad8958de
 #5 [ff2d20159b39f578] bad_area_nosemaphore at ffffffffad895a96
 #6 [ff2d20159b39f588] do_kern_addr_fault at ffffffffad89688e
 #7 [ff2d20159b39f5b0] __do_page_fault at ffffffffad896b30
 #8 [ff2d20159b39f618] do_page_fault at ffffffffad896db6
 #9 [ff2d20159b39f650] page_fault at ffffffffae402acd
    [exception RIP: memcpy_erms+6]
    RIP: ffffffffae261ab6  RSP: ff2d20159b39f700  RFLAGS: 00010293
    RAX: ff198291741ecf2e  RBX: ff19828e70d6a100  RCX: fffffffffea1af2b
    RDX: fffffffffffffffd  RSI: ff19828eba6d7e5e  RDI: ff198291757d2000
    RBP: ff2d20159b39f760   R8: ff198291741ecf00   R9: 000000000000037c
    R10: 000000000000003c  R11: ff19828ffe953940  R12: ff198291741ecf20
    R13: ff198267dcb1b600  R14: ff19828eeebb09c0  R15: ff198291741ecf00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #10 [ff2d20159b39f700] mlx5e_sq_xmit_wqe at ffffffffc05c162e [mlx5_core]
 #11 [ff2d20159b39f768] mlx5e_xmit at ffffffffc05c1ca3 [mlx5_core]
 #12 [ff2d20159b39f800] dev_hard_start_xmit at ffffffffae083766
 #13 [ff2d20159b39f860] sch_direct_xmit at ffffffffae0e2564
 #14 [ff2d20159b39f8b0] __qdisc_run at ffffffffae0e294e
 #15 [ff2d20159b39f928] __dev_queue_xmit at ffffffffae083eee
 #16 [ff2d20159b39f9a8] dev_queue_xmit at ffffffffae084370
 #17 [ff2d20159b39f9b8] vlan_dev_hard_start_xmit at ffffffffc2fb6fec [8021q]
 #18 [ff2d20159b39f9d8] dev_hard_start_xmit at ffffffffae083766
 #19 [ff2d20159b39fa38] __dev_queue_xmit at ffffffffae08416a
 #20 [ff2d20159b39fab8] dev_queue_xmit_accel at ffffffffae08438e
 #21 [ff2d20159b39fac8] macvlan_start_xmit at ffffffffc2fc18d9 [macvlan]
 #22 [ff2d20159b39faf0] dev_hard_start_xmit at ffffffffae083766
 #23 [ff2d20159b39fb50] sch_direct_xmit at ffffffffae0e2564
 #24 [ff2d20159b39fba0] __qdisc_run at ffffffffae0e294e
 #25 [ff2d20159b39fc18] __dev_queue_xmit at ffffffffae083c81
 #26 [ff2d20159b39fc90] dev_queue_xmit at ffffffffae084370
 #27 [ff2d20159b39fca0] tap_sendmsg at ffffffffc07206ed [tap]
 #28 [ff2d20159b39fd20] vhost_tx_batch at ffffffffc2fd6590 [vhost_net]
 #29 [ff2d20159b39fd68] handle_tx_copy at ffffffffc2fd70f3 [vhost_net]
 #30 [ff2d20159b39fe80] handle_tx at ffffffffc2fd7651 [vhost_net]
 #31 [ff2d20159b39feb0] handle_tx_kick at ffffffffc2fd76b5 [vhost_net]
 #32 [ff2d20159b39fec0] vhost_worker at ffffffffc12a5be8 [vhost]
 #33 [ff2d20159b39ff08] kthread at ffffffffad8dbfe5
 #34 [ff2d20159b39ff50] ret_from_fork at ffffffffae400364

This change was discussed with Nvidia and they are in agreement.

Orabug: 36879159
CVE: CVE-2024-41090
CVE: CVE-2024-41091

Fixes: e4cf27b ("net/mlx5e: Re-eanble client vlan TX acceleration")
Reported-and-tested-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com>
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>

In UEK4 stats is not a pointer, change the dropped code.

Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants