fix dpdk runtime bug based on spdk/dpdk #1

liu-chunmei · 2017-12-13T06:09:55Z

ceph async messenger has some run time error with this dpdk library, 1) need init mb->next= null when allocate a buffer other wise rte_mbuf_sanity_check will report error. 2) when check the size, can't calculate mbuf_data_room_size because async messenger dpdk will allocate this part later not at create mempool.

Signed-off-by: chunmei chunmei.liu@intel.com

tchaikov · 2017-12-13T06:35:02Z

@liu-chunmei i think the DPDK project does not accept PRs over github. you might want to contribute patches by sending them to the mailing list, see http://dpdk.org/dev .

liu-chunmei · 2017-12-13T06:39:59Z

thanks, will do it.

Below commit introduced pthread barrier for synchronization. But two IPC threads block on the barrier, and never wake up. (gdb) bt #0 futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4) at ../sysdeps/unix/sysv/linux/futex-internal.h:61 #1 futex_wait_simple (private=0, expected=0, futex_word=0x7fffffffcff4) at ../sysdeps/nptl/futex-internal.h:135 #2 __pthread_barrier_wait (barrier=0x7fffffffcff0) at pthread_barrier_wait.c:184 #3 rte_thread_init (arg=0x7fffffffcfe0) at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160 #4 start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333 #5 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Through analysis, we find the barrier defined on the stack could be the root cause. This patch will change to use heap memory as the barrier. Fixes: d651ee4 ("eal: set affinity for control threads") Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>

This patch reverts the testpmd CLI prompt routine modifications done in order to support softnic. The reason of doing so is due to testpmd abnormal exit observed on several setups caused by the softnic modifications to this routine, for example: When running testpmd with tap interface (testpmd -n 4 --vdev=net_tap0,iface=tap0,remote=eth1 -- --burst=64 --mbcache=512 -i --nb-cores=7 --rxq=2 --txq=2 --txd=512 --rxd=512 --port-topology=chained --forward-mode=rxonly) testpmd crashes seconds after presenting its prompt with the following error: testpmd> PANIC in prompt(): CLI poll error (-1) Thread 1 "testpmd" received signal SIGABRT, Aborted. 0x00007ffff668e0d0 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff668e0d0 in raise () from /lib64/libc.so.6 #1 0x00007ffff668f6b1 in abort () from /lib64/libc.so.6 #2 0x0000000000468027 in __rte_panic () #3 0x00000000004876ed in prompt () #4 0x000000000046dffc in main () When running testpmd with bare-metal device (testpmd -n 4 --socket-mem=1024,1024 -w 04:00.0 -- --burst=64 --mbcache=512 -i --nb-cores=7 --rxq=64 --txq=4 --txd=16 --rxd=16) and pressing CTRL+D right after testpmd prompt is presented then the program crashes while presenting the same messages as above. Needless to say that this behavior is not observed when using the previous CLI prompt routine. Fixes: 0ad778b ("app/testpmd: rework softnic forward mode") Signed-off-by: Moti Haimovsky <motih@mellanox.com> Acked-by: Cristian Dumitrescu <cristian.dumitrescu@intel.com> Acked-by: Aaron Conole <aconole@redhat.com>

The NIC's interrupt source has some active handler when the port removed. We should cancel the delay handler before removing dev to prevent executing the delay handler. Call Trace: #0 ixgbe_disable_intr (hw=0x0, hw=0x0) at /usr/src/debug/dpdk-18.11/drivers/net/ixgbe/ixgbe_ethdev.c:852 #1 ixgbe_dev_interrupt_delayed_handler (param=0xadb9c0 <rte_eth_devices@@DPDK_2.2+33024>) at /usr/src/debug/dpdk-18.11/drivers/net/ixgbe/ixgbe_ethdev.c:4386 #2 0x00007f05782147af in eal_alarm_callback (arg=<optimized out>) at /usr/src/debug/dpdk-18.11/lib/librte_eal/linuxapp/eal/ eal_alarm.c:90 #3 0x00007f057821320a in eal_intr_process_interrupts (nfds=1, events=0x7f056cbf3e88) at /usr/src/debug/dpdk-18.11/lib/ librte_eal/linuxapp/eal/eal_interrupts.c:838 #4 eal_intr_handle_interrupts (totalfds=<optimized out>, pfd=18) at /usr/src/debug/dpdk-18.11/lib/librte_eal/linuxapp/eal/ eal_interrupts.c:885 #5 eal_intr_thread_main (arg=<optimized out>) at /usr/src/debug/dpdk-18.11/lib/librte_eal/linuxapp/eal/ eal_interrupts.c:965 #6 0x00007f05708a0e45 in start_thread () from /usr/lib64/libpthread.so.0 #7 0x00007f056eb4ab5d in clone () from /usr/lib64/libc.so.6 Fixes: 2866c5f ("ixgbe: support port hotplug") Cc: stable@dpdk.org Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Acked-by: Qi Zhang <qi.z.zhang@intel.com>

The compat.h header file provided macros for two purposes: 1. it provided the macros for marking functions as rte_experimental 2. it provided the macros for doing function versioning Although these were in the same file, #1 is something that is for use by public header files, which #2 is for internal use only. Therefore, we can split these into two headers, keeping #1 in rte_compat.h and #2 in a new file rte_function_versioning.h. For "make" builds, since internal objects pick up the headers from the "include/" folder, we need to add the new header to the installation list, but for "meson" builds it does not need to be installed as it's not for public use. The rework also serves to allow the use of the function versioning macros to files that actually need them, so the use of experimental functions does not need including of the versioning code. Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Andrzej Ostruszka <amo@semihalf.com>

Previously rx/tx_queues were passed into eth_from_pcaps_common() as ptrs and were checked for being NULL. In commit da6ba28 ("net/pcap: use a struct to pass user options") that changed to pass in a ptr to a pmd_devargs_all which contains the rx/tx_queues. The parameter checking was not updated as part of that commit and coverity caught that there was still a check if rx/tx_queues were NULL, apparently after they had been dereferenced. In fact as they are a members of the devargs_all struct, they will not be NULL so remove those checks. 1231 struct pmd_devargs *rx_queues = &devargs_all->rx_queues; 1232 struct pmd_devargs *tx_queues = &devargs_all->tx_queues; 1233 const unsigned int nb_rx_queues = rx_queues->num_of_queue; deref_ptr: Directly dereferencing pointer tx_queues. 1234 const unsigned int nb_tx_queues = tx_queues->num_of_queue; 1235 unsigned int i; 1236 1237 /* do some parameter checking */ CID 345004: Dereference before null check (REVERSE_INULL) [select issue] 1238 if (rx_queues == NULL && nb_rx_queues > 0) 1239 return -1; CID 345029 (#1 of 1): Dereference before null check (REVERSE_INULL) check_after_deref: Null-checking tx_queues suggests that it may be null, but it has already been dereferenced on all paths leading to the check. 1240 if (tx_queues == NULL && nb_tx_queues > 0) 1241 return -1; Coverity issue: 345029 Coverity issue: 345044 Fixes: da6ba28 ("net/pcap: use a struct to pass user options") Cc: stable@dpdk.org Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Cian Ferriter <cian.ferriter@intel.com>

Coverity complains that ctrl_flags is set to NULL at the start of the function and it may not have been set before there is a jump to fc_success and it is dereferenced. Check for NULL before dereference. 312fc_success: CID 344983 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)7. var_deref_op: Dereferencing null pointer ctrl_flags. 313 *ctrl_flags = rte_cpu_to_be_64(*ctrl_flags); Coverity issue: 344983 Fixes: 6cc5409 ("crypto/octeontx: add supported sessions") Cc: stable@dpdk.org Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com>

Coverity is complaining about identical code regardless of which branch of the if else is taken. Functionally it means an error will always be returned if this if else is hit. Remove the else branch. CID 337928 (#1 of 1): Identical code for different branches (IDENTICAL_BRANCHES)identical_branches: The same code is executed regardless of whether n->level != IPN3KE_TM_NODE_LEVEL_COS || n->n_children != 0U is true, because the 'then' and 'else' branches are identical. Should one of the branches be modified, or the entire 'if' statement replaced? 1506 if (n->level != IPN3KE_TM_NODE_LEVEL_COS || 1507 n->n_children != 0) { 1508 return -rte_tm_error_set(error, 1509 EINVAL, 1510 RTE_TM_ERROR_TYPE_UNSPECIFIED, 1511 NULL, 1512 rte_strerror(EINVAL)); else_branch: The else branch, identical to the then branch. 1513 } else { 1514 return -rte_tm_error_set(error, 1515 EINVAL, 1516 RTE_TM_ERROR_TYPE_UNSPECIFIED, 1517 NULL, 1518 rte_strerror(EINVAL)); 1519 } Coverity issue: 337928 Fixes: c820468 ("net/ipn3ke: support TM") Cc: stable@dpdk.org Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Reviewed-by: Rosen Xu <rosen.xu@intel.com>

Coverity complains that this statement is not needed as the goto label is on the next line anyway. Remove the if statement. 653 ret = ipn3ke_cfg_parse_i40e_pf_ethdev(afu_name, pf_name); CID 337930 (#1 of 1): Identical code for different branches (IDENTICAL_BRANCHES)identical_branches: The same code is executed when the condition ret is true or false, because the code in the if-then branch and after the if statement is identical. Should the if statement be removed? 654 if (ret) 655 goto end; implicit_else: The code from the above if-then branch is identical to the code after the if statement. 656end: Coverity issue: 337930 Fixes: c01c748 ("net/ipn3ke: add new driver") Cc: stable@dpdk.org Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Reviewed-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Rosen Xu <rosen.xu@intel.com>

Currently, there are coverity defects warning as below: CID 349937 (#1 of 1): Unintended sign extension (SIGN_EXTENSION) sign_extension: Suspicious implicit sign extension: port_number with type uint16_t (16 bits, unsigned) is promoted in port_number << cur_pos to type int (32 bits, signed), then sign-extended to type unsigned long (64 bits, unsigned). If port_number << cur_pos is greater than 0x7FFFFFFF, the upper bits of the result will all be 1. CID 349893 (#1 of 1): Unintended sign extension (SIGN_EXTENSION) sign_extension: Suspicious implicit sign extension: vlan_tag with type uint8_t (8 bits, unsigned) is promoted in vlan_tag << cur_pos to type int (32 bits, signed), then sign-extended to type unsigned long (64 bits, unsigned). If vlan_tag << cur_pos is greater than 0x7FFFFFFF, the upper bits of the result will all be 1. This patch fixes them by replacing the data type of port_number and vlan_tag with uint32_t in the inner static function named hns3_fd_convert_meta_data of hns3 PMD driver. Coverity issue: 349937, 349893 Fixes: fcba820 ("net/hns3: support flow director") Cc: stable@dpdk.org Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>

Currently, there is a coverity defect warning about hns3 PMD driver, the detail information as blow: CID 289969 (#1 of 1): Unchecked return value (CHECKED_RETURN) 1. check_return: Calling rte_mp_action_register without checking return value (as is done elsewhere 11 out of 13 times). The problem is that missing checking the return value of calling the API rte_mp_action_register during initialization. If registering an action function for primary and secondary communication failed, the secondary process can't work properly. This patch fixes it by adding check return value of the API function named rte_mp_action_register in the '.dev_init' implementation function of hns3 PMD driver. Coverity issue: 289969 Fixes: 23d4b61 ("net/hns3: support multiple process") Cc: stable@dpdk.org Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>

[ upstream commit c064f69 ] Currently, we encounter segmentation fault when performing the following test case: 1. Run testpmd application, config the flow filter rules then flush them repeatedly. 2. Inject FLR concurrently every 5 second. The calltrace info: This GDB was configured as "aarch64-linux-gnu". Reading symbols from ./testpmd...(no debugging symbols found)...done. [New LWP 322] [New LWP 325] [New LWP 324] [New LWP 326] [New LWP 323] [New LWP 327] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/aarch64-linux-gnu/ libthread_db.so.1". Core was generated by `/home/root/app/testpmd -w 0000:00:01.0 -w 0000:00:02.0 -w 0000:00:03.0 -l 0-3 -'. Program terminated with signal SIGSEGV, Segmentation fault. libc.so.6 [Current thread is 1 (Thread 0xffff8bb35110 (LWP 322))] (gdb) bt #0 0x0000ffff8b936a90 in strlen () from /lib/aarch64-linux-gnu/ libc.so.6 #1 0x0000ffff8b905ccc in vfprintf () from /lib/aarch64-linux-gnu/ libc.so.6 #2 0x0000ffff8b993d04 in __printf_chk () from /lib/aarch64-linux-gnu/ libc.so.6 #3 0x0000000000754828 in port_flow_flush () #4 0x0000000000870f3c in cmdline_parse () The root cause as follows： In the '.flush' ops implementation function named hns3_flow_flush, By the way the '.flush' ops is defined in the struct rte_flow_ops, if failed to call hns3_clear_rss_filter, the out parameter error is not set, and then the member variable name message in the struct error is invalid(filled with 0x44444444 in port_flow_flush function of the testpmd application), it leads to segmentation fault when format the message. We fixes it by filling error parameter when failure in calling static function named hns3_clear_rss_filter in the the '.flush' ops implementation function named hns3_flow_flush. Fixes: c37ca66 ("net/hns3: support RSS") Signed-off-by: Chengwen Feng <fengchengwen@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>

[ upstream commit 5c471cb ] Currently, there are coverity defects warning as below: CID 349937 (#1 of 1): Unintended sign extension (SIGN_EXTENSION) sign_extension: Suspicious implicit sign extension: port_number with type uint16_t (16 bits, unsigned) is promoted in port_number << cur_pos to type int (32 bits, signed), then sign-extended to type unsigned long (64 bits, unsigned). If port_number << cur_pos is greater than 0x7FFFFFFF, the upper bits of the result will all be 1. CID 349893 (#1 of 1): Unintended sign extension (SIGN_EXTENSION) sign_extension: Suspicious implicit sign extension: vlan_tag with type uint8_t (8 bits, unsigned) is promoted in vlan_tag << cur_pos to type int (32 bits, signed), then sign-extended to type unsigned long (64 bits, unsigned). If vlan_tag << cur_pos is greater than 0x7FFFFFFF, the upper bits of the result will all be 1. This patch fixes them by replacing the data type of port_number and vlan_tag with uint32_t in the inner static function named hns3_fd_convert_meta_data of hns3 PMD driver. Coverity issue: 349937, 349893 Fixes: fcba820 ("net/hns3: support flow director") Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>

[ upstream commit 9570b1f ] Currently, there is a coverity defect warning about hns3 PMD driver, the detail information as blow: CID 289969 (#1 of 1): Unchecked return value (CHECKED_RETURN) 1. check_return: Calling rte_mp_action_register without checking return value (as is done elsewhere 11 out of 13 times). The problem is that missing checking the return value of calling the API rte_mp_action_register during initialization. If registering an action function for primary and secondary communication failed, the secondary process can't work properly. This patch fixes it by adding check return value of the API function named rte_mp_action_register in the '.dev_init' implementation function of hns3 PMD driver. Coverity issue: 289969 Fixes: 23d4b61 ("net/hns3: support multiple process") Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>

Up to this commit the regex application used one QP which was assigned a number of jobs, each with a different segment of a file to parse. This commit adds support for multiple QPs assignments. All QPs will be assigned the same number of jobs, with the same segments of file to parse. It will enable comparing functionality with different numbers of QPs. All queues are managed on one core with one thread. This commit focuses on changing routines API to support multi QPs, mainly, QP scalar variables are replaced by per-QP struct instance. The enqueue/dequeue operations are interleaved as follows: enqueue(QP #1) enqueue(QP #2) ... enqueue(QP #n) dequeue(QP #1) dequeue(QP #2) ... dequeue(QP #n) A new parameter 'nb_qps' was added to configure the number of QPs: --nb_qps <num of qps>. If not configured, nb_qps is set to 1 by default. Signed-off-by: Ophir Munk <ophirmu@nvidia.com> Acked-by: Ori Kam <orika@nvidia.com>

Since the patch '1848b117' has initialized the variable 'key' in 'struct rte_flow_action_rss' with 'NULL', the PMD cannot get the RSS key now. Details as bellow: testpmd> flow create 0 ingress pattern eth / ipv4 / end actions rss types ipv4-other end key 1234567890123456789012345678901234567890FFFFFFFFFFFF123 4567890123456789012345678901234567890FFFFFFFFFFFF queues end / end Flow rule #1 created testpmd> show port 0 rss-hash key RSS functions: all ipv4-other ip RSS key: 4439796BB54C5023B675EA5B124F9F30B8A2C03DDFDC4D02A08C9B3 34AF64A4C05C6FA343958D8557D99583AE138C92E81150366 This patch sets offset and size of the 'key' variable as the first parameter of the token 'key'. Later, the address of the RSS key will be copied to 'key' variable. Fixes: 1848b11 ("app/testpmd: fix RSS key for flow API RSS rule") Cc: stable@dpdk.org Signed-off-by: Alvin Zhang <alvinx.zhang@intel.com> Tested-by: Jun W Zhou <junx.w.zhou@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

CID 363716 (#1 of 1): Unchecked return value (CHECKED_RETURN) check_return: Calling rte_pci_write_config without checking return value (as is done elsewhere 46 out of 49 times). Coverity issue: 363716 Fixes: be14720 ("net/bnxt: support FW reset") Cc: stable@dpdk.org Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>

Current testpmd implementation supports VXLAN only for tunnel offload. Add GRE, NVGRE and GENEVE for tunnel offload flow matches. For example: testpmd> flow tunnel create 0 type vxlan port 0: flow tunnel #1 type vxlan testpmd> flow tunnel create 0 type nvgre port 0: flow tunnel #2 type nvgre testpmd> flow tunnel create 0 type gre port 0: flow tunnel #3 type gre testpmd> flow tunnel create 0 type geneve port 0: flow tunnel #4 type geneve Fixes: 1b9f274 ("app/testpmd: add commands for tunnel offload") Cc: stable@dpdk.org Signed-off-by: Eli Britstein <elibr@nvidia.com> Reviewed-by: Gregory Etelson <getelson@nvidia.com>

Caught with ASan: ==9727==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7f0daa2fc0d0 at pc 0x7f0daeefacb2 bp 0x7f0daa2fadd0 sp 0x7f0daa2fa578 READ of size 1 at 0x7f0daa2fc0d0 thread T1 #0 0x7f0daeefacb1 (/lib64/libasan.so.5+0xbacb1) #1 0x115eba1 in dev_uev_parse ../lib/eal/linux/eal_dev.c:167 #2 0x115f281 in dev_uev_handler ../lib/eal/linux/eal_dev.c:248 #3 0x1169b91 in eal_intr_process_interrupts ../lib/eal/linux/eal_interrupts.c:1026 #4 0x116a3a2 in eal_intr_handle_interrupts ../lib/eal/linux/eal_interrupts.c:1100 #5 0x116a7f0 in eal_intr_thread_main ../lib/eal/linux/eal_interrupts.c:1172 #6 0x112640a in ctrl_thread_init ../lib/eal/common/eal_common_thread.c:202 #7 0x7f0dade27159 in start_thread (/lib64/libpthread.so.0+0x8159) #8 0x7f0dadb58f72 in clone (/lib64/libc.so.6+0xfcf72) Address 0x7f0daa2fc0d0 is located in stack of thread T1 at offset 4192 in frame #0 0x115f0c9 in dev_uev_handler ../lib/eal/linux/eal_dev.c:226 This frame has 2 object(s): [32, 48) 'uevent' [96, 4192) 'buf' <== Memory access at offset 4192 overflows this variable HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext (longjmp and C++ exceptions *are* supported) Thread T1 created by T0 here: #0 0x7f0daee92ea3 in __interceptor_pthread_create (/lib64/libasan.so.5+0x52ea3) #1 0x1126542 in rte_ctrl_thread_create ../lib/eal/common/eal_common_thread.c:228 #2 0x116a8b5 in rte_eal_intr_init ../lib/eal/linux/eal_interrupts.c:1200 #3 0x1159dd1 in rte_eal_init ../lib/eal/linux/eal.c:1044 #4 0x7a22f8 in main ../app/test-pmd/testpmd.c:4105 #5 0x7f0dada7f802 in __libc_start_main (/lib64/libc.so.6+0x23802) Bugzilla ID: 792 Fixes: 0d0f478 ("eal/linux: add uevent parse and process") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Tested-by: Yan Xia <yanx.xia@intel.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

[ upstream commit df58076 ] Cast 1 to type uint64_t to avoid overflow. CID 375812 (#1 of 1): Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN) overflow_before_widen: Potentially overflowing expression 1 << 2 * i + 1 with type int (32 bits, signed) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type uint64_t (64 bits, unsigned). Coverity issue: 375812 Fixes: 5fec01c ("net/i40e: support Linux VF to configure IRQ link list") Signed-off-by: Steve Yang <stevex.yang@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

[ upstream commit d57f289 ] The "kni_dev" is the private data of the "net_device" in kni, and allocated with the "net_device" by calling "alloc_netdev()". The "net_device" is freed by calling "free_netdev()" when kni release. The freed memory includes the "kni_dev". So after "kni_dev" should not be accessed after "net_device" is released. Fixes: e77fec6 ("kni: fix possible mbuf leaks and speed up port release") KASAN trace: [ 85.263717] ========================================================== [ 85.264418] BUG: KASAN: use-after-free in kni_net_release_fifo_phy+ 0x30/0x84 [rte_kni] [ 85.265139] Read of size 8 at addr ffff000260668d60 by task kni/341 [ 85.265703] [ 85.265857] CPU: 0 PID: 341 Comm: kni Tainted: G U O 5.15.0-rc4+ #1 [ 85.266525] Hardware name: linux,dummy-virt (DT) [ 85.266968] Call trace: [ 85.267220] dump_backtrace+0x0/0x2d0 [ 85.267591] show_stack+0x24/0x30 [ 85.267924] dump_stack_lvl+0x8c/0xb8 [ 85.268294] print_address_description.constprop.0+0x74/0x2b8 [ 85.268855] kasan_report+0x1e4/0x200 [ 85.269224] __asan_load8+0x98/0xd4 [ 85.269577] kni_net_release_fifo_phy+0x30/0x84 [rte_kni] [ 85.270116] kni_dev_remove.isra.0+0x50/0x64 [rte_kni] [ 85.270630] kni_ioctl_release+0x254/0x320 [rte_kni] [ 85.271136] kni_ioctl+0x64/0xb0 [rte_kni] [ 85.271553] __arm64_sys_ioctl+0xdc/0x120 [ 85.271955] invoke_syscall+0x68/0x1a0 [ 85.272332] el0_svc_common.constprop.0+0x90/0x200 [ 85.272807] do_el0_svc+0x94/0xa4 [ 85.273144] el0_svc+0x78/0x240 [ 85.273463] el0t_64_sync_handler+0x1a8/0x1b0 [ 85.273895] el0t_64_sync+0x1a0/0x1a4 [ 85.274264] [ 85.274427] Allocated by task 341: [ 85.274767] kasan_save_stack+0x2c/0x60 [ 85.275157] __kasan_kmalloc+0x90/0xb4 [ 85.275533] __kmalloc_node+0x230/0x594 [ 85.275917] kvmalloc_node+0x8c/0x190 [ 85.276286] alloc_netdev_mqs+0x70/0x6b0 [ 85.276678] kni_ioctl_create+0x224/0xf40 [rte_kni] [ 85.277166] kni_ioctl+0x9c/0xb0 [rte_kni] [ 85.277581] __arm64_sys_ioctl+0xdc/0x120 [ 85.277980] invoke_syscall+0x68/0x1a0 [ 85.278357] el0_svc_common.constprop.0+0x90/0x200 [ 85.278830] do_el0_svc+0x94/0xa4 [ 85.279172] el0_svc+0x78/0x240 [ 85.279491] el0t_64_sync_handler+0x1a8/0x1b0 [ 85.279925] el0t_64_sync+0x1a0/0x1a4 [ 85.280292] [ 85.280454] Freed by task 341: [ 85.280763] kasan_save_stack+0x2c/0x60 [ 85.281147] kasan_set_track+0x2c/0x40 [ 85.281522] kasan_set_free_info+0x2c/0x50 [ 85.281930] __kasan_slab_free+0xdc/0x140 [ 85.282331] slab_free_freelist_hook+0x90/0x250 [ 85.282782] kfree+0x128/0x580 [ 85.283099] kvfree+0x48/0x60 [ 85.283402] netdev_freemem+0x34/0x44 [ 85.283770] netdev_release+0x50/0x64 [ 85.284138] device_release+0xa0/0x120 [ 85.284516] kobject_put+0xf8/0x160 [ 85.284867] put_device+0x20/0x30 [ 85.285204] free_netdev+0x22c/0x310 [ 85.285562] kni_dev_remove.isra.0+0x48/0x64 [rte_kni] [ 85.286076] kni_ioctl_release+0x254/0x320 [rte_kni] [ 85.286573] kni_ioctl+0x64/0xb0 [rte_kni] [ 85.286992] __arm64_sys_ioctl+0xdc/0x120 [ 85.287392] invoke_syscall+0x68/0x1a0 [ 85.287769] el0_svc_common.constprop.0+0x90/0x200 [ 85.288243] do_el0_svc+0x94/0xa4 [ 85.288579] el0_svc+0x78/0x240 [ 85.288899] el0t_64_sync_handler+0x1a8/0x1b0 [ 85.289332] el0t_64_sync+0x1a0/0x1a4 [ 85.289699] [ 85.289862] The buggy address belongs to the object at ffff000260668000 [ 85.289862] which belongs to the cache kmalloc-cg-8k of size 8192 [ 85.291079] The buggy address is located 3424 bytes inside of [ 85.291079] 8192-byte region [ffff000260668000, ffff00026066a000) [ 85.292213] The buggy address belongs to the page: [ 85.292684] page:(____ptrval____) refcount:1 mapcount:0 mapping: 0000000000000000 index:0x0 pfn:0x2a0668 [ 85.293585] head:(____ptrval____) order:3 compound_mapcount:0 compound_pincount:0 [ 85.294305] flags: 0xbfff80000010200(slab|head|node=0|zone=2| lastcpupid=0x7fff) [ 85.295020] raw: 0bfff80000010200 0000000000000000 dead000000000122 ffff0000c000d680 [ 85.295767] raw: 0000000000000000 0000000080020002 00000001ffffffff 0000000000000000 [ 85.296512] page dumped because: kasan: bad access detected [ 85.297054] [ 85.297217] Memory state around the buggy address: [ 85.297688] ffff000260668c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 85.298384] ffff000260668c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 85.299088] >ffff000260668d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 85.299781] ^ [ 85.300396] ffff000260668d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 85.301092] ffff000260668e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 85.301787] =========================================================== Signed-off-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Min Hu (Connor) <humin29@huawei.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

[ upstream commit 0490d69 ] This patch fixes the stack buffer overflow error reported from AddressSanitizer. Function send_packetsx4() tries to access out of bound data from rte_mbuf and fill it into TX buffer even in the case where no pending packets (len = 0). Performance impact:- No ASAN error report:- ==819==ERROR: AddressSanitizer: stack-buffer-overflow on address 0xffffe2c0dcf0 at pc 0x0000005e791c bp 0xffffe2c0d7e0 sp 0xffffe2c0d800 READ of size 8 at 0xffffe2c0dcf0 thread T0 #0 0x5e7918 in send_packetsx4 ../examples/l3fwd/l3fwd_common.h:251 #1 0x5e7918 in send_packets_multi ../examples/l3fwd/l3fwd_neon.h:226 Fixes: 96ff445 ("examples/l3fwd: reorganise and optimize LPM code path") Signed-off-by: Rahul Bhansali <rbhansali@marvell.com> Reviewed-by: Conor Walsh <conor.walsh@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

The Kni kthreads seem to be re-scheduled at a granularity of roughly 1 millisecond right now, which seems to be insufficient for performing tests involving a lot of control plane traffic. Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it seems that the existing code cannot reschedule at the desired granularily, due to precision constraints of schedule_timeout_interruptible(). In our use case, we leverage the Linux Kernel for control plane, and it is not uncommon to have 60K - 100K pps for some signaling protocols. Since we are not in atomic context, the usleep_range() function seems to be more appropriate for being able to introduce smaller controlled delays, in the range of 5-10 microseconds. Upon reading the existing code, it would seem that this was the original intent. Adding sub-millisecond delays, seems unfeasible with a call to schedule_timeout_interruptible(). KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ schedule_timeout_interruptible( usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); Below, we attempted a brief comparison between the existing implementation, which uses schedule_timeout_interruptible() and usleep_range(). We attempt to measure the CPU usage, and RTT between two Kni interfaces, which are created on top of vmxnet3 adapters, connected by a vSwitch. insmod rte_kni.ko kthread_mode=single carrier=on schedule_timeout_interruptible(usecs_to_jiffies(5)) kni_single CPU Usage: 2-4 % [root@localhost ~]# ping 1.1.1.2 -I eth1 PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data. 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms usleep_range(5, 10) kni_single CPU usage: 50% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms usleep_range(20, 50) kni_single CPU usage: 24% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms usleep_range(50, 100) kni_single CPU usage: 13% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms usleep_range(100, 200) kni_single CPU usage: 7% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms usleep_range(1000, 1100) kni_single CPU usage: 2% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms Upon testing, usleep_range(1000, 1100) seems roughly equivalent in latency and cpu usage to the variant with schedule_timeout_interruptible(), while usleep_range(100, 200) seems to give a decent tradeoff between latency and cpu usage, while allowing users to tweak the limits for improved precision if they have such use cases. Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a softlockup on my kernel. Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 1226 Comm: kni_single Tainted: G W O 3.10 #1 <IRQ> [<ffffffff814f84de>] dump_stack+0x19/0x1b [<ffffffff814f7891>] panic+0xcd/0x1e0 [<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160 [<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0 [<ffffffff81064b57>] hrtimer_interrupt+0xe7/0x1f0 [<ffffffff8102cd57>] smp_apic_timer_interrupt+0x67/0xa0 [<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80 This patch also attempts to remove this option. References: [1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com> Acked-by: Padraig Connolly <Padraig.J.Connolly@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

Cast 1 to type uint64_t to avoid overflow. CID 375812 (#1 of 1): Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN) overflow_before_widen: Potentially overflowing expression 1 << 2 * i + 1 with type int (32 bits, signed) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type uint64_t (64 bits, unsigned). Coverity issue: 375812 Fixes: 5fec01c ("net/i40e: support Linux VF to configure IRQ link list") Cc: stable@dpdk.org Signed-off-by: Steve Yang <stevex.yang@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

The "kni_dev" is the private data of the "net_device" in kni, and allocated with the "net_device" by calling "alloc_netdev()". The "net_device" is freed by calling "free_netdev()" when kni release. The freed memory includes the "kni_dev". So after "kni_dev" should not be accessed after "net_device" is released. Fixes: e77fec6 ("kni: fix possible mbuf leaks and speed up port release") Cc: stable@dpdk.org KASAN trace: [ 85.263717] ========================================================== [ 85.264418] BUG: KASAN: use-after-free in kni_net_release_fifo_phy+ 0x30/0x84 [rte_kni] [ 85.265139] Read of size 8 at addr ffff000260668d60 by task kni/341 [ 85.265703] [ 85.265857] CPU: 0 PID: 341 Comm: kni Tainted: G U O 5.15.0-rc4+ #1 [ 85.266525] Hardware name: linux,dummy-virt (DT) [ 85.266968] Call trace: [ 85.267220] dump_backtrace+0x0/0x2d0 [ 85.267591] show_stack+0x24/0x30 [ 85.267924] dump_stack_lvl+0x8c/0xb8 [ 85.268294] print_address_description.constprop.0+0x74/0x2b8 [ 85.268855] kasan_report+0x1e4/0x200 [ 85.269224] __asan_load8+0x98/0xd4 [ 85.269577] kni_net_release_fifo_phy+0x30/0x84 [rte_kni] [ 85.270116] kni_dev_remove.isra.0+0x50/0x64 [rte_kni] [ 85.270630] kni_ioctl_release+0x254/0x320 [rte_kni] [ 85.271136] kni_ioctl+0x64/0xb0 [rte_kni] [ 85.271553] __arm64_sys_ioctl+0xdc/0x120 [ 85.271955] invoke_syscall+0x68/0x1a0 [ 85.272332] el0_svc_common.constprop.0+0x90/0x200 [ 85.272807] do_el0_svc+0x94/0xa4 [ 85.273144] el0_svc+0x78/0x240 [ 85.273463] el0t_64_sync_handler+0x1a8/0x1b0 [ 85.273895] el0t_64_sync+0x1a0/0x1a4 [ 85.274264] [ 85.274427] Allocated by task 341: [ 85.274767] kasan_save_stack+0x2c/0x60 [ 85.275157] __kasan_kmalloc+0x90/0xb4 [ 85.275533] __kmalloc_node+0x230/0x594 [ 85.275917] kvmalloc_node+0x8c/0x190 [ 85.276286] alloc_netdev_mqs+0x70/0x6b0 [ 85.276678] kni_ioctl_create+0x224/0xf40 [rte_kni] [ 85.277166] kni_ioctl+0x9c/0xb0 [rte_kni] [ 85.277581] __arm64_sys_ioctl+0xdc/0x120 [ 85.277980] invoke_syscall+0x68/0x1a0 [ 85.278357] el0_svc_common.constprop.0+0x90/0x200 [ 85.278830] do_el0_svc+0x94/0xa4 [ 85.279172] el0_svc+0x78/0x240 [ 85.279491] el0t_64_sync_handler+0x1a8/0x1b0 [ 85.279925] el0t_64_sync+0x1a0/0x1a4 [ 85.280292] [ 85.280454] Freed by task 341: [ 85.280763] kasan_save_stack+0x2c/0x60 [ 85.281147] kasan_set_track+0x2c/0x40 [ 85.281522] kasan_set_free_info+0x2c/0x50 [ 85.281930] __kasan_slab_free+0xdc/0x140 [ 85.282331] slab_free_freelist_hook+0x90/0x250 [ 85.282782] kfree+0x128/0x580 [ 85.283099] kvfree+0x48/0x60 [ 85.283402] netdev_freemem+0x34/0x44 [ 85.283770] netdev_release+0x50/0x64 [ 85.284138] device_release+0xa0/0x120 [ 85.284516] kobject_put+0xf8/0x160 [ 85.284867] put_device+0x20/0x30 [ 85.285204] free_netdev+0x22c/0x310 [ 85.285562] kni_dev_remove.isra.0+0x48/0x64 [rte_kni] [ 85.286076] kni_ioctl_release+0x254/0x320 [rte_kni] [ 85.286573] kni_ioctl+0x64/0xb0 [rte_kni] [ 85.286992] __arm64_sys_ioctl+0xdc/0x120 [ 85.287392] invoke_syscall+0x68/0x1a0 [ 85.287769] el0_svc_common.constprop.0+0x90/0x200 [ 85.288243] do_el0_svc+0x94/0xa4 [ 85.288579] el0_svc+0x78/0x240 [ 85.288899] el0t_64_sync_handler+0x1a8/0x1b0 [ 85.289332] el0t_64_sync+0x1a0/0x1a4 [ 85.289699] [ 85.289862] The buggy address belongs to the object at ffff000260668000 [ 85.289862] which belongs to the cache kmalloc-cg-8k of size 8192 [ 85.291079] The buggy address is located 3424 bytes inside of [ 85.291079] 8192-byte region [ffff000260668000, ffff00026066a000) [ 85.292213] The buggy address belongs to the page: [ 85.292684] page:(____ptrval____) refcount:1 mapcount:0 mapping: 0000000000000000 index:0x0 pfn:0x2a0668 [ 85.293585] head:(____ptrval____) order:3 compound_mapcount:0 compound_pincount:0 [ 85.294305] flags: 0xbfff80000010200(slab|head|node=0|zone=2| lastcpupid=0x7fff) [ 85.295020] raw: 0bfff80000010200 0000000000000000 dead000000000122 ffff0000c000d680 [ 85.295767] raw: 0000000000000000 0000000080020002 00000001ffffffff 0000000000000000 [ 85.296512] page dumped because: kasan: bad access detected [ 85.297054] [ 85.297217] Memory state around the buggy address: [ 85.297688] ffff000260668c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 85.298384] ffff000260668c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 85.299088] >ffff000260668d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 85.299781] ^ [ 85.300396] ffff000260668d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 85.301092] ffff000260668e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 85.301787] =========================================================== Signed-off-by: Huisong Li <lihuisong@huawei.com> Signed-off-by: Min Hu (Connor) <humin29@huawei.com> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

This patch fixes the stack buffer overflow error reported from AddressSanitizer. Function send_packetsx4() tries to access out of bound data from rte_mbuf and fill it into TX buffer even in the case where no pending packets (len = 0). Performance impact:- No ASAN error report:- ==819==ERROR: AddressSanitizer: stack-buffer-overflow on address 0xffffe2c0dcf0 at pc 0x0000005e791c bp 0xffffe2c0d7e0 sp 0xffffe2c0d800 READ of size 8 at 0xffffe2c0dcf0 thread T0 #0 0x5e7918 in send_packetsx4 ../examples/l3fwd/l3fwd_common.h:251 #1 0x5e7918 in send_packets_multi ../examples/l3fwd/l3fwd_neon.h:226 Fixes: 96ff445 ("examples/l3fwd: reorganise and optimize LPM code path") Cc: stable@dpdk.org Signed-off-by: Rahul Bhansali <rbhansali@marvell.com> Reviewed-by: Conor Walsh <conor.walsh@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

If DPDK is built with thread sanitizer it reports a race in setting of multiprocess file descriptor. The fix is to use atomic operations when updating mp_fd. Build: $ meson -Db_sanitize=address build $ ninja -C build Simple example: $ .build/app/dpdk-testpmd -l 1-3 --no-huge EAL: Detected CPU lcores: 16 EAL: Detected NUMA nodes: 1 EAL: Static memory layout is selected, amount of reserved memory can be adjusted with -m or --socket-mem EAL: Detected static linkage of DPDK EAL: Multi-process socket /run/user/1000/dpdk/rte/mp_socket EAL: Selected IOVA mode 'VA' testpmd: No probed ethernet devices testpmd: create a new mbuf pool <mb_pool_0>: n=163456, size=2176, socket=0 testpmd: preferred mempool ops selected: ring_mp_mc EAL: Error - exiting with code: 1 Cause: Creation of mbuf pool for socket 0 failed: Cannot allocate memory ================== WARNING: ThreadSanitizer: data race (pid=87245) Write of size 4 at 0x558e04d8ff70 by main thread: #0 rte_mp_channel_cleanup <null> (dpdk-testpmd+0x1e7d30c) #1 rte_eal_cleanup <null> (dpdk-testpmd+0x1e85929) #2 rte_exit <null> (dpdk-testpmd+0x1e5bc0a) #3 mbuf_pool_create.cold <null> (dpdk-testpmd+0x274011) #4 main <null> (dpdk-testpmd+0x5cc15d) Previous read of size 4 at 0x558e04d8ff70 by thread T2: #0 mp_handle <null> (dpdk-testpmd+0x1e7c439) #1 ctrl_thread_init <null> (dpdk-testpmd+0x1e6ee1e) As if synchronized via sleep: #0 nanosleep libsanitizer/tsan/tsan_interceptors_posix.cpp:366 #1 get_tsc_freq <null> (dpdk-testpmd+0x1e92ff9) #2 set_tsc_freq <null> (dpdk-testpmd+0x1e6f2fc) #3 rte_eal_timer_init <null> (dpdk-testpmd+0x1e931a4) #4 rte_eal_init.cold <null> (dpdk-testpmd+0x29e578) #5 main <null> (dpdk-testpmd+0x5cbc45) Location is global 'mp_fd' of size 4 at 0x558e04d8ff70 (dpdk-testpmd+0x000003122f70) Thread T2 'rte_mp_handle' (tid=87248, running) created by main thread at: #0 pthread_create libsanitizer/tsan/tsan_interceptors_posix.cpp:969 #1 rte_ctrl_thread_create <null> (dpdk-testpmd+0x1e6efd0) #2 rte_mp_channel_init.cold <null> (dpdk-testpmd+0x29cb7c) #3 rte_eal_init <null> (dpdk-testpmd+0x1e8662e) #4 main <null> (dpdk-testpmd+0x5cbc45) SUMMARY: ThreadSanitizer: data race (app/dpdk-testpmd+0x1e7d30c) in rte_mp_channel_cleanup ================== ThreadSanitizer: reported 1 warnings Fixes: bacaa27 ("eal: add channel for multi-process communication") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Reviewed-by: Chengwen Feng <fengchengwen@huawei.com>

Devices can end up without driver assigned after probing. The cleanup function from patch below does not free devices that do not have it assigned. The devices reported to leak are not the ones that are the object of the hotplug test. This issue appears only when using shared objects build, and on a specific set of CI machines. More debugging needs to happen to fully understand the issue here. Fixes patch below: (1cab1a4)bus: cleanup devices on shutdown Errors from ASAN: 00:16:25.099 ==48971==ERROR: LeakSanitizer: detected memory leaks 00:16:25.099 00:16:25.099 Indirect leak of 11544 byte(s) in 37 object(s) allocated from: 00:16:25.099 #0 0x7f4d00f1a6af in __interceptor_malloc (/usr/lib64/libasan.so.8+0xba6af) 00:16:25.099 #1 0x7f4d0017c4a7 in pci_scan_one ../drivers/bus/pci/linux/pci.c:218 00:16:25.099 #2 0x7f4d0017ceb5 in rte_pci_scan ../drivers/bus/pci/linux/pci.c:471 00:16:25.099 #3 0x7f4d0002394c in rte_bus_scan ../lib/eal/common/eal_common_bus.c:56 00:16:25.099 #4 0x7f4d00053847 in rte_eal_init ../lib/eal/linux/eal.c:1065 00:16:25.099 #5 0x7f4d00256c01 in spdk_env_init /var/jenkins/workspace/hw-nvme-hotplug/spdk/lib/env_dpdk/init.c:585 00:16:25.099 #6 0x40709c in main /var/jenkins/workspace/hw-nvme-hotplug/spdk/examples/nvme/hotplug/hotplug.c:571 00:16:25.099 #7 0x7f4cff50150f in __libc_start_call_main (/usr/lib64/libc.so.6+0x2750f) Change-Id: I78252587a0930a15097ce16227a4935d34871b75 Signed-off-by: Krzysztof Karas <krzysztof.karas@intel.com> Reviewed-on: https://review.spdk.io/gerrit/c/spdk/dpdk/+/16443 Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com> Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com> Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>

The net/vhost pmd currently provides a -1 vid when disabling interrupt after a virtio port got disconnected. This can be caught when running with ASan. First, start dpdk-l3fwd-power in interrupt mode with a net/vhost port. $ ./build-clang/examples/dpdk-l3fwd-power -l0,1 --in-memory \ -a 0000:00:00.0 \ --vdev net_vhost0,iface=plop.sock,client=1\ -- \ -p 0x1 \ --interrupt-only \ --config '(0,0,1)' \ --parse-ptype 0 Then start testpmd with virtio-user. $ ./build-clang/app/dpdk-testpmd -l0,2 --single-file-segment --in-memory \ -a 0000:00:00.0 \ --vdev net_virtio_user0,path=plop.sock,server=1 \ -- \ -i Finally stop testpmd. ASan then splats in dpdk-l3fwd-power: ================================================================= ==3641005==ERROR: AddressSanitizer: global-buffer-overflow on address 0x000005ed0778 at pc 0x000001270f81 bp 0x7fddbd2eee20 sp 0x7fddbd2eee18 READ of size 8 at 0x000005ed0778 thread T2 #0 0x1270f80 in get_device .../lib/vhost/vhost.h:801:27 #1 0x1270f80 in rte_vhost_get_vhost_vring .../lib/vhost/vhost.c:951:8 #2 0x3ac95cb in eth_rxq_intr_disable .../drivers/net/vhost/rte_eth_vhost.c:647:8 #3 0x170e0bf in rte_eth_dev_rx_intr_disable .../lib/ethdev/rte_ethdev.c:5443:25 #4 0xf72ba7 in turn_on_off_intr .../examples/l3fwd-power/main.c:881:4 #5 0xf71045 in main_intr_loop .../examples/l3fwd-power/main.c:1061:6 #6 0x17f9292 in eal_thread_loop .../lib/eal/common/eal_common_thread.c:210:9 #7 0x18373f5 in eal_worker_thread_loop .../lib/eal/linux/eal.c:915:2 #8 0x7fddc16ae12c in start_thread (/lib64/libc.so.6+0x8b12c) (BuildId: 81daba31ee66dbd63efdc4252a872949d874d136) #9 0x7fddc172fbbf in __GI___clone3 (/lib64/libc.so.6+0x10cbbf) (BuildId: 81daba31ee66dbd63efdc4252a872949d874d136) 0x000005ed0778 is located 8 bytes to the left of global variable 'vhost_devices' defined in '.../lib/vhost/vhost.c:24' (0x5ed0780) of size 8192 0x000005ed0778 is located 20 bytes to the right of global variable 'vhost_config_log_level' defined in '.../lib/vhost/vhost.c:2174' (0x5ed0760) of size 4 SUMMARY: AddressSanitizer: global-buffer-overflow .../lib/vhost/vhost.h:801:27 in get_device Shadow bytes around the buggy address: 0x000080bd2090: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x000080bd20a0: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 0x000080bd20b0: f9 f9 f9 f9 00 f9 f9 f9 00 f9 f9 f9 00 f9 f9 f9 0x000080bd20c0: 00 00 00 00 00 00 00 f9 f9 f9 f9 f9 04 f9 f9 f9 0x000080bd20d0: 00 00 00 00 00 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 =>0x000080bd20e0: 00 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 04 f9 f9[f9] 0x000080bd20f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x000080bd2100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x000080bd2110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x000080bd2120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x000080bd2130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Thread T2 created by T0 here: #0 0xe98996 in __interceptor_pthread_create (.examples/dpdk-l3fwd-power+0xe98996) (BuildId: d0b984a3b0287b9e0f301b73426fa921aeecca3a) #1 0x1836767 in eal_worker_thread_create .../lib/eal/linux/eal.c:952:6 #2 0x1834b83 in rte_eal_init .../lib/eal/linux/eal.c:1257:9 #3 0xf68902 in main .../examples/l3fwd-power/main.c:2496:8 #4 0x7fddc164a50f in __libc_start_call_main (/lib64/libc.so.6+0x2750f) (BuildId: 81daba31ee66dbd63efdc4252a872949d874d136) ==3641005==ABORTING More generally, any application passing an incorrect vid would trigger such an OOB access. Fixes: 4796ad6 ("examples/vhost: import userspace vhost application") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Segmentation fault has been observed while running the ixgbe_recv_pkts_lro() function to receive packets on the Loongson 3C5000 processor which has 64 cores and 4 NUMA nodes. From the ixgbe_recv_pkts_lro() function, we found that as long as the first packet has the EOP bit set, and the length of this packet is less than or equal to rxq->crc_len, the segmentation fault will definitely happen even though on the other platforms. For example, if we made the first packet which had the EOP bit set had a zero length by force, the segmentation fault would happen on X86. Because when processd the first packet the first_seg->next will be NULL, if at the same time this packet has the EOP bit set and its length is less than or equal to rxq->crc_len, the following loop will be executed: for (lp = first_seg; lp->next != rxm; lp = lp->next) ; We know that the first_seg->next will be NULL under this condition. So the expression of lp->next->next will cause the segmentation fault. Normally, the length of the first packet with EOP bit set will be greater than rxq->crc_len. However, the out-of-order execution of CPU may make the read ordering of the status and the rest of the descriptor fields in this function not be correct. The related codes are as following: rxdp = &rx_ring[rx_id]; #1 staterr = rte_le_to_cpu_32(rxdp->wb.upper.status_error); if (!(staterr & IXGBE_RXDADV_STAT_DD)) break; #2 rxd = *rxdp; The sentence #2 may be executed before sentence #1. This action is likely to make the ready packet zero length. If the packet is the first packet and has the EOP bit set, the above segmentation fault will happen. So, we should add a proper memory barrier to ensure the read ordering be correct. We also did the same thing in the ixgbe_recv_pkts() function to make the rxd data be valid even though we did not find segmentation fault in this function. Fixes: 8eecb32 ("ixgbe: add LRO support") Cc: stable@dpdk.org Signed-off-by: Min Zhou <zhoumin@loongson.cn> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

getline() may allocate a buffer even though it returns -1: """ If *lineptr is set to NULL before the call, then getline() will allocate a buffer for storing the line. This buffer should be freed by the user program even if getline() failed. """ This leak has been observed on a RHEL8 system with two CX5 PF devices (no VFs). ASan reports: ==8899==ERROR: LeakSanitizer: detected memory leaks Direct leak of 120 byte(s) in 1 object(s) allocated from: #0 0x7fe58576aba8 in __interceptor_malloc (/lib64/libasan.so.5+0xefba8) #1 0x7fe583e866b2 in __getdelim (/lib64/libc.so.6+0x886b2) spdk#2 0x327bd23 in mlx5_sysfs_switch_info ../drivers/net/mlx5/linux/mlx5_ethdev_os.c:1084 spdk#3 0x3271f86 in mlx5_os_pci_probe_pf ../drivers/net/mlx5/linux/mlx5_os.c:2282 spdk#4 0x3273c83 in mlx5_os_pci_probe ../drivers/net/mlx5/linux/mlx5_os.c:2497 spdk#5 0x327475f in mlx5_os_net_probe ../drivers/net/mlx5/linux/mlx5_os.c:2578 #6 0xc6eac7 in drivers_probe ../drivers/common/mlx5/mlx5_common.c:937 #7 0xc6f150 in mlx5_common_dev_probe ../drivers/common/mlx5/mlx5_common.c:1027 #8 0xc8ef80 in mlx5_common_pci_probe ../drivers/common/mlx5/mlx5_common_pci.c:168 #9 0xc21b67 in rte_pci_probe_one_driver ../drivers/bus/pci/pci_common.c:312 #10 0xc2224c in pci_probe_all_drivers ../drivers/bus/pci/pci_common.c:396 #11 0xc222f4 in pci_probe ../drivers/bus/pci/pci_common.c:423 #12 0xb71fff in rte_bus_probe ../lib/eal/common/eal_common_bus.c:78 #13 0xbe6888 in rte_eal_init ../lib/eal/linux/eal.c:1300 #14 0x5ec717 in main ../app/test-pmd/testpmd.c:4515 #15 0x7fe583e38d84 in __libc_start_main (/lib64/libc.so.6+0x3ad84) As far as why getline() errors, strace gives a hint: 8516 openat(AT_FDCWD, "/sys/class/net/enp130s0f0/phys_port_name", O_RDONLY) = 34 8516 fstat(34, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0 8516 read(34, 0x621000098900, 4096) = -1 EOPNOTSUPP (Operation not supported) Fixes: f8a226e ("net/mlx5: fix sysfs port name translation") Cc: stable@dpdk.org Signed-off-by: David Marchand <david.marchand@redhat.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

In function mlx5_dev_configure, dev->data->tx_queues is assigned to priv->txqs. When a member is removed from a bond, the function eth_dev_tx_queue_config is called to release dev->data->tx_queues. However, function mlx5_dev_close will access priv->txqs again and cause the use after free problem. In function mlx5_dev_close, before free priv->txqs, we add a check that dev->data->tx_queues is not NULL. build/app/dpdk-testpmd -c7 -a 0000:08:00.2 -- -i --nb-cores=2 --total-num-mbufs=2048 testpmd> port stop 0 testpmd> create bonding device 4 0 testpmd> add bonding member 0 1 testpmd> remove bonding member 0 1 testpmd> quit ASan reports: ==2571911==ERROR: AddressSanitizer: heap-use-after-free on address 0x000174529880 at pc 0x0000113c8440 bp 0xffffefae0ea0 sp 0xffffefae0eb0 READ of size 8 at 0x000174529880 thread T0 #0 0x113c843c in mlx5_txq_release ../drivers/net/mlx5/mlx5_txq.c: 1203 #1 0xffdb53c in mlx5_dev_close ../drivers/net/mlx5/mlx5.c:2286 #2 0xe12dc0 in rte_eth_dev_close ../lib/ethdev/rte_ethdev.c:1877 #3 0x6bac1c in close_port ../app/test-pmd/testpmd.c:3540 #4 0x6bc320 in pmd_test_exit ../app/test-pmd/testpmd.c:3808 #5 0x6c1a94 in main ../app/test-pmd/testpmd.c:4759 #6 0xffff9328f038 (/usr/lib64/libc.so.6+0x2b038) #7 0xffff9328f110 in __libc_start_main (/usr/lib64/libc.so.6+ 0x2b110) Fixes: 6e78005 ("net/mlx5: add reference counter on DPDK Tx queues") Cc: stable@dpdk.org Reported-by: Yunjian Wang <wangyunjian@huawei.com> Signed-off-by: Pengfei Sun <sunpengfei16@huawei.com> Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>

fix dpdk runtime bug based on spdk/dpdk

7541440

liu-chunmei mentioned this pull request Dec 13, 2017

fix dpdk runtime issue based on spdk/dpdk libarary ceph/ceph#19467

Closed

liu-chunmei closed this Jan 2, 2018

jesonliang mentioned this pull request Jul 8, 2021

Dpdk often stuck in write log to /var/log/message when start VM? #4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix dpdk runtime bug based on spdk/dpdk #1

fix dpdk runtime bug based on spdk/dpdk #1

liu-chunmei commented Dec 13, 2017 •

edited

Loading

tchaikov commented Dec 13, 2017 •

edited

Loading

liu-chunmei commented Dec 13, 2017

fix dpdk runtime bug based on spdk/dpdk #1

fix dpdk runtime bug based on spdk/dpdk #1

Conversation

liu-chunmei commented Dec 13, 2017 • edited Loading

tchaikov commented Dec 13, 2017 • edited Loading

liu-chunmei commented Dec 13, 2017

liu-chunmei commented Dec 13, 2017 •

edited

Loading

tchaikov commented Dec 13, 2017 •

edited

Loading