-
Notifications
You must be signed in to change notification settings - Fork 4
[PWCI] "net/mlx5: fix IP tunnel detection for IPIP and IPv6 ENCAP" #462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The capabilities flag for the vector offload path include the QinQ offload capability, but in fact the offload path lacks any ability to create context descriptors. This means that it cannot insert multiple vlan tags for QinQ support, so move the offload from the VECTOR_OFFLOAD list to the NO_VECTOR list. Similarly, remove any check for the QinQ mbuf flag in any packets being transmitted, since that offload is invalid to request if the feature is not enabled. Fixes: 808a17b ("net/ice: add Rx AVX512 offload path") Cc: stable@dpdk.org Signed-off-by: Bruce Richardson <bruce.richardson@intel.com> Acked-by: Ciara Loftus <ciara.loftus@intel.com>
The common Rx path selection logic iterates through an array of candidate paths and selects the best fit for the requested features. Currently, in the event that two potential candidates are identified, the one with the fewer offloads (and thus less complex path) is selected. However this is not correct, because if the path with more offloads has a greater SIMD width, that should be chosen. This commit reworks the logic so that the number of offloads is only taken into consideration when choosing between two paths with the same SIMD width. Since the paths arrays are ordered from lowest SIMD width to highest, and vector paths tend to have fewer offloads enabled than scalar paths, "new" candidate paths with greater SIMDs widths tended to have fewer or equal offloads than the "current" candidate paths and thus were correctly accepted as the best candidate. For this reason the incorrect logic did not cause any incorrect path selections in practise. Fixes: 9d99641 ("net/intel: introduce infrastructure for Rx path selection") Signed-off-by: Ciara Loftus <ciara.loftus@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Add methods to Linux Session class for setting the link of an interface up and deleting an interface. Signed-off-by: Dean Marx <dmarx@iol.unh.edu> Reviewed-by: Patrick Robb <probb@iol.unh.edu>
Add a command to the testpmd shell for setting the portlist (list of forwarding ports) within a testpmd session. This allows for changing the forwarding order between ports. Signed-off-by: Dean Marx <dmarx@iol.unh.edu> Reviewed-by: Patrick Robb <probb@iol.unh.edu>
Add test suite covering virtio-user/vhost-user server/client forwarding scenarios with testpmd packet validation. Signed-off-by: Dean Marx <dmarx@iol.unh.edu> Reviewed-by: Patrick Robb <probb@iol.unh.edu>
Add a section to the dts rst under How to Write a Test Suite which provides an example for how to write a test case docstring, including a steps and verify section. Signed-off-by: Dean Marx <dmarx@iol.unh.edu> Reviewed-by: Luca Vizzarro <luca.vizzarro@arm.com>
Current documentation for protocol agnostic filtering for ICE driver is a bit terse and relies on a lot of assumed knowledge. Document the feature better and make all of the assumptions explicit. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
The statistics contain 40 bits. The lower 32 bits are read first, followed by the upper 8 bits. In some cases, after reading the lower 32 bits, a carry occurs from the lower bits, which causes the final statistics to be incorrect. This commit fixes this issue. Fixes: a37bde5 ("net/ice: support statistics") Cc: stable@dpdk.org Signed-off-by: Zhichao Zeng <zhichaox.zeng@intel.com> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
As per recent change by the following commit: commit 066f3d9 ("ethdev: remove callback checks from fast path") framework unconditionally invokes dev->tx_pkt_prepare. Due to this, ICE driver gets crashed as tx_pkt_prepare was set to NULL during initialization. Ensure dev->tx_pkt_prepare is not NULL when vector or simple Tx paths are selected, by assigning rte_eth_tx_pkt_prepare_dummy. This aligns with expectations with above mentioned commit. Bugzilla ID: 1795 Fixes: 6eac0b7 ("net/ice: support advance Rx/Tx") Cc: stable@dpdk.org Signed-off-by: Sunil Kumar Kori <skori@marvell.com> Tested-by: Hailin Xu <hailinx.xu@intel.com>
As per recent change by the following commit: commit 066f3d9 ("ethdev: remove callback checks from fast path") framework unconditionally invokes dev->tx_pkt_prepare. Ensure dev->tx_pkt_prepare is not NULL when vector or simple TX paths are selected, by assigning rte_eth_tx_pkt_prepare_dummy. This aligns with expectations with above mentioned commit. Fixes: 7829b8d ("net/ixgbe: add Tx preparation") Cc: stable@dpdk.org Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
As per recent change by the following commit: commit 066f3d9 ("ethdev: remove callback checks from fast path") framework unconditionally invokes dev->tx_pkt_prepare. Ensure dev->tx_pkt_prepare is not NULL when vector or simple TX paths are selected, by assigning rte_eth_tx_pkt_prepare_dummy. This aligns with expectations with above mentioned commit. Fixes: 9b134aa ("net/fm10k: add Tx preparation") Cc: stable@dpdk.org Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Create a new default tree for the SDP interface if more than one Tx queue is requested. This helps to back pressure each queue independently when they are created with separate channels. Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Drain all CQ buffers and close CQ when SQ enabled completion is about to stop. Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Update scatter check as warning for SDP interfaces instead of error to support cases where host application is already aware for the max buf size. Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
This patch fixes the inline device functions to work when roc_nix is NULL. Fixes: f81ee71 ("common/cnxk: support inline SA context invalidate") Cc: stable@dpdk.org Signed-off-by: Monendra Singh Kushwaha <kmonendra@marvell.com>
LSO enhanced to support flags modification. Added new mbox to enable this feature. Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Extend LSO offload to support IPv4 fragmentation. Signed-off-by: Satha Rao <skoteshwar@marvell.com>
While performing IPv4 fragmentation, consider the DF flag from the original packet header instead of setting it to zero. Signed-off-by: Satha Rao <skoteshwar@marvell.com>
SQ context extended with new feature, if enabled the counter is updated when a packet if processed, whether it is transmitted or dropped. Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Added function to check whether board supports 16B alignment Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
On CN20K SoC, back pressure can be configured for eight different traffic classes per pool along with threshold and BPIDs. RoC API is added to configure the same. Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
By default, SQB pool is created with max (512 buffers) + extra threshold buffers and aura limit is set to 512 + thr. But while clean up, aura limit is reset to MAX (512 buffers) only before destroying the pool. Hence while destroying the pool, only 512 buffers are cleaned from aura and extra threshold buffers are left as it is. At later stage if same SQB pool is created then H/W throws error for extra threshold buffers that it is already in pool. Fixes: 780f90e ("common/cnxk: restore NIX SQB pool limit before destroy") Cc: stable@dpdk.org Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Add support for SQ resize by making SQB mem allocated in chunks of SQB size. Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
CN10K platform supports Tx schedulers up to 2K. Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Fix klockwork for NULL pointer dereference Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
Condition to check SQ is non NULL before access. Also pktio locks are simplified while doing threshold_profile config. Fixes: 90a903f ("common/cnxk: split NIX TM hierarchy enable API") Cc: stable@dpdk.org Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Aura field width has changed from 20 bits to 17 bits for cn20k. Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Fix issue reported by klocwork. Fixes: f410059 ("common/cnxk: support inline inbound queue") Cc: stable@dpdk.org Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Fix format specifier for bandwidth profile ID. Fixes: db5744d ("common/cnxk: support NIX debug for CN20K") Cc: stable@dpdk.org Signed-off-by: Aarnav JP <ajp@marvell.com>
Fix CPT res address config logic to avoid garbage values and trigger only when inline dev is present. Fixes: 3c31a74 ("common/cnxk: config CPT result address for CN20K") Cc: stable@dpdk.org Signed-off-by: Aarnav JP <ajp@marvell.com>
The cited commit removed the representor interrupt handler cleanup by mistake. Fixes: 5cf0707 ("net/mlx5: remove Rx queue data list from device") Signed-off-by: Roi Dayan <roid@nvidia.com> Acked-by: Suanming Mou <suanmingm@nvidia.com>
When creating a new mempool but assigning a shared entries from a different mempool need to release the newly unused allocated entries. Fix it. Fixes: 8947eeb ("common/mlx5: fix shared memory region ranges allocation") Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Gregory Etelson <getelson@nvidia.com> Acked-by: Suanming Mou <suanmingm@nvidia.com>
`buddy` was erroroneously declared as static. When multiple threads call this routine, they set the same static variable corrupting pool data and can cause potential double free when releasing resources. Fixes: b4dd7bc ("net/mlx5/hws: add pool and buddy") Signed-off-by: Nupur Uttarwar <nuttarwar@nvidia.com> Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
When mlx5_dev_start() fails partway through initialization, the error cleanup code unconditionally calls cleanup functions for all steps, including those that were never successfully initialized. This causes state corruption leading to incorrect behavior on subsequent start attempts. The issue manifests as: 1. First start attempt fails with -ENOMEM (expected) 2. Second start attempt returns -EINVAL instead of -ENOMEM 3. With flow isolated mode, second attempt incorrectly succeeds, leading to segfault in rte_eth_rx_burst() Root cause: The single error label cleanup path calls functions like mlx5_traffic_disable() and mlx5_flow_stop_default() even when their corresponding initialization functions (mlx5_traffic_enable() and mlx5_flow_start_default()) were never called due to earlier failure. For example, when mlx5_rxq_start() fails: - mlx5_traffic_enable() at line 1403 never executes - mlx5_flow_start_default() at line 1420 never executes - But cleanup unconditionally calls: * mlx5_traffic_disable() - destroys control flows list * mlx5_flow_stop_default() - corrupts flow metadata state This corrupts the device state, causing subsequent start attempts to fail with different errors or, in isolated mode, to incorrectly succeed with an improperly initialized device. Fix by replacing the single error label with cascading error labels (Linux kernel style). Each label cleans up only its corresponding step, then falls through to clean up earlier steps. This ensures only successfully initialized steps are cleaned up, maintaining device state consistency across failed start attempts. Bugzilla ID: 1419 Fixes: 8db7e3b ("net/mlx5: change operations for non-cached flows") Cc: stable@dpdk.org Signed-off-by: Maayan Kashani <mkashani@nvidia.com> Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Gcc-16 detects use of uninitialized variable. If the retry loop exits the code would do memcmp against uninitialized stack value. Resolve by initializing to zero. Bugzilla ID: 1823 Fixes: 1256805 ("net/mlx5: move Linux-specific functions") Fixes: cfee947 ("net/mlx5: fix link status to use wait to complete") Cc: stable@dpdk.org Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Raslan Darawsheh <rasland@nvidia.com>
If flow template table was created with resizable attribute, then completing table resize could fail for 2 user-related reasons: - not all flow rules were yet updated to use the resized table, - resize was not started. Both of these were reported with the same error message i.e., "cannot complete table resize". Since PMD can distinguish these 2 cases, this patch improves the error reporting to report these 2 errors separately. Also, this patch removes redundant __rte_unused on device parameter. Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com> Acked-by: Gregory Etelson <getelson@nvidia.com> Acked-by: Bing Zhao <bingz@nvidia.com>
Offending patch introduced support for additional flow tag indexes with HW Steering flow engine. New tag indexes will be mapped to HW registers REG_C_8 to REG_C_11, depending on HW capabilities. That patch only handled tables created on group > 0 (non-root table), where mlx5 PMD directly configures the HW. Tables and flow rules on group 0 (root table) are handled through kernel driver, and new registers were not addressed for that case. Because of that, usage of unsupported tag index in group 0 triggered an assertion in flow_dv_match_meta_reg(). This patch adds necessary definitions for REG_C_8 to REG_C_11 to make these registers usable for flow tag indexes in root table. Validation of flow tag to HW register translation is also amended to report invalid cases to the user, instead of relying on assertions. Fixes: 7e3a144 ("net/mlx5/hws: support 4 additional C registers") Cc: stable@dpdk.org Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Some cases are not supported for rule hash calculation. For example when the matcher defined as FW matcher, or is the hash type is different than CRC32. One case is when the distribute mode is not by hash, the previous condition checked a wrong capability, while this commit fixes it to the correct check. Fixes: 7f5e6de ("net/mlx5/hws: query flow rule hash") Cc: stable@dpdk.org Signed-off-by: Itamar Gozlan <igozlan@nvidia.com> Acked-by: Bing Zhao <bingz@nvidia.com>
GCC analyzer identified a code path where acts->mhdr could be NULL when dereferenced. When modify header validation fails in mlx5_tbl_translate_modify_header(), __flow_hw_action_template_destroy() sets acts->mhdr to NULL. Add defensive NULL check in mlx5_tbl_ensure_shared_modify_header() to prevent the dereference. Bugzilla ID: 1521 Fixes: 12f2ed3 ("net/mlx5: set modify header as shared flow action") Cc: stable@dpdk.org Signed-off-by: Shani Peretz <shperetz@nvidia.com> Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
If flow isolation is enabled, then skip flow_hw_create_ctrl_rx_tables because these are not used with flow isolation. This is used to save the unneeded resource allocation and also speed up the device startup time. Fixes: 9fa7c1c ("net/mlx5: create control flow rules with HWS") Cc: stable@dpdk.org Signed-off-by: Nupur Uttarwar <nuttarwar@nvidia.com> Signed-off-by: Bing Zhao <bingz@nvidia.com> Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
The mlx5_ipool_free() function was called with a NULL pool pointer during HW flow destruction, causing a segmentation fault. This occurred when flow creation failed and the cleanup path attempted to free resources from an uninitialized flow pool. The crash happened in the following scenario: 1. During device start, a default NTA copy action flow is created 2. If the flow creation fails, mlx5_flow_hw_list_destroy() is called 3. In hw_cmpl_flow_update_or_destroy(), table->flow pool could be NULL 4. mlx5_ipool_free(table->flow, flow->idx) was called without checking if table->flow is NULL 5. Inside mlx5_ipool_free(), accessing pool->cfg.per_core_cache caused a segmentation fault due to NULL pointer dereference The fix adds two layers of protection, 1. Add NULL check for table->flow before calling mlx5_ipool_free() in hw_cmpl_flow_update_or_destroy(), consistent with the existing check for table->resource on the previous line 2. Add NULL check for pool parameter in mlx5_ipool_free() as a defensive measure to prevent similar crashes in other code paths The fix also renames the ‘flow’ field in rte_flow_template_table to ‘flow_pool’ for better code readability. Stack trace of the fault: mlx5_ipool_free (pool=0x0) at mlx5_utils.c:753 hw_cmpl_flow_update_or_destroy at mlx5_flow_hw.c:4481 mlx5_flow_hw_destroy at mlx5_flow_hw.c:14219 mlx5_flow_hw_list_destroy at mlx5_flow_hw.c:14279 flow_hw_list_create at mlx5_flow_hw.c:14415 mlx5_flow_start_default at mlx5_flow.c:8263 mlx5_dev_start at mlx5_trigger.c:1420 Fixes: 27d171b ("net/mlx5: abstract flow action and enable reconfigure") Cc: stable@dpdk.org Signed-off-by: Maayan Kashani <mkashani@nvidia.com> Acked-by: Bing Zhao <bingz@nvidia.com>
Since auxiliary structure is associated with per rule, it can be allocated in the same ipool allocation to save the extra overhead of the *alloc header and the unneeded CPU cycles. Fixes: 27d171b ("net/mlx5: abstract flow action and enable reconfigure") Cc: stable@dpdk.org Signed-off-by: Bing Zhao <bingz@nvidia.com> Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Currently the Windows implementation hardcodes match criteria to MLX5_MATCH_OUTER_HEADERS when creating flow rules, which prevents matching on inner headers and other criteria types like NVGRE. The fix uses the matcher's match_criteria_enable attribute instead of hardcoding OUTER_HEADERS, and moves the assignment outside the action switch block to apply to all cases. NVGRE item type is also added to the supported items list. Fixes: 1d19449 ("net/mlx5: create flow rule on Windows") Cc: stable@dpdk.org Signed-off-by: Itai Sharoni <isharoni@nvidia.com> Acked-by: Bing Zhao <bingz@nvidia.com>
Update the build configuration for Microsoft Azure Cobalt 100 SoC to use a CPU-specific mcpu value supported by GCC 14+. Signed-off-by: Doug Foster <doug.foster@arm.com> Reviewed-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
The array used to save the flow rules pointer was allocated with an incorrect length. 1 more rule space should be appended but not 1 byte. Fixes: 070316d ("app/flow-perf: add multi-core rule insertion and deletion") Cc: stable@dpdk.org Signed-off-by: Bing Zhao <bingz@nvidia.com> Reviewed-by: Dariusz Sosnowski <dsosnowski@nvidia.com> Acked-by: Wisam Jaddo <wisamm@nvidia.com>
On the ConnectX-7 Multi-Host Setup, the PF index is not continuous,
user need to query the PF index based on the return value of the
related sysfs entries:
cat /sys/class/net/*/phys_port_name
Example output is 0 and 2 for PF1 and PF2, then use 0 and 2 of PF index::
-a <Primary_PCI_BDF>,representor=pf[0,2]vf[0-2]
Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
As specified in RFC 4291 section 2.5.1, link local addresses must be generated based on a modified EUI-64 interface identifier: > Modified EUI-64 format interface identifiers are formed by inverting > the "u" bit (universal/local bit in IEEE EUI-64 terminology) when > forming the interface identifier from IEEE EUI-64 identifiers. This translates to 'mac->addr_bytes[0] ^= 0x02'. Fixes: 3d6d85f ("net: add utilities for well known IPv6 address types") Cc: stable@dpdk.org Signed-off-by: Christophe Fontaine <cfontain@redhat.com> Signed-off-by: Robin Jarry <rjarry@redhat.com>
Add start tx_first method to testpmd shell, which sends a specified number of burst packets prior to starting packet forwarding. Signed-off-by: Dean Marx <dmarx@iol.unh.edu> Reviewed-by: Luca Vizzarro <luca.vizzarro@arm.com> Reviewed-by: Patrick Robb <probb@iol.unh.edu>
Mock the Pydantic import so that even when Pydantic is available on the system, it is not loaded by Sphinx, ensuring we perform the doc build without Pydantic regardless of the environment. Signed-off-by: Patrick Robb <probb@iol.unh.edu>
Some maintainers known to be inactive for at least a year are removed to make clearer where help is needed. Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
There are warnings from Coverity and other tools that handling of intermediate bursts may wraparound. Shouldn't be possible but use unsigned int to avoid any issues, and just as fast. Coverity issue: 499471 Fixes: 0dea03e ("pdump: remove use of VLA") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
There is a race where the request to disable pdump may get ahead of the handling of pdump requests in dumpcap. The fix is to do local removal of callbacks before forwarding same to secondary. To reproduce: 1. Start testpmd and start traffic 2. Start dumpcap to capture 3. Interrupt dumpcap with ^C Testpmd will show missing response and dumpcap will show error: EAL: Cannot find action: mp_pdump Only reproducible if additional logging not enabled. Fixes: c3ceb87 ("pdump: forward callback enable to secondary process") Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Create an ELF file to load using clang. Repackage the object into an array using xdd. Write a test to see load and run the BPF. If libelf library is not available, then DPDK bpf will return -ENOTSUP to the test and the test will be skipped. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Marat Khalili <marat.khalili@huawei.com> Tested-by: Marat Khalili <marat.khalili@huawei.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
New test using null device to test filtering with BPF. If libelf library is not available, then DPDK bpf will return -ENOTSUP to the test and the test will be skipped. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Marat Khalili <marat.khalili@huawei.com> Tested-by: Marat Khalili <marat.khalili@huawei.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
This adds the BlueField-4 device ID to the list of NVIDIA devices that run the mlx5 drivers. The device is still in development stage. Signed-off-by: Raslan Darawsheh <rasland@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
The HWS (Hardware Steering) IP tunnel detection implementation was not recognizing IPIP tunnel types based on the next protocol value in the outer IP header as specified by RFC 2003 (IP-in-IP) and RFC 2473 (IPv6 Encapsulation). The issue was that the code treated all combinations of [IPv4|6] / [IPv4|6] as IPIP tunnels, setting MLX5_FLOW_LAYER_IPIP flag regardless of the actual protocol. While this happened to work for simple cases due to the fallback classification, it violated RFC compliance and could cause issues with protocol validation. The fix adds RFC-compliant tunnel detection by 1. Checking the protocol field (IPPROTO_IPIP=4 or IPPROTO_IPV6=41) in the outer IP header to determine tunnel type 2. Adding protocol mismatch validation to ensure outer protocol matches inner header type 3. Correctly setting MLX5_FLOW_LAYER_IPV6_ENCAP for IPv6 encapsulation The expected tunnel patterns are now: - [ipv4 | ipv6] proto is 4 / ipv4 -> MLX5_FLOW_LAYER_IPIP - [ipv4 | ipv6] proto is 41 / ipv6 -> MLX5_FLOW_LAYER_IPV6_ENCAP For cases without protocol specification, fallback classification based on inner header type is still supported. Fixes: f66c7c3 ("net/mlx5/hws: recognize IP-in-IP tunnel in definer layer") Fixes: 4b70445 ("net/mlx5: support IP-in-IP tunnel for all combinations") Cc: stable@dpdk.org Signed-off-by: Junfeng Guo <junfengg@nvidia.com> Signed-off-by: 0-day Robot <robot@bytheb.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @ovsrobot, your pull request is larger than the review limit of 150000 diff characters
WalkthroughThis pull request introduces DCB forwarding enhancements to testpmd, adds BPF ELF-based testing support, expands CNXK driver capabilities with dynamic SQB resizing and LSO optimizations, refactors MLX5 flow handling and representor management, and improves multiple network drivers with targeted fixes and new features. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Areas requiring extra attention:
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 17
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (13)
drivers/net/axgbe/axgbe_ethdev.c (1)
1745-1759: Critical: Incorrect bit manipulation - OR operation cannot clear bits.The disable logic is fundamentally broken. The code uses
value |= mac_tscr(line 1757) to apply disable settings, but OR operations can only set bits to 1, never clear them to 0. SinceAXGMAC_SET_BITS(..., 0)creates fields with 0 values to disable features, ORing these with the current register value will fail to clear the corresponding enable bits that are currently set to 1.Additionally, line 1750 already directly modifies the MAC_TSCR register via
AXGMAC_IOWRITE_BITS, making the subsequent read at line 1756 redundant and the approach inconsistent. The enable functionaxgbe_timesync_enableuses a direct write pattern without read-modify-write.To fix this, either:
- Use AND with inverted mask to clear bits, or
- Follow the enable function's pattern with direct write:
struct axgbe_port *pdata = dev->data->dev_private; unsigned int mac_tscr = 0; - unsigned int value = 0; /*disable timestamp for all pkts*/ AXGMAC_SET_BITS(mac_tscr, MAC_TSCR, TSENALL, 0); - /*disable the addened register*/ - AXGMAC_IOWRITE_BITS(pdata, MAC_TSCR, TSADDREG, 0); + /*disable the addend register*/ + AXGMAC_SET_BITS(mac_tscr, MAC_TSCR, TSADDREG, 0); /* disable timestamp update */ AXGMAC_SET_BITS(mac_tscr, MAC_TSCR, TSCFUPDT, 0); /*disable time stamp*/ AXGMAC_SET_BITS(mac_tscr, MAC_TSCR, TSENA, 0); - - value = AXGMAC_IOREAD(pdata, MAC_TSCR); - value |= mac_tscr; - AXGMAC_IOWRITE(pdata, MAC_TSCR, value); - + AXGMAC_IOWRITE(pdata, MAC_TSCR, mac_tscr); + return 0;drivers/net/intel/ice/ice_rxtx_vec_common.h (1)
1-207: PR title and description do not match the actual file changes.The PR is titled "net/mlx5: fix IP tunnel detection for IPIP and IPv6 ENCAP" and references the MLX5 driver, but this file is part of the Intel ICE driver (
drivers/net/intel/ice/) and the changes relate to QINQ offload handling, not IP tunnel detection. Please verify:
- Is this the correct file for this PR?
- Should the PR title and description be updated to reflect the actual changes?
- Are there additional MLX5-related files that should be in this PR?
drivers/net/cnxk/cnxk_ethdev_ops.c (1)
1-1: PR title doesn't match the changes in this file.The PR title references "net/mlx5: fix IP tunnel detection for IPIP and IPv6 ENCAP", but this file is part of the CNXK driver (
drivers/net/cnxk/). The AI summary also indicates this PR contains changes across multiple drivers (DCB, BPF, CNXK, MLX5) and components.Consider organizing related changes into separate PRs to improve reviewability and change tracking.
drivers/net/mlx5/mlx5_flow.c (1)
8810-8818: Missing closing brace forif (dev->data->dev_started)causes syntax errorAs written, the
if (dev->data->dev_started) {block is never closed beforepriv->isolated = !!enable;, which makes the function invalid C and changes the scope of subsequent statements. You likely intended an early-return guard with the rest of the function outside theif.Add the missing
}after thereturn -rte_errno;:if (dev->data->dev_started) { rte_flow_error_set(error, EBUSY, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, "port must be stopped first"); return -rte_errno; + } priv->isolated = !!enable;drivers/net/mlx5/mlx5_trigger.c (1)
59-67: Fix potential null dereference inmlx5_txq_start()
txq_ctrlis dereferenced to initializetxq_databefore you checkif (!txq_ctrl), so a NULL returned frommlx5_txq_get()would cause undefined behaviour. Theif (!txq_ctrl)branch below suggests NULL is expected.Consider reordering:
- for (i = 0; i != priv->txqs_n; ++i) { - struct mlx5_txq_ctrl *txq_ctrl = mlx5_txq_get(dev, i); - struct mlx5_txq_data *txq_data = &txq_ctrl->txq; - - if (!txq_ctrl) - continue; + for (i = 0; i != priv->txqs_n; ++i) { + struct mlx5_txq_ctrl *txq_ctrl = mlx5_txq_get(dev, i); + struct mlx5_txq_data *txq_data; + + if (!txq_ctrl) + continue; + txq_data = &txq_ctrl->txq;This avoids dereferencing a possible NULL
txq_ctrl.drivers/net/cnxk/cn10k_ethdev_sec.c (1)
1333-1352: Avoid leaving rx_inject flag enabled when inline LF is missingThe new
inl_lfNULL guard is good, butroc_idev_nix_rx_inject_set(port_id, enable);has already been called. Ifroc_nix_inl_inb_inj_lf_get(nix)returns NULL, the function now exits with-ENOTSUPwhile the global inject flag stays enabled andinj_cfgis never initialized, which can lead to inconsistent behavior later.Consider reverting the flag (or moving the setter below the guard), e.g.:
inl_lf = roc_nix_inl_inb_inj_lf_get(nix); - if (!inl_lf) - return -ENOTSUP; + if (!inl_lf) { + roc_idev_nix_rx_inject_set(port_id, false); + return -ENOTSUP; + }drivers/common/cnxk/roc_nix_inl_dev.c (1)
245-252: Error cleanup for CPT inline queue missing slot/queue IDIn
nix_cfg_fail, the cleanup now sends acpt_rx_inl_queue_cfgmessage with onlyenable = 0set:cpt_req = mbox_alloc_msg_cpt_rx_inl_queue_cfg(mbox); if (!cpt_req) { rc |= -ENOSPC; } else { cpt_req->enable = 0; rc |= mbox_process(mbox); }Unlike the normal release path in
nix_inl_inb_queue_release(), this does not populateslotorrx_queue_id, so the FW may not correctly identify which CPT LF/queue to disable (or might operate on defaults).To mirror the release path and avoid leaking or mis-disabling queues, populate these fields as well:
nix_cfg_fail: cpt_req = mbox_alloc_msg_cpt_rx_inl_queue_cfg(mbox); if (!cpt_req) { rc |= -ENOSPC; } else { - cpt_req->enable = 0; + cpt_req->enable = 0; + cpt_req->slot = slot_id; + cpt_req->rx_queue_id = qid; rc |= mbox_process(mbox); }drivers/common/cnxk/roc_nix_queue.c (1)
2069-2149: SQ init: consider cleaning up SQB aura on failures after pool population
roc_nix_sq_init()now branches betweensqb_pool_dyn_populate()andsqb_pool_populate(), but on subsequent failures (TM leaf fetch, SQ context init,mbox_process) thenomempath only freessq->fc:if (rc) goto nomem; /* ... later ... */ nomem: plt_free(sq->fc); fail: return rc;At that point the SQB aura and backing memory (static or dynamic) have already been created and are leaked if init fails.
It would be safer to also destroy the pool in this path, e.g. along the lines of:
nomem: - plt_free(sq->fc); + if (sq->aura_handle) + roc_npa_pool_destroy(sq->aura_handle); + if (!roc_nix->sq_resize_ena && sq->sqe_mem) + plt_free(sq->sqe_mem); + plt_free(sq->fc);This is a pre‑existing pattern but becomes more important with dynamic SQB resizing, where failures are more likely during bring‑up.
app/test/test_bpf.c (1)
5-11: Add<errno.h>for system errno usage increate_temp_bpf_file()The code uses system
errnowithstrerror(errno)at lines 3299, 3307, and 3318 within thecreate_temp_bpf_file()helper, which calls standard C library functions (asprintf(),mkstemps(),write()) that set systemerrno. However, the file only includes<rte_errno.h>(DPDK-specific), not the required system header<errno.h>. This will cause portability issues and build failures on strict toolchains.Add
#include <errno.h>after the other standard C headers (before the DPDK headers):-#include <stdint.h> -#include <inttypes.h> -#include <unistd.h> +#include <stdint.h> +#include <inttypes.h> +#include <unistd.h> +#include <errno.h>app/test-pmd/cmd_flex_item.c (1)
141-165: Fix incorrect rte_flow_conv() source parameter and add error/bounds checksIn
flex_link_item_parse()(lines 145–165), the code passesitemtorte_flow_conv(RTE_FLOW_CONV_OP_ITEM_MASK, NULL, 0, item, NULL). However,itemis the destination buffer with onlytypeset, whilepatternis the source item withspec/mask/lastfully populated byflow_parse(). The conversion should query frompattern, not the partially-initializeditem.This causes three problems:
- Wrong conversion source:
rte_flow_conv()returns the mask size for the item, which should be based on the parsedpattern, not the destinationitem.- Unhandled errors: Negative
retvalues (conversion failures) are never checked; many similar uses in the codebase checkret <= 0.- Buffer overflow: Pre-allocated
spec/maskbuffers areFLEX_MAX_FLOW_PATTERN_LENGTH(64 bytes). Ifret > 64, the subsequentmemcpy()calls overflow these fixed buffers.Additionally, the current logic sets
item->spec/masktoNULLwhenret <= 0, which causes valid JSON to be rejected by the downstream validation inflex_link_parse()(line 184:if (!link->item.spec || !link->item.mask) return -EINVAL;).Fix:
item->type = pattern->type; - ret = rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_MASK, NULL, 0, item, NULL); - if ((ret > 0) && pattern->spec) { + /* Query mask size from the parsed pattern item. */ + ret = rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_MASK, NULL, 0, pattern, NULL); + if (ret < 0) + return ret; + if (ret > FLEX_MAX_FLOW_PATTERN_LENGTH) + return -EINVAL; + if (pattern->spec) { ptr = (void *)(uintptr_t)item->spec; memcpy(ptr, pattern->spec, ret); } else { item->spec = NULL; } - if ((ret > 0) && pattern->mask) { + if (pattern->mask) { ptr = (void *)(uintptr_t)item->mask; memcpy(ptr, pattern->mask, ret); } else { item->mask = NULL; } - if ((ret > 0) && pattern->last) { + if (pattern->last) { ptr = (void *)(uintptr_t)item->last; memcpy(ptr, pattern->last, ret); } else { item->last = NULL; }drivers/net/mlx5/mlx5_flow_hw.c (3)
11305-11333: ctrl-rx create/cleanup: partial-failure leak and differing responsibilities
mlx5_flow_hw_create_ctrl_rx_tables()allocatespriv->hw_ctrl_rxand then creates pattern templates per(eth_pattern_type, rss_type). On failure it just returns-rte_errnowithout cleaning up already-created templates orhw_ctrl_rx. This will leak templates and the container struct on early failure; consider callingmlx5_flow_hw_cleanup_ctrl_rx_tables(dev)or an internal rollback helper from theerr:path.mlx5_flow_hw_cleanup_ctrl_rx_tables()fully destroys tables, pattern templates, RSS action templates, and freeshw_ctrl_rx.mlx5_flow_hw_cleanup_ctrl_rx_templates()only tears down the tables and RSS templates (and nulls their pointers) but leaves pattern templates andhw_ctrl_rxintact, presumably for reuse on reconfigure.The split responsibilities make sense, but it would help to document when each should be used (e.g., “templates” for port stop/reconfigure, “tables” for final teardown) and to ensure the create‑path error cleanup calls the stronger helper.
Also applies to: 11592-11629, 11632-11661
15278-15315: flow_hw_table_resize_multi_pattern_actions uses undefinedtableidentifierInside
flow_hw_table_resize_multi_pattern_actions()the first line uses:struct mlx5_multi_pattern_segment *segment = table->mpctx.segments;but the parameter is named
tbl. This won’t compile and should be:struct mlx5_multi_pattern_segment *segment = tbl->mpctx.segments;and any other
table->references in this helper should be adjusted totbl->.-static int -flow_hw_table_resize_multi_pattern_actions(struct rte_eth_dev *dev, - struct rte_flow_template_table *tbl, - uint32_t nb_flows, - struct rte_flow_error *error) -{ - struct mlx5_multi_pattern_segment *segment = table->mpctx.segments; +static int +flow_hw_table_resize_multi_pattern_actions(struct rte_eth_dev *dev, + struct rte_flow_template_table *tbl, + uint32_t nb_flows, + struct rte_flow_error *error) +{ + struct mlx5_multi_pattern_segment *segment = tbl->mpctx.segments;
13620-13627: Verify Windows platform buffer overflow and update commentThe review concern is partially verified and requires attention:
Confirmed issues:
Platform-specific buffer overflow risk: On Windows (mlx5_os.c:550),
priv->flows[i]is allocated with sizesizeof(rte_flow_hw) + sizeof(rte_flow_nt2hws)only—withoutrte_flow_hw_aux. However,flow_hw_prepare()at line 13622 writes to memory beyond the allocated chunk by computing:(uintptr_t)((*flow)->nt2hws) + sizeof(struct rte_flow_nt2hws). On Linux with HWS support (mlx5_os.c:1654-1655), the allocation correctly includes the aux structure, but Windows lacks this.Misleading comment: Lines 13609-13610 claim the pool size is only "sizeof(rte_flow_hw) + sizeof(rte_flow_nt2hws)," omitting any reference to
rte_flow_hw_aux. This is accurate for Windows but misleading for Linux HWS configurations where aux is allocated.Dead code confirmed: The null check at lines 13624-13627 cannot fire for a valid allocation returned by
mlx5_ipool_zmalloc(), since pointer arithmetic on a valid pointer cannot yield NULL.Required actions:
- Ensure Windows allocation (mlx5_os.c:550) includes space for
sizeof(struct rte_flow_hw_aux)to match the code's expectations.- Update the misleading comment at lines 13609-13610 to clarify the full layout.
- Remove the unreachable null check at lines 13624-13627.
🧹 Nitpick comments (43)
doc/guides/rel_notes/release_25_11.rst (1)
247-254: Add "previously announced" language for consistency with other removed items.This entry breaks the established pattern in the Removed Items section—all other removals reference prior deprecation announcements (e.g., lines 230, 237, 243). Add a parallel introductory phrase for consistency.
Additionally, clarify the grammatical scope of "Applications which disabled this option." The intent appears to be applications that disabled
repr_matching_en(set it to false), but the phrasing is ambiguous.Apply this diff to improve consistency and clarity:
-* net/mlx5: ``repr_matching_en`` device argument has been removed. - Applications which disabled this option were able to receive traffic - from any physical port/VF/SF on any representor port. - Specifically, in most cases, this was used to process all traffic on representor port - which is a transfer proxy port. - Similar behavior in mlx5 PMD can be achieved without this device argument, - by using ``RTE_FLOW_ACTION_TYPE_RSS`` in transfer flow rules. +* net/mlx5: As previously announced, the ``repr_matching_en`` device argument has been removed. + Applications that previously disabled this option (repr_matching_en=false) were able to receive traffic + from any physical port/VF/SF on any representor port. + This option was commonly used to process all traffic on a representor port, which acts as a transfer proxy. + Equivalent behavior can be achieved using ``RTE_FLOW_ACTION_TYPE_RSS`` in transfer flow rules.drivers/net/cnxk/cnxk_ethdev_mtr.c (1)
1264-1270: Good defensive bounds checking added.The index validation prevents out-of-bounds access on
color_mapand safely defaults to GREEN for invalid inputs. This improves robustness.Consider these optional enhancements:
- Add logging when defaulting occurs to aid debugging:
/* Index validation */ if (idx >= RTE_COLORS) + plt_warn("Invalid default_input_color %d, defaulting to GREEN", idx); cfg->icolor = ROC_NIX_BPF_COLOR_GREEN; else cfg->icolor = color_map[idx];
- Apply consistent validation to other
color_mapindexing sites for completeness:
- Line 765:
color_map[dscp_table[i]]- Line 837:
color_map[vlan_table[i]]- Lines 1395, 1411:
color_map[mtr->params.dscp_table[i]]andcolor_map[mtr->params.vlan_table[i]]drivers/common/mlx5/mlx5_devx_cmds.h (1)
339-344: Consider adding documentation for the new software ownership fields.These six new bitfields lack documentation explaining their purpose, when they're populated, and how they should be used. Consider adding a comment block describing:
- What "software ownership" means in this context
- The difference between v2 and non-v2 versions
- When each field is expected to be set
Example:
+ /* Software ownership flags for flow tables: + * - rx_sw_owner/rx_sw_owner_v2: RX flow table ownership + * - tx_sw_owner/tx_sw_owner_v2: TX flow table ownership + * - esw_sw_owner/esw_sw_owner_v2: E-Switch flow table ownership + * v2 variants are preferred when available. + */ uint8_t rx_sw_owner:1; uint8_t rx_sw_owner_v2:1;drivers/bus/fslmc/fslmc_bus.c (1)
617-628: Clarify return value when no matching driver is found.When the driver iteration completes without finding a match, the function returns 0 (success). This could be misleading if the caller expects an error when a device cannot be probed. Consider returning
-ENOENTor a similar error code when no matching driver is found.Apply this pattern:
+ break; + } + + /* If loop completed without break, no driver was found */ + if (drv == NULL) + return -ENOENT; + return ret;drivers/crypto/qat/qat_sym_session.c (1)
2777-2780: CCMhash_state_szchange looks correct; you can simplify the alignment.Basing
auth_param->hash_state_szon the (aligned) AAD size instead ofdigestsizeis consistent with the GCM path and looks like the right fix for CCM. Sinceauth_param->u2.aad_szis already aligned toICP_QAT_HW_CCM_AAD_ALIGNMENTwhenaad_length > 0, you could compute this as:auth_param->hash_state_sz = auth_param->u2.aad_sz >> 3;and, if needed, ensure
u2.aad_szis aligned in theaad_length == 0branch as well. That would remove the redundantRTE_ALIGN_CEILcall and make the intent a bit clearer without changing behavior.dts/api/packet.py (2)
91-93: Remove redundant cast after isinstance assertion.The
caston line 93 is unnecessary because theisinstanceassertion on line 91 already serves as a type guard for static type checkers.Apply this diff to simplify:
- assert isinstance( - get_ctx().func_tg, CapturingTrafficGenerator - ), "Cannot capture with a non-capturing traffic generator" - tg: CapturingTrafficGenerator = cast(CapturingTrafficGenerator, get_ctx().func_tg) + assert isinstance( + get_ctx().func_tg, CapturingTrafficGenerator + ), "Cannot capture with a non-capturing traffic generator" + tg = get_ctx().func_tg
347-350: Remove redundant cast after isinstance assertion.Similar to line 93, the
caston line 350 is unnecessary after theisinstanceassertion.Apply this diff:
- assert isinstance( - get_ctx().perf_tg, PerformanceTrafficGenerator - ), "Cannot send performance traffic with non-performance traffic generator" - tg: PerformanceTrafficGenerator = cast(PerformanceTrafficGenerator, get_ctx().perf_tg) + assert isinstance( + get_ctx().perf_tg, PerformanceTrafficGenerator + ), "Cannot send performance traffic with non-performance traffic generator" + tg = get_ctx().perf_tgdts/api/test.py (1)
152-153: Refine timestamp handling to avoid overrides and include timezone info.Right now, a
timestampkey inperformance_datawill override the generated timestamp, and the ISO string is naive (no timezone). If you want this helper to always stamp when it runs and log unambiguous times, consider:- perf_data = {"timestamp": datetime.now().isoformat(), **performance_data} + perf_data = { + **performance_data, + "timestamp": datetime.now().astimezone().isoformat(), + }This ensures the helper’s timestamp always wins and includes timezone information.
drivers/net/mlx5/mlx5_nta_sample.c (2)
528-562: Clarify which action list is actually being validated
validate_sample_terminal_actions()walks theactions[]array from the passed-in pointer and enforces constraints on the last non‑VOID action. At the call site you passsample(theRTE_FLOW_ACTION_TYPE_SAMPLEaction in the outer action list), so this function currently validates “last non‑VOID action from SAMPLE onward” (i.e. SAMPLE + suffix actions), not the innersample_conf->actionslist.If the goal is to validate the terminal action list executed by the SAMPLE action, this should instead receive
sample_conf->actions(orsample_actions) and be called after retrievingsample_conf. If the intent is to constrain the terminal action of the outer flow for sample flows, it would help to document that explicitly or rename the parameter to make this clear.
866-876: Make the SAMPLE precondition explicit at the call site
validate_sample_terminal_actions(dev, attr, sample)assumessamplepoints to a valid actions array terminated byRTE_FLOW_ACTION_TYPE_END. Immediately after,sample->confis dereferenced as well. Ifmlx5_nta_sample_flow_list_create()is ever invoked on an actions list without a SAMPLE action, both uses will access invalid memory.If the higher-level design guarantees that this helper is only called for flows that contain SAMPLE, consider adding an assertion or an early error when
sample == NULLto make the precondition explicit and fail predictably.drivers/dma/dpaa2/dpaa2_qdma.c (1)
1511-1512: Propagatedpaa2_dpdmai_dev_uninit()errors and confirm close semanticsWiring
dpaa2_dpdmai_dev_uninit()intodpaa2_qdma_close()nicely centralizes DPDMAI teardown, and theqdma_dev == NULLguard indpaa2_qdma_close()keeps double-close safe.Two follow-ups are worth considering:
dpaa2_qdma_close()always returns 0, even ifdpdmai_close()fails insidedpaa2_dpdmai_dev_uninit(). Propagating that error would make failures visible to callers:- dpaa2_dpdmai_dev_uninit(dev); - - return 0; + return dpaa2_dpdmai_dev_uninit(dev);
- On the primary process,
dpaa2_dpdmai_dev_uninit()freesqdma_dev, so afterdpaa2_qdma_close()the device cannot be reconfigured or started again without a fresh probe / init. Please double-check that this matches the intended DPDK dmadev lifecycle (i.e., thatdev_closeis terminal for this PMD’s usage model).drivers/net/mlx5/hws/mlx5dr_definer.c (1)
3381-3399: IPIP / IPv6 ENCAP tunnel detection fix looks right; consider extension‑header edge caseThe new logic that, for
[IPv4|IPv6] / IPv4and[IPv4|IPv6] / IPv6, setscd.tunnel = trueand tagsitem_flagswithMLX5_FLOW_LAYER_IPIP/MLX5_FLOW_LAYER_IPV6_ENCAPbefore calling the IPv4/IPv6 converters correctly ensures the inner IP header is treated as tunneled, fixing the earlier misclassification.Two things worth double‑checking:
- That
MLX5_FLOW_LAYER_IPIPandMLX5_FLOW_LAYER_IPV6_ENCAPboth carry theMLX5_FLOW_LAYER_TUNNELbit somlx5dr_definer_set_conv_tunnel()continues to work for deeper inner layers.- Whether sequences like
IPv6 / IPv6_ROUTING_EXT / IPv6are expected to be recognized as IPv6 ENCAP as well; those won’t satisfycd.last_item == RTE_FLOW_ITEM_TYPE_IPV4/6and will still be classified as non‑tunnel under this logic.If either of these should be handled, a small follow‑up tweak around this condition might be needed.
drivers/net/mlx5/mlx5.c (1)
1438-1475: HWS/SWS capability helpers look correct; consider build‑time gating for HWS.The new helpers cleanly encapsulate runtime checks for HWS (
devx && wqe_based_flow_table_sup) and SWS (Rx/Tx/esw SW owner flags), andmlx5_kvargs_is_used()correctly leverages theis_usedbitmap frommlx5_kvargs_ctrl.One thing to double‑check: on builds where
HAVE_MLX5_HWS_SUPPORTis not enabled but the FW still reportswqe_based_flow_table_sup,mlx5_hws_is_supported()will still return true and can drivedv_flow_ento 2. If any HWS code paths are compiled out under#ifdef HAVE_MLX5_HWS_SUPPORT, it may be safer to either:
- gate
mlx5_hws_is_supported()behind that macro, or- ensure all runtime uses of
dv_flow_en == 2are already fully guarded by the same macro.If this is already guaranteed elsewhere, current code is fine; otherwise, consider adding such a guard.
drivers/net/mlx5/mlx5_flow_dv.c (1)
10838-10872: flow_dv_translate_item_tag now correctly surfaces register mapping errorsSwitching
flow_dv_translate_item_tag()to returnintand explicitly handling:
reg < 0: unexpected mapping failures, now logged and returned as-EINVAL.reg == REG_NON: unsupported registers, now logged and returned as-ENOTSUP.prevents silent misuse of invalid
regvalues inflow_dv_match_meta_reg()and aligns with the rest of the error-handling strategy. The behavior for valid mappings is unchanged.If you want slightly richer diagnostics later, you could propagate a
struct rte_flow_error *into this helper and set it directly instead of relying solely on the caller, but this is not required for correctness.drivers/net/mlx5/mlx5_flow.h (3)
24-25: Proxy-port detection macro looks correctExpression matches the usual “E-Switch mode && master port” notion of an uplink/proxy;
!!normalization to boolean is fine. If you expect this to be used heavily, astatic inline boolhelper could give type checking, but the macro is acceptable as-is.
1810-1843: META/TAG register selection changes align with expected behavior
- Lines 1813-1814: Extending the META case to use
REG_C_1when eitherdv_xmeta_en == MLX5_XMETA_MODE_META32_HWSormlx5_esw_metadata_passing_enabled(sh)holds keeps the existing behavior and additionally covers the “metadata passing” capability path. Becausemlx5_esw_metadata_passing_enabled()already impliesdv_esw_en, the extrash->config.dv_esw_en &&is redundant but harmless.- Lines 1841-1842: Returning
REG_NONwhenid >= MLX5_FLOW_HW_TAGS_MAXavoids out-of-bounds access onreg->hw_avl_tags[id]and is consistent with how unsupported fields are indicated elsewhere in this function. Callers already treatREG_NONas “unavailable register”, so this is a safe hardening.These changes look correct and should address the tunnel/metadata corner cases without regressing NIC paths.
3715-3719: IPv6 TC capability helper is minimal and readable
mlx5_dv_modify_ipv6_traffic_class_supported()provides a clear, single-source check for IPv6 TC modify support, which should simplify checks in call sites. If there are configurations wherepriv->sh->phdevmay be NULL, consider adding anMLX5_ASSERT(priv->sh->phdev)here, but otherwise this looks fine.drivers/net/dpaa2/dpaa2_ethdev.c (1)
634-653: Helper correctly clears active DQS; consider softening the busy‑waitThe helper looks logically correct: it walks all per‑lcore queue storage entries, waits for any outstanding command to complete, then clears the SWP’s global
active_dqsand the per‑queue pointer. That should make teardown safer before freeing queue storage.The
while (!qbman_check_command_complete(...)) continue;loop is a hard spin with no pause or timeout, though. If HW ever misbehaves, this could spin indefinitely and burn a full core during device close. Consider adding a small backoff (rte_pause()or similar) and/or a bounded retry with a warning log to avoid pathological hangs during teardown.drivers/crypto/cnxk/cnxk_cryptodev_ops.h (1)
55-59: cpt_inflight_req layout + static asserts look consistent, but may break debug buildsThe new
rsvd[16], relocatedmeta[META_LEN](Line 58) and addedqpfield (Line 64) give a clean, cache‑line–sized layout, and theoffsetof(meta) == 32assertion (Line 77) matches the precedingres+rsvdfields. This aligns with usingcpt_inflight_reqalongside CN20K CQ entries and with the new use ofqpin CN20K adapter dequeue paths.One caveat:
PLT_STATIC_ASSERT(sizeof(struct cpt_inflight_req) == ROC_CACHE_LINE_SZ)(Line 76) will fail as soon asCPT_INST_DEBUG_ENABLEis defined, because the debug-only fields below increase the struct size. If you expect to build withCPT_INST_DEBUG_ENABLEenabled, consider wrapping this assert in#ifndef CPT_INST_DEBUG_ENABLE(or adjusting it to tolerate the debug fields) so debug builds continue to compile.Also applies to: 64-64, 76-78
drivers/crypto/cnxk/cn20k_cryptodev_ops.c (1)
263-266: CQ enable + atomic result init for CN20K path are reasonable; consider harmonizing other NOT_DONE initsSetting
inst[0].cq_ena = 1(Line 264) together with the atomic initialization ofinfl_req->res.u64[0]toCPT_COMP_NOT_DONE(Lines 265-266) matches how the various dequeue paths poll this word and is consistent with enabling the per‑LF CQ for CN20K elsewhere.As a minor follow‑up, you might consider switching other initializations of the same result word (e.g., plain
infl_req->res.cn20k.compcode = CPT_COMP_NOT_DONE;in the adapter path) to use the samerte_atomic_store_explicithelper for uniform atomic semantics, but the current change is already an improvement.drivers/crypto/cnxk/cnxk_cryptodev_ops.c (1)
220-238: Symmetric CQ disable on stop; minor nit on using consistent lf pointerThe stop path mirrors start: it skips NULL LFs (Lines 230-232) and only disables CQ when
lf->cpt_cq_enais set (Lines 235-237). Behaviourally this is correct and symmetric.As a minor consistency nit, you could pass
lfinstead ofroc_cpt->lf[qp_id]toroc_cpt_iq_disable()(Line 234) sincelfhas already been looked up and NULL‑checked.drivers/net/mlx5/mlx5_trigger.c (1)
1229-1232: Centralized errno preservation and cleanup inmlx5_dev_start()looks correctThe
SAVE_RTE_ERRNO_AND_STOP(ret, dev)macro plus the new label structure gives a consistent pattern: on each post‑dev_startedfailure, you (a) stashrte_errnoinret, (b) cleardev_started, then (c) unwind resources in the right order before restoringrte_errnoat theerror:label. All gotos into the shared cleanup chain now go through this macro, so there are no paths that would leavedev_started == 1on failure or lose the errno value. Just keep this invariant in mind for any future gotos into these labels.Also applies to: 1320-1497
drivers/common/cnxk/roc_npa.h (1)
817-820: Clarify first parameter semantics forroc_npa_pool_bp_configureThe header names the first argument
pool_id, while the implementation inroc_npa.cuses and treats it as an aura handle (then derivespool_idviaroc_npa_aura_handle_to_aura). ABI is fine (types/order match), but this discrepancy is easy to misread.Consider either:
- Renaming the parameter in the
.cimplementation topool_idand passing a raw pool index, or- Renaming it here to
aura_handleand documenting that a handle is expected and internally mapped to a pool.Also, for readability, you may want to mention in a comment that this API is pool‑oriented CN20K multi‑BP configuration to distinguish it from
roc_npa_aura_bp_configure.drivers/common/cnxk/roc_nix_inl.c (1)
831-858: Behavior change inroc_nix_inl_inb_rx_inject_enablefor inline‑dev‑only queriesThe new guards avoid NULL dereferences, which is good, but they also change semantics:
- Previously, for
inb_inl_dev == true, a non‑NULLidev->nix_inl_devwithattach_cptlf && rx_inj_enawas enough to returntrue, even whenroc_nixwas NULL (though the finalreturn roc_nix->rx_inj_ena;would have been UB if reached).- Now the inline‑dev branch additionally requires
roc_nix && roc_nix->rx_inj_ena, and ifroc_nixis NULL this function will always returnfalse, regardless ofinl_dev->rx_inj_ena.Other helpers like
roc_nix_inl_inb_sa_base_get()and profile‑ID getters explicitly supportroc_nix == NULLwheninb_inl_dev == true, so this stricter requirement may be unintended.If you want to keep the old “inline‑dev only” semantics while fixing the NULL deref, consider something like:
- if (inl_dev && inl_dev->attach_cptlf && inl_dev->rx_inj_ena && roc_nix && - roc_nix->rx_inj_ena) - return true; + if (inl_dev && inl_dev->attach_cptlf && inl_dev->rx_inj_ena) { + if (!roc_nix) + return true; /* inline-dev only case */ + if (roc_nix->rx_inj_ena) + return true; + } @@ - return roc_nix ? roc_nix->rx_inj_ena : 0; + return roc_nix ? roc_nix->rx_inj_ena : 0;Please double‑check call sites to confirm which behavior is desired for
inb_inl_dev == trueandroc_nix == NULL.drivers/common/cnxk/roc_nix_tm_utils.c (1)
583-593: Optionally sanity-check TL4 SDP relchan against tx_chan_cntFor SDP links,
relchanis now computed as(nix->tx_chan_base & 0xff) + node->rel_chan, which seems intentional to support per-node relative channels. To harden against misconfiguration (e.g.,node->rel_chan≥nix->tx_chan_cnt), consider adding a debug check/log before programmingNIX_AF_TL4X_SDP_LINK_CFG, e.g. warning ifrelchan >= nix->tx_chan_cntor exceeds the HW width.This would not change behavior in the valid case but would make TM misconfigurations easier to diagnose.
drivers/net/cnxk/cn20k_ethdev.c (1)
193-205: Guard tx completion cleanup in cn20k_nix_tx_queue_releaseThe added completion-drain logic is correct in intent (drain CQ, free
tx_compl.ptr, then delegate tocnxk_nix_tx_queue_release), but it currently assumestxqand the completion context are always valid.To make this more robust against repeated releases or partial setup failures, consider:
static void cn20k_nix_tx_queue_release(struct rte_eth_dev *eth_dev, uint16_t qid) { - struct cn20k_eth_txq *txq = eth_dev->data->tx_queues[qid]; + struct cn20k_eth_txq *txq = eth_dev->data->tx_queues[qid]; struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev); struct roc_nix *nix = &dev->nix; - if (nix->tx_compl_ena) { + if (txq && nix->tx_compl_ena && txq->tx_compl.ena && txq->tx_compl.ptr) { /* First process all CQ entries */ handle_tx_completion_pkts(txq, 0); plt_free(txq->tx_compl.ptr); } cnxk_nix_tx_queue_release(eth_dev, qid); }This keeps behavior unchanged in the normal case while avoiding potential NULL dereferences.
drivers/net/cnxk/cn10k_ethdev.c (1)
199-211: Guard tx completion cleanup in cn10k_nix_tx_queue_releaseDraining TX completions before delegating to
cnxk_nix_tx_queue_release()is the right ordering, but the code assumestxqis always valid and its completion context initialized:struct cn10k_eth_txq *txq = eth_dev->data->tx_queues[qid]; ... if (nix->tx_compl_ena) { /* First process all CQ entries */ handle_tx_completion_pkts(txq, 0); plt_free(txq->tx_compl.ptr); }To be safer on repeated releases or partial setups, mirror the CNXK helper’s NULL checks and the completion enable flag:
- struct cn10k_eth_txq *txq = eth_dev->data->tx_queues[qid]; + struct cn10k_eth_txq *txq = eth_dev->data->tx_queues[qid]; ... - if (nix->tx_compl_ena) { + if (txq && nix->tx_compl_ena && txq->tx_compl.ena && txq->tx_compl.ptr) { /* First process all CQ entries */ handle_tx_completion_pkts(txq, 0); plt_free(txq->tx_compl.ptr); } cnxk_nix_tx_queue_release(eth_dev, qid);This preserves the intended behavior while hardening against edge cases.
drivers/common/cnxk/roc_nix_queue.c (3)
1438-1482: SQE‑per‑SQB and SQB slack helpers look sound, but consider minimal validation
sqes_per_sqb_calc()correctly derives SQEs per SQB for W16/W8 max SQE sizes and reserves one SQE for the next‑SQB pointer.sq_desc_to_sqb()andsqb_slack_adjust()provide clear separation of concerns (translate descriptors to SQBs, then apply slack/max clamps).Given these helpers are now central to static and dynamic SQB sizing, you might want a defensive assert to catch impossible configurations (e.g.,
PLT_ASSERT(sqes_per_sqb);) before divisions, though it’s not strictly required ifnix->sqb_sizeandmax_sqe_szare always sane.
1563-1651: Dynamic SQB pool population is coherent; reclaim path is robust
sqb_pool_dyn_populate()sets up a dynamic SQB pool sized tomax_sqb_count(+ slack) and populates only the initialnb_sqb_bufsbuffers:
- Per‑SQ state (
sqes_per_sqb,nb_sqb_bufs[_adj],aura_sqb_bufs) is initialized appropriately.- Per‑SQB allocations are individually
plt_zmalloc’d and freed into the aura, with full reclaim on failure innomem/npa_failpaths before destroying the pool.- Aura limit is set to the initial count and the range is broadened to
[0, UINT64_MAX], which is consistent with dynamically allocated SQBs living anywhere in IOVA space.Given the dynamic nature, this looks correct. Just be aware that
roc_npa_aura_op_range_set(..., 0, UINT64_MAX)effectively disables range checking; that’s intentional but worth keeping in mind if future code assumes tighter ranges.
2168-2225: SQ fini: dynamic SQB teardown flow is coherent
- Restoring the aura limit to
sq->aura_sqb_bufsbefore draining improves the chances of successful aura drain.- For resize‑enabled SQs, calling
nix_sqb_mem_dyn_free()beforeroc_npa_pool_destroy()ensures dynamically allocated SQBs are freed back to the host.- Only freeing
sq->sqe_memfor non‑resize SQs correctly reflects that dynamic SQBs are individually allocated and tracked via the aura.Only minor nit: you may want to check and log the return value of
roc_npa_aura_limit_modify()in case restoring the limit fails, but this is non‑critical.drivers/common/cnxk/roc_nix.h (1)
32-36: Public API and struct extensions align with implementation changes
- The new
ROC_NIX_LSO_FORMAT_IDX_*macros mirror the kernel’sNIX_LSO_FORMAT_IDX_TSOV4/TSOV6and reserveIDX_IPV4for the new IPv4 fragmentation format.struct roc_nix_sqgainssqes_per_sqb,sq_cnt_ptr, andupdate_sq_cnt, which are used by the new SQB sizing and SQ count update logic inroc_nix_queue.c.struct roc_nixgainssq_resize_enaanduse_multi_bpids, matching dynamic SQ resizing and multi‑BPID usage in queue/TM and NPA code.ROC_NIX_TM_SDPandroc_nix_tm_sdp_prepare_tree()are consistent with the SDP TM tree implementation.- New LSO and SQ APIs (
roc_nix_lso_fmt_ipv4_frag_get,roc_nix_lso_alt_flags_profile_setup,roc_nix_sq_cnt_update,roc_nix_sq_resize) have corresponding definitions and usages elsewhere.These interface changes look coherent with the rest of the PR. Just be aware that adding fields in public structs (
roc_nix,roc_nix_sq) is an ABI change and should align with the project’s versioning/ABI policy.Also applies to: 475-501, 559-606, 698-705, 875-876, 975-993, 1073-1093
drivers/common/cnxk/hw/nix.h (1)
669-675: HW structs and masks for ALT_FLAGS and CN20K SQ count look internally consistent
struct nix_lso_format’s new fields (alt_flags,alt_flags_index,shift) and the associated masks (NIX_LSO_FIELD_SHIFT_MASK,*_ALT_FLG_*, updatedNIX_LSO_FIELD_MASK) line up with their bit positions and how they are programmed innix_lso_ipv4().- The ALT_FLAGS format union
nix_lso_alt_flg_format_tcleanly encodes the two 64‑bit ALT_FLAGS config registers and is used as such by the ROC layer.- CN20K SQ context structs now expose:
update_sq_countandseb_countinnix_cn20k_sq_ctx_hw_s.- A 46‑bit
sq_count_iovaandseb_countinnix_cn20k_sq_ctx_s, matching the high/low split fields in the HW view.NIX_LSOALG_ALT_FLAGSandNIX_LSO_FORMAT_IDX_TSOV4/TSOV6complete the LSO enum and index set in a way that matches the ROC‑level macros.Only minor observation:
alt_flags_indexis 2 bits, so only 4 ALT_FLAGS profiles are representable innix_lso_format. If HW supports more, callers (likenix_lso_ipv4()) should validate that the returned profile index fits in 0–3 before assigning it; otherwise behaviour will silently truncate.Also applies to: 2509-2553, 2073-2202, 2700-2703
doc/guides/conf.py (1)
107-110: Pydantic mocking logic is fine; consider iterable-unpacking formThe explicit set+concatenation works and correctly guarantees
pydantic/pydantic_coreare mocked, but Ruff flags this pattern. You can keep semantics and address RUF005 with a slightly cleaner expression:autodoc_mock_imports = list({*autodoc_mock_imports, "pydantic", "pydantic_core"})This avoids list concatenation, keeps entries unique, and matches Ruff’s preferred style.
app/test/bpf/filter.c (1)
21-37: TCP/IPv4 filter logic is sound; consider naming header offsetsThe filter correctly assumes an Ethernet header of 14 bytes, validates IPv4 via the version nibble, and checks the protocol field for TCP (6), returning 1 only in that case. For readability, you might consider defining named constants for the IP header offset and protocol field offset instead of bare
14and14 + 9, but this is purely cosmetic in a small test program.doc/guides/testpmd_app_ug/testpmd_funcs.rst (1)
1490-1505: Polish wording for new DCB forwarding commandsThe semantics are clear, but the English can be tightened a bit to match surrounding documentation style. For example:
-Config DCB forwarding on specify TCs, if bit-n in tc-mask is 1, then TC-n's -forwarding is enabled, and vice versa:: +Configure DCB forwarding on specified traffic classes (TCs). If bit n in +tc_mask is 1, then TC n forwarding is enabled; otherwise it is disabled:: @@ -Config DCB forwarding cores per-TC, 1-means one core process all queues of a TC, -2-means two cores process all queues of a TC, and so on:: +Configure the number of forwarding cores per TC. +For example, 1 means one core processes all queues of a TC, +2 means two cores process all queues of a TC, and so on::This keeps the meaning but improves readability.
app/test/bpf/load.c (1)
11-51: Ensure struct layout stays in sync withtest_bpf.cand clarifydummy_func1callThis ELF BPF program redefines
struct dummy_offset/struct dummy_vect8to “match” the host-side structs inapp/test/test_bpf.c, but:
- The host version of
struct dummy_offsetusesRTE_ATOMIC(...)fields, while this file uses plain integers plus__attribute__((packed)).- Any change to the host structs (field order, padding, atomics) will silently desynchronize the layout assumed by this BPF object.
test_call1()callsdummy_func1(arg, &v32, &v64)whereargis astruct dummy_vect8 *, whiledummy_func1conceptually operates on a singlestruct dummy_offset. Today this works only becausedummy_vect8starts within[0]and you rely on that aliasing.To make this more robust and self-documenting:
- Move the
dummy_offset/dummy_vect8definitions into a small shared header used by bothtest_bpf.cand this BPF source, so the layouts cannot diverge.- Consider passing
&arg->in[0]todummy_func1instead ofarg, which makes the intent explicit and avoids relying on the first-member trick:- /* Call external function */ - dummy_func1(arg, &v32, &v64); + /* Call external function on first input element */ + dummy_func1(&arg->in[0], &v32, &v64);[recommendation only — behavior is currently consistent as long as the layouts remain identical.]
app/test/bpf/meson.build (1)
4-5: Meson BPF setup looks fine; consider removing unusedbpf_test_hdrsThe Meson logic for building BPF objects with clang and converting them to headers via
xxdis consistent with typical DPDK patterns, and wiring them intoresourcesmakes sense for the tests.One minor nit:
bpf_test_hdrsis initialized but never used; all headers are tracked viaresourcesinstead. You can either dropbpf_test_hdrsor, if you intended to expose these headers separately, append to it in the loop:-bpf_test_hdrs = [ ] + bpf_test_hdrs = [] @@ - bpf_test_h = custom_target(bpf_src + '_h', + bpf_test_h = custom_target(bpf_src + '_h', @@ - resources += bpf_test_h + resources += bpf_test_h + bpf_test_hdrs += bpf_test_hOr simply remove the
bpf_test_hdrsvariable to avoid confusion.Also applies to: 26-53
app/test-pmd/cmdline.c (2)
514-516: Help text is accurate but can be slightly clarifiedThe new
set dcb fwd_tc (tc_mask)entry is wired correctly to the new command, but the description text could be tightened for readability and consistency with the CLI syntax, e.g. usetc_maskinstead oftc-maskand “specific TCs” instead of “specify TCs”. This is purely cosmetic and can be deferred.
6230-6328: New DCB forwarding TC and cores commands look correct; consider minor robustness and help improvementsFunctionality-wise this block looks sound:
- Parsing and token wiring for
set dcb fwd_tc <tc_mask>andset dcb fwd_tc_cores <tc_cores>are consistent with the rest of the file.tc_maskis validated as non‑zero and printed cleanly by iteratingi < RTE_ETH_8_TCS.tc_coresis validated as non‑zero before updatingdcb_fwd_tc_cores.A few non‑blocking points you may want to polish:
- Help string mismatch / clarity (Line 6273)
.help_strstarts with “config DCB forwarding…”, while the actual command isset dcb fwd_tc <tc_mask>. Aligning help text with the exact syntax improves discoverability:
- .help_str = "config DCB forwarding on specify TCs, if bit-n in tc-mask is 1, then TC-n's forwarding is enabled, and vice versa.",
- .help_str = "set dcb fwd_tc <tc_mask>: configure DCB forwarding on specific TCs; if bit n in tc_mask is 1 then TC n forwarding is enabled.",
Similarly, consider starting the cores help with `set dcb fwd_tc_cores <tc_cores>:`. 2. **Missing help entry for `fwd_tc_cores` in the long help section** There is a help line for `set dcb fwd_tc (tc_mask)` but none for `set dcb fwd_tc_cores`. Adding a line beside 514–516 would keep the config help complete: ```diff "set dcb fwd_tc (tc_mask)\n" " Set DCB forwarding on specify TCs, if bit-n in tc-mask is 1, then TC-n's forwarding is enabled\n\n" + "set dcb fwd_tc_cores (tc_cores)\n" + " Set number of forwarding cores per enabled TC; 1 means a single core processes all queues in a TC\n\n"
- Optional: sanity‑check
tc_coresagainst available forwarding cores
Right nowtc_coresis only checked for non‑zero. Ifdcb_fwd_tc_coresis later used to derive core assignments, you may want to clamp or warn whentc_cores > nb_fwd_lcoresto avoid surprising runtime behavior. This can be added later if the usage path justifies it.app/test-pmd/config.c (2)
5160-5194:dcb_fwd_tc_update_dcb_info()only preserves TC queue mapping; other DCB fields are zeroed.This helper compresses
org_dcb_infodown to the masked TCs by rebuilding a freshdcb_infowith onlytc_queueandnb_tcspopulated, then assigning*org_dcb_info = dcb_info;. That’s fine for the current usage indcb_fwd_config_setup(), where onlytc_queueandnb_tcsare read afterwards.However, this makes the function fragile for future reuse: any later code that expects other fields in
struct rte_eth_dcb_info(e.g.,prio_tc,tc_bws) will silently see zeros.If you want to future‑proof it a bit, consider:
- Starting from a copy of
*org_dcb_infoand only rewritingtc_queueandnb_tcs, or- Adding a brief comment that this function intentionally only preserves TC queue layout and nothing else, and is not a general “mask dcb_info” helper.
Given it’s
staticand only used in this DCB fwd path, this is more of a maintainability concern than a correctness bug.
5215-5339: DCB fwd config multi‑core logic is sound; minor robustness nits.The new
dcb_fwd_config_setup()logic generally looks consistent:
- Validates per‑TC queue counts via
dcb_fwd_check_cores_per_tc()before configuring.- Computes
total_tc_numbased on the mask and enforcesnb_fwd_lcores >= total_tc_num * dcb_fwd_tc_cores.- Uses
sub_core_idxand per‑TC queue partitioning to distribute each TC’s queues acrossdcb_fwd_tc_coreslcores, in a way that matches the comments and respects VMDQ pools.A couple of small improvements to consider:
Type/overflow safety for
target_lcores:
target_lcoresislcoreid_t, and you compute it via:target_lcores = (lcoreid_t)total_tc_num * (lcoreid_t)dcb_fwd_tc_cores;If
lcoreid_tis narrower thanint, this multiplication happens in the narrower type and can silently wrap before the< nb_fwd_lcorescheck. Safer to do the multiplication in a wider type first:
- lcoreid_t lc_id, target_lcores;
- lcoreid_t lc_id;
- uint32_t target_lcores;
...
- target_lcores = (lcoreid_t)total_tc_num * (lcoreid_t)dcb_fwd_tc_cores;
- target_lcores = (uint32_t)total_tc_num * (uint32_t)dcb_fwd_tc_cores;
if (nb_fwd_lcores < target_lcores) {
...
}
- cur_fwd_config.nb_fwd_lcores = target_lcores;
- cur_fwd_config.nb_fwd_lcores = (lcoreid_t)target_lcores;
nb_fwd_streamsvs masked TCs:When
dcb_fwd_tc_maskdoesn’t cover all TCs, the code will only populate streams for the masked TCs, butcur_fwd_config.nb_fwd_streamsis still set tonb_rxq * nb_fwd_ports. That’s okay because forwarding usesfwd_lcores[lc_id]->stream_idx/stream_nbrather thannb_fwd_streams, but it does make thestreamscount inpkt_fwd_config_display()potentially misleading. If you want the display to reflect reality, you could accumulate the total number of assigned streams while building the mapping and assign that tocur_fwd_config.nb_fwd_streams.Overall, the algorithm and indexing (
sm_id,k + j) look safe, and the mask +dcb_fwd_tc_coresbehaviour matches the new feature description.drivers/net/mlx5/mlx5_flow_hw.c (2)
2339-2343: Modify-header length validation is good; avoid double action teardownAdding
flow_hw_validate_compiled_modify_field()before creating the DR modify-header action is a solid guard against exceedingmax_header_modify_pattern_length, and the newacts->mhdr == NULLcheck inmlx5_tbl_ensure_shared_modify_header()prevents NULL derefs.You currently call
__flow_hw_action_template_destroy(dev, acts)insidemlx5_tbl_translate_modify_header()on error and again in the caller’serr:path. The destroy helper is idempotent, so this is safe but redundant; consider letting the caller own cleanup to avoid confusion.Also applies to: 2375-2383
8387-8513: Tunnel detection logic: behaviorally sound, but IPv6 FRAG helper unusedThe new tunnel helpers:
- Read outer protocol from the previous L3 item only when that field is actually masked in the template.
- Prefer RFC‑compliant classification (IPIP vs IPv6‑in‑IP) based on
IPPROTO_IPIP/IPPROTO_IPV6, with a fallback to inner header type when proto isn’t constrained.- Enforce a single tunnel layer by checking
*item_flags & MLX5_FLOW_LAYER_TUNNEL.- Treat IPv6 routing extension (
OUTER_IPV6_ROUTING_EXT) as an IPv6 tunnel anchor.This matches the intent to fix IPIP/IPv6 ENCAP detection and should avoid misclassifying non‑tunnel flows when proto is set.
One nit: the
RTE_FLOW_ITEM_TYPE_IPV6_FRAG_EXTcase inmlx5_hw_flow_get_next_protocol()is currently not used, sincelast_l3_itemis only ever set to IPv4/IPv6 items in__flow_hw_pattern_validate(). Either wirelast_l3_itemto the frag-ext item when relevant or drop that branch to avoid dead code.Also applies to: 8791-8805, 8807-8812, 8982-8985
| static int | ||
| dcb_fwd_check_cores_per_tc(void) | ||
| { | ||
| struct rte_eth_dcb_info dcb_info = {0}; | ||
| uint32_t port, tc, vmdq_idx; | ||
|
|
||
| if (dcb_fwd_tc_cores == 1) | ||
| return 0; | ||
|
|
||
| for (port = 0; port < nb_fwd_ports; port++) { | ||
| (void)rte_eth_dev_get_dcb_info(fwd_ports_ids[port], &dcb_info); | ||
| for (tc = 0; tc < dcb_info.nb_tcs; tc++) { | ||
| for (vmdq_idx = 0; vmdq_idx < RTE_ETH_MAX_VMDQ_POOL; vmdq_idx++) { | ||
| if (dcb_info.tc_queue.tc_rxq[vmdq_idx][tc].nb_queue == 0) | ||
| break; | ||
| /* make sure nb_rx_queue can be divisible. */ | ||
| if (dcb_info.tc_queue.tc_rxq[vmdq_idx][tc].nb_queue % | ||
| dcb_fwd_tc_cores) | ||
| return -1; | ||
| /* make sure nb_tx_queue can be divisible. */ | ||
| if (dcb_info.tc_queue.tc_txq[vmdq_idx][tc].nb_queue % | ||
| dcb_fwd_tc_cores) | ||
| return -1; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| return 0; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against dcb_fwd_tc_cores == 0 and align check with dcb_fwd_tc_mask.
Two things to tighten here:
dcb_fwd_tc_cores == 0will currently hit the modulus operations on Lines 5132 and 5136, which is undefined behaviour. Even if CLI validation is supposed to prevent 0, it’s safer to handle it here:
static int
dcb_fwd_check_cores_per_tc(void)
{
struct rte_eth_dcb_info dcb_info = {0};
uint32_t port, tc, vmdq_idx;
- if (dcb_fwd_tc_cores == 1)
- return 0;
+ if (dcb_fwd_tc_cores == 0)
+ return -EINVAL;
+ if (dcb_fwd_tc_cores == 1)
+ return 0;- The divisibility check currently applies to all TCs, but the rest of the DCB forwarding logic (including
get_fwd_port_total_tc_num()anddcb_fwd_tc_update_dcb_info()) only cares about TCs enabled indcb_fwd_tc_mask. As written, a port with a “non-divisible” TC that is not selected in the mask will still cause this check to fail unnecessarily. You likely want to ignore masked‑out TCs here:
- for (tc = 0; tc < dcb_info.nb_tcs; tc++) {
+ for (tc = 0; tc < dcb_info.nb_tcs; tc++) {
+ if (!(dcb_fwd_tc_mask & (1u << tc)))
+ continue;
for (vmdq_idx = 0; vmdq_idx < RTE_ETH_MAX_VMDQ_POOL; vmdq_idx++) {
if (dcb_info.tc_queue.tc_rxq[vmdq_idx][tc].nb_queue == 0)
break;
/* make sure nb_rx_queue can be divisible. */
if (dcb_info.tc_queue.tc_rxq[vmdq_idx][tc].nb_queue %
dcb_fwd_tc_cores)
return -1;
/* make sure nb_tx_queue can be divisible. */
if (dcb_info.tc_queue.tc_txq[vmdq_idx][tc].nb_queue %
dcb_fwd_tc_cores)
return -1;
}
}This keeps the pre‑check consistent with the mask-driven behaviour of the rest of the DCB forwarding path.
🤖 Prompt for AI Agents
In app/test-pmd/config.c around lines 5115 to 5143, guard against
dcb_fwd_tc_cores being zero and only perform the divisibility checks for traffic
classes enabled in dcb_fwd_tc_mask: first if dcb_fwd_tc_cores == 0 return -1 (or
appropriate error) to avoid division/modulus by zero, then inside the loop skip
any TC where the corresponding bit in dcb_fwd_tc_mask is not set so you only
validate enabled TCs, keeping the pre-check consistent with the rest of the DCB
forwarding logic.
| static unsigned int | ||
| setup_mbufs(struct rte_mbuf *burst[], unsigned int n) | ||
| { | ||
| struct rte_ether_hdr eh = { | ||
| .ether_type = rte_cpu_to_be_16(RTE_ETHER_TYPE_IPV4), | ||
| }; | ||
| const struct rte_ipv4_hdr iph = { | ||
| .version_ihl = RTE_IPV4_VHL_DEF, | ||
| .total_length = rte_cpu_to_be_16(BPF_TEST_PKT_LEN - sizeof(eh)), | ||
| .time_to_live = IPDEFTTL, | ||
| .src_addr = rte_cpu_to_be_32(ip_src_addr), | ||
| .dst_addr = rte_cpu_to_be_32(ip_dst_addr), | ||
| }; | ||
| unsigned int tcp_count = 0; | ||
|
|
||
| rte_eth_random_addr(eh.dst_addr.addr_bytes); | ||
|
|
||
| for (unsigned int i = 0; i < n; i++) { | ||
| struct rte_mbuf *mb = burst[i]; | ||
|
|
||
| /* Setup Ethernet header */ | ||
| *rte_pktmbuf_mtod(mb, struct rte_ether_hdr *) = eh; | ||
|
|
||
| /* Setup IP header */ | ||
| struct rte_ipv4_hdr *ip | ||
| = rte_pktmbuf_mtod_offset(mb, struct rte_ipv4_hdr *, sizeof(eh)); | ||
| *ip = iph; | ||
|
|
||
| if (rte_rand() & 1) { | ||
| struct rte_udp_hdr *udp | ||
| = rte_pktmbuf_mtod_offset(mb, struct rte_udp_hdr *, | ||
| sizeof(eh) + sizeof(iph)); | ||
|
|
||
| ip->next_proto_id = IPPROTO_UDP; | ||
| *udp = (struct rte_udp_hdr) { | ||
| .src_port = rte_cpu_to_be_16(9), /* discard */ | ||
| .dst_port = rte_cpu_to_be_16(9), /* discard */ | ||
| .dgram_len = BPF_TEST_PKT_LEN - sizeof(eh) - sizeof(iph), | ||
| }; | ||
|
|
||
| } else { | ||
| struct rte_tcp_hdr *tcp | ||
| = rte_pktmbuf_mtod_offset(mb, struct rte_tcp_hdr *, | ||
| sizeof(eh) + sizeof(iph)); | ||
|
|
||
| ip->next_proto_id = IPPROTO_TCP; | ||
| *tcp = (struct rte_tcp_hdr) { | ||
| .src_port = rte_cpu_to_be_16(9), /* discard */ | ||
| .dst_port = rte_cpu_to_be_16(9), /* discard */ | ||
| .tcp_flags = RTE_TCP_RST_FLAG, | ||
| }; | ||
| ++tcp_count; | ||
| } | ||
| } | ||
|
|
||
| return tcp_count; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Initialize mbuf lengths in setup_mbufs
setup_mbufs() writes Ethernet/IP/UDP/TCP headers into the mbuf data area but never updates mb->data_len/mb->pkt_len. After rte_pktmbuf_alloc_bulk(), these are zero, so you end up transmitting zero‑length packets, which is incorrect even if the null PMD ignores it.
A minimal fix is to set the packet length after populating headers:
static unsigned int
setup_mbufs(struct rte_mbuf *burst[], unsigned int n)
{
@@
- for (unsigned int i = 0; i < n; i++) {
- struct rte_mbuf *mb = burst[i];
+ for (unsigned int i = 0; i < n; i++) {
+ struct rte_mbuf *mb = burst[i];
@@
- if (rte_rand() & 1) {
+ if (rte_rand() & 1) {
struct rte_udp_hdr *udp
@@
++tcp_count;
}
+
+ /* Full packet length: L2 + L3 + L4 (no payload). */
+ mb->data_len = BPF_TEST_PKT_LEN;
+ mb->pkt_len = BPF_TEST_PKT_LEN;
}(If you prefer, you can instead use rte_pktmbuf_append(mb, BPF_TEST_PKT_LEN) and write via that pointer.)
🏁 Script executed:
#!/bin/bash
# Verify the issue by examining setup_mbufs usage and related code
set -e
# 1. Search for calls to setup_mbufs
echo "=== Calls to setup_mbufs ==="
rg -n "setup_mbufs" --type=c
# 2. Find BPF_TEST_PKT_LEN definition
echo -e "\n=== BPF_TEST_PKT_LEN definition ==="
rg -n "BPF_TEST_PKT_LEN" --type=c
# 3. Check for rte_pktmbuf_alloc_bulk calls in test_bpf.c
echo -e "\n=== rte_pktmbuf_alloc_bulk calls ==="
rg -n "rte_pktmbuf_alloc_bulk" --type=c -A 3 -B 3
# 4. Check if data_len or pkt_len are set anywhere in setup_mbufs or nearby
echo -e "\n=== Search for data_len/pkt_len modifications ==="
rg -n "data_len|pkt_len" app/test/test_bpf.cLength of output: 33045
🌐 Web query:
DPDK mbuf data_len pkt_len initialization behavior after rte_pktmbuf_alloc_bulk
💡 Result:
Short answer: rte_pktmbuf_alloc_bulk returns mbufs with segment data_len = 0 and packet pkt_len = 0 (and data_off initialized to headroom), and other mbuf fields reset to their pktmbuf defaults (refcnt = 1, nb_segs = 1, next = NULL). In contrast rte_mbuf_raw_alloc[_bulk] returns uninitialized mbufs (only buf_addr/buf_len/refcnt/nb_segs/next/pool/priv_size are set) and the caller must initialize data_len/pkt_len. [1][2]
Sources:
- DPDK rte_pktmbuf_alloc_bulk / rte_pktmbuf_alloc behavior: “initialize … and reset the fields to default values” (pkt/segment length zero, headroom set). [1]
- DPDK rte_mbuf_raw_alloc/_alloc_bulk behavior: returns uninitialized mbufs — caller must initialize other fields (data_len/pkt_len etc.). [2]
Set mbuf lengths in setup_mbufs function
The issue is confirmed. After rte_pktmbuf_alloc_bulk(), mbufs are allocated with data_len and pkt_len initialized to 0. The setup_mbufs() function writes Ethernet/IP/UDP/TCP headers into the mbuf data area but never updates these fields, leaving them at zero. This causes packets to appear empty despite containing valid header data, breaking test correctness.
Add the following before the loop closes (after the increment of tcp_count):
+ mb->data_len = BPF_TEST_PKT_LEN;
+ mb->pkt_len = BPF_TEST_PKT_LEN;This sets the packet length to 64 bytes, matching the full header size (Ether + IP + L4).
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In app/test/test_bpf.c around lines 3468 to 3524, after writing the
Ethernet/IP/L4 headers for each mbuf inside the loop, set the mbuf length fields
so the packet is not treated as empty: assign mb->data_len = mb->pkt_len =
BPF_TEST_PKT_LEN (or 64 if you prefer the literal) for each mbuf before
finishing the loop iteration.
| test_bpf_elf_rx_load(void) | ||
| { | ||
| static const char null_dev[] = "net_null_bpf0"; | ||
| struct rte_mempool *pool = NULL; | ||
| char *tmpfile = NULL; | ||
| uint16_t port; | ||
| int ret; | ||
|
|
||
| printf("%s start\n", __func__); | ||
|
|
||
| /* Make a pool for packets */ | ||
| pool = rte_pktmbuf_pool_create("bpf_rx_test_pool", 2 * BPF_TEST_BURST, | ||
| 0, 0, RTE_MBUF_DEFAULT_BUF_SIZE, | ||
| SOCKET_ID_ANY); | ||
| TEST_ASSERT(pool != NULL, "failed to create mempool"); | ||
|
|
||
| ret = null_vdev_setup(null_dev, &port, pool); | ||
| if (ret != 0) | ||
| goto fail; | ||
|
|
||
| /* Create temp file from embedded BPF object */ | ||
| tmpfile = create_temp_bpf_file(app_test_bpf_filter_o, app_test_bpf_filter_o_len, "rx"); | ||
| if (tmpfile == NULL) | ||
| goto fail; | ||
|
|
||
| /* Do test with VM */ | ||
| ret = bpf_rx_test(port, tmpfile, pool, "drop", 0, 0); | ||
| if (ret != 0) | ||
| goto fail; | ||
|
|
||
| /* Repeat with JIT */ | ||
| ret = bpf_rx_test(port, tmpfile, pool, "drop", RTE_BPF_ETH_F_JIT, 0); | ||
| if (ret != 0) | ||
| goto fail; | ||
|
|
||
| /* Repeat with allow all */ | ||
| ret = bpf_rx_test(port, tmpfile, pool, "allow", 0, BPF_TEST_BURST); | ||
| if (ret != 0) | ||
| goto fail; | ||
|
|
||
| /* Repeat with JIT */ | ||
| ret = bpf_rx_test(port, tmpfile, pool, "allow", RTE_BPF_ETH_F_JIT, BPF_TEST_BURST); | ||
| if (ret != 0) | ||
| goto fail; | ||
|
|
||
| printf("%s: RX ELF load test passed\n", __func__); | ||
|
|
||
| /* The filter should free the mbufs */ | ||
| unsigned int avail = rte_mempool_avail_count(pool); | ||
| TEST_ASSERT_EQUAL(avail, BPF_TEST_POOLSIZE, | ||
| "Mempool available %u != %u leaks?", avail, BPF_TEST_POOLSIZE); | ||
|
|
||
| fail: | ||
| if (tmpfile) { | ||
| unlink(tmpfile); | ||
| free(tmpfile); | ||
| } | ||
|
|
||
| if (port != UINT16_MAX) | ||
| rte_vdev_uninit(null_dev); | ||
|
|
||
| rte_mempool_free(pool); | ||
|
|
||
| return ret == 0 ? TEST_SUCCESS : TEST_FAILED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Uninitialized port in test_bpf_elf_rx_load cleanup
test_bpf_elf_rx_load() declares uint16_t port; without initialization, but in the fail: path you check if (port != UINT16_MAX) before calling rte_vdev_uninit(null_dev);. If null_vdev_setup() fails before assigning port, this comparison reads an indeterminate value (undefined behavior).
Initialize port as in the TX test:
static int
test_bpf_elf_rx_load(void)
{
static const char null_dev[] = "net_null_bpf0";
- struct rte_mempool *pool = NULL;
- char *tmpfile = NULL;
- uint16_t port;
+ struct rte_mempool *pool = NULL;
+ char *tmpfile = NULL;
+ uint16_t port = UINT16_MAX;🏁 Script executed:
#!/bin/bash
# Check current state of test_bpf_elf_rx_load function and find test_bpf_elf_tx_load
rg -n "test_bpf_elf_rx_load|test_bpf_elf_tx_load" app/test/test_bpf.c -A 15 | head -80Length of output: 1364
🏁 Script executed:
#!/bin/bash
# Search for null_vdev_setup function definition to understand port assignment
rg -n "null_vdev_setup" app/test/test_bpf.c -B 2 -A 15Length of output: 1858
🏁 Script executed:
#!/bin/bash
# Verify the exact initialization patterns in both test functions
rg -n "uint16_t port" app/test/test_bpf.c -B 2 -A 2Length of output: 617
Initialize port to UINT16_MAX in test_bpf_elf_rx_load
The RX test declares uint16_t port; without initialization at line 3667, but the cleanup path checks if (port != UINT16_MAX) before calling rte_vdev_uninit(). If null_vdev_setup() fails before assigning port, this reads an indeterminate value (undefined behavior). The TX test correctly initializes port = UINT16_MAX at line 3574; apply the same fix to the RX test.
static int
test_bpf_elf_rx_load(void)
{
static const char null_dev[] = "net_null_bpf0";
struct rte_mempool *pool = NULL;
char *tmpfile = NULL;
- uint16_t port;
+ uint16_t port = UINT16_MAX;
int ret;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| test_bpf_elf_rx_load(void) | |
| { | |
| static const char null_dev[] = "net_null_bpf0"; | |
| struct rte_mempool *pool = NULL; | |
| char *tmpfile = NULL; | |
| uint16_t port; | |
| int ret; | |
| printf("%s start\n", __func__); | |
| /* Make a pool for packets */ | |
| pool = rte_pktmbuf_pool_create("bpf_rx_test_pool", 2 * BPF_TEST_BURST, | |
| 0, 0, RTE_MBUF_DEFAULT_BUF_SIZE, | |
| SOCKET_ID_ANY); | |
| TEST_ASSERT(pool != NULL, "failed to create mempool"); | |
| ret = null_vdev_setup(null_dev, &port, pool); | |
| if (ret != 0) | |
| goto fail; | |
| /* Create temp file from embedded BPF object */ | |
| tmpfile = create_temp_bpf_file(app_test_bpf_filter_o, app_test_bpf_filter_o_len, "rx"); | |
| if (tmpfile == NULL) | |
| goto fail; | |
| /* Do test with VM */ | |
| ret = bpf_rx_test(port, tmpfile, pool, "drop", 0, 0); | |
| if (ret != 0) | |
| goto fail; | |
| /* Repeat with JIT */ | |
| ret = bpf_rx_test(port, tmpfile, pool, "drop", RTE_BPF_ETH_F_JIT, 0); | |
| if (ret != 0) | |
| goto fail; | |
| /* Repeat with allow all */ | |
| ret = bpf_rx_test(port, tmpfile, pool, "allow", 0, BPF_TEST_BURST); | |
| if (ret != 0) | |
| goto fail; | |
| /* Repeat with JIT */ | |
| ret = bpf_rx_test(port, tmpfile, pool, "allow", RTE_BPF_ETH_F_JIT, BPF_TEST_BURST); | |
| if (ret != 0) | |
| goto fail; | |
| printf("%s: RX ELF load test passed\n", __func__); | |
| /* The filter should free the mbufs */ | |
| unsigned int avail = rte_mempool_avail_count(pool); | |
| TEST_ASSERT_EQUAL(avail, BPF_TEST_POOLSIZE, | |
| "Mempool available %u != %u leaks?", avail, BPF_TEST_POOLSIZE); | |
| fail: | |
| if (tmpfile) { | |
| unlink(tmpfile); | |
| free(tmpfile); | |
| } | |
| if (port != UINT16_MAX) | |
| rte_vdev_uninit(null_dev); | |
| rte_mempool_free(pool); | |
| return ret == 0 ? TEST_SUCCESS : TEST_FAILED; | |
| test_bpf_elf_rx_load(void) | |
| { | |
| static const char null_dev[] = "net_null_bpf0"; | |
| struct rte_mempool *pool = NULL; | |
| char *tmpfile = NULL; | |
| uint16_t port = UINT16_MAX; | |
| int ret; | |
| printf("%s start\n", __func__); | |
| /* Make a pool for packets */ | |
| pool = rte_pktmbuf_pool_create("bpf_rx_test_pool", 2 * BPF_TEST_BURST, | |
| 0, 0, RTE_MBUF_DEFAULT_BUF_SIZE, | |
| SOCKET_ID_ANY); | |
| TEST_ASSERT(pool != NULL, "failed to create mempool"); | |
| ret = null_vdev_setup(null_dev, &port, pool); | |
| if (ret != 0) | |
| goto fail; | |
| /* Create temp file from embedded BPF object */ | |
| tmpfile = create_temp_bpf_file(app_test_bpf_filter_o, app_test_bpf_filter_o_len, "rx"); | |
| if (tmpfile == NULL) | |
| goto fail; | |
| /* Do test with VM */ | |
| ret = bpf_rx_test(port, tmpfile, pool, "drop", 0, 0); | |
| if (ret != 0) | |
| goto fail; | |
| /* Repeat with JIT */ | |
| ret = bpf_rx_test(port, tmpfile, pool, "drop", RTE_BPF_ETH_F_JIT, 0); | |
| if (ret != 0) | |
| goto fail; | |
| /* Repeat with allow all */ | |
| ret = bpf_rx_test(port, tmpfile, pool, "allow", 0, BPF_TEST_BURST); | |
| if (ret != 0) | |
| goto fail; | |
| /* Repeat with JIT */ | |
| ret = bpf_rx_test(port, tmpfile, pool, "allow", RTE_BPF_ETH_F_JIT, BPF_TEST_BURST); | |
| if (ret != 0) | |
| goto fail; | |
| printf("%s: RX ELF load test passed\n", __func__); | |
| /* The filter should free the mbufs */ | |
| unsigned int avail = rte_mempool_avail_count(pool); | |
| TEST_ASSERT_EQUAL(avail, BPF_TEST_POOLSIZE, | |
| "Mempool available %u != %u leaks?", avail, BPF_TEST_POOLSIZE); | |
| fail: | |
| if (tmpfile) { | |
| unlink(tmpfile); | |
| free(tmpfile); | |
| } | |
| if (port != UINT16_MAX) | |
| rte_vdev_uninit(null_dev); | |
| rte_mempool_free(pool); | |
| return ret == 0 ? TEST_SUCCESS : TEST_FAILED; | |
| } |
🤖 Prompt for AI Agents
In app/test/test_bpf.c around lines 3662 to 3725, the local variable "uint16_t
port;" is used in the cleanup check "if (port != UINT16_MAX)" but is not
initialized, causing undefined behavior if null_vdev_setup() fails before
assignment; initialize port to UINT16_MAX at declaration (same as the TX test)
so the cleanup check is safe and rte_vdev_uninit() is only called when port was
successfully assigned.
| }; | ||
| struct rte_flow_error error; | ||
| struct rte_flow *flow = flow = rte_flow_create(port_id, &attr, pattern, actions, &error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix double assignment syntax error in C code example.
Line 844 contains a double assignment that would result in a syntax error if copied directly: struct rte_flow *flow = flow = rte_flow_create(...). The variable declaration and assignment should not include the intermediate flow =.
Apply this diff to fix the syntax error:
- struct rte_flow *flow = flow = rte_flow_create(port_id, &attr, pattern, actions, &error);
+ struct rte_flow *flow = rte_flow_create(port_id, &attr, pattern, actions, &error);🤖 Prompt for AI Agents
In doc/guides/nics/ice.rst around line 844, the C example contains a double
assignment "struct rte_flow *flow = flow = rte_flow_create(...)" which is a
syntax error; change it to declare and assign once by removing the extra "flow
=" so it reads as a single declaration with the call result assigned to flow.
| fslmc_bus_plug(struct rte_device *rte_dev) | ||
| { | ||
| /* No operation is performed while plugging the device */ | ||
| return 0; | ||
| int ret = 0; | ||
| struct rte_dpaa2_device *dev = container_of(rte_dev, | ||
| struct rte_dpaa2_device, device); | ||
| struct rte_dpaa2_driver *drv; | ||
|
|
||
| TAILQ_FOREACH(drv, &rte_fslmc_bus.driver_list, next) { | ||
| ret = rte_fslmc_match(drv, dev); | ||
| if (ret) | ||
| continue; | ||
|
|
||
| if (!drv->probe) | ||
| continue; | ||
|
|
||
| if (rte_dev_is_probed(&dev->device)) | ||
| continue; | ||
|
|
||
| if (dev->device.devargs && | ||
| dev->device.devargs->policy == RTE_DEV_BLOCKED) { | ||
| DPAA2_BUS_DEBUG("%s Blocked, skipping", | ||
| dev->device.name); | ||
| continue; | ||
| } | ||
|
|
||
| ret = drv->probe(drv, dev); | ||
| if (ret) { | ||
| DPAA2_BUS_ERR("Unable to probe"); | ||
| } else { | ||
| dev->driver = drv; | ||
| dev->device.driver = &drv->driver; | ||
| DPAA2_BUS_INFO("%s Plugged", dev->device.name); | ||
| } | ||
| break; | ||
| } | ||
|
|
||
| return ret; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify hot-plug behavior aligns with scan mode policy.
The fslmc_bus_plug implementation closely mirrors rte_fslmc_probe (lines 462-494) but omits the allowlist check. In rte_fslmc_probe, devices are only probed if either probe_all is true OR the device has an explicit RTE_DEV_ALLOWED policy (lines 481-483). However, fslmc_bus_plug probes any device that is not blocked and not already probed, potentially violating the scan mode configuration.
Additionally, there is significant code duplication between these two functions. The matching, blocking, and probing logic is nearly identical.
Consider:
- Adding the allowlist check to maintain policy consistency:
+ if (dev->device.devargs &&
+ dev->device.devargs->policy == RTE_DEV_BLOCKED) {
+ DPAA2_BUS_DEBUG("%s Blocked, skipping",
+ dev->device.name);
+ continue;
+ }
+
+ /* Only probe if explicitly allowed during hot-plug */
+ if (!dev->device.devargs ||
+ dev->device.devargs->policy != RTE_DEV_ALLOWED) {
+ continue;
+ }- Extracting the common probe logic into a helper function like
fslmc_probe_device(dev, drv)to eliminate duplication.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fslmc_bus_plug(struct rte_device *rte_dev) | |
| { | |
| /* No operation is performed while plugging the device */ | |
| return 0; | |
| int ret = 0; | |
| struct rte_dpaa2_device *dev = container_of(rte_dev, | |
| struct rte_dpaa2_device, device); | |
| struct rte_dpaa2_driver *drv; | |
| TAILQ_FOREACH(drv, &rte_fslmc_bus.driver_list, next) { | |
| ret = rte_fslmc_match(drv, dev); | |
| if (ret) | |
| continue; | |
| if (!drv->probe) | |
| continue; | |
| if (rte_dev_is_probed(&dev->device)) | |
| continue; | |
| if (dev->device.devargs && | |
| dev->device.devargs->policy == RTE_DEV_BLOCKED) { | |
| DPAA2_BUS_DEBUG("%s Blocked, skipping", | |
| dev->device.name); | |
| continue; | |
| } | |
| ret = drv->probe(drv, dev); | |
| if (ret) { | |
| DPAA2_BUS_ERR("Unable to probe"); | |
| } else { | |
| dev->driver = drv; | |
| dev->device.driver = &drv->driver; | |
| DPAA2_BUS_INFO("%s Plugged", dev->device.name); | |
| } | |
| break; | |
| } | |
| return ret; | |
| } | |
| fslmc_bus_plug(struct rte_device *rte_dev) | |
| { | |
| int ret = 0; | |
| struct rte_dpaa2_device *dev = container_of(rte_dev, | |
| struct rte_dpaa2_device, device); | |
| struct rte_dpaa2_driver *drv; | |
| TAILQ_FOREACH(drv, &rte_fslmc_bus.driver_list, next) { | |
| ret = rte_fslmc_match(drv, dev); | |
| if (ret) | |
| continue; | |
| if (!drv->probe) | |
| continue; | |
| if (rte_dev_is_probed(&dev->device)) | |
| continue; | |
| if (dev->device.devargs && | |
| dev->device.devargs->policy == RTE_DEV_BLOCKED) { | |
| DPAA2_BUS_DEBUG("%s Blocked, skipping", | |
| dev->device.name); | |
| continue; | |
| } | |
| /* Only probe if explicitly allowed during hot-plug */ | |
| if (!dev->device.devargs || | |
| dev->device.devargs->policy != RTE_DEV_ALLOWED) { | |
| continue; | |
| } | |
| ret = drv->probe(drv, dev); | |
| if (ret) { | |
| DPAA2_BUS_ERR("Unable to probe"); | |
| } else { | |
| dev->driver = drv; | |
| dev->device.driver = &drv->driver; | |
| DPAA2_BUS_INFO("%s Plugged", dev->device.name); | |
| } | |
| break; | |
| } | |
| return ret; | |
| } |
🤖 Prompt for AI Agents
In drivers/bus/fslmc/fslmc_bus.c around lines 592 to 629, hot-plug probing
ignores the scan-mode allowlist check and duplicates logic from rte_fslmc_probe;
update fslmc_bus_plug to enforce the same policy as rte_fslmc_probe by skipping
probes unless probe_all is true or dev->device.devargs->policy ==
RTE_DEV_ALLOWED, and refactor the common match/block/probe sequence into a
shared helper (e.g., fslmc_probe_device(dev, drv)) that performs the
rte_fslmc_match call, devargs policy checks (blocked/allowed), rte_dev_is_probed
check, calls drv->probe and sets dev->driver and dev->device.driver on success,
then have both rte_fslmc_probe and fslmc_bus_plug call that helper to remove
duplication.
| for (i = 0; i < priv->nb_rx_queues; i++) { | ||
| dpaa2_q = priv->rx_vq[i]; | ||
| dpaa2_clear_queue_active_dps(dpaa2_q, | ||
| RTE_MAX_LCORE); | ||
| dpaa2_queue_storage_free(dpaa2_q, | ||
| RTE_MAX_LCORE); | ||
| priv->rx_vq[i] = NULL; | ||
| } | ||
| /* cleanup tx queue cscn */ | ||
| for (i = 0; i < priv->nb_tx_queues; i++) { | ||
| dpaa2_q = priv->tx_vq[i]; | ||
| rte_free(dpaa2_q->cscn); | ||
| priv->tx_vq[i] = NULL; | ||
| } | ||
| if (priv->flags & DPAA2_TX_CONF_ENABLE) { | ||
| /* cleanup tx conf queue storage */ | ||
| for (i = 0; i < priv->nb_tx_queues; i++) { | ||
| dpaa2_q = priv->tx_conf_vq[i]; | ||
| dpaa2_queue_storage_free(dpaa2_q, | ||
| RTE_MAX_LCORE); | ||
| priv->tx_conf_vq[i] = NULL; | ||
| } | ||
| } | ||
| if (priv->flags & DPAAX_RX_ERROR_QUEUE_FLAG) { | ||
| dpaa2_q = priv->rx_err_vq; | ||
| dpaa2_queue_storage_free(dpaa2_q, RTE_MAX_LCORE); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Free path should also release rx_err_vq and (optionally) clear its active DQS
The added cleanup for RX/TX queues is good: you now clear active DQS before freeing RX queue storage and NULL out rx_vq[i], tx_vq[i], and tx_conf_vq[i] entries.
Two issues around the error RX queue remain:
rx_err_vqis allocated withrte_zmalloc("dpni_rx_err", sizeof(struct dpaa2_queue), ...)indpaa2_alloc_rx_tx_queues, but here you only calldpaa2_queue_storage_free()on it and neverrte_free()thestruct dpaa2_queueitself or NULL the pointer. That struct will leak for devices using the error queue.- Because this block is nested under
if (priv->rx_vq[0]), the error‑queue struct is not freed in the allocation‑failure path wherepriv->rx_vq[0]has been set to NULL butrx_err_vqwas already allocated and its storage freed in the fail path.
You can address both by unconditionally cleaning up rx_err_vq outside the rx_vq[0] block, and fully releasing the struct as well. For example:
static void
dpaa2_free_rx_tx_queues(struct rte_eth_dev *dev)
{
@@
- /* Queue allocation base */
- if (priv->rx_vq[0]) {
+ /* Queue allocation base */
+ if (priv->rx_vq[0]) {
@@
dpaa2_queue_storage_free(dpaa2_q,
RTE_MAX_LCORE);
priv->rx_vq[i] = NULL;
@@
rte_free(dpaa2_q->cscn);
priv->tx_vq[i] = NULL;
@@
dpaa2_queue_storage_free(dpaa2_q,
RTE_MAX_LCORE);
priv->tx_conf_vq[i] = NULL;
}
}
- if (priv->flags & DPAAX_RX_ERROR_QUEUE_FLAG) {
- dpaa2_q = priv->rx_err_vq;
- dpaa2_queue_storage_free(dpaa2_q, RTE_MAX_LCORE);
- }
-
/*free memory for all queues (RX+TX) */
rte_free(priv->rx_vq[0]);
priv->rx_vq[0] = NULL;
}
+
+ if (priv->flags & DPAAX_RX_ERROR_QUEUE_FLAG && priv->rx_err_vq) {
+ dpaa2_clear_queue_active_dps(priv->rx_err_vq, RTE_MAX_LCORE);
+ dpaa2_queue_storage_free(priv->rx_err_vq, RTE_MAX_LCORE);
+ rte_free(priv->rx_err_vq);
+ priv->rx_err_vq = NULL;
+ }
}This ensures the error queue’s per‑lcore state and struct are always reclaimed, including when queue allocation failed earlier.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In drivers/net/dpaa2/dpaa2_ethdev.c around lines 667 to 694, the rx_err_vq
allocated in dpaa2_alloc_rx_tx_queues is only having its per-lcore storage freed
but the dpaa2_queue struct itself is never rte_free()'d nor its pointer NULLed,
and because current cleanup is nested under the rx_vq[0] block it can leak when
rx_vq[0] is cleared on earlier failure; move unconditional cleanup of
priv->rx_err_vq outside the rx_vq[0] conditional, call
dpaa2_clear_queue_active_dps(priv->rx_err_vq, RTE_MAX_LCORE) (optional but
recommended) then dpaa2_queue_storage_free(priv->rx_err_vq, RTE_MAX_LCORE),
followed by rte_free(priv->rx_err_vq) and set priv->rx_err_vq = NULL so the
struct and pointer are always reclaimed even on partial allocation failure.
| struct rte_eth_dev *eth_dev; | ||
| int ret; | ||
| int ret = 0; | ||
|
|
||
| eth_dev = rte_eth_dev_allocated(dpaa2_dev->device.name); | ||
| if (eth_dev) { | ||
| ret = dpaa2_dev_close(eth_dev); | ||
| if (ret) | ||
| DPAA2_PMD_ERR("dpaa2_dev_close ret= %d", ret); | ||
|
|
||
| ret = rte_eth_dev_release_port(eth_dev); | ||
| } | ||
|
|
||
| eth_dev = dpaa2_dev->eth_dev; | ||
| dpaa2_dev_close(eth_dev); | ||
| dpaa2_valid_dev--; | ||
| if (!dpaa2_valid_dev) | ||
| if (!dpaa2_valid_dev) { | ||
| rte_mempool_free(dpaa2_tx_sg_pool); | ||
| ret = rte_eth_dev_release_port(eth_dev); | ||
| dpaa2_tx_sg_pool = NULL; | ||
| } | ||
|
|
||
| return ret; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent dpaa2_valid_dev underflow and only refcount when an eth_dev exists
In rte_dpaa2_remove, dpaa2_valid_dev is decremented unconditionally, even when rte_eth_dev_allocated() returns NULL. If a probe failed earlier and no rte_eth_dev was ever registered for this dpaa2_device, you’ll still decrement from 0 to −1. That desynchronizes the reference count from the actual number of live ports and can prevent dpaa2_tx_sg_pool from being freed when the last successfully‑probed device is removed.
You can tighten the logic by only adjusting the refcount (and freeing the pool) when an eth_dev was found:
static int
rte_dpaa2_remove(struct rte_dpaa2_device *dpaa2_dev)
{
- struct rte_eth_dev *eth_dev;
- int ret = 0;
-
- eth_dev = rte_eth_dev_allocated(dpaa2_dev->device.name);
- if (eth_dev) {
- ret = dpaa2_dev_close(eth_dev);
- if (ret)
- DPAA2_PMD_ERR("dpaa2_dev_close ret= %d", ret);
-
- ret = rte_eth_dev_release_port(eth_dev);
- }
-
- dpaa2_valid_dev--;
- if (!dpaa2_valid_dev) {
- rte_mempool_free(dpaa2_tx_sg_pool);
- dpaa2_tx_sg_pool = NULL;
- }
-
- return ret;
+ struct rte_eth_dev *eth_dev;
+ int ret = 0;
+
+ eth_dev = rte_eth_dev_allocated(dpaa2_dev->device.name);
+ if (eth_dev) {
+ ret = dpaa2_dev_close(eth_dev);
+ if (ret)
+ DPAA2_PMD_ERR("dpaa2_dev_close ret= %d", ret);
+
+ ret = rte_eth_dev_release_port(eth_dev);
+
+ dpaa2_valid_dev--;
+ if (dpaa2_valid_dev == 0 && dpaa2_tx_sg_pool) {
+ rte_mempool_free(dpaa2_tx_sg_pool);
+ dpaa2_tx_sg_pool = NULL;
+ }
+ }
+
+ return ret;
}This keeps the global device count consistent with the actual set of allocated rte_eth_dev instances and avoids leaking or prematurely freeing the shared TX SG mempool.
🤖 Prompt for AI Agents
In drivers/net/dpaa2/dpaa2_ethdev.c around lines 3379 to 3398, dpaa2_valid_dev
is decremented unconditionally even when rte_eth_dev_allocated() returns NULL;
change the logic so the refcount and the pool-freeing happen only when eth_dev
is non-NULL: move the dpaa2_valid_dev-- and the check/free of dpaa2_tx_sg_pool
inside the if (eth_dev) block (so they run only when an eth_dev was found and
release_port was attempted), and leave ret as-is (0) when no eth_dev exists to
avoid underflow and keep the global count consistent with actual allocated
ports.
| if (vf->vf_res->vf_cap_flags & VIRTCHNL_VF_CAP_PTP && | ||
| vf->ptp_caps & VIRTCHNL_1588_PTP_CAP_RX_TSTAMP) | ||
| dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_TIMESTAMP; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Stricter RX timestamp capability check looks right; please confirm ptp_caps init timing
Requiring both VIRTCHNL_VF_CAP_PTP and vf->ptp_caps & VIRTCHNL_1588_PTP_CAP_RX_TSTAMP ensures we only advertise RTE_ETH_RX_OFFLOAD_TIMESTAMP when the PF explicitly reports RX timestamp support, which is a sensible tightening.
One subtle point to double‑check: rte_eth_dev_info_get() can be called before iavf_get_ptp_cap() runs. If vf->ptp_caps is still zero in that phase (or if older PFs don’t populate it), RX timestamp offload might no longer be advertised even though the VF can timestamp packets. If that’s possible, either initialising ptp_caps earlier or falling back to VF_CAP_PTP when ptp_caps == 0 may be needed.
🏁 Script executed:
#!/bin/bash
# Verify where and when vf->ptp_caps is set relative to dev_info_get().
# Show all references to ptp_caps in the iavf driver
rg -n "ptp_caps" drivers/net/intel/iavf -C3
# Also list calls to iavf_get_ptp_cap to see if it’s only invoked from dev_start
rg -n "iavf_get_ptp_cap" drivers/net/intel/iavf -C3Length of output: 5791
🏁 Script executed:
#!/bin/bash
# Find function names and full context around the two key locations
# Context around line 1023 (iavf_get_ptp_cap call)
echo "=== Context around iavf_get_ptp_cap call (line 1023) ==="
sed -n '1000,1040p' drivers/net/intel/iavf/iavf_ethdev.c | head -50
echo ""
echo "=== Context around dev_info check (line 1180) ==="
sed -n '1160,1200p' drivers/net/intel/iavf/iavf_ethdev.c
echo ""
echo "=== Find where ptp_caps might be initialized ==="
rg -n "ptp_caps\s*=" drivers/net/intel/iavf -C2Length of output: 2985
🏁 Script executed:
#!/bin/bash
# Find exact function names
echo "=== Function containing line 1023 ==="
awk 'NR<=1023 {if (/^[a-zA-Z_].*\(/) last=$0} NR==1023 {print last; exit}' drivers/net/intel/iavf/iavf_ethdev.c
echo ""
echo "=== Function containing line 1180 ==="
awk 'NR<=1180 {if (/^[a-zA-Z_].*\(/) last=$0} NR==1180 {print last; exit}' drivers/net/intel/iavf/iavf_ethdev.c
echo ""
echo "=== Search for function signatures ==="
rg -n "^[a-z_]+.*dev_start|^[a-z_]+.*dev_info_get|^[a-z_]+.*iavf_dev_start" drivers/net/intel/iavf/iavf_ethdev.c | head -10
echo ""
echo "=== Check if vf structure is memset/zeroed ==="
rg -n "memset.*vf\)|calloc.*iavf_info\)|malloc.*iavf_info" drivers/net/intel/iavf -C2 | head -30
echo ""
echo "=== Check where vf struct is allocated ==="
rg -n "IAVF_DEV_PRIVATE_TO_VF|iavf_info.*alloc" drivers/net/intel/iavf -C2 | head -20Length of output: 1959
🏁 Script executed:
#!/bin/bash
# Find device private allocation
echo "=== Search for dev_private allocation ==="
rg -n "dev_private.*calloc|dev_private.*malloc|rte_zmalloc.*dev_private|rte_malloc.*dev_private" drivers/net/intel/iavf -C3
echo ""
echo "=== Check probe/init function ==="
rg -n "eth_dev_alloc|eth_dev_pci_allocate|rte_eth_dev_allocate" drivers/net/intel/iavf -C3 | head -30
echo ""
echo "=== Search for device initialization sequence ==="
rg -n "static struct eth_dev_ops|.dev_info_get\s*=|.dev_start\s*=" drivers/net/intel/iavf/iavf_ethdev.c -C1 | head -20Length of output: 283
🏁 Script executed:
#!/bin/bash
# Find probe/attach function and device allocation
echo "=== Search for eth_dev allocation in probe ==="
rg -n "eth_dev_pci_probe|eth_dev_pci_generic_probe|rte_eth_dev_pci_allocate" drivers/net/intel/iavf -C5 | head -40
echo ""
echo "=== Check for secondary/primary function ==="
rg -n "secondary_init|iavf_adapter_init|iavf_init_vf" drivers/net/intel/iavf/iavf_ethdev.c -C3 | head -30
echo ""
echo "=== Look for struct initialization in probe context ==="
rg -n "pci_probe|pci_attach|dev_probe" drivers/net/intel/iavf/iavf_ethdev.c -A20 | head -50
echo ""
echo "=== Check if ptp_caps has default/fallback behavior ==="
rg -n "vf->ptp_caps\s*=\s*0|ptp_caps.*init|ptp_caps.*reset" drivers/net/intel/iavf -C2Length of output: 2757
🏁 Script executed:
#!/bin/bash
# Verify the device private allocation uses calloc (zero-init)
echo "=== Check DPDK rte_eth_dev_pci_generic_probe behavior ==="
echo "Documentation: rte_eth_dev_pci_generic_probe typically uses rte_zmalloc or calloc"
echo "This means dev_private is zero-initialized"
echo ""
echo "=== Verify iavf_init_vf context (called during probe) ==="
sed -n '2535,2700p' drivers/net/intel/iavf/iavf_ethdev.c | head -100
echo ""
echo "=== Check if dev_info_get is ever called before dev_start in DPDK ==="
echo "Standard DPDK flow: probe -> configure -> (can call dev_info_get) -> start"
echo "dev_info_get() is a query that can be called anytime for device capabilities"
echo ""
echo "=== Look for what changed in the condition (git diff or comments) ==="
rg -n "VIRTCHNL_VF_CAP_PTP|VF_CAP_PTP" drivers/net/intel/iavf/iavf_ethdev.c -B2 -A2Length of output: 3699
RX timestamp capability check has a real ordering issue—ptp_caps may be zero when dev_info_get() is called
The verification confirms the original concern is valid. Here's the problem:
vf->ptp_capsis zero-initialized when the device is probed (viarte_eth_dev_pci_generic_probe)iavf_get_ptp_cap()is called only iniavf_dev_start()(line 1023), which setsvf->ptp_caps- In DPDK,
dev_info_get()can be called beforedev_start()(during configure phase) - The new condition at lines 1180–1182 requires both flags to be set; if
ptp_capsis still 0, the condition fails andRTE_ETH_RX_OFFLOAD_TIMESTAMPis not advertised, even though the VF supports it
Fix: Either query ptp_caps during probe/init (not just dev_start), or add fallback logic to dev_info_get() to handle the case where ptp_caps == 0 but VIRTCHNL_VF_CAP_PTP is set.
🤖 Prompt for AI Agents
In drivers/net/intel/iavf/iavf_ethdev.c around lines 1180–1182, dev_info_get()
currently checks both vf->vf_res->vf_cap_flags and vf->ptp_caps to decide
whether to advertise RTE_ETH_RX_OFFLOAD_TIMESTAMP, but vf->ptp_caps may still be
zero during probe/configure (dev_info_get can be called before dev_start),
causing missed advertisement; fix by either populating vf->ptp_caps earlier
(call the existing iavf_get_ptp_cap() during probe/init so ptp_caps is set
before dev_info_get is called) or add fallback logic in dev_info_get(): if
vf_cap_flags has VIRTCHNL_VF_CAP_PTP and vf->ptp_caps is zero, attempt to
query/set ptp_caps (via the same helper) or conservatively treat the VF as
supporting RX timestamps (set RTE_ETH_RX_OFFLOAD_TIMESTAMP) only when confirmed,
ensuring the code avoids false positives and keeps behavior consistent with how
capabilities are determined elsewhere.
| tg = get_ctx().func_tg | ||
| if tg: | ||
| tg.send_packets(packets, get_ctx().topology.tg_port_egress) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silent failure hides configuration errors.
If func_tg is None, this function silently does nothing, which could hide bugs or misconfiguration. Consider either:
- Raising an error if
func_tgisNone(consistent withsend_packets_and_capture), or - Documenting when
func_tgcan beNoneand why silent failure is appropriate.
If option 1 is preferred:
packets = adjust_addresses(packets)
tg = get_ctx().func_tg
- if tg:
- tg.send_packets(packets, get_ctx().topology.tg_port_egress)
+ assert tg is not None, "No functional traffic generator available"
+ tg.send_packets(packets, get_ctx().topology.tg_port_egress)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| tg = get_ctx().func_tg | |
| if tg: | |
| tg.send_packets(packets, get_ctx().topology.tg_port_egress) | |
| tg = get_ctx().func_tg | |
| assert tg is not None, "No functional traffic generator available" | |
| tg.send_packets(packets, get_ctx().topology.tg_port_egress) |
🤖 Prompt for AI Agents
In dts/api/packet.py around lines 114-116, the current code silently skips
sending when get_ctx().func_tg is None; update it to mirror
send_packets_and_capture by explicitly checking for func_tg and raising a clear
exception (e.g., RuntimeError or ValueError) when it is None, including
contextual info (like topology name or tg_port_egress) in the message so
misconfiguration is visible; replace the silent if-guard with a check that
raises when missing, otherwise call tg.send_packets(...) as before.
| def assess_performance_by_packet( | ||
| packet: Packet, duration: float, send_mpps: int | None = None | ||
| ) -> PerformanceTrafficStats: | ||
| """Send a given packet for a given duration and assess basic performance statistics. | ||
| Send `packet` and assess NIC performance for a given duration, corresponding to the test | ||
| suite's given topology. | ||
| Args: | ||
| packet: The packet to send. | ||
| duration: Performance test duration (in seconds). | ||
| send_mpps: The millions packets per second send rate. | ||
| Returns: | ||
| Performance statistics of the generated test. | ||
| """ | ||
| from framework.testbed_model.traffic_generator.performance_traffic_generator import ( | ||
| PerformanceTrafficGenerator, | ||
| ) | ||
|
|
||
| assert isinstance( | ||
| get_ctx().perf_tg, PerformanceTrafficGenerator | ||
| ), "Cannot send performance traffic with non-performance traffic generator" | ||
| tg: PerformanceTrafficGenerator = cast(PerformanceTrafficGenerator, get_ctx().perf_tg) | ||
| # TODO: implement @requires for types of traffic generator | ||
| return tg.calculate_traffic_and_stats(packet, duration, send_mpps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Missing address adjustment for packet.
Unlike send_packets_and_capture (line 95) and send_packets (line 113), this function doesn't call adjust_addresses(packet) before sending. This inconsistency could result in incorrect L2/L3 addresses in performance tests.
Verify whether performance traffic requires address adjustment by checking if the underlying calculate_traffic_and_stats handles addressing or if callers are expected to provide properly addressed packets:
🏁 Script executed:
#!/bin/bash
# Search for usages of assess_performance_by_packet to see if callers pre-adjust packets
rg -n -C5 'assess_performance_by_packet'
# Search for how calculate_traffic_and_stats is used in other contexts
rg -n -C5 'calculate_traffic_and_stats'Length of output: 6054
🏁 Script executed:
#!/bin/bash
# Get broader context around the assess_performance_by_packet call in the test
head -80 dts/tests/TestSuite_single_core_forward_perf.py | tail -50
# Search for _generate_traffic implementation to see if it handles address adjustment
rg -n -A10 'def _generate_traffic' dts/framework/testbed_model/traffic_generator/Length of output: 193
---
🏁 Script executed:
```shell
#!/bin/bash
# Find the actual paths using fd
fd -t f 'packet.py' | head -5
fd -t f 'TestSuite_single_core_forward_perf.py' | head -5
fd -t f 'performance_traffic_generator.py' | head -5
# Search for _generate_traffic implementation across codebase
rg -n 'def _generate_traffic' | head -20
Length of output: 229
---
🏁 Script executed:
```shell
#!/bin/bash
# Get context around the assess_performance_by_packet call
sed -n '40,80p' dts/tests/TestSuite_single_core_forward_perf.py
# Get _generate_traffic implementation
rg -n -A20 'def _generate_traffic' dts/framework/testbed_model/traffic_generator/trex.py
# Compare with how send_packets and send_packets_and_capture use adjust_addresses
sed -n '90,120p' dts/api/packet.py
Length of output: 3070
---
🏁 Script executed:
```shell
#!/bin/bash
# Find and show adjust_addresses implementation
rg -n -A15 'def adjust_addresses' dts/api/packet.py
# Check if there are any comments or patterns explaining why perf_tg differs from func_tg
rg -n -B5 -A5 'perf_tg|func_tg' dts/api/packet.py | head -50
Length of output: 2215
---
🏁 Script executed:
```shell
#!/bin/bash
# Look for other usages of assess_performance_by_packet or perf_tg.calculate_traffic_and_stats
rg -n 'perf_tg|assess_performance_by_packet' --type py -C3 | head -100
# Check _create_packet_stream implementation to see if it handles addressing
rg -n -A20 'def _create_packet_stream' dts/framework/testbed_model/traffic_generator/trex.py
Length of output: 6301
Missing address adjustment for packet.
Unlike send_packets_and_capture (line 95) and send_packets (line 113), this function doesn't call adjust_addresses(packet) before sending. The adjust_addresses function performs L2 and L3 address adjustments for the two-link topology, which the performance test also requires. Without this adjustment, performance tests will send packets with placeholder addresses instead of topology-aware addresses, potentially invalidating performance measurements.
🤖 Prompt for AI Agents
In dts/api/packet.py around lines 327 to 352, the function
assess_performance_by_packet omits the topology address adjustment done in other
send helpers; call adjust_addresses(packet) immediately before sending so the
packet gets L2/L3 addresses for the two-link topology (same as
send_packets_and_capture/send_packets). Add the call (adjust_addresses(packet))
just before invoking tg.calculate_traffic_and_stats; ensure adjust_addresses is
available in scope (import or use the existing helper in this module) and that
the function uses the adjusted packet passed to the traffic generator.
NOTE: This is an auto submission for "net/mlx5: fix IP tunnel detection for IPIP and IPv6 ENCAP".
See "http://patchwork.dpdk.org/project/dpdk/list/?series=36752" for details.
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation