Skip to content

Conversation

@ovsrobot
Copy link
Owner

@ovsrobot ovsrobot commented Nov 19, 2025

NOTE: This is an auto submission for "net/txgbe: fix the missing old mailbox interface calls".

See "http://patchwork.dpdk.org/project/dpdk/list/?series=36750" for details.

Summary by CodeRabbit

  • New Features

    • Added configurable DCB forwarding with per-traffic-class core assignment.
    • Added BPF ELF program loading tests for packet filtering.
  • Bug Fixes

    • Fixed memory allocation and timestamp validation issues.
    • Improved error handling in various driver paths.
  • Documentation

    • Updated configuration guides for DTS and traffic generator setup.
    • Enhanced network interface documentation with examples.
  • Tests

    • Added AES-CCM cryptographic test cases.

bruce-richardson and others added 30 commits November 12, 2025 17:21
The capabilities flag for the vector offload path include the QinQ
offload capability, but in fact the offload path lacks any ability to
create context descriptors. This means that it cannot insert multiple
vlan tags for QinQ support, so move the offload from the VECTOR_OFFLOAD
list to the NO_VECTOR list. Similarly, remove any check for the QinQ
mbuf flag in any packets being transmitted, since that offload is
invalid to request if the feature is not enabled.

Fixes: 808a17b ("net/ice: add Rx AVX512 offload path")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
The common Rx path selection logic iterates through an array of
candidate paths and selects the best fit for the requested features.
Currently, in the event that two potential candidates are identified,
the one with the fewer offloads (and thus less complex path) is
selected. However this is not correct, because if the path with more
offloads has a greater SIMD width, that should be chosen. This commit
reworks the logic so that the number of offloads is only taken into
consideration when choosing between two paths with the same SIMD width.

Since the paths arrays are ordered from lowest SIMD width to highest,
and vector paths tend to have fewer offloads enabled than scalar paths,
"new" candidate paths with greater SIMDs widths tended to have fewer or
equal offloads than the "current" candidate paths and thus were
correctly accepted as the best candidate. For this reason the incorrect
logic did not cause any incorrect path selections in practise.

Fixes: 9d99641 ("net/intel: introduce infrastructure for Rx path selection")

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Add methods to Linux Session class for setting the link
of an interface up and deleting an interface.

Signed-off-by: Dean Marx <dmarx@iol.unh.edu>
Reviewed-by: Patrick Robb <probb@iol.unh.edu>
Add a command to the testpmd shell for setting the portlist
(list of forwarding ports) within a testpmd session. This
allows for changing the forwarding order between ports.

Signed-off-by: Dean Marx <dmarx@iol.unh.edu>
Reviewed-by: Patrick Robb <probb@iol.unh.edu>
Add test suite covering virtio-user/vhost-user
server/client forwarding scenarios with
testpmd packet validation.

Signed-off-by: Dean Marx <dmarx@iol.unh.edu>
Reviewed-by: Patrick Robb <probb@iol.unh.edu>
Add a section to the dts rst under How to Write a Test Suite
which provides an example for how to write a test case
docstring, including a steps and verify section.

Signed-off-by: Dean Marx <dmarx@iol.unh.edu>
Reviewed-by: Luca Vizzarro <luca.vizzarro@arm.com>
Current documentation for protocol agnostic filtering for ICE driver is a
bit terse and relies on a lot of assumed knowledge. Document the feature
better and make all of the assumptions explicit.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
The statistics contain 40 bits. The lower 32 bits are read first, followed
by the upper 8 bits.

In some cases, after reading the lower 32 bits, a carry occurs from
the lower bits, which causes the final statistics to be incorrect.

This commit fixes this issue.

Fixes: a37bde5 ("net/ice: support statistics")
Cc: stable@dpdk.org

Signed-off-by: Zhichao Zeng <zhichaox.zeng@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
As per recent change by the following commit:

commit 066f3d9 ("ethdev: remove callback checks from fast path")

framework unconditionally invokes dev->tx_pkt_prepare.
Due to this, ICE driver gets crashed as tx_pkt_prepare
was set to NULL during initialization.

Ensure dev->tx_pkt_prepare is not NULL when vector or simple
Tx paths are selected, by assigning rte_eth_tx_pkt_prepare_dummy.

This aligns with expectations with above mentioned commit.

Bugzilla ID: 1795
Fixes: 6eac0b7 ("net/ice: support advance Rx/Tx")
Cc: stable@dpdk.org

Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Tested-by: Hailin Xu <hailinx.xu@intel.com>
As per recent change by the following commit:

commit 066f3d9 ("ethdev: remove callback checks from fast path")

framework unconditionally invokes dev->tx_pkt_prepare. Ensure
dev->tx_pkt_prepare is not NULL when vector or simple TX paths are
selected, by assigning rte_eth_tx_pkt_prepare_dummy.

This aligns with expectations with above mentioned commit.

Fixes: 7829b8d ("net/ixgbe: add Tx preparation")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
As per recent change by the following commit:

commit 066f3d9 ("ethdev: remove callback checks from fast path")

framework unconditionally invokes dev->tx_pkt_prepare. Ensure
dev->tx_pkt_prepare is not NULL when vector or simple TX paths are
selected, by assigning rte_eth_tx_pkt_prepare_dummy.

This aligns with expectations with above mentioned commit.

Fixes: 9b134aa ("net/fm10k: add Tx preparation")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
Create a new default tree for the SDP interface if more than one Tx
queue is requested. This helps to back pressure each queue independently
when they are created with separate channels.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Drain all CQ buffers and close CQ when SQ enabled completion is about to
stop.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Update scatter check as warning for SDP interfaces instead of error
to support cases where host application is already aware for the max
buf size.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
This patch fixes the inline device functions to work
when roc_nix is NULL.

Fixes: f81ee71 ("common/cnxk: support inline SA context invalidate")
Cc: stable@dpdk.org

Signed-off-by: Monendra Singh Kushwaha <kmonendra@marvell.com>
LSO enhanced to support flags modification. Added new mbox to enable
this feature.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Extend LSO offload to support IPv4 fragmentation.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
While performing IPv4 fragmentation, consider the DF flag from the
original packet header instead of setting it to zero.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
SQ context extended with new feature, if enabled the counter is updated
when a packet if processed, whether it is transmitted or dropped.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Added function to check whether board supports
16B alignment

Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
On CN20K SoC, back pressure can be configured for eight
different traffic classes per pool along with threshold and
BPIDs.

RoC API is added to configure the same.

Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
By default, SQB pool is created with max (512 buffers) +
extra threshold buffers and aura limit is set to 512 + thr.

But while clean up, aura limit is reset to MAX (512 buffers)
only before destroying the pool.

Hence while destroying the pool, only 512 buffers are cleaned
from aura and extra threshold buffers are left as it is.

At later stage if same SQB pool is created then H/W
throws error for extra threshold buffers that it is already
in pool.

Fixes: 780f90e ("common/cnxk: restore NIX SQB pool limit before destroy")
Cc: stable@dpdk.org

Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
Add support for SQ resize by making SQB mem allocated
in chunks of SQB size.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
CN10K platform supports Tx schedulers up to 2K.

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Fix klockwork for NULL pointer dereference

Signed-off-by: Rakesh Kudurumalla <rkudurumalla@marvell.com>
Condition to check SQ is non NULL before access. Also pktio locks
are simplified while doing threshold_profile config.

Fixes: 90a903f ("common/cnxk: split NIX TM hierarchy enable API")
Cc: stable@dpdk.org

Signed-off-by: Satha Rao <skoteshwar@marvell.com>
Aura field width has changed from 20 bits to 17 bits for
cn20k.

Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Fix issue reported by klocwork.

Fixes: f410059 ("common/cnxk: support inline inbound queue")
Cc: stable@dpdk.org

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Fix format specifier for bandwidth profile ID.

Fixes: db5744d ("common/cnxk: support NIX debug for CN20K")
Cc: stable@dpdk.org

Signed-off-by: Aarnav JP <ajp@marvell.com>
Fix CPT res address config logic to avoid garbage values
and trigger only when inline dev is present.

Fixes: 3c31a74 ("common/cnxk: config CPT result address for CN20K")
Cc: stable@dpdk.org

Signed-off-by: Aarnav JP <ajp@marvell.com>
roidayan and others added 27 commits November 18, 2025 14:19
The cited commit removed the representor interrupt
handler cleanup by mistake.

Fixes: 5cf0707 ("net/mlx5: remove Rx queue data list from device")

Signed-off-by: Roi Dayan <roid@nvidia.com>
Acked-by: Suanming Mou <suanmingm@nvidia.com>
When creating a new mempool but assigning a shared entries
from a different mempool need to release the newly unused
allocated entries. Fix it.

Fixes: 8947eeb ("common/mlx5: fix shared memory region ranges allocation")

Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Suanming Mou <suanmingm@nvidia.com>
`buddy` was erroroneously declared as static. When multiple
threads call this routine, they set the same static variable
corrupting pool data and can cause potential double free when
releasing resources.

Fixes: b4dd7bc ("net/mlx5/hws: add pool and buddy")

Signed-off-by: Nupur Uttarwar <nuttarwar@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
When mlx5_dev_start() fails partway through initialization, the error
cleanup code unconditionally calls cleanup functions for all steps,
including those that were never successfully initialized. This causes
state corruption leading to incorrect behavior on subsequent start
attempts.

The issue manifests as:
1. First start attempt fails with -ENOMEM (expected)
2. Second start attempt returns -EINVAL instead of -ENOMEM
3. With flow isolated mode, second attempt incorrectly succeeds,
   leading to segfault in rte_eth_rx_burst()

Root cause: The single error label cleanup path calls functions like
mlx5_traffic_disable() and mlx5_flow_stop_default() even when their
corresponding initialization functions (mlx5_traffic_enable() and
mlx5_flow_start_default()) were never called due to earlier failure.

For example, when mlx5_rxq_start() fails:
- mlx5_traffic_enable() at line 1403 never executes
- mlx5_flow_start_default() at line 1420 never executes
- But cleanup unconditionally calls:
  * mlx5_traffic_disable() - destroys control flows list
  * mlx5_flow_stop_default() - corrupts flow metadata state

This corrupts the device state, causing subsequent start attempts to
fail with different errors or, in isolated mode, to incorrectly succeed
with an improperly initialized device.

Fix by replacing the single error label with cascading error labels
(Linux kernel style). Each label cleans up only its corresponding step,
then falls through to clean up earlier steps.
This ensures only successfully initialized steps are cleaned up,
maintaining device state consistency across failed start attempts.

Bugzilla ID: 1419
Fixes: 8db7e3b ("net/mlx5: change operations for non-cached flows")
Cc: stable@dpdk.org

Signed-off-by: Maayan Kashani <mkashani@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Gcc-16 detects use of uninitialized variable.
If the retry loop exits the code would do memcmp against uninitialized
stack value. Resolve by initializing to zero.

Bugzilla ID: 1823
Fixes: 1256805 ("net/mlx5: move Linux-specific functions")
Fixes: cfee947 ("net/mlx5: fix link status to use wait to complete")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Raslan Darawsheh <rasland@nvidia.com>
If flow template table was created with resizable attribute,
then completing table resize could fail for 2 user-related reasons:

- not all flow rules were yet updated to use the resized table,
- resize was not started.

Both of these were reported with the same error message
i.e., "cannot complete table resize".

Since PMD can distinguish these 2 cases, this patch improves the error
reporting to report these 2 errors separately.

Also, this patch removes redundant __rte_unused on device parameter.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Gregory Etelson <getelson@nvidia.com>
Acked-by: Bing Zhao <bingz@nvidia.com>
Offending patch introduced support for additional flow tag indexes
with HW Steering flow engine. New tag indexes will be mapped
to HW registers REG_C_8 to REG_C_11, depending on HW capabilities.

That patch only handled tables created on group > 0 (non-root table),
where mlx5 PMD directly configures the HW.
Tables and flow rules on group 0 (root table) are handled through
kernel driver, and new registers were not addressed for that case.

Because of that, usage of unsupported tag index in group 0
triggered an assertion in flow_dv_match_meta_reg().

This patch adds necessary definitions for REG_C_8 to REG_C_11 to make
these registers usable for flow tag indexes in root table.
Validation of flow tag to HW register translation is also amended
to report invalid cases to the user, instead of relying on assertions.

Fixes: 7e3a144 ("net/mlx5/hws: support 4 additional C registers")
Cc: stable@dpdk.org

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Some cases are not supported for rule hash calculation. For example
when the matcher defined as FW matcher, or is the hash type is different
than CRC32. One case is when the distribute mode is not by hash, the
previous condition checked a wrong capability, while this commit fixes
it to the correct check.

Fixes: 7f5e6de ("net/mlx5/hws: query flow rule hash")
Cc: stable@dpdk.org

Signed-off-by: Itamar Gozlan <igozlan@nvidia.com>
Acked-by: Bing Zhao <bingz@nvidia.com>
GCC analyzer identified a code path where acts->mhdr could be
NULL when dereferenced.
When modify header validation fails in mlx5_tbl_translate_modify_header(),
 __flow_hw_action_template_destroy() sets acts->mhdr to NULL.
Add defensive NULL check in mlx5_tbl_ensure_shared_modify_header()
to prevent the dereference.

Bugzilla ID: 1521
Fixes: 12f2ed3 ("net/mlx5: set modify header as shared flow action")
Cc: stable@dpdk.org

Signed-off-by: Shani Peretz <shperetz@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
If flow isolation is enabled, then skip flow_hw_create_ctrl_rx_tables
because these are not used with flow isolation. This is used to save
the unneeded resource allocation and also speed up the device startup
time.

Fixes: 9fa7c1c ("net/mlx5: create control flow rules with HWS")
Cc: stable@dpdk.org

Signed-off-by: Nupur Uttarwar <nuttarwar@nvidia.com>
Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
The mlx5_ipool_free() function was called with a NULL pool pointer
during HW flow destruction, causing a segmentation fault. This occurred
when flow creation failed and the cleanup path attempted to free
resources from an uninitialized flow pool.

The crash happened in the following scenario:
1. During device start, a default NTA copy action flow is created
2. If the flow creation fails, mlx5_flow_hw_list_destroy() is called
3. In hw_cmpl_flow_update_or_destroy(), table->flow pool could be NULL
4. mlx5_ipool_free(table->flow, flow->idx) was called without checking
   if table->flow is NULL
5. Inside mlx5_ipool_free(), accessing pool->cfg.per_core_cache caused
   a segmentation fault due to NULL pointer dereference

The fix adds two layers of protection,
1. Add NULL check for table->flow before calling mlx5_ipool_free() in
   hw_cmpl_flow_update_or_destroy(), consistent with the existing check
   for table->resource on the previous line
2. Add NULL check for pool parameter in mlx5_ipool_free() as a defensive
   measure to prevent similar crashes in other code paths

The fix also renames the ‘flow’ field in rte_flow_template_table
to ‘flow_pool’ for better code readability.

Stack trace of the fault:
  mlx5_ipool_free (pool=0x0) at mlx5_utils.c:753
  hw_cmpl_flow_update_or_destroy at mlx5_flow_hw.c:4481
  mlx5_flow_hw_destroy at mlx5_flow_hw.c:14219
  mlx5_flow_hw_list_destroy at mlx5_flow_hw.c:14279
  flow_hw_list_create at mlx5_flow_hw.c:14415
  mlx5_flow_start_default at mlx5_flow.c:8263
  mlx5_dev_start at mlx5_trigger.c:1420

Fixes: 27d171b ("net/mlx5: abstract flow action and enable reconfigure")
Cc: stable@dpdk.org

Signed-off-by: Maayan Kashani <mkashani@nvidia.com>
Acked-by: Bing Zhao <bingz@nvidia.com>
Since auxiliary structure is associated with per rule, it can be
allocated in the same ipool allocation to save the extra overhead
of the *alloc header and the unneeded CPU cycles.

Fixes: 27d171b ("net/mlx5: abstract flow action and enable reconfigure")
Cc: stable@dpdk.org

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Currently the Windows implementation hardcodes match criteria to
MLX5_MATCH_OUTER_HEADERS when creating flow rules, which prevents
matching on inner headers and other criteria types like NVGRE.

The fix uses the matcher's match_criteria_enable attribute instead
of hardcoding OUTER_HEADERS, and moves the assignment outside the
action switch block to apply to all cases.

NVGRE item type is also added to the supported items list.

Fixes: 1d19449 ("net/mlx5: create flow rule on Windows")
Cc: stable@dpdk.org

Signed-off-by: Itai Sharoni <isharoni@nvidia.com>
Acked-by: Bing Zhao <bingz@nvidia.com>
Update the build configuration for Microsoft Azure Cobalt 100 SoC to
use a CPU-specific mcpu value supported by GCC 14+.

Signed-off-by: Doug Foster <doug.foster@arm.com>
Reviewed-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
The array used to save the flow rules pointer was allocated with an
incorrect length. 1 more rule space should be appended but not 1 byte.

Fixes: 070316d ("app/flow-perf: add multi-core rule insertion and deletion")
Cc: stable@dpdk.org

Signed-off-by: Bing Zhao <bingz@nvidia.com>
Reviewed-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Wisam Jaddo <wisamm@nvidia.com>
On the ConnectX-7 Multi-Host Setup, the PF index is not continuous,
user need to query the PF index based on the return value of the
related sysfs entries:
    cat /sys/class/net/*/phys_port_name

Example output is 0 and 2 for PF1 and PF2, then use 0 and 2 of PF index::
   -a <Primary_PCI_BDF>,representor=pf[0,2]vf[0-2]

Signed-off-by: Jiawei Wang <jiaweiw@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
As specified in RFC 4291 section 2.5.1, link local addresses must be
generated based on a modified EUI-64 interface identifier:

> Modified EUI-64 format interface identifiers are formed by inverting
> the "u" bit (universal/local bit in IEEE EUI-64 terminology) when
> forming the interface identifier from IEEE EUI-64 identifiers.

This translates to 'mac->addr_bytes[0] ^= 0x02'.

Fixes: 3d6d85f ("net: add utilities for well known IPv6 address types")
Cc: stable@dpdk.org

Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Signed-off-by: Robin Jarry <rjarry@redhat.com>
Add start tx_first method to testpmd shell, which sends
a specified number of burst packets prior to starting
packet forwarding.

Signed-off-by: Dean Marx <dmarx@iol.unh.edu>
Reviewed-by: Luca Vizzarro <luca.vizzarro@arm.com>
Reviewed-by: Patrick Robb <probb@iol.unh.edu>
Mock the Pydantic import so that even when Pydantic
is available on the system, it is not loaded by
Sphinx, ensuring we perform the doc build without
Pydantic regardless of the environment.

Signed-off-by: Patrick Robb <probb@iol.unh.edu>
Some maintainers known to be inactive for at least a year
are removed to make clearer where help is needed.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
There are warnings from Coverity and other tools that handling
of intermediate bursts may wraparound. Shouldn't be possible but
use unsigned int to avoid any issues, and just as fast.

Coverity issue: 499471
Fixes: 0dea03e ("pdump: remove use of VLA")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
There is a race where the request to disable pdump may get ahead
of the handling of pdump requests in dumpcap. The fix is to do
local removal of callbacks before forwarding same to secondary.

To reproduce:
   1. Start testpmd and start traffic
   2. Start dumpcap to capture
   3. Interrupt dumpcap with ^C

Testpmd will show missing response and dumpcap will show error:
EAL: Cannot find action: mp_pdump

Only reproducible if additional logging not enabled.

Fixes: c3ceb87 ("pdump: forward callback enable to secondary process")

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Create an ELF file to load using clang.
Repackage the object into an array using xdd.
Write a test to see load and run the BPF.

If libelf library is not available, then DPDK bpf
will return -ENOTSUP to the test and the test will be skipped.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
Tested-by: Marat Khalili <marat.khalili@huawei.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
New test using null device to test filtering with BPF.

If libelf library is not available, then DPDK bpf
will return -ENOTSUP to the test and the test will be skipped.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Marat Khalili <marat.khalili@huawei.com>
Tested-by: Marat Khalili <marat.khalili@huawei.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
This adds the BlueField-4 device ID to the list of
NVIDIA devices that run the mlx5 drivers.
The device is still in development stage.

Signed-off-by: Raslan Darawsheh <rasland@nvidia.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
There are some SW-FW interactions still call the old mailbox interface
function, which is used for SP devices. It causes the interaction
command to time out.

Adjust the interaction flow to use a unified function pointer.

Fixes: 6a139ad ("net/txgbe: add new SW-FW mailbox interface")
Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Signed-off-by: 0-day Robot <robot@bytheb.org>
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @ovsrobot, your pull request is larger than the review limit of 150000 diff characters

@coderabbitai
Copy link

coderabbitai bot commented Nov 19, 2025

Walkthrough

Release candidate version bump to 25.11.0-rc3 with maintainer updates, DCB forwarding configuration enhancements in testpmd, BPF ELF loading test infrastructure, cryptodev test coverage expansion, significant MLx5 driver refactoring (representor matching removal, flow API changes), CNXK driver enhancements (LSO formats, queue management, SQ resizing), and various driver fixes across Intel, Mellanox, and DPAA2 platforms, plus documentation updates for DTS and NIC features.

Changes

Cohort / File(s) Change Summary
Version and Metadata
VERSION, .mailmap, MAINTAINERS
Version bump to rc3; mailmap entries added for three contributors; four maintainers removed from MAINTAINERS (Christian Ehrhardt, Michael Santana, Tyler Retzlaff ×2).
DCB Forwarding Configuration
app/test-pmd/cmdline.c, app/test-pmd/config.c, app/test-pmd/testpmd.h, app/test-pmd/testpmd.c
Added DCB forwarding CLI commands for per-TC configuration (set dcb fwd_tc, set dcb fwd_tc_cores); new global variables dcb_fwd_tc_mask and dcb_fwd_tc_cores; validation and remapping logic for per-TC core distribution; added documentation in testpmd_funcs.rst.
Memory Allocation & Flow Fixes
app/test-flow-perf/main.c, app/test-pmd/cmd_flex_item.c
Fixed flows_list allocation in insert_flows; added pre-convocation length check for flex item spec/mask/last fields.
BPF ELF Loading
app/test/bpf/filter.c, app/test/bpf/load.c, app/test/bpf/meson.build, app/test/meson.build, app/test/test_bpf.c
New BPF test programs (filter, load) with ELF compilation via clang and xxd; added test_bpf_elf autotest with temporary file lifecycle management.
Cryptodev Tests
app/test/test_cryptodev.c, app/test/test_cryptodev_aead_test_vectors.h
Extended CCM-128 authentication test suite with new test case 128_4 for both encryption and decryption.
IPv6 Test Update
app/test/test_net_ip6.c
Updated IPv6 address hextet in test_ipv6_llocal_from_ethernet from 0x047b to 0x067b.
ARM Configuration
config/arm/meson.build
Updated ARM MCU cobalt-100 configuration: mcpu name change, SoC extra march features added, obsolete mcpu_cobalt100 entry removed.
DTS Documentation
doc/api/dts/tests.TestSuite_single_core_forward_perf.rst, doc/api/dts/tests.TestSuite_virtio_fwd.rst, doc/guides/tools/dts.rst
New Sphinx documentation files for test suites; updated DTS guide with TRex dependencies, configuration paths refactoring, test case docstring examples.
Autodoc Configuration
doc/guides/conf.py
Extended autodoc_mock_imports to include pydantic and pydantic_core.
Network Driver Documentation
doc/guides/nics/ice.rst, doc/guides/nics/mlx5.rst, doc/guides/prog_guide/ethdev/ethdev.rst
Expanded ICE raw pattern documentation with workflow and action support details; updated MLx5 dv_flow_en defaults and representor syntax; clarified ethdev representor syntax with updated examples.
Release Notes
doc/guides/rel_notes/deprecation.rst, doc/guides/rel_notes/release_25_11.rst
Removed repr_matching_en deprecation notice; added removal notice for MLx5 repr_matching_en device argument.
FSLMC Bus Driver
drivers/bus/fslmc/bus_fslmc_driver.h, drivers/bus/fslmc/fslmc_bus.c
Removed anonymous union from rte_dpaa2_device exposing per-type device pointers; replaced no-op plug/unplug functions with full probing/removal workflows.
CNXK NIX Hardware
drivers/common/cnxk/hw/nix.h
Restructured CN20K SQ context fields (dse_rsvd1, update_sq_count, seb_count, sq_count_iova regions); updated LSO format with ALT_FLAGS-based scheme and new typedef nix_lso_alt_flg_format_t.
CNXK CPT Configuration
drivers/common/cnxk/roc_cpt.c, drivers/common/cnxk/roc_cpt.h
Replaced internal CQ_ENTRY_SIZE_UNIT with public ROC_CPT_CQ_ENTRY_SIZE_UNIT macro.
CNXK Feature Flags
drivers/common/cnxk/roc_features.h
Added feature-query helpers roc_feature_nix_has_sq_cnt_update and roc_feature_nix_has_16b_align.
CNXK NIX Queue Management
drivers/common/cnxk/roc_nix.h, drivers/common/cnxk/roc_nix_queue.c
Added public APIs roc_nix_sq_resize and roc_nix_sq_cnt_update; expanded struct roc_nix_sq and roc_nix with new fields; dynamic SQ pool resizing with multi-core per-TC support; robust parameter validation.
CNXK LSO & TM Operations
drivers/common/cnxk/roc_nix_ops.c, drivers/common/cnxk/roc_nix_tm.c, drivers/common/cnxk/roc_nix_tm_ops.c, drivers/common/cnxk/roc_nix_tm_utils.c
Added roc_nix_lso_alt_flags_profile_setup, roc_nix_lso_fmt_ipv4_frag_get, roc_nix_tm_sdp_prepare_tree; extended TM hierarchy to support SDP tree; refined flush validation for sq_cnt_ptr handling.
CNXK Flow Control & Debugging
drivers/common/cnxk/roc_nix_fc.c, drivers/common/cnxk/roc_nix_inl.c, drivers/common/cnxk/roc_nix_inl_dev.c, drivers/common/cnxk/roc_nix_debug.c
Added multi-BPID backpressure support; improved inbound SA setup with guarded computation; fixed CPT queue disable on failure; changed band_prof_id printing to hex format.
CNXK NPA Backpressure
drivers/common/cnxk/roc_npa.c, drivers/common/cnxk/roc_npa.h, drivers/common/cnxk/roc_npa_priv.h
Added public API roc_npa_pool_bp_configure; replaced bp field in npa_aura_attr with union for bp/bp_thresh[8].
CNXK Type & Platform
drivers/common/cnxk/roc_npa_type.c, drivers/common/cnxk/roc_platform.h, drivers/common/cnxk/roc_nix_priv.h
Dynamic shift amount for aura limit operations (47 for CN20K, 44 otherwise); added atomic operation macro aliases; increased NIX_TM_MAX_HW_TXSCHQ from 1024 to 2048; added lso_ipv4_idx and SDP tree support.
CNXK Symbol Exports
drivers/common/cnxk/roc_platform_base_symbols.c
Exported six new symbols: roc_nix_tm_sdp_prepare_tree, roc_nix_lso_alt_flags_profile_setup, roc_nix_lso_fmt_ipv4_frag_get, roc_nix_sq_resize, roc_nix_sq_cnt_update, roc_npa_pool_bp_configure.
MLx5 Common Driver
drivers/common/mlx5/mlx5_common.h, drivers/common/mlx5/mlx5_common_mr.c, drivers/common/mlx5/mlx5_devx_cmds.c, drivers/common/mlx5/mlx5_devx_cmds.h, drivers/common/mlx5/mlx5_prm.h
Added PCI_DEVICE_ID_MELLANOX_BLUEFIELD4; improved shared MR handling with deadlock avoidance; added ownership flags rx/tx/esw_sw_owner and v2 variants; expanded FTE match structure with metadata_reg_c_8-11; migrated steering format from macros to enum; updated ESW cap layout.
MLx5 Flow & Representor
drivers/net/mlx5/mlx5.c, drivers/net/mlx5/mlx5.h, drivers/net/mlx5/mlx5_flow.c, drivers/net/mlx5/mlx5_flow.h
Removed repr_matching configuration and related checks; added capability-query helpers (mlx5_hws_is_supported, mlx5_sws_is_any_supported); refactored DV flow mode defaults based on capabilities; updated flow template structures (flow→flow_pool); added mlx5_flow_hw_create/cleanup_ctrl_rx_tables APIs; removed TX metadata copy templates.
MLx5 Flow DV & HWS
drivers/net/mlx5/mlx5_flow_dv.c, drivers/net/mlx5/mlx5_flow_hw.c
Updated flow_dv_translate_item_tag to return int with error handling; expanded metadata register support (REG_C_8-11 via misc5); renamed and exported flow control helpers (mlx5_flow_hw_*); migrated pool references from flow to flow_pool; removed default TX metadata copy paths.
MLx5 RX & Trigger
drivers/net/mlx5/mlx5_rx.c, drivers/net/mlx5/mlx5_trigger.c, drivers/net/mlx5/mlx5_txq.c
Added mlx5_monitor_cqe_own_callback for CQE ownership detection; refactored device start with centralized error handling macro; unconditional TX repr matching flow creation/destruction (removed repr_matching gating); hardened HWS startup with RX control table management.
MLx5 Utilities & OS
drivers/net/mlx5/mlx5_utils.c, drivers/net/mlx5/linux/mlx5_ethdev_os.c, drivers/net/mlx5/linux/mlx5_os.c, drivers/net/mlx5/windows/mlx5_flow_os.c, drivers/net/mlx5/windows/mlx5_flow_os.h
Added null-check in mlx5_ipool_free; initialized dev_link fields; added send_to_kernel_action cleanup; introduced mlx5_ignore_pf_representor helper; updated match_criteria_enable usage; added NVGRE support.
MLx5 NTA & Metering
drivers/net/mlx5/mlx5_nta_sample.c, drivers/net/mlx5/mlx5_ethdev_mtr.c
Added validate_sample_terminal_actions helper for sample chain validation; bounds-checked color_map access in metering config.
Cryptodev CNXK
drivers/crypto/cnxk/cn20k_cryptodev_ops.c, drivers/crypto/cnxk/cnxk_cryptodev_ops.c, drivers/crypto/cnxk/cnxk_cryptodev_ops.h
Enabled CQ in cn20k_cpt_fill_inst; refactored queue pair setup with CN20K CQ support and conditional pending queue allocation; restructured cpt_inflight_req with rsvd, meta, qp fields.
Cryptodev DPAA2 & QAT
drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c, drivers/crypto/qat/qat_sym_session.c
Replaced stored cryptodev pointer with named-device lookup in remove; changed CCM hash_state_sz calculation based on aligned AAD length.
Cryptodev MLx5
drivers/crypto/mlx5/mlx5_crypto.c
Added PCI ID for BLUEFIELD4 to mlx5_crypto_pci_id_map.
DMA & Network Drivers
drivers/dma/dpaa2/dpaa2_qdma.c, drivers/net/axgbe/axgbe_ethdev.c, drivers/net/cnxk/cn10k_ethdev.c, drivers/net/cnxk/cn10k_ethdev_sec.c, drivers/net/cnxk/cn20k_ethdev.c
Added dpaa2_dpdmai_dev_uninit invocation in close; refactored TX queue release with completion handling; added null-check for inl_lf; added TX completion processing before release; added CQ cleanup for TX completions.
CNXK Ethdev Configuration
drivers/net/cnxk/cnxk_ethdev.c, drivers/net/cnxk/cnxk_ethdev_mtr.c, drivers/net/cnxk/cnxk_ethdev_ops.c
Added CQ cleanup in tx_queue_release; selected SDP-based TM hierarchy when applicable; added bounds-check for color_map in metering; allowed MTU changes on SDP with warning instead of error.
DPAA2 Network Driver
drivers/net/dpaa2/dpaa2_ethdev.c, drivers/net/dpaa2/dpaa2_ethdev.h, drivers/net/dpaa2/dpaa2_recycle.c
Added dpaa2_clear_queue_active_dps helper for in-flight DPQ cleanup; improved device removal flow; removed dpaa2_dev_recycle_qp_setup declaration and implementation.
Intel Network Drivers
drivers/net/intel/common/rx.h, drivers/net/intel/e1000/igb_ethdev.c, drivers/net/intel/fm10k/fm10k_ethdev.c, drivers/net/intel/iavf/iavf_ethdev.c, drivers/net/intel/iavf/iavf_rxtx.c, drivers/net/intel/iavf/iavf_rxtx.h, drivers/net/intel/ice/ice_ethdev.c, drivers/net/intel/ice/ice_rxtx.c, drivers/net/intel/ice/ice_rxtx_vec_common.h, drivers/net/intel/idpf/idpf_rxtx.c, drivers/net/intel/ixgbe/...
Renamed current_features to chosen_path_features in RX path; added flex filter bounds check; set tx_pkt_prepare to rte_eth_tx_pkt_prepare_dummy (fm10k, ice, ixgbe); tightened RX timestamp capability checks (iavf); added timestamp validity guard (iavf); added IAVF_RX_FLX_DESC_TS_VALID macro; improved 40-bit counter wrap handling (ice); added TCP_TSO offload mapping (idpf); reduced mailbox payload size (ixgbe VF); updated QINQ offload flags (ice).
MLx5 Datapath & HWS
drivers/net/mlx5/hws/mlx5dr_buddy.c, drivers/net/mlx5/hws/mlx5dr_pool.c, drivers/net/mlx5/hws/mlx5dr_rule.c
Added self-deallocation in mlx5dr_buddy_cleanup; changed buddy from static to local in mlx5dr_pool_buddy_get_next_buddy; removed simple_free(buddy) in pool uninit; updated eligibility check from access_index_mode to distribute_mode.
QEDE Driver
drivers/net/qede/base/ecore_mng_tlv.c
Removed entire MFW TLV handling module including ecore_mfw_process_tlv_req and ecore_mfw_update_tlvs functions.
TXGBE Driver
drivers/net/txgbe/base/txgbe_aml.c, drivers/net/txgbe/base/txgbe_aml40.c, drivers/net/txgbe/base/txgbe_eeprom.c
Added mailbox initialization in AML ops; replaced dual semaphore mask with single TXGBE_MNGSEM_SWFLASH across EEPROM operations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–75 minutes

Areas requiring extra attention:

  • MLx5 driver flow refactoring (mlx5_flow*.c, mlx5_flow_hw.c): Extensive changes to flow template structure (flow→flow_pool), removal of repr_matching gating, and changes to tag translation error handling require careful verification of flow control paths and error propagation.
  • CNXK SQ resizing and queue management (roc_nix_queue.c): Complex new APIs for dynamic SQ resizing with multi-core per-TC support, aura management, and pending queue handling; multiple helper functions and state transitions need validation.
  • CNXK LSO and traffic management (roc_nix_ops.c, roc_nix_tm*.c): New alt-flags profile setup, IPv4 fragmentation LSO, and SDP tree handling introduce new initialization paths that should be traced end-to-end.
  • MLx5 trigger and device lifecycle (mlx5_trigger.c): Refactored error handling with new macro-based cleanup sequences; representor start validation changes; RX control table creation/cleanup in HWS path.
  • MLx5 PRM and HCA attribute changes (mlx5_prm.h, mlx5_devx_cmds.c): New metadata registers (REG_C_8-11), ownership flags, and ESW capability layout modifications affect low-level matching and device initialization.

A rabbit-eared nudge from the burrow: 🐰
With DCB paths and queue wisdom flowing free,
And MLx5 representors reorganized with glee,
New SQ resizing dances and LSO's brighter light,
This patch brings the network a springtime refight! 🌱✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.65% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title specifically references 'net/txgbe: fix the missing old mailbox interface calls', which aligns with the changes in txgbe mailbox initialization files (txgbe_aml.c and txgbe_aml40.c) where mailbox command handlers are wired up.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch series_36750

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 25

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (12)
drivers/net/intel/e1000/igb_ethdev.c (1)

4273-4282: Critical: Incomplete bounds check creates potential buffer over-read vulnerability.

The added bounds check i < E1000_FLEX_FILTERS_MASK_SIZE prevents out-of-bounds write to flex_filter->filter_info.mask[i] (line 4281), but the loop also reads from filter->mask[i] (line 4278) without validating that filter->mask has sufficient size.

If RTE_ALIGN(filter->len, CHAR_BIT) / CHAR_BIT exceeds the actual allocated size of filter->mask, this creates a buffer over-read vulnerability.

Consider adding validation before the loop to ensure both arrays are adequately sized:

 	flex_filter->filter_info.len = filter->len;
 	flex_filter->filter_info.priority = filter->priority;
 	memcpy(flex_filter->filter_info.dwords, filter->bytes, filter->len);
+	
+	/* Validate mask array size before processing */
+	uint32_t mask_len = RTE_ALIGN(filter->len, CHAR_BIT) / CHAR_BIT;
+	if (mask_len > E1000_FLEX_FILTERS_MASK_SIZE) {
+		PMD_DRV_LOG(ERR, "filter mask length %u exceeds maximum %u",
+			    mask_len, E1000_FLEX_FILTERS_MASK_SIZE);
+		rte_free(flex_filter);
+		return -EINVAL;
+	}
+	
-	for (i = 0; i < RTE_ALIGN(filter->len, CHAR_BIT) / CHAR_BIT &&
-			i < E1000_FLEX_FILTERS_MASK_SIZE; i++) {
+	for (i = 0; i < mask_len; i++) {
 		mask = 0;
 		/* reverse bits in flex filter's mask*/
 		for (shift = 0; shift < CHAR_BIT; shift++) {
 			if (filter->mask[i] & (0x01 << shift))
 				mask |= (0x80 >> shift);
 		}
 		flex_filter->filter_info.mask[i] = mask;
 	}

Additionally, verify that the caller validates the size of filter->mask to match RTE_ALIGN(filter->len, CHAR_BIT) / CHAR_BIT.

drivers/net/axgbe/axgbe_ethdev.c (1)

1745-1758: Critical: Timestamp disable logic is broken - implements a no-op.

The added read-modify-write sequence does not disable timestamps:

  1. mac_tscr is initialized to 0 and kept at 0 by the AXGMAC_SET_BITS calls (setting bits to 0)
  2. Line 1756 reads the current register value
  3. Line 1757 ORs it with mac_tscr (which is 0): value | 0 = value
  4. Line 1758 writes back the unchanged value

This means the register is never modified and timestamps remain enabled.

To fix, either remove lines 1755-1758 entirely (relying on the AXGMAC_IOWRITE_BITS call at line 1750), or implement proper bit clearing:

Option 1 (preferred): Remove the no-op code

 	/*disable time stamp*/
 	AXGMAC_SET_BITS(mac_tscr, MAC_TSCR, TSENA, 0);
-
-	value = AXGMAC_IOREAD(pdata, MAC_TSCR);
-	value |= mac_tscr;
-	AXGMAC_IOWRITE(pdata, MAC_TSCR, value);
-
+	AXGMAC_IOWRITE(pdata, MAC_TSCR, mac_tscr);
 	return 0;
 }

Option 2: Proper bit clearing with AND operation

-	value = AXGMAC_IOREAD(pdata, MAC_TSCR);
-	value |= mac_tscr;
-	AXGMAC_IOWRITE(pdata, MAC_TSCR, value);
+	value = AXGMAC_IOREAD(pdata, MAC_TSCR);
+	value &= mac_tscr;  /* Clear the bits */
+	AXGMAC_IOWRITE(pdata, MAC_TSCR, value);
drivers/net/mlx5/mlx5_nta_sample.c (1)

1-898: PR title inconsistency with file changes.

The PR title mentions "net/txgbe: fix the missing old mailbox interface calls" but this file is drivers/net/mlx5/mlx5_nta_sample.c, which is for the MLX5 driver, not the txgbe driver. This mismatch makes it difficult to understand the scope and purpose of changes.

Please ensure the PR title accurately reflects all files being modified, or consider splitting unrelated changes into separate PRs for better traceability.

drivers/net/mlx5/windows/mlx5_flow_os.h (1)

1-489: Critical: PR title/description does not match the actual code changes.

The PR title claims: "net/txgbe: fix the missing old mailbox interface calls" and references a txgbe driver patchwork series. However, the file under review is drivers/net/mlx5/windows/mlx5_flow_os.h, which is part of the MLX5 Windows driver and has nothing to do with txgbe mailbox interfaces.

This suggests either:

  • Wrong files were included in this PR
  • Wrong PR title/description
  • Incorrect auto-submission from patchwork

Please verify that the correct patches are included in this PR and update the PR metadata accordingly.

drivers/net/mlx5/linux/mlx5_os.c (1)

1-3355: Critical: PR title/description mismatch with actual file changes.

The PR title indicates "net/txgbe: fix the missing old mailbox interface calls" but this file contains changes to the mlx5 driver (drivers/net/mlx5/linux/mlx5_os.c). There is no txgbe-related code in this file. The AI summary also describes mlx5 refactoring and feature additions, not txgbe mailbox fixes.

This suggests either:

  1. Wrong file was included in this PR
  2. PR title/description is incorrect
  3. Auto-submission process error

Please verify this PR contains the intended changes and update the title/description accordingly.

drivers/crypto/qat/qat_sym_session.c (1)

1-3: Major inconsistency: PR title/description doesn't match the file being changed.

The PR is titled "net/txgbe: fix the missing old mailbox interface calls" and mentions the txgbe network driver, but this file is drivers/crypto/qat/qat_sym_session.c - a completely different subsystem (QAT crypto driver vs. txgbe network driver).

Additionally, the AI-generated summary mentions "release candidate version bump", "DCB forwarding configuration", "MLx5 driver refactoring", etc., none of which relate to this specific CCM hash state calculation fix.

This suggests either:

  1. Wrong file was included in this PR
  2. The PR description is incorrect
  3. This is an aggregated PR from patchwork series 36750 that combines multiple unrelated patches

Please clarify the relationship between the PR title/description and these changes, or ensure the correct patches are associated with the correct PR metadata.

drivers/net/intel/ice/ice_ethdev.c (1)

6052-6087: Inverted error check in ice_vsi_dis_outer_insertion

In ice_vsi_dis_outer_insertion, the ice_update_vsi status check is reversed:

status = ice_update_vsi(hw, vsi->idx, &ctxt, NULL);
if (!status) {
    PMD_DRV_LOG(ERR,
        "update VSI for disabling outer VLAN insertion failed, err %d",
        status);
    err = -EINVAL;
} else {
    vsi->info.outer_vlan_flags = ctxt.info.outer_vlan_flags;
    vsi->info.port_based_inner_vlan = ctxt.info.port_based_inner_vlan;
}

Elsewhere in this file, ice_update_vsi success is treated as status == ICE_SUCCESS (0) and non‑zero as error. Here, success is logged as failure and returns -EINVAL, while non‑zero error codes fall into the “success” branch and update vsi->info. This likely breaks outer VLAN insertion disable in DVM.

Consider flipping the condition to match the rest of the code:

- status = ice_update_vsi(hw, vsi->idx, &ctxt, NULL);
- if (!status) {
-     PMD_DRV_LOG(ERR,
-             "update VSI for disabling outer VLAN insertion failed, err %d",
-             status);
-     err = -EINVAL;
- } else {
-     vsi->info.outer_vlan_flags = ctxt.info.outer_vlan_flags;
-     vsi->info.port_based_inner_vlan = ctxt.info.port_based_inner_vlan;
- }
+ status = ice_update_vsi(hw, vsi->idx, &ctxt, NULL);
+ if (status) {
+     PMD_DRV_LOG(ERR,
+             "update VSI for disabling outer VLAN insertion failed, err %d",
+             status);
+     err = -EIO;
+ } else {
+     vsi->info.outer_vlan_flags = ctxt.info.outer_vlan_flags;
+     vsi->info.port_based_inner_vlan = ctxt.info.port_based_inner_vlan;
+ }
drivers/net/mlx5/mlx5_flow_hw.c (3)

3961-3985: Per‑table flow_pool handling and resize: align capacity and avoid inconsistent limits

The move to a per‑table flow_pool (mlx5_ipool_* usage in flow_hw_async_flow_create_generic, flow_hw_table_create/destroy, and flow_hw_table_resize) is a good direction, but there are a couple of edge cases to address:

  1. Resize uses unaligned nb_flows with ipool, then aligns only matcher/table:

    In flow_hw_table_resize():

    if (nb_flows <= table->cfg.attr.nb_flows)
        return rte_flow_error_set(..., "shrinking table is not supported");
    ret = mlx5_ipool_resize(table->flow_pool, nb_flows, error);
    ...
    nb_flows = rte_align32pow2(nb_flows);
    matcher_attr.rule.num_log = rte_log2_u32(nb_flows);
    ...
    table->cfg.attr.nb_flows = nb_flows;
    • mlx5_ipool_resize() enforces num_entries % pool->cfg.trunk_size == 0; passing a non‑aligned nb_flows can fail even when the aligned value would be fine.
    • After this call, flow_pool->cfg.max_idx is set to the unaligned nb_flows, but table->cfg.attr.nb_flows and the matcher capacity are based on the aligned, larger value. That can leave the table logically accepting more flows than the pool can store.

    Consider aligning first and using the aligned value consistently:

  • if (nb_flows <= table->cfg.attr.nb_flows)
  • nb_flows = rte_align32pow2(nb_flows);
  • if (nb_flows <= table->cfg.attr.nb_flows)
    return rte_flow_error_set(..., "shrinking table is not supported");
  • ret = mlx5_ipool_resize(table->flow_pool, nb_flows, error);
  • ret = mlx5_ipool_resize(table->flow_pool, nb_flows, error);
    ...
  • nb_flows = rte_align32pow2(nb_flows);
    matcher_attr.rule.num_log = rte_log2_u32(nb_flows);
    table->cfg.attr.nb_flows = nb_flows;
    
    This keeps matcher capacity, `attr.nb_flows`, and `flow_pool->cfg.max_idx` in sync and respects the ipool’s trunk‑size requirement.
    
    
  1. Resize‑complete sanity checks look good, but rely on the above:

    flow_hw_table_resize_complete() now:

    • Verifies the table is resizable.
    • Confirms the “retired” matcher exists.
    • Enforces refcnt == 0 before destroying it.

    That logic is sound; just be aware it assumes flows cannot be allocated beyond flow_pool->cfg.max_idx, which depends on point (1) being fixed.

  2. Per‑table flow_pool lifecycle appears consistent otherwise:

    • flow_hw_table_create() creates tbl->flow_pool and tbl->flow_aux.
    • flow_hw_async_flow_create_generic() allocates and frees indices from tbl->flow_pool.
    • flow_hw_table_destroy() flushes caches, checks for residual entries (via mlx5_ipool_get_next), and destroys the pool.

    This wiring looks correct once the resize alignment is fixed.

Also applies to: 4048-4055, 5019-5037, 5268-5270, 5497-5503, 5532-5533, 15198-15242, 15305-15338


11187-11215: Ctrl‑RX tables/templates: exported helpers and error paths

The new mlx5_flow_hw_cleanup_ctrl_rx_tables() / mlx5_flow_hw_create_ctrl_rx_tables() exports make sense, but a couple of details are worth tightening:

  1. Partial‑allocation leak on create failure

    In mlx5_flow_hw_create_ctrl_rx_tables():

    priv->hw_ctrl_rx = mlx5_malloc(...);
    ...
    for (i...) {
        for (j...) {
            pt = flow_hw_create_ctrl_rx_pattern_template(...);
            if (!pt)
                goto err;
            priv->hw_ctrl_rx->tables[i][j].pt = pt;
        }
    }
    ...
    err:
        ret = rte_errno;
        return -ret;

    On the goto err path, any previously created pattern templates (and the hw_ctrl_rx struct itself) are leaked. Consider either:

    • Calling mlx5_flow_hw_cleanup_ctrl_rx_tables(dev) (or a local partial cleanup) in the error path, or
    • Refactoring the creation loop to clean up incrementally before returning.
  2. Two cleanup flavors: tables vs tables+templates

    • mlx5_flow_hw_cleanup_ctrl_rx_templates() only destroys tables and RSS action templates but leaves pattern templates and hw_ctrl_rx allocated.
    • mlx5_flow_hw_cleanup_ctrl_rx_tables() additionally destroys pattern templates and frees hw_ctrl_rx.

    This split is fine, but it’s easy for callers to accidentally call both and double‑destroy tables/templates. It would help to:

    • Document that mlx5_flow_hw_cleanup_ctrl_rx_tables() subsumes ..._templates() and that they must not be mixed, or
    • Have the “templates” variant call into the full cleanup but skip freeing hw_ctrl_rx when it’s expected to survive.

No functional correctness problems as long as the call sites are consistent, but the error path and API contract between these two helpers should be clarified.

Also applies to: 11473-11511, 11513-11542


13503-13511: Update mlx5_os.c Windows platform and fix outdated comment in flow_hw_prepare

The ipool size configuration for priv->flows[type] is incomplete on Windows and the comment in flow_hw_prepare is outdated.

Findings:

  • Linux (drivers/net/mlx5/linux/mlx5_os.c lines 1654-1655) correctly includes all three structs: icfg[i].size += sizeof(struct rte_flow_hw_aux);
  • Windows (drivers/net/mlx5/windows/mlx5_os.c line 550) is missing this addition, only allocating space for rte_flow_hw + rte_flow_nt2hws
  • The comment at drivers/net/mlx5/mlx5_flow_hw.c lines 13491-13492 still refers to "size = (sizeof(struct rte_flow_hw) + sizeof(struct rte_flow_nt2hws))" but the code now writes to flow_aux, which will corrupt memory on Windows

Required fixes:

  1. Add icfg[i].size += sizeof(struct rte_flow_hw_aux); to Windows mlx5_os.c (line 550)
  2. Update the comment in mlx5_flow_hw.c (lines 13491-13492) to reflect that pool size now includes all three structs
drivers/net/cnxk/cn10k_ethdev.c (1)

199-277: Harden cn10k_nix_tx_queue_release against NULL and partial TX completion setup

cn10k_nix_tx_queue_release() currently assumes:

  • eth_dev->data->tx_queues[qid] is always non-NULL, and
  • its tx_compl fields are fully initialized whenever nix->tx_compl_ena is true.

However:

  • In some error/teardown paths, tx_queues[qid] can already be cleared before a later tx_queue_release call, causing a NULL deref here.
  • cn10k_nix_tx_compl_setup() is called after cnxk_nix_tx_queue_setup() has stored txq in eth_dev->data->tx_queues[qid]. If cn10k_nix_tx_compl_setup() fails, txq remains installed but tx_compl.ena/ptr may be uninitialized. A subsequent tx_queue_release with nix->tx_compl_ena set would then run handle_tx_completion_pkts() and plt_free(txq->tx_compl.ptr) on uninitialized memory.

To make this path robust and keep it consistent with the CN20K fix, consider:

 static void
 cn10k_nix_tx_queue_release(struct rte_eth_dev *eth_dev, uint16_t qid)
 {
-	struct cn10k_eth_txq *txq = eth_dev->data->tx_queues[qid];
+	struct cn10k_eth_txq *txq = eth_dev->data->tx_queues[qid];
 	struct cnxk_eth_dev *dev = cnxk_eth_pmd_priv(eth_dev);
 	struct roc_nix *nix = &dev->nix;
 
-	if (nix->tx_compl_ena) {
-		/* First process all CQ entries */
-		handle_tx_completion_pkts(txq, 0);
-		plt_free(txq->tx_compl.ptr);
-	}
-	cnxk_nix_tx_queue_release(eth_dev, qid);
+	if (!txq) {
+		cnxk_nix_tx_queue_release(eth_dev, qid);
+		return;
+	}
+
+	if (nix->tx_compl_ena && txq->tx_compl.ena && txq->tx_compl.ptr) {
+		/* First process all CQ entries */
+		handle_tx_completion_pkts(txq, 0);
+		plt_free(txq->tx_compl.ptr);
+		txq->tx_compl.ptr = NULL;
+	}
+
+	cnxk_nix_tx_queue_release(eth_dev, qid);
 }

This keeps the intended “drain completions before ROC SQ/CQ fini” behavior while avoiding crashes in error paths and double-frees.

drivers/common/cnxk/roc_nix_inl_dev.c (1)

245-253: Failure path does not identify which CPT inline queue to disable

In nix_inl_inb_queue_setup(), if NIX RX inline queue config fails (nix_cfg_fail), the cleanup path allocates a new cpt_rx_inline_qcfg_req and only sets enable = 0 before sending it. Unlike the normal release path, it does not populate rx_queue_id or slot, so the firmware cannot reliably know which CPT inline queue to disable. This can leave the just‑enabled CPT inline queue active without a corresponding NIX mapping.

Consider applying something like:

nix_cfg_fail:
-   cpt_req = mbox_alloc_msg_cpt_rx_inl_queue_cfg(mbox);
-   if (!cpt_req) {
-       rc |= -ENOSPC;
-   } else {
-       cpt_req->enable = 0;
-       rc |= mbox_process(mbox);
-   }
+   cpt_req = mbox_alloc_msg_cpt_rx_inl_queue_cfg(mbox);
+   if (!cpt_req) {
+       rc |= -ENOSPC;
+   } else {
+       cpt_req->enable = 0;
+       cpt_req->rx_queue_id = qid;
+       cpt_req->slot = slot_id;
+       rc |= mbox_process(mbox);
+   }

This matches the release path semantics and ensures the correct CPT LF queue is disabled on setup failure.

🧹 Nitpick comments (27)
drivers/net/mlx5/mlx5_utils.c (1)

751-752: Good defensive NULL check, consider consistency across similar functions.

The addition of the NULL pool check prevents a potential null-pointer dereference that would occur when accessing pool->cfg and other pool members in subsequent lines. This is a safe improvement.

However, other similar functions in this file (e.g., mlx5_ipool_get at line 815, mlx5_ipool_malloc at line 668) don't perform NULL pool checks, creating an inconsistency in the API's defensive posture.

Consider adding similar NULL checks to related functions for consistency:

// In mlx5_ipool_get (around line 822)
if (!pool || !idx)
    return NULL;

// In mlx5_ipool_malloc (around line 675)  
if (!pool || !idx)
    return NULL;
doc/guides/nics/ice.rst (1)

627-871: Consider clarifying relationship between general raw pattern capability and specific engine limitations.

The section introduces raw patterns as a general protocol-agnostic feature (lines 630-635), but the "Limitations" subsection (line 871) restricts it to FDIR and Hash engines only. This creates potential confusion for users about whether raw patterns work with Switch engine or are universally available.

Recommendation: Either emphasize earlier that raw patterns are engine-specific (FDIR/Hash), or restructure to clarify this upfront rather than in the limitations section. The ice PMD's ability to leverage the raw pattern, enabling protocol-agnostic flow offloading should be framed with appropriate scope boundaries.

doc/guides/conf.py (1)

109-111: Use iterable unpacking for cleaner syntax.

As per the Ruff hint, consider using iterable unpacking instead of list concatenation for improved readability:

-    autodoc_mock_imports = list(set(autodoc_mock_imports + ['pydantic', 'pydantic_core']))
+    autodoc_mock_imports = list({*autodoc_mock_imports, 'pydantic', 'pydantic_core'})

This is more idiomatic Python and slightly more efficient.

drivers/net/intel/iavf/iavf_rxtx.c (1)

1585-1597: Timestamp validity check is correct; consider a tiny readability tweak

The extra guard on timestamp handling (requiring iavf_timestamp_dynflag > 0 and the descriptor’s time_stamp_low TS_VALID bit) looks correct and is consistently applied across all FLEX RX paths. To make the intent more obvious at a glance, you might parenthesize the bit test:

if (iavf_timestamp_dynflag > 0 &&
    (rxd.wb.time_stamp_low & IAVF_RX_FLX_DESC_TS_VALID)) {
    ...
}

Same pattern would apply to the scattered and scan‑path variants.

Also applies to: 1755-1767, 2041-2054

drivers/common/cnxk/roc_npa_type.c (1)

63-76: Dynamic shift selection for CN20K vs others looks correct; consider de‑magic‑ifying bit positions.

The runtime choice of shift = roc_model_is_cn20k() ? 47 : 44; and using it in wdata = aura_id << shift; is consistent with the CN20K vs non‑CN20K handling exposed by roc_model_is_cn20k() (see drivers/common/cnxk/roc_model.h:150-160). Computing shift once before the loop is also efficient.

To improve maintainability, consider:

  • Introducing named constants or a small helper for the OP_LIMIT aura shift per model (e.g., ROC_NPA_AURA_OP_LIMIT_SHIFT_CN20K / _LEGACY or a roc_npa_limit_shift() inline), and using that instead of raw 47/44.
  • Optionally doing the same for the existing BIT_ULL(42) test if similar definitions exist elsewhere.

This would make the encoding semantics clearer and reduce risk if the bit layout changes again or needs to be reused in other call sites.

Also applies to: 91-92

drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c (1)

4527-4535: Name-based lookup is correct; consider de-duplicating the name format

The new snprintf(cryptodev_name, "dpsec-%d", dpaa2_dev->object_id) plus rte_cryptodev_pmd_get_named_dev(cryptodev_name) matches the name used in cryptodev_dpaa2_sec_probe(), so the removal path should reliably find the right device, assuming successful probe.

To avoid future drift, you might want to centralize the "dpsec-%d" pattern (and the %d/%u choice) in a small helper or macro used by both probe, init, and remove, so any naming change only needs to be done in one place.

drivers/common/mlx5/mlx5_common_mr.c (1)

1719-1737: Deadlock fix via deferred mrs free looks correct; only a minor clarity nit

The new gc_mrs scratch pointer solves the original deadlock hazard by ensuring mlx5_free() is called after share_cache->rwlock is released, while still avoiding a leak of the initial new_mpr->mrs buffer. The ownership swap (new_mpr->mrs = mpr->mrs) happens under the write lock before new_mpr is inserted into mempool_reg_list, so there is no race, and later destruction will correctly act on the shared mrs array, not the freed one. Overall, the concurrency and lifetime semantics look sound.

If you want to make this a bit clearer for future readers, you could slightly rephrase the comment to say we are deferring freeing the unused new_mpr->mrs buffer rather than “releasing MRs”, since no actual hardware MRs have been created in it yet.

app/test/test_bpf.c (2)

3290-3325: Verify error handling completeness in temporary file creation.

The create_temp_bpf_file function has several resource management paths:

  1. Line 3304: mkstemps returns a file descriptor that is always closed at line 3314, which is correct.
  2. Lines 3297-3301: On asprintf failure, the function returns NULL without attempting to free tmpfile (which is already NULL), which is correct.
  3. Lines 3316-3322: On write failure, the function calls unlink(tmpfile) and free(tmpfile) before returning NULL, which is correct.

However, consider checking for partial writes more explicitly:

if (written < 0 || written != (ssize_t)size) {
    printf("%s@%d: write failed (wrote %zd of %zu bytes): %s\n",
           __func__, __LINE__, written, size, 
           written < 0 ? strerror(errno) : "incomplete write");
    unlink(tmpfile);
    free(tmpfile);
    return NULL;
}

This distinguishes between write errors and partial writes for better debugging.


3468-3524: Consider documenting the packet generation logic more clearly.

The setup_mbufs function (lines 3468-3524) generates test packets with randomly assigned protocols (TCP or UDP) using rte_rand() & 1 at line 3496. The function returns the count of TCP packets generated, which is used by the caller to validate the BPF filter behavior.

While the implementation is correct, the relationship between the random protocol assignment and the expected TCP count might not be immediately obvious to future maintainers. Consider adding a comment at line 3496 explaining:

/* Randomly assign TCP or UDP - filter will allow TCP, drop UDP */
if (rte_rand() & 1) {

This makes the test's intent clearer.

app/test/bpf/meson.build (1)

10-12: Consider adding more diagnostic information for BPF support check.

The check at lines 10-12 uses --print-supported-cpus to verify BPF support, which works but doesn't provide detailed diagnostics if it fails. Consider adding the clang version to the warning message to help with debugging:

if not clang_supports_bpf
    clang_version = run_command(clang, '--version', check: false).stdout()
    message('app/test_bpf: no BPF load tests - clang lacks BPF support')
    message('clang version: ' + clang_version.strip().split('\n')[0])
    subdir_done()
endif

This helps users understand whether they need to upgrade clang or if there's another issue.

app/test/bpf/filter.c (1)

21-37: Consider bounds checking or document assumptions about packet length.

The test_filter function accesses packet data at fixed offsets:

  • Line 28: data[14] (first byte of IP header)
  • Line 35: data[14 + 9] (protocol field, offset 23 from start)

For a TX filter in the BPF/DPDK context, there are typically safety mechanisms:

  1. The BPF verifier may validate bounds
  2. The test infrastructure ensures packets are large enough (BPF_TEST_PKT_LEN = 64 bytes at line 3436 in test_bpf.c)

However, it's good practice to either:

  1. Add bounds checking in the BPF program
  2. Document the minimum packet size requirement in a comment

Consider adding a comment like:

/*
 * Simple TX filter that accepts TCP packets
 * 
 * Assumes packet is at least 24 bytes:
 * - 14 bytes Ethernet header
 * - At least 10 bytes of IP header to read protocol field
 */
drivers/common/mlx5/mlx5_devx_cmds.h (1)

338-344: New sw_owner fields look correct; consider aligning bitfield base types

The new rx_sw_owner* / tx_sw_owner* / esw_sw_owner* flags are wired up consistently with the query logic and appended at the end of struct mlx5_hca_attr, so they are safe from an ABI perspective.

However, all existing bitfields in this struct use uint32_t as the base type, while these use uint8_t. That relies on implementation-defined bitfield layout and is slightly less portable/consistent than:

uint32_t rx_sw_owner:1;
uint32_t rx_sw_owner_v2:1;
uint32_t tx_sw_owner:1;
uint32_t tx_sw_owner_v2:1;
uint32_t esw_sw_owner:1;
uint32_t esw_sw_owner_v2:1;

Not a blocker, but worth aligning with the surrounding code unless there’s a strong reason to keep uint8_t here.

drivers/common/cnxk/roc_nix_tm_utils.c (1)

583-593: SDP relchan computation is sensible; optional range clamp

Using (nix->tx_chan_base & 0xff) + node->rel_chan correctly derives a per‑queue relative channel for SDP. If there’s any risk that tx_chan_base & 0xff + node->rel_chan can exceed the HW channel field width, consider masking (e.g. & 0xff) or adding a debug assert, but functionally this change aligns with the SDP TM tree setup.

drivers/common/cnxk/roc_npa.h (1)

819-820: Consider clarifying first parameter name to match implementation semantics

The prototype uses uint64_t pool_id but the implementation in roc_npa.c expects an aura handle and calls roc_npa_aura_handle_to_aura() on it. Types align so this works, but renaming the parameter to aura_handle (or documenting accepted forms) would better reflect how the API is intended to be used.

drivers/common/cnxk/roc_nix_ops.c (1)

227-293: IPv4 fragmentation LSO format setup is correct; consider propagating true error codes

nix_lso_ipv4() correctly:

  • programs an alt‑flags profile to manage DF/MF bits,
  • constructs a format that adjusts payload length and fragment offset using NIX_LSOALG_ADD_OFFSET with NIX_LSO_FRMT_IPV4_OFFSET_SHFT, and
  • packs the shift and format index into nix->lso_ipv4_idx.

One minor improvement: if roc_nix_lso_alt_flags_profile_setup() fails, the function currently returns its own rc (initialized to -ENOSPC) instead of the actual error:

int flag_idx = 0, rc = -ENOSPC;
flag_idx = roc_nix_lso_alt_flags_profile_setup(...);
if (flag_idx < 0)
    return rc;

You could preserve more accurate diagnostics by setting rc = flag_idx before returning:

-    flag_idx = roc_nix_lso_alt_flags_profile_setup(roc_nix, &alt_flags);
-    if (flag_idx < 0)
-        return rc;
+    flag_idx = roc_nix_lso_alt_flags_profile_setup(roc_nix, &alt_flags);
+    if (flag_idx < 0)
+        return flag_idx;
drivers/net/cnxk/cnxk_ethdev_mtr.c (1)

1263-1270: Safe guard for invalid default_input_color

Switching to a local idx and clamping out-of-range values to ROC_NIX_BPF_COLOR_GREEN avoids out-of-bounds access on color_map while giving a sensible default.

Given this fix, consider adding similar bounds checks where mtr->params.dscp_table[i] and mtr->params.vlan_table[i] are used as indices into the same 3‑entry color_map to keep all pre-color paths equally robust.

doc/guides/nics/mlx5.rst (2)

751-759: Good documentation, minor suggestion for clarity.

The addition of ConnectX-7 multi-host configuration guidance is helpful. The example clearly shows how to query and use non-continuous PF indices.

Consider adding a note about what happens if users mistakenly use continuous indices (e.g., [0,1] instead of [0,2]) - would it fail with an error or silently ignore PF1?


837-841: Consider clarifying "according to capability".

The documentation states the default value is set "according to capability" but doesn't specify which capability. For user clarity, consider either:

  • Specifying which capability/feature flag determines this
  • Stating the actual default value(s) for different device types
  • Referencing where users can check this capability

Also, the phrasing "not supported...and will be forced to 0" could be simplified to "is forced to 0 (not supported)" for clarity.

app/test-pmd/testpmd.h (1)

487-489: Consider improving declaration organization and adding documentation.

The new DCB forwarding configuration variables are added but could benefit from:

  1. Grouping with related declarations: There's an existing extern uint8_t dcb_config; at line 615. Consider moving these new DCB-related declarations near that location for better organization.

  2. Add documentation comments: The purpose of dcb_fwd_tc_mask and especially dcb_fwd_tc_cores is not immediately clear. Consider adding brief comments:

    #define DEFAULT_DCB_FWD_TC_MASK 0xFF  /**< Default: all 8 TCs enabled */
    extern uint8_t dcb_fwd_tc_mask;       /**< Bitmask of enabled TCs for DCB forwarding */
    extern uint8_t dcb_fwd_tc_cores;      /**< Number of cores per TC for DCB forwarding */
  3. Naming clarity: If dcb_fwd_tc_cores represents "cores per TC", consider renaming to dcb_fwd_cores_per_tc for clarity.

doc/guides/testpmd_app_ug/testpmd_funcs.rst (1)

1490-1505: Clarify tc_mask / tc_cores constraints in docs

The semantics read correctly (bitmask per‑TC and cores per‑TC), but users may struggle without basic constraints spelled out. Consider briefly adding:

  • The valid range/format for tc_mask (e.g., 8‑bit mask matching the number of configured TCs, typically bits 0–7).
  • That tc_cores must be ≥1 and is further limited by nb_fwd_lcores / available lcores and DCB configuration, and what happens when the value is too large or the mask selects more TCs than can be scheduled.

These clarifications would make the new commands easier to use without reading the testpmd source.

app/test-pmd/cmdline.c (1)

6230-6328: DCB fwd_tc and fwd_tc_cores commands look correct; consider optional validation/wording tweaks.

Functionally this block is fine: parsing is wired correctly, RTE_UINT8 matches the intended bitmask/cores, RTE_ETH_8_TCS bounds the loop, and zero is rejected for both mask and cores. Two small, optional improvements:

  • User feedback / validation:
    • You might also validate that the mask does not select any TC index ≥ current num_tcs for the DCB config, and/or that nb_fwd_lcores is sufficient for enabled_TCs * dcb_fwd_tc_cores, so the user gets immediate feedback rather than only at reconfig time.
  • Help strings wording:
    • The current help_str texts are a bit hard to read:
-    .help_str = "config DCB forwarding on specify TCs, if bit-n in tc-mask is 1, then TC-n's forwarding is enabled, and vice versa.",
+    .help_str = "set dcb fwd_tc (tc_mask): Enable DCB forwarding only on selected TCs; if bit n in tc_mask is 1, TC n is enabled.",
@@
-    .help_str = "config DCB forwarding cores per-TC, 1-means one core process all queues of a TC.",
+    .help_str = "set dcb fwd_tc_cores (cores_per_tc): Configure number of cores assigned per TC (1 means one core handles all queues of a TC).",

These are non-blocking but would improve usability.

app/test-pmd/config.c (1)

5160-5194: dcb_fwd_tc_update_dcb_info() compresses queues correctly but loses other DCB fields

The compression of tc_queue.tc_rxq/tc_txq and nb_tcs based on dcb_fwd_tc_mask is logically sound and matches how dcb_fwd_config_setup() uses this struct (only nb_tcs and per-TC queue/base mappings).

However, because you build a fresh dcb_info initialized to zero and then assign *org_dcb_info = dcb_info;, all other fields of struct rte_eth_dcb_info (e.g. prio_tc, tc_bws) are cleared. That’s fine for the current caller, but it makes this helper brittle if it’s ever reused elsewhere expecting those fields to remain valid.

If you want to future-proof this with minimal change, consider starting from a copy instead of a zeroed struct, and only overwriting the TC queue/nb_tcs parts:

-static void
-dcb_fwd_tc_update_dcb_info(struct rte_eth_dcb_info *org_dcb_info)
-{
-    struct rte_eth_dcb_info dcb_info = {0};
+static void
+dcb_fwd_tc_update_dcb_info(struct rte_eth_dcb_info *org_dcb_info)
+{
+    struct rte_eth_dcb_info dcb_info = *org_dcb_info;
@@
-    dcb_info.nb_tcs = tc;
-    *org_dcb_info = dcb_info;
+    dcb_info.nb_tcs = tc;
+    *org_dcb_info = dcb_info;

This keeps all the non-TC-queue metadata intact while still compressing the TC mapping.

drivers/net/mlx5/mlx5.c (2)

1517-1575: Steering mode and duplicate-pattern defaults: behavior change but coherent

The new logic to:

  • prefer HWS (dv_flow_en = 2) only when SWS is not supported but HWS is, and
  • force allow_duplicate_pattern = 0 for HWS while keeping it 1 by default for non‑HWS,

is internally consistent and respects explicit user devargs via mlx5_kvargs_is_used. This will change defaults on HWS‑only devices and auto‑upgrade dv_flow_en=1 to 2 in that case, but with logging, which seems intentional.

If you want stricter safety, you could also explicitly guard against dv_flow_en == 2 when mlx5_hws_is_supported(sh) is false and downgrade or error out instead of relying solely on dev_cap.dv_flow_en.


2444-2450: Representor interrupt handler cleanup in close path

Freeing dev->intr_handle only when priv->representor is set matches the “each representor has a dedicated interrupts handler” comment and avoids touching PF’s shared handler. Assuming other code never reuses dev->intr_handle after mlx5_dev_close, this looks correct.

Consider a defensive null check before rte_intr_instance_free(dev->intr_handle) to guard against partially initialized representors.

drivers/net/mlx5/mlx5_flow.h (3)

24-26: Proxy-port helper macro is clear and localized

MLX5_HW_PORT_IS_PROXY(priv) is a simple, readable predicate over esw_mode and master and should simplify call sites.

If there’s any chance of calling this with a null priv, consider guarding the macro with (priv) && to avoid surprising crashes from misuse.


1810-1843: Improved META handling and TAG bounds checking in register selection

  • Extending the META path to also return REG_C_1 when dv_xmeta_en == MLX5_XMETA_MODE_META32_HWS or when mlx5_esw_metadata_passing_enabled(sh) holds is consistent with how metadata is passed in ESW/HWS modes.
  • The new TAG branch returning REG_NON when id >= MLX5_FLOW_HW_TAGS_MAX avoids relying on assertions and provides a defined failure path for invalid tag indices.

Both changes look correct and make the code more robust in edge cases.

If you want clearer diagnostics, consider logging when an out-of-range tag id is encountered before returning REG_NON.


3715-3719: IPv6 traffic-class modify support helper is simple and focused

mlx5_dv_modify_ipv6_traffic_class_supported() reads the shared physical-device ipv6_tc_fallback flag and encapsulates the check cleanly. Provided sh->phdev and its config are initialized during bring-up (as they are for other phdev usages), this looks safe.

If this helper might be used early in probe paths, consider an MLX5_ASSERT(priv->sh->phdev) or null-check to catch misordered initialization.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between be01128 and 6b2df31.

📒 Files selected for processing (107)
  • .mailmap (4 hunks)
  • MAINTAINERS (0 hunks)
  • VERSION (1 hunks)
  • app/test-flow-perf/main.c (1 hunks)
  • app/test-pmd/cmd_flex_item.c (1 hunks)
  • app/test-pmd/cmdline.c (3 hunks)
  • app/test-pmd/config.c (6 hunks)
  • app/test-pmd/testpmd.c (1 hunks)
  • app/test-pmd/testpmd.h (1 hunks)
  • app/test/bpf/filter.c (1 hunks)
  • app/test/bpf/load.c (1 hunks)
  • app/test/bpf/meson.build (1 hunks)
  • app/test/meson.build (1 hunks)
  • app/test/test_bpf.c (3 hunks)
  • app/test/test_cryptodev.c (4 hunks)
  • app/test/test_cryptodev_aead_test_vectors.h (2 hunks)
  • app/test/test_net_ip6.c (1 hunks)
  • config/arm/meson.build (2 hunks)
  • doc/api/dts/tests.TestSuite_single_core_forward_perf.rst (1 hunks)
  • doc/api/dts/tests.TestSuite_virtio_fwd.rst (1 hunks)
  • doc/guides/conf.py (1 hunks)
  • doc/guides/nics/ice.rst (1 hunks)
  • doc/guides/nics/mlx5.rst (5 hunks)
  • doc/guides/prog_guide/ethdev/ethdev.rst (1 hunks)
  • doc/guides/rel_notes/deprecation.rst (0 hunks)
  • doc/guides/rel_notes/release_25_11.rst (1 hunks)
  • doc/guides/testpmd_app_ug/testpmd_funcs.rst (1 hunks)
  • doc/guides/tools/dts.rst (7 hunks)
  • drivers/bus/fslmc/bus_fslmc_driver.h (0 hunks)
  • drivers/bus/fslmc/fslmc_bus.c (1 hunks)
  • drivers/common/cnxk/hw/nix.h (7 hunks)
  • drivers/common/cnxk/roc_cpt.c (1 hunks)
  • drivers/common/cnxk/roc_cpt.h (1 hunks)
  • drivers/common/cnxk/roc_features.h (1 hunks)
  • drivers/common/cnxk/roc_nix.h (9 hunks)
  • drivers/common/cnxk/roc_nix_debug.c (1 hunks)
  • drivers/common/cnxk/roc_nix_fc.c (2 hunks)
  • drivers/common/cnxk/roc_nix_inl.c (4 hunks)
  • drivers/common/cnxk/roc_nix_inl_dev.c (1 hunks)
  • drivers/common/cnxk/roc_nix_ops.c (4 hunks)
  • drivers/common/cnxk/roc_nix_priv.h (3 hunks)
  • drivers/common/cnxk/roc_nix_queue.c (10 hunks)
  • drivers/common/cnxk/roc_nix_tm.c (2 hunks)
  • drivers/common/cnxk/roc_nix_tm_ops.c (4 hunks)
  • drivers/common/cnxk/roc_nix_tm_utils.c (1 hunks)
  • drivers/common/cnxk/roc_npa.c (3 hunks)
  • drivers/common/cnxk/roc_npa.h (1 hunks)
  • drivers/common/cnxk/roc_npa_priv.h (1 hunks)
  • drivers/common/cnxk/roc_npa_type.c (3 hunks)
  • drivers/common/cnxk/roc_platform.h (1 hunks)
  • drivers/common/cnxk/roc_platform_base_symbols.c (5 hunks)
  • drivers/common/mlx5/mlx5_common.h (1 hunks)
  • drivers/common/mlx5/mlx5_common_mr.c (1 hunks)
  • drivers/common/mlx5/mlx5_devx_cmds.c (2 hunks)
  • drivers/common/mlx5/mlx5_devx_cmds.h (1 hunks)
  • drivers/common/mlx5/mlx5_prm.h (3 hunks)
  • drivers/crypto/cnxk/cn20k_cryptodev_ops.c (1 hunks)
  • drivers/crypto/cnxk/cnxk_cryptodev_ops.c (9 hunks)
  • drivers/crypto/cnxk/cnxk_cryptodev_ops.h (1 hunks)
  • drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c (1 hunks)
  • drivers/crypto/mlx5/mlx5_crypto.c (1 hunks)
  • drivers/crypto/qat/qat_sym_session.c (1 hunks)
  • drivers/dma/dpaa2/dpaa2_qdma.c (2 hunks)
  • drivers/net/axgbe/axgbe_ethdev.c (2 hunks)
  • drivers/net/cnxk/cn10k_ethdev.c (1 hunks)
  • drivers/net/cnxk/cn10k_ethdev_sec.c (1 hunks)
  • drivers/net/cnxk/cn20k_ethdev.c (1 hunks)
  • drivers/net/cnxk/cnxk_ethdev.c (3 hunks)
  • drivers/net/cnxk/cnxk_ethdev_mtr.c (1 hunks)
  • drivers/net/cnxk/cnxk_ethdev_ops.c (1 hunks)
  • drivers/net/dpaa2/dpaa2_ethdev.c (3 hunks)
  • drivers/net/dpaa2/dpaa2_ethdev.h (0 hunks)
  • drivers/net/dpaa2/dpaa2_recycle.c (0 hunks)
  • drivers/net/intel/common/rx.h (2 hunks)
  • drivers/net/intel/e1000/igb_ethdev.c (1 hunks)
  • drivers/net/intel/fm10k/fm10k_ethdev.c (2 hunks)
  • drivers/net/intel/iavf/iavf_ethdev.c (1 hunks)
  • drivers/net/intel/iavf/iavf_rxtx.c (3 hunks)
  • drivers/net/intel/iavf/iavf_rxtx.h (1 hunks)
  • drivers/net/intel/ice/ice_ethdev.c (1 hunks)
  • drivers/net/intel/ice/ice_rxtx.c (2 hunks)
  • drivers/net/intel/ice/ice_rxtx_vec_common.h (2 hunks)
  • drivers/net/intel/idpf/idpf_rxtx.c (1 hunks)
  • drivers/net/intel/ixgbe/base/ixgbe_vf.c (1 hunks)
  • drivers/net/intel/ixgbe/ixgbe_rxtx.c (1 hunks)
  • drivers/net/mlx5/hws/mlx5dr_buddy.c (1 hunks)
  • drivers/net/mlx5/hws/mlx5dr_pool.c (1 hunks)
  • drivers/net/mlx5/hws/mlx5dr_rule.c (1 hunks)
  • drivers/net/mlx5/linux/mlx5_ethdev_os.c (1 hunks)
  • drivers/net/mlx5/linux/mlx5_os.c (5 hunks)
  • drivers/net/mlx5/mlx5.c (6 hunks)
  • drivers/net/mlx5/mlx5.h (0 hunks)
  • drivers/net/mlx5/mlx5_flow.c (1 hunks)
  • drivers/net/mlx5/mlx5_flow.h (6 hunks)
  • drivers/net/mlx5/mlx5_flow_dv.c (6 hunks)
  • drivers/net/mlx5/mlx5_flow_hw.c (21 hunks)
  • drivers/net/mlx5/mlx5_nta_sample.c (2 hunks)
  • drivers/net/mlx5/mlx5_rx.c (2 hunks)
  • drivers/net/mlx5/mlx5_trigger.c (12 hunks)
  • drivers/net/mlx5/mlx5_txq.c (2 hunks)
  • drivers/net/mlx5/mlx5_utils.c (1 hunks)
  • drivers/net/mlx5/windows/mlx5_flow_os.c (1 hunks)
  • drivers/net/mlx5/windows/mlx5_flow_os.h (1 hunks)
  • drivers/net/qede/base/ecore_mng_tlv.c (0 hunks)
  • drivers/net/txgbe/base/txgbe_aml.c (2 hunks)
  • drivers/net/txgbe/base/txgbe_aml40.c (2 hunks)
  • drivers/net/txgbe/base/txgbe_eeprom.c (8 hunks)
⛔ Files not processed due to max files limit (32)
  • drivers/net/txgbe/base/txgbe_hw.c
  • drivers/net/txgbe/base/txgbe_mng.c
  • drivers/net/txgbe/base/txgbe_mng.h
  • drivers/net/txgbe/base/txgbe_type.h
  • dts/api/packet.py
  • dts/api/test.py
  • dts/api/testpmd/init.py
  • dts/configurations/test_run.example.yaml
  • dts/configurations/tests_config.example.yaml
  • dts/framework/config/test_run.py
  • dts/framework/context.py
  • dts/framework/remote_session/blocking_app.py
  • dts/framework/remote_session/dpdk.py
  • dts/framework/remote_session/interactive_shell.py
  • dts/framework/settings.py
  • dts/framework/test_run.py
  • dts/framework/testbed_model/linux_session.py
  • dts/framework/testbed_model/os_session.py
  • dts/framework/testbed_model/traffic_generator/init.py
  • dts/framework/testbed_model/traffic_generator/capturing_traffic_generator.py
  • dts/framework/testbed_model/traffic_generator/performance_traffic_generator.py
  • dts/framework/testbed_model/traffic_generator/scapy.py
  • dts/framework/testbed_model/traffic_generator/traffic_generator.py
  • dts/framework/testbed_model/traffic_generator/trex.py
  • dts/tests/TestSuite_qinq.py
  • dts/tests/TestSuite_single_core_forward_perf.py
  • dts/tests/TestSuite_virtio_fwd.py
  • dts/tests_config.example.yaml
  • lib/ethdev/ethdev_driver.h
  • lib/ethdev/ethdev_private.c
  • lib/net/rte_ip6.h
  • lib/pdump/rte_pdump.c
💤 Files with no reviewable changes (7)
  • drivers/net/mlx5/mlx5.h
  • doc/guides/rel_notes/deprecation.rst
  • drivers/bus/fslmc/bus_fslmc_driver.h
  • drivers/net/dpaa2/dpaa2_ethdev.h
  • MAINTAINERS
  • drivers/net/qede/base/ecore_mng_tlv.c
  • drivers/net/dpaa2/dpaa2_recycle.c
🧰 Additional context used
🧬 Code graph analysis (30)
drivers/net/cnxk/cn20k_ethdev.c (3)
drivers/net/cnxk/cnxk_ethdev.h (1)
  • cnxk_eth_pmd_priv (463-467)
drivers/net/cnxk/cn20k_rxtx.h (1)
  • handle_tx_completion_pkts (147-190)
drivers/net/cnxk/cnxk_ethdev.c (1)
  • cnxk_nix_tx_queue_release (654-689)
drivers/net/cnxk/cnxk_ethdev_ops.c (1)
drivers/common/cnxk/roc_nix.c (1)
  • roc_nix_is_sdp (48-54)
drivers/common/cnxk/roc_npa_type.c (1)
drivers/common/cnxk/roc_model.h (1)
  • roc_model_is_cn20k (151-161)
drivers/net/txgbe/base/txgbe_aml40.c (1)
drivers/net/txgbe/base/txgbe_mng.c (1)
  • txgbe_host_interface_command_aml (163-272)
drivers/net/txgbe/base/txgbe_aml.c (1)
drivers/net/txgbe/base/txgbe_mng.c (1)
  • txgbe_host_interface_command_aml (163-272)
drivers/common/cnxk/roc_features.h (1)
drivers/common/cnxk/roc_model.h (1)
  • roc_model_is_cn20k (151-161)
app/test/bpf/load.c (1)
app/test/test_bpf.c (1)
  • dummy_func1 (1827-1836)
drivers/crypto/cnxk/cnxk_cryptodev_ops.c (2)
drivers/common/cnxk/roc_cpt.c (4)
  • roc_cpt_iq_enable (1202-1235)
  • roc_cpt_cq_enable (1128-1138)
  • roc_cpt_iq_disable (1148-1200)
  • roc_cpt_cq_disable (1140-1146)
drivers/common/cnxk/roc_model.h (1)
  • roc_model_is_cn20k (151-161)
drivers/common/cnxk/roc_npa.h (1)
drivers/common/cnxk/roc_npa.c (1)
  • roc_npa_pool_bp_configure (1027-1137)
drivers/net/cnxk/cnxk_ethdev.c (3)
drivers/common/cnxk/roc_nix_queue.c (1)
  • roc_nix_cq_fini (1355-1436)
drivers/common/cnxk/roc_nix.c (1)
  • roc_nix_is_sdp (48-54)
drivers/common/cnxk/roc_nix_tm_ops.c (1)
  • roc_nix_tm_hierarchy_enable (646-714)
drivers/net/dpaa2/dpaa2_ethdev.c (2)
drivers/bus/fslmc/portal/dpaa2_hw_pvt.h (1)
  • clear_swp_active_dqs (483-487)
lib/ethdev/ethdev_driver.c (1)
  • rte_eth_dev_allocated (149-164)
drivers/common/cnxk/roc_npa.c (1)
drivers/common/cnxk/roc_model.h (1)
  • roc_model_is_cn20k (151-161)
drivers/common/cnxk/roc_nix_ops.c (2)
drivers/common/cnxk/roc_nix_priv.h (1)
  • roc_nix_to_nix_priv (290-294)
drivers/common/cnxk/roc_model.h (1)
  • roc_model_is_cn20k (151-161)
drivers/common/cnxk/roc_nix.h (3)
drivers/common/cnxk/roc_nix_tm.c (1)
  • roc_nix_tm_sdp_prepare_tree (1893-2049)
drivers/common/cnxk/roc_nix_ops.c (2)
  • roc_nix_lso_fmt_ipv4_frag_get (482-488)
  • roc_nix_lso_alt_flags_profile_setup (164-190)
drivers/common/cnxk/roc_nix_queue.c (2)
  • roc_nix_sq_cnt_update (2487-2544)
  • roc_nix_sq_resize (2348-2410)
drivers/common/cnxk/roc_nix_fc.c (3)
drivers/common/cnxk/roc_nix_priv.h (1)
  • roc_nix_to_nix_priv (290-294)
drivers/common/cnxk/roc_npa.c (1)
  • roc_npa_pool_bp_configure (1027-1137)
drivers/common/cnxk/roc_model.h (1)
  • roc_model_is_cn20k (151-161)
app/test-pmd/cmd_flex_item.c (1)
lib/ethdev/rte_flow.c (1)
  • rte_flow_conv (1118-1186)
drivers/common/cnxk/roc_nix_tm_ops.c (4)
drivers/common/cnxk/roc_nix_queue.c (1)
  • roc_nix_sq_cnt_update (2487-2544)
drivers/common/cnxk/roc_nix_debug.c (2)
  • roc_nix_tm_dump (1413-1448)
  • roc_nix_dump (1450-1522)
drivers/common/cnxk/roc_nix.c (1)
  • roc_nix_is_sdp (48-54)
drivers/common/cnxk/roc_nix_tm.c (2)
  • roc_nix_tm_sdp_prepare_tree (1893-2049)
  • nix_tm_prepare_default_tree (1587-1658)
app/test/test_bpf.c (2)
app/test/bpf/filter.c (3)
  • section (21-37)
  • section (39-45)
  • section (47-53)
app/test/bpf/load.c (1)
  • section (31-51)
drivers/common/cnxk/roc_nix_queue.c (6)
drivers/common/cnxk/roc_nix_priv.h (2)
  • nix_priv_to_roc_nix (296-301)
  • roc_nix_to_nix_priv (290-294)
drivers/common/cnxk/roc_model.h (1)
  • roc_model_is_cn9k (127-137)
drivers/common/cnxk/roc_npa.c (4)
  • roc_npa_pool_create (674-730)
  • roc_npa_aura_limit_modify (836-873)
  • roc_npa_pool_op_range_set (26-47)
  • roc_npa_pool_destroy (897-911)
drivers/common/cnxk/roc_npa_type.c (1)
  • roc_npa_buf_type_update (8-31)
drivers/common/cnxk/roc_npa.h (4)
  • roc_npa_aura_op_cnt_set (88-102)
  • roc_npa_aura_op_available_wait (159-179)
  • roc_npa_aura_op_alloc (55-67)
  • roc_npa_aura_op_cnt_get (69-86)
drivers/common/cnxk/roc_nix.c (1)
  • roc_nix_max_pkt_len (158-176)
drivers/common/cnxk/roc_nix_inl.c (1)
drivers/common/cnxk/roc_model.h (1)
  • roc_model_is_cn10k (139-149)
drivers/net/cnxk/cn10k_ethdev.c (3)
drivers/net/cnxk/cnxk_ethdev.h (1)
  • cnxk_eth_pmd_priv (463-467)
drivers/net/cnxk/cn10k_rxtx.h (1)
  • handle_tx_completion_pkts (150-196)
drivers/net/cnxk/cnxk_ethdev.c (1)
  • cnxk_nix_tx_queue_release (654-689)
drivers/net/mlx5/linux/mlx5_os.c (2)
drivers/net/mlx5/mlx5_flow_dv.c (1)
  • flow_dv_tbl_resource_release (12002-12012)
drivers/net/mlx5/linux/mlx5_ethdev_os.c (1)
  • mlx5_set_link_up (982-986)
drivers/net/mlx5/mlx5_txq.c (1)
drivers/net/mlx5/mlx5_flow_hw.c (2)
  • mlx5_flow_hw_create_tx_repr_matching_flow (16070-16124)
  • mlx5_flow_hw_destroy_tx_repr_matching_flow (16138-16173)
drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c (1)
lib/cryptodev/rte_cryptodev.c (1)
  • rte_cryptodev_pmd_get_named_dev (924-942)
drivers/net/mlx5/mlx5_trigger.c (7)
drivers/net/mlx5/mlx5.h (1)
  • mlx5_devx_obj_ops_en (2227-2236)
drivers/net/mlx5/mlx5_rxq.c (2)
  • mlx5_rx_intr_vec_enable (1084-1160)
  • mlx5_rx_intr_vec_disable (1168-1197)
drivers/net/mlx5/mlx5_flow_hw.c (6)
  • flow_hw_table_update (5293-5307)
  • mlx5_flow_hw_create_tx_repr_matching_flow (16070-16124)
  • mlx5_flow_hw_create_nic_tx_default_mreg_copy_flow (16020-16068)
  • mlx5_flow_hw_create_ctrl_rx_tables (11473-11511)
  • mlx5_flow_hw_cleanup_ctrl_rx_tables (11187-11215)
  • mlx5_flow_hw_flush_ctrl_flows (15691-15723)
drivers/net/mlx5/mlx5.c (1)
  • mlx5_dev_ctx_shared_mempool_subscribe (1268-1282)
drivers/net/mlx5/mlx5_flow.c (3)
  • mlx5_flow_stop_default (8190-8211)
  • mlx5_action_handle_detach (11004-11032)
  • mlx5_vport_tx_metadata_passing_enabled (12561-12568)
drivers/net/mlx5/mlx5_txpp.c (1)
  • mlx5_txpp_stop (938-958)
drivers/net/mlx5/mlx5_txq.c (1)
  • mlx5_txq_release (1330-1363)
drivers/common/cnxk/roc_platform_base_symbols.c (4)
drivers/common/cnxk/roc_nix_tm.c (1)
  • roc_nix_tm_sdp_prepare_tree (1893-2049)
drivers/common/cnxk/roc_nix_ops.c (5)
  • roc_nix_lso_alt_flags_profile_setup (164-190)
  • roc_nix_lso_custom_fmt_setup (192-225)
  • roc_nix_lso_fmt_setup (295-480)
  • roc_nix_lso_fmt_get (490-500)
  • roc_nix_lso_fmt_ipv4_frag_get (482-488)
drivers/common/cnxk/roc_nix_queue.c (2)
  • roc_nix_sq_resize (2348-2410)
  • roc_nix_sq_cnt_update (2487-2544)
drivers/common/cnxk/roc_npa.c (1)
  • roc_npa_pool_bp_configure (1027-1137)
drivers/net/mlx5/mlx5_flow.h (2)
drivers/net/mlx5/mlx5_flow.c (1)
  • mlx5_esw_metadata_passing_enabled (12570-12580)
drivers/net/mlx5/mlx5_flow_hw.c (2)
  • mlx5_flow_hw_create_ctrl_rx_tables (11473-11511)
  • mlx5_flow_hw_cleanup_ctrl_rx_tables (11187-11215)
app/test-pmd/config.c (1)
app/test-pmd/testpmd.c (1)
  • init_fwd_streams (1810-1898)
drivers/common/cnxk/roc_nix_tm.c (3)
drivers/common/cnxk/roc_nix_priv.h (2)
  • roc_nix_to_nix_priv (290-294)
  • nix_tm_have_tl1_access (343-347)
drivers/common/cnxk/roc_nix_tm_utils.c (2)
  • nix_tm_node_alloc (1093-1104)
  • nix_tm_node_free (1106-1113)
drivers/common/cnxk/roc_nix.c (1)
  • roc_nix_is_pf (56-62)
drivers/net/mlx5/mlx5_flow_hw.c (2)
drivers/net/mlx5/mlx5_flow.h (1)
  • mlx5_dv_modify_ipv6_traffic_class_supported (3715-3719)
drivers/net/mlx5/mlx5_utils.c (7)
  • mlx5_ipool_malloc (667-732)
  • mlx5_ipool_free (744-812)
  • mlx5_ipool_create (106-153)
  • mlx5_ipool_destroy (845-894)
  • mlx5_ipool_flush_cache (896-942)
  • mlx5_ipool_get_next (967-984)
  • mlx5_ipool_resize (986-1012)
🪛 Ruff (0.14.5)
doc/guides/conf.py

110-110: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

Comment on lines +146 to 162
ret = rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_MASK, NULL, 0, item, NULL);
if ((ret > 0) && pattern->spec) {
ptr = (void *)(uintptr_t)item->spec;
memcpy(ptr, pattern->spec, FLEX_MAX_FLOW_PATTERN_LENGTH);
memcpy(ptr, pattern->spec, ret);
} else {
item->spec = NULL;
}
if (pattern->mask) {
if ((ret > 0) && pattern->mask) {
ptr = (void *)(uintptr_t)item->mask;
memcpy(ptr, pattern->mask, FLEX_MAX_FLOW_PATTERN_LENGTH);
memcpy(ptr, pattern->mask, ret);
} else {
item->mask = NULL;
}
if (pattern->last) {
if ((ret > 0) && pattern->last) {
ptr = (void *)(uintptr_t)item->last;
memcpy(ptr, pattern->last, FLEX_MAX_FLOW_PATTERN_LENGTH);
memcpy(ptr, pattern->last, ret);
} else {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use pattern as the conversion source and validate ret before memcpy

The switch to rte_flow_conv() and using ret as the copy size fixes the previous fixed-length copy, but the current call:

ret = rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_MASK, NULL, 0, item, NULL);

uses item as the source, even though only item->type is initialized and item->spec/mask/last just point to generic byte buffers. For item types with variable layouts (e.g. RAW, FLEX), the conversion helpers may inspect fields in item->mask, which are not valid typed structures here.

To avoid undefined behavior and make the intent explicit, it’s safer to use the parsed pattern as the source (it has proper spec/mask/last), and to treat conversion failure or an unexpectedly large size as an error:

-    item->type = pattern->type;
-    ret = rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_MASK, NULL, 0, item, NULL);
-    if ((ret > 0) && pattern->spec) {
+    item->type = pattern->type;
+    ret = rte_flow_conv(RTE_FLOW_CONV_OP_ITEM_MASK, NULL, 0, pattern, NULL);
+    if (ret <= 0 || ret > FLEX_MAX_FLOW_PATTERN_LENGTH)
+        return -EINVAL;
+    if (pattern->spec) {
         ptr = (void *)(uintptr_t)item->spec;
         memcpy(ptr, pattern->spec, ret);
     } else {
         item->spec = NULL;
     }
-    if ((ret > 0) && pattern->mask) {
+    if (pattern->mask) {
         ptr = (void *)(uintptr_t)item->mask;
         memcpy(ptr, pattern->mask, ret);
     } else {
         item->mask = NULL;
     }
-    if ((ret > 0) && pattern->last) {
+    if (pattern->last) {
         ptr = (void *)(uintptr_t)item->last;
         memcpy(ptr, pattern->last, ret);
     } else {
         item->last = NULL;
     }

This keeps the new behavior (length derived from the real item type) while ensuring the conversion logic only ever sees a fully initialized struct rte_flow_item and preventing oversize copies into the fixed FLEX_MAX_FLOW_PATTERN_LENGTH buffers.

Comment on lines +514 to +515
"set dcb fwd_tc (tc_mask)\n"
" Set DCB forwarding on specify TCs, if bit-n in tc-mask is 1, then TC-n's forwarding is enabled\n\n"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Polish DCB fwd help text and include both new commands in help config.

The new help config entry documents only set dcb fwd_tc and has slightly awkward wording (“on specify TCs”, “tc-mask”). Consider:

  • Fixing the text to something clearer, e.g.:
-            "set dcb fwd_tc (tc_mask)\n"
-            "    Set DCB forwarding on specify TCs, if bit-n in tc-mask is 1, then TC-n's forwarding is enabled\n\n"
+            "set dcb fwd_tc (tc_mask)\n"
+            "    Enable DCB forwarding only on selected TCs; if bit n in tc_mask is 1, TC n is enabled for forwarding.\n"
+            "set dcb fwd_tc_cores (cores_per_tc)\n"
+            "    Set number of cores used per enabled TC in DCB forwarding.\n\n"

This keeps CLI and docs aligned and makes both knobs discoverable from in-app help.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"set dcb fwd_tc (tc_mask)\n"
" Set DCB forwarding on specify TCs, if bit-n in tc-mask is 1, then TC-n's forwarding is enabled\n\n"
"set dcb fwd_tc (tc_mask)\n"
" Enable DCB forwarding only on selected TCs; if bit n in tc_mask is 1, TC n is enabled for forwarding.\n"
"set dcb fwd_tc_cores (cores_per_tc)\n"
" Set number of cores used per enabled TC in DCB forwarding.\n\n"
🤖 Prompt for AI Agents
In app/test-pmd/cmdline.c around lines 514-515, the help text for the new DCB
forwarding command is awkward and only documents one of the new commands; update
the text to a clearer, user-facing sentence such as "Set DCB forwarding for
specified traffic classes; each bit set in the TC mask enables forwarding for
that traffic class." and ensure both newly added commands are included in the
`help config` output so both knobs are discoverable from the in-app help.

Comment on lines +5115 to +5143
static int
dcb_fwd_check_cores_per_tc(void)
{
struct rte_eth_dcb_info dcb_info = {0};
uint32_t port, tc, vmdq_idx;

if (dcb_fwd_tc_cores == 1)
return 0;

for (port = 0; port < nb_fwd_ports; port++) {
(void)rte_eth_dev_get_dcb_info(fwd_ports_ids[port], &dcb_info);
for (tc = 0; tc < dcb_info.nb_tcs; tc++) {
for (vmdq_idx = 0; vmdq_idx < RTE_ETH_MAX_VMDQ_POOL; vmdq_idx++) {
if (dcb_info.tc_queue.tc_rxq[vmdq_idx][tc].nb_queue == 0)
break;
/* make sure nb_rx_queue can be divisible. */
if (dcb_info.tc_queue.tc_rxq[vmdq_idx][tc].nb_queue %
dcb_fwd_tc_cores)
return -1;
/* make sure nb_tx_queue can be divisible. */
if (dcb_info.tc_queue.tc_txq[vmdq_idx][tc].nb_queue %
dcb_fwd_tc_cores)
return -1;
}
}
}

return 0;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Make dcb_fwd_check_cores_per_tc() respect the TC mask and guard cores-per-TC

Two concerns here:

  1. Masked-out TCs still enforced
    The function validates every TC on the port, ignoring dcb_fwd_tc_mask. If a TC is not selected for DCB forwarding (mask bit cleared) but its RX/TX queue count is not divisible by dcb_fwd_tc_cores, this function will still return -1 and abort DCB setup even though that TC will never be used for forwarding. That makes valid “subset of TCs” configurations fail.

    Consider restricting the check to selected TCs only, e.g.:

    static int
    dcb_fwd_check_cores_per_tc(void)
    {
        struct rte_eth_dcb_info dcb_info = {0};
        uint32_t port, tc, vmdq_idx;
    
        if (dcb_fwd_tc_cores == 1)
            return 0;
    
        for (port = 0; port < nb_fwd_ports; port++) {
            (void)rte_eth_dev_get_dcb_info(fwd_ports_ids[port], &dcb_info);
  •      for (tc = 0; tc < dcb_info.nb_tcs; tc++) {
    
  •      for (tc = 0; tc < dcb_info.nb_tcs; tc++) {
    
  •          if (!(dcb_fwd_tc_mask & (1u << tc)))
    
  •              continue;
             for (vmdq_idx = 0; vmdq_idx < RTE_ETH_MAX_VMDQ_POOL; vmdq_idx++) {
    
    
    
  1. Potential divide-by-zero if dcb_fwd_tc_cores is misconfigured
    This helper assumes dcb_fwd_tc_cores > 0. If some CLI path or default ever leaves it at 0, the modulus operations here and the divisions in dcb_fwd_config_setup() will hit undefined behaviour. Even if another file is intended to clamp it to ≥1, it would be safer to harden this path.

    Suggest adding an explicit guard in the DCB setup path (see comment on dcb_fwd_config_setup) or here, e.g.:

        if (dcb_fwd_tc_cores == 1)
            return 0;
  •  if (dcb_fwd_tc_cores == 0)
    
  •      return -1;
    
    
    and treat this as an invalid configuration before doing any `% dcb_fwd_tc_cores` work.
    
    

🏁 Script executed:

#!/bin/bash
# Search for dcb_fwd_tc_cores initialization and validation
rg -nP 'dcb_fwd_tc_cores\s*=' app/test-pmd/ -C 3

Length of output: 1581


🏁 Script executed:

#!/bin/bash
# Search for dcb_fwd_tc_mask usage and how it gates TC operations
rg -nP 'dcb_fwd_tc_mask' app/test-pmd/ -C 2 -g '*.c'

Length of output: 1938


🏁 Script executed:

#!/bin/bash
# Get full context of dcb_fwd_config_setup() to see divisor usage
rg -nP 'dcb_fwd_config_setup' app/test-pmd/config.c -A 100 | head -150

Length of output: 5085


🏁 Script executed:

#!/bin/bash
# Check if there's any zero-check for dcb_fwd_tc_cores before modulo operations
rg -nP '(dcb_fwd_tc_cores\s*==\s*0|if.*dcb_fwd_tc_cores)' app/test-pmd/ -C 2 -g '*.c'

Length of output: 464


🏁 Script executed:

#!/bin/bash
# Look for how dcb_fwd_tc_mask filters TC iteration in other functions
rg -nP '\(1[ul]?\s*<<\s*tc\)' app/test-pmd/config.c -C 3

Length of output: 0


Make dcb_fwd_check_cores_per_tc() respect the TC mask and harden cores-per-TC validation

Two confirmed issues:

  1. Masked-out TCs still enforced
    The function validates every TC on the port, ignoring dcb_fwd_tc_mask. If a TC is not selected for DCB forwarding (mask bit cleared) but its RX/TX queue count is not divisible by dcb_fwd_tc_cores, this function will still return -1 and abort DCB setup even though that TC will never be used. Compare with dcb_fwd_tc_update_dcb_info() (line 5178), which correctly filters with if (!(dcb_fwd_tc_mask & (1u << i))) continue;

    Restrict the check to selected TCs only:

    -          for (tc = 0; tc < dcb_info.nb_tcs; tc++) {
    +          for (tc = 0; tc < dcb_info.nb_tcs; tc++) {
    +              if (!(dcb_fwd_tc_mask & (1u << tc)))
    +                  continue;
                    for (vmdq_idx = 0; vmdq_idx < RTE_ETH_MAX_VMDQ_POOL; vmdq_idx++) {
  2. Missing zero-guard before division
    Although dcb_fwd_tc_cores is initialized to 1 and CLI validation (cmdline.c:6297) prevents zero, dcb_fwd_config_setup() divides by it at lines 5296–5299 without explicit defensive guard. Harden the path by adding:

         if (dcb_fwd_tc_cores == 1)
             return 0;
    +   if (dcb_fwd_tc_cores == 0)
    +       return -1;
🤖 Prompt for AI Agents
In app/test-pmd/config.c around lines 5115 to 5143, update
dcb_fwd_check_cores_per_tc() so it only validates TCs that are selected in
dcb_fwd_tc_mask and add a defensive zero-guard for dcb_fwd_tc_cores: first, at
the top of the function return an error (e.g., -1) if dcb_fwd_tc_cores == 0;
then inside the tc loop skip any TC where !(dcb_fwd_tc_mask & (1u << tc)) before
checking rx/tx queue divisibility; keep the existing modulo checks but they will
now only run for masked TCs and with the zero-guard protecting against
division/modulo by zero.

Comment on lines +12 to +17
struct dummy_offset {
uint64_t u64;
uint32_t u32;
uint16_t u16;
uint8_t u8;
} __attribute__((packed));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify structure packing consistency with test_bpf.c.

The struct dummy_offset at lines 12-17 includes __attribute__((packed)), which forces the compiler to eliminate padding between fields. However, examining the structure definition in test_bpf.c (lines 43-48), there's no indication of packing:

struct dummy_offset {
    RTE_ATOMIC(uint64_t) u64;
    RTE_ATOMIC(uint32_t) u32;
    uint16_t u16;
    uint8_t  u8;
};

The RTE_ATOMIC macro might affect alignment. Without packing, the natural alignment would be:

  • u64 at offset 0 (8 bytes)
  • u32 at offset 8 (4 bytes)
  • u16 at offset 12 (2 bytes)
  • u8 at offset 14 (1 byte)
    Total: 15 bytes, but likely padded to 16 bytes

With __attribute__((packed)), there's no padding, resulting in 15 bytes.

This mismatch could cause data corruption when the BPF program accesses the structure.

Verify the structure layout by checking how struct dummy_offset is used in test_bpf.c:


🏁 Script executed:

#!/bin/bash
# Search for all uses of struct dummy_offset in test_bpf.c to understand expected layout
rg -n -C5 'struct dummy_offset' app/test/test_bpf.c | head -100

Length of output: 2393



---

🏁 Script executed:

```shell
#!/bin/bash
# Find RTE_ATOMIC macro definition
rg -n 'define\s+RTE_ATOMIC|^#define.*RTE_ATOMIC' --type=c --type=h | head -20

Length of output: 1144


🏁 Script executed:

#!/bin/bash
# Check load.c struct dummy_offset definition and how it's used
rg -n 'struct dummy_offset' app/test/bpf/load.c -C5 | head -100

Length of output: 522


🏁 Script executed:

#!/bin/bash
# Search for offsetof usage in load.c to see if it uses dynamic or hardcoded offsets
rg -n 'offsetof|\.off\s*=' app/test/bpf/load.c | head -50

Length of output: 0


Remove __attribute__((packed)) from struct dummy_offset in load.c (lines 12-17).

The struct is declared with __attribute__((packed)) in load.c, but the corresponding struct in test_bpf.c (lines 43-48) uses RTE_ATOMIC wrappers without packing. RTE_ATOMIC(type) expands to _Atomic(type), which enforces compiler-defined alignment requirements that differ from the base type.

Without packing in test_bpf.c, the natural alignment creates a different field layout than load.c's forced contiguous packing. The BPF bytecode compiled from load.c will use offsets based on the packed layout, but the kernel will pass a struct with the unpacked layout from test_bpf.c, causing memory access violations.

Remove the __attribute__((packed)) from line 17 to match test_bpf.c's expected layout.

🤖 Prompt for AI Agents
In app/test/bpf/load.c around lines 12 to 17, the struct dummy_offset is
declared with __attribute__((packed)), causing a packed field layout that
disagrees with the unpacked, RTE_ATOMIC-wrapped definition used in test_bpf.c;
remove the __attribute__((packed)) from the struct declaration so the compiler
uses natural alignment and the field offsets match the kernel/test_bpf.c layout.

};
struct rte_flow_error error;
struct rte_flow *flow = flow = rte_flow_create(port_id, &attr, pattern, actions, &error);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo in C code example.

Line 844 has a duplicate variable assignment: struct rte_flow *flow = flow = rte_flow_create(...). Remove the redundant assignment.

-   struct rte_flow *flow = flow = rte_flow_create(port_id, &attr, pattern, actions, &error);
+   struct rte_flow *flow = rte_flow_create(port_id, &attr, pattern, actions, &error);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
struct rte_flow *flow = flow = rte_flow_create(port_id, &attr, pattern, actions, &error);
struct rte_flow *flow = rte_flow_create(port_id, &attr, pattern, actions, &error);
🤖 Prompt for AI Agents
In doc/guides/nics/ice.rst around line 844, the C example contains a duplicated
assignment ("struct rte_flow *flow = flow = rte_flow_create(...)"); remove the
redundant "flow =" so the variable is assigned only once (i.e., declare "struct
rte_flow *flow = rte_flow_create(...)" and keep the rest of the call and error
handling unchanged).

Comment on lines +646 to +647
while (!qbman_check_command_complete(qs->active_dqs))
continue; /* wait */
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add timeout to prevent infinite busy-wait.

The busy-wait loop checking command completion has no timeout mechanism, which could cause an infinite hang if the hardware fails to complete the operation. This is a critical reliability issue that could deadlock the driver during queue cleanup.

Consider adding a timeout with a counter and error handling:

+			uint32_t timeout = 10000; /* Timeout iterations */
 			while (!qbman_check_command_complete(qs->active_dqs))
-				continue; /* wait */
+			{
+				if (--timeout == 0) {
+					DPAA2_PMD_ERR("Timeout waiting for DQS completion on lcore %d", i);
+					break;
+				}
+				rte_pause();
+			}

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In drivers/net/dpaa2/dpaa2_ethdev.c around lines 646-647, the busy-wait loop
that calls qbman_check_command_complete(qs->active_dqs) must be guarded with a
timeout to avoid an infinite hang; introduce a timeout mechanism (either
iteration counter or elapsed-time check using the driver/platform timing API),
sleep/yield briefly inside the loop to avoid spinning (e.g., short usleep or
rte_delay_us), and if the timeout is reached log an error with context, break
out of the loop and propagate/return an appropriate error code or perform safe
cleanup so the driver does not deadlock. Ensure the timeout value is reasonable
and documented in a comment, and update callers to handle the error path if
necessary.

Comment on lines +1180 to 1182
if (vf->vf_res->vf_cap_flags & VIRTCHNL_VF_CAP_PTP &&
vf->ptp_caps & VIRTCHNL_1588_PTP_CAP_RX_TSTAMP)
dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_TIMESTAMP;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Verify vf->ptp_caps initialization order.

The added check for vf->ptp_caps & VIRTCHNL_1588_PTP_CAP_RX_TSTAMP is more precise, but vf->ptp_caps appears to be populated only during iavf_dev_start() (Line 1023). If an application calls rte_eth_dev_info_get() before starting the device, vf->ptp_caps may be uninitialized (zero), causing the RX timestamp capability to not be advertised even when supported by the hardware.

Verify that vf->ptp_caps is properly initialized before iavf_dev_info_get() can be invoked, or handle the uninitialized case appropriately.


🏁 Script executed:

#!/bin/bash
# Search for ptp_caps initialization and usage patterns

echo "=== Searching for ptp_caps initialization ==="
rg -n "ptp_caps\s*=" --type c

echo -e "\n=== Searching for ptp_caps usage in condition checks ==="
rg -n "ptp_caps\s*&" --type c -A2 -B2

echo -e "\n=== Checking iavf_get_ptp_cap implementation ==="
ast-grep --pattern 'iavf_get_ptp_cap($$$) {
  $$$
}'

echo -e "\n=== Verify if ptp_caps is initialized in iavf_init_vf ==="
rg -n "iavf_init_vf" --type c -A50 | rg "ptp_caps"

Length of output: 1130



---

🏁 Script executed:

```shell
#!/bin/bash
# Search for iavf_get_ptp_cap function definition and calls

echo "=== Search for iavf_get_ptp_cap function definition ==="
rg -n "iavf_get_ptp_cap\s*\(" --type c -B2 -A10

echo -e "\n=== Search for calls to iavf_get_ptp_cap ==="
rg -n "iavf_get_ptp_cap" --type c

echo -e "\n=== Check iavf_dev_start function around line 1023 ==="
rg -n "iavf_dev_start" --type c -A100 | head -150

echo -e "\n=== Search for where virtchnl_ptp_caps response handler is ==="
rg -n "virtchnl_ptp_caps" --type c -B5 -A5

Length of output: 20317


🏁 Script executed:

#!/bin/bash
# This script does not require repository access
# Searching for API docs on DPDK dev_info_get lifecycle
echo "Documentation check would be done via web search"

Length of output: 48



---

🏁 Script executed:

```shell
#!/bin/bash
# Find where vf structure is allocated and ptp_caps initialized

echo "=== Search for vf structure allocation ==="
rg -n "vf_res\s*=" --type c -B5 -A5 | head -80

echo -e "\n=== Search for ptp_caps field declaration in iavf_info struct ==="
rg -n "ptp_caps" --type c -B2 -A2 | grep -E "(struct|uint|int)" | head -20

echo -e "\n=== Search for iavf_info structure definition ==="
ast-grep --pattern 'struct iavf_info {
  $$$
}'

echo -e "\n=== Check if ptp_caps has memset or initialization ==="
rg -n "memset.*vf\|sizeof.*iavf_info" --type c -B2 -A2 | head -40

Length of output: 6165


🏁 Script executed:

#!/bin/bash
# Search for where adapter/vf is initialized on device probe

echo "=== Search for iavf_dev_init or probe function ==="
rg -n "iavf_dev_init\|iavf_pci_probe" --type c -A30 | head -100

echo -e "\n=== Check iavf_init_vf for ptp_caps initialization ==="
rg -n "iavf_init_vf" --type c -A100 | head -150

Length of output: 10489



---

🏁 Script executed:

```shell
#!/bin/bash
# Find iavf_info struct definition

echo "=== Search for iavf_info struct definition ==="
rg -n "struct iavf_info\s*\{" --type c -A100 | head -200

echo -e "\n=== Search specifically for ptp_caps field in iavf_info ==="
rg -n "} iavf_info\|} *iavf_info" --type c -B80 | grep -A80 "struct iavf_info"

echo -e "\n=== Check if there's any memset or reset of ptp_caps ==="
rg -n "ptp_caps\s*=" --type c

Length of output: 6423


🏁 Script executed:

#!/bin/bash
# Verify standard DPDK practice: when dev_info_get can be called

echo "=== Check how many functions can call iavf_dev_info_get ==="
rg -n "dev_infos_get\|rte_eth_dev_info_get" --type c | head -20

echo -e "\n=== Look for any early exit conditions in iavf_dev_info_get ==="
rg -n "iavf_dev_info_get" --type c -A50 | head -80

Length of output: 7531



---

🌐 Web query:

```web
DPDK rte_eth_dev_info_get when can it be called before dev_start

💡 Result:

Yes — rte_eth_dev_info_get may be called before rte_eth_dev_start, as long as the port/device has been created/attached and initialized by the ethdev layer (i.e., after probe/attach or after rte_eth_dev_configure). It is commonly used in probe/NEW-event callbacks and to query a PMD’s capabilities before starting the device. See DPDK ethdev API docs and the dev mailing discussion confirming callers may invoke rte_eth_dev_info_get in probing/NEW callbacks. [1][2]

Sources:
[1] DPDK dev mailing-list patch discussion (calls of rte_eth_dev_info_get in NEW/probe callbacks).
[2] DPDK ethdev API reference (rte_ethdev header / ethdev usage notes).


Capability will not be advertised if rte_eth_dev_info_get() is called before device start.

The review concern is valid. Verification confirms that vf->ptp_caps remains zero before iavf_dev_start() executes, since it is only populated by iavf_get_ptp_cap() during device start. However, DPDK applications commonly call rte_eth_dev_info_get() before starting the device (per DPDK API semantics). When this occurs, the bitwise AND at line 1181 evaluates to zero, preventing RTE_ETH_RX_OFFLOAD_TIMESTAMP from being advertised even when the hardware supports it.

Required: Either move PTP capability discovery earlier (during device probe/initialization) or add a check to only report the capability after successful device start.

🤖 Prompt for AI Agents
In drivers/net/intel/iavf/iavf_ethdev.c around lines 1180-1182, currently
RTE_ETH_RX_OFFLOAD_TIMESTAMP is advertised based on vf->ptp_caps which is only
populated during iavf_dev_start(), so rte_eth_dev_info_get() called before
device start will not report PTP capability; fix by either invoking the PTP
capability discovery earlier (call iavf_get_ptp_cap() during device
probe/initialization so vf->ptp_caps is populated before rte_eth_dev_info_get()
can be called) or by guarding the capability advertisement here to only set
rx_offload_capa when the device is started (check a device-started flag/state
before ORing RTE_ETH_RX_OFFLOAD_TIMESTAMP) and ensure the chosen approach
updates initialization order and relevant state checks consistently.

Comment on lines 1625 to 1633
value = *(const uint8_t *)item.spec << 24;
value = rte_cpu_to_be_32(value);
item.spec = &value;
} else if (conf->dst.field == RTE_FLOW_FIELD_IPV6_DSCP &&
!(mask[0] & MLX5_IPV6_HDR_ECN_MASK) &&
mlx5_dv_modify_ipv6_traffic_class_supported(dev->data->dev_private)) {
value = *(const unaligned_uint32_t *)item.spec << MLX5_IPV6_HDR_DSCP_SHIFT;
item.spec = &value;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

IPv6 DSCP immediate handling: confirm endianness and shift semantics

The new IPv6 DSCP path reuses item.spec as a 32‑bit value and left‑shifts by MLX5_IPV6_HDR_DSCP_SHIFT, but unlike the META/TAG/VXLAN branches it does not apply rte_cpu_to_be_32 to value before passing it into flow_dv_convert_modify_action. Given that the runtime path (flow_hw_modify_field_construct) later does an explicit DSCP shift and then rte_cpu_to_be_32, please double‑check that this compile‑time path matches the HW expectation (host vs network order and bit position) and does not end up double‑shifting or using the wrong byte order for DSCP. If necessary, mirror the META/TAG pattern (shift then convert to BE) for consistency.


🏁 Script executed:

#!/bin/bash
# 1. Extract context around lines 1625-1633 (IPv6 DSCP block)
echo "=== IPv6 DSCP block (lines 1625-1633) ==="
sed -n '1620,1640p' drivers/net/mlx5/mlx5_flow_hw.c

echo ""
echo "=== Find META/TAG/VXLAN value handling for comparison ==="
# Look for patterns with rte_cpu_to_be_32 in modify field handling
rg -n -B2 -A2 'RTE_FLOW_FIELD_META|RTE_FLOW_FIELD_TAG|RTE_FLOW_FIELD_VXLAN' drivers/net/mlx5/mlx5_flow_hw.c | head -60

echo ""
echo "=== Find flow_dv_convert_modify_action calls context ==="
rg -n -C3 'flow_dv_convert_modify_action' drivers/net/mlx5/mlx5_flow_hw.c | head -80

echo ""
echo "=== Find flow_hw_modify_field_construct for runtime DSCP ==="
rg -n -C5 'flow_hw_modify_field_construct' drivers/net/mlx5/mlx5_flow_hw.c | head -100

Length of output: 511


🏁 Script executed:

#!/bin/bash
# 1. Extract context around lines 1625-1633 (IPv6 DSCP block)
echo "=== IPv6 DSCP block (lines 1620-1640) ==="
sed -n '1620,1640p' drivers/net/mlx5/mlx5_flow_hw.c

echo ""
echo "=== Find RTE_FLOW_FIELD_IPV6_DSCP context ==="
rg -n -B5 -A5 'RTE_FLOW_FIELD_IPV6_DSCP' drivers/net/mlx5/mlx5_flow_hw.c

echo ""
echo "=== Find META/TAG/VXLAN branches with rte_cpu_to_be_32 ==="
rg -n -B3 -A3 'RTE_FLOW_FIELD_META|RTE_FLOW_FIELD_TAG' drivers/net/mlx5/mlx5_flow_hw.c | head -80

echo ""
echo "=== MLX5_IPV6_HDR_DSCP_SHIFT definition ==="
rg -n 'MLX5_IPV6_HDR_DSCP_SHIFT' drivers/net/mlx5/ 

echo ""
echo "=== flow_dv_convert_modify_action function signature ==="
rg -n -A20 'static.*flow_dv_convert_modify_action' drivers/net/mlx5/mlx5_flow_hw.c

Length of output: 6554


🏁 Script executed:

#!/bin/bash
echo "=== flow_dv_convert_modify_action function definition ==="
rg -n -A30 'flow_dv_convert_modify_action\(' drivers/net/mlx5/mlx5_flow_hw.c | head -100

echo ""
echo "=== All flow_dv_convert_modify_action call sites with context ==="
rg -n -B2 -A5 'flow_dv_convert_modify_action' drivers/net/mlx5/mlx5_flow_hw.c | grep -v '^--$'

echo ""
echo "=== Check mlx5_flow_dv.c for flow_dv_convert_modify_action implementation ==="
rg -n -A30 'flow_dv_convert_modify_action\(' drivers/net/mlx5/mlx5_flow_dv.c | head -100

echo ""
echo "=== Compare IPv6 DSCP vs VXLAN handling in broader context ==="
sed -n '1600,1635p' drivers/net/mlx5/mlx5_flow_hw.c

Length of output: 7009


Missing big-endian conversion in IPv6 DSCP compile-time path

The review comment is accurate. The IPv6 DSCP code at lines 1631–1632 lacks the rte_cpu_to_be_32() conversion that is required by flow_dv_convert_modify_action.

The function explicitly documents: "The item and mask are provided in big-endian format. The fields should be presented as in big-endian format either." All other immediate-value field types (TAG, VXLAN_LAST_RSVD, GTP_PSC_QFI, META) apply this conversion before passing to the function. The IPv6 DSCP path must do the same:

value = *(const unaligned_uint32_t *)item.spec << MLX5_IPV6_HDR_DSCP_SHIFT;
value = rte_cpu_to_be_32(value);  // <- Add this line
item.spec = &value;
🤖 Prompt for AI Agents
In drivers/net/mlx5/mlx5_flow_hw.c around lines 1625 to 1633, the IPv6 DSCP
compile-time path shifts the immediate value but omits converting it to
big-endian before passing to flow_dv_convert_modify_action; add a call to
rte_cpu_to_be_32() on the shifted value (same as other immediate-value paths)
and then assign item.spec = &value so the item is presented in big-endian format
as required.

Comment on lines +2339 to 2343
int ret = flow_hw_validate_compiled_modify_field(dev, cfg, mhdr, error);
if (ret) {
__flow_hw_action_template_destroy(dev, acts);
return -rte_errno;
return ret;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Avoid double‑destroying acts on modify‑header validation failure

mlx5_tbl_translate_modify_header() now calls __flow_hw_action_template_destroy(dev, acts) on error, but its caller __flow_hw_translate_actions_template() still unconditionally calls __flow_hw_action_template_destroy(dev, acts) in the err: path.

Because __flow_hw_action_template_destroy() decrements mark refcounts, releases shared counters/meters, and frees acts->mhdr/encap_decap/push_remove, calling it twice on the same acts will underflow refcounts and/or free resources twice.

Consider changing this to let the caller own cleanup, e.g.:

-    int ret = flow_hw_validate_compiled_modify_field(dev, cfg, mhdr, error);
-    if (ret) {
-        __flow_hw_action_template_destroy(dev, acts);
-        return ret;
-    }
+    int ret = flow_hw_validate_compiled_modify_field(dev, cfg, mhdr, error);
+    if (ret)
+        return ret;

and keep the single destruction in __flow_hw_translate_actions_template()’s err: block.

Also applies to: 2978-2983

🤖 Prompt for AI Agents
In drivers/net/mlx5/mlx5_flow_hw.c around lines 2339-2343 (and similarly
2978-2983), the helper mlx5_tbl_translate_modify_header() now destroys acts on
error, but the caller __flow_hw_translate_actions_template() also
unconditionally destroys acts in its err: path, causing double-destruction.
Remove the inner call to __flow_hw_action_template_destroy(dev, acts) from
mlx5_tbl_translate_modify_header() (and any other helper that currently destroys
acts on failure), instead return an error and let the caller perform a single
__flow_hw_action_template_destroy(dev, acts) in its err: cleanup path so acts is
destroyed exactly once and refcounts/resources are not underflowed or
double-freed.

Comment on lines +528 to +562
static bool
validate_sample_terminal_actions(const struct rte_eth_dev *dev,
const struct rte_flow_attr *flow_attr,
const struct rte_flow_action *actions)
{
uint32_t i;
const struct mlx5_priv *priv = dev->data->dev_private;
const struct rte_flow_action_ethdev *port = NULL;
bool is_proxy = MLX5_HW_PORT_IS_PROXY(priv);
const struct rte_flow_action *a = NULL;

for (i = 0; actions[i].type != RTE_FLOW_ACTION_TYPE_END; i++) {
if (actions[i].type != RTE_FLOW_ACTION_TYPE_VOID)
a = &actions[i];
}
if (a == NULL)
return false;
switch (a->type) {
case RTE_FLOW_ACTION_TYPE_JUMP:
case RTE_FLOW_ACTION_TYPE_QUEUE:
case RTE_FLOW_ACTION_TYPE_DROP:
case RTE_FLOW_ACTION_TYPE_REPRESENTED_PORT:
return true;
case RTE_FLOW_ACTION_TYPE_PORT_REPRESENTOR:
if (!is_proxy || !flow_attr->transfer)
return false;
port = a->conf;
if (!port || port->port_id != MLX5_REPRESENTED_PORT_ESW_MGR)
return false;
return true;
default:
break;
}
return false;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Wrong parameter passed to validation function.

The validate_sample_terminal_actions function expects an array of actions and iterates through them (line 539), but at line 872 it's called with sample, which points to a single RTE_FLOW_ACTION_TYPE_SAMPLE action, not the nested actions array.

According to lines 877-878, the actual nested actions are in sample_conf->actions. The validation should be performed on those nested actions instead.

Apply this diff to fix the issue:

-	if (!validate_sample_terminal_actions(dev, attr, sample)) {
+	sample_conf = (const struct rte_flow_action_sample *)sample->conf;
+	sample_actions = (struct rte_flow_action *)(uintptr_t)sample_conf->actions;
+	if (!validate_sample_terminal_actions(dev, attr, sample_actions)) {
 		rte_flow_error_set(error, EINVAL, RTE_FLOW_ERROR_TYPE_ACTION,
 				   NULL, "Invalid sample actions");
 		return NULL;
 	}
-	sample_conf = (const struct rte_flow_action_sample *)sample->conf;
-	sample_actions = (struct rte_flow_action *)(uintptr_t)sample_conf->actions;

Also applies to: 872-876

🤖 Prompt for AI Agents
In drivers/net/mlx5/mlx5_nta_sample.c around lines 528-562 and also at the call
site around lines 872-876, the validator validate_sample_terminal_actions is
being passed the single SAMPLE action object (sample) instead of the nested
actions array; update the call to pass sample_conf->actions (the nested actions
array) so the function iterates the proper action list for validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.