[PWCI] "net/idpf: enable AVX2 for split queue Rx/Tx" #254

ovsrobot · 2025-10-28T06:53:18Z

Auto-submission for "http://patchwork.dpdk.org/project/dpdk/list/?series=36480"

Summary by Sourcery

Enable AVX2-accelerated split queue Rx/Tx in the Intel idpf PMD by adding new vectorized receive and transmit routines, consolidating rearm logic, and registering the new paths in the driver

New Features:

Add AVX2-based split queue receive function for idpf driver
Add AVX2-based split queue transmit function and supporting routines
Enable driver selection of Split AVX2 Vector Rx/Tx based on SIMD width

Enhancements:

Extract common split queue rearm logic into idpf_splitq_rearm_common
Refactor single queue AVX2 transmit burst function variable declaration

Summary by CodeRabbit

Documentation
- Clarified split queue mode completion queue configuration constraints.
New Features
- Enabled AVX2 vector optimizations for split-queue network packet reception and transmission on compatible systems.

In case some CPUs don't support AVX512. Enable AVX2 for them to get better per-core performance. In the single queue model, the same descriptor queue is used by SW to post descriptors to the device and used by device to report completed descriptors to SW. While as the split queue model separates them into different queues for parallel processing and improved performance. Signed-off-by: Shaiq Wani <shaiq.wani@intel.com> Signed-off-by: 0-day Robot <robot@bytheb.org>

Added a note in the IDPF Poll Mode Driver documentation to clarify that sharing a completion queue among multiple TX queues serviced by different CPU cores is not supported in split queue mode. Signed-off-by: Shaiq Wani <shaiq.wani@intel.com> Signed-off-by: 0-day Robot <robot@bytheb.org>

sourcery-ai · 2025-10-28T06:53:23Z

Reviewer's Guide

This PR adds AVX2-accelerated split-queue Rx and Tx paths in the Intel idpf driver by implementing vectorized packet processing routines, extracting common rearm logic, and updating device registration and header declarations accordingly.

Class diagram for new and updated split queue AVX2 Rx/Tx functions

classDiagram
    class idpf_rx_queue {
        +adapter
        +bufq2
        +rx_tail
        +expected_gen_id
        +nb_rx_desc
        +mbuf_initializer
        +rx_ring
        +rxrearm_nb
        +rxrearm_start
        +mp
        +sw_ring
        +fake_mbuf
        +rx_stats
        +qrx_tail
    }
    class ci_tx_queue {
        +tx_tail
        +nb_tx_desc
        +expected_gen_id
        +compl_ring
        +txqs
        +tx_start_qid
        +rs_compl_count
        +tx_free_thresh
        +tx_rs_thresh
        +desc_ring
        +sw_ring_vec
        +nb_tx_free
        +qtx_tail
        +complq
    }
    class idpf_flex_tx_sched_desc {
    }
    class idpf_splitq_tx_compl_desc {
    }
    class rte_mbuf {
        +data_len
        +buf_iova
        +data_off
        +rearm_data
    }
    idpf_rx_queue --> ci_tx_queue : uses in Tx
    ci_tx_queue --> idpf_flex_tx_sched_desc : uses for Tx desc
    ci_tx_queue --> idpf_splitq_tx_compl_desc : uses for completion
    idpf_rx_queue --> rte_mbuf : uses for Rx
    ci_tx_queue --> rte_mbuf : uses for Tx

    class idpf_common_rxtx_avx2 {
        +idpf_dp_splitq_recv_pkts_avx2()
        +idpf_splitq_rearm_common()
        +idpf_splitq_scan_cq_ring()
        +idpf_splitq_vtx1_avx2()
        +idpf_splitq_vtx_avx2()
        +idpf_splitq_xmit_fixed_burst_vec_avx2()
        +idpf_dp_splitq_xmit_pkts_avx2()
    }
    idpf_common_rxtx_avx2 --> idpf_rx_queue
    idpf_common_rxtx_avx2 --> ci_tx_queue
    idpf_common_rxtx_avx2 --> idpf_flex_tx_sched_desc
    idpf_common_rxtx_avx2 --> idpf_splitq_tx_compl_desc
    idpf_common_rxtx_avx2 --> rte_mbuf

Flow diagram for AVX2 split queue Rx packet processing

flowchart TD
    Start(["Start Rx Burst"])
    RearmCheck["Check if rearm needed"]
    RearmCall["Call idpf_splitq_rearm_common if needed"]
    HeadGenCheck["Check head generation"]
    Loop["For each batch of 4 descriptors"]
    DDCheck["Check DD bits"]
    CopyMbuf["Copy mbuf pointers"]
    BuildMbuf["Build mbuf rearm data"]
    UpdateTail["Update rx_tail and expected_gen_id"]
    ReturnPkts["Return received packets"]
    Start --> RearmCheck
    RearmCheck -->|Yes| RearmCall
    RearmCheck -->|No| HeadGenCheck
    RearmCall --> HeadGenCheck
    HeadGenCheck -->|OK| Loop
    HeadGenCheck -->|Fail| ReturnPkts
    Loop --> DDCheck
    DDCheck -->|All set| CopyMbuf
    DDCheck -->|Not all| UpdateTail
    CopyMbuf --> BuildMbuf
    BuildMbuf --> Loop
    Loop --> UpdateTail
    UpdateTail --> ReturnPkts

Flow diagram for AVX2 split queue Tx packet processing

flowchart TD
    Start(["Start Tx Burst"])
    ScanCQ["Scan completion ring"]
    FreeBufs["Free buffers if rs_compl_count > tx_free_thresh"]
    Loop["While nb_pkts > 0"]
    FixedBurst["Call idpf_splitq_xmit_fixed_burst_vec_avx2"]
    UpdateTail["Update tx_tail"]
    WriteTail["Write to NIC tail register"]
    ReturnTx["Return nb_tx"]
    Start --> ScanCQ
    ScanCQ --> FreeBufs
    FreeBufs --> Loop
    Loop --> FixedBurst
    FixedBurst --> UpdateTail
    UpdateTail --> WriteTail
    WriteTail --> Loop
    Loop -->|Done| ReturnTx

File-Level Changes

Change	Details	Files
Implement split-queue Rx using AVX2 intrinsics	Added idpf_dp_splitq_recv_pkts_avx2 with AVX2-based descriptor processing loop Defined 256-bit shuffle, length mask, and ptype mask vectors for mbuf rearm_data assembly Integrated generation-bit checks, tail pointer updates, and rearm threshold logic Exported the new Rx function and registered it in rx_path_infos for SIMD width 256	`drivers/net/intel/idpf/idpf_common_rxtx_avx2.c`
Implement split-queue Tx using AVX2 intrinsics	Introduced idpf_splitq_scan_cq_ring for scanning completion queue with generation toggling Added idpf_splitq_vtx1_avx2 and idpf_splitq_vtx_avx2 for vectorized descriptor builds Created idpf_splitq_xmit_fixed_burst_vec_avx2 and idpf_dp_splitq_xmit_pkts_avx2 for burst transmission Exported the new Tx functions for internal symbol visibility	`drivers/net/intel/idpf/idpf_common_rxtx_avx2.c`
Extract and unify split-queue rearm logic	Moved idpf_splitq_rearm_common implementation into idpf_common_rxtx.c Removed duplicate inline rearm in idpf_common_rxtx_avx512.c Added EXPORT and header declaration for idpf_splitq_rearm_common	`drivers/net/intel/idpf/idpf_common_rxtx.c` `drivers/net/intel/idpf/idpf_common_rxtx_avx512.c` `drivers/net/intel/idpf/idpf_common_rxtx.h`
Update device setup and header enums for AVX2 support	Extended idpf_set_tx_function to select split-queue AVX2 vector Tx path for SIMD width 256 Added IDPF_RX_AVX2 enum and prototype exports in idpf_common_device.h and idpf_common_rxtx.h Registered the new AVX2 Rx function in the device RX path info	`drivers/net/intel/idpf/idpf_rxtx.c` `drivers/net/intel/idpf/idpf_common_device.h` `drivers/net/intel/idpf/idpf_common_rxtx.h`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

coderabbitai · 2025-10-28T06:53:54Z

Walkthrough

The changes add AVX2 support for split-queue RX/TX data paths in the Intel IDPF driver. A common rearm function is extracted as a public symbol for shared use. The RX/TX path selection logic is updated to recognize AVX2-optimized paths. Documentation notes a split-queue completion queue sharing limitation.

Changes

Cohort / File(s)	Summary
Documentation `doc/guides/nics/idpf.rst`	Added runtime configuration note stating that completion queue sharing among multiple TX queues serviced by different CPU cores is unsupported in split queue mode.
Enumeration and declarations `drivers/net/intel/idpf/idpf_common_device.h`, `drivers/net/intel/idpf/idpf_common_rxtx.h`	Added `IDPF_RX_AVX2` enum member to `idpf_rx_func_type`. Declared new public functions: `idpf_splitq_rearm_common()`, `idpf_dp_splitq_recv_pkts_avx2()`, and `idpf_dp_splitq_xmit_pkts_avx2()` for AVX2 split-queue support.
Common RX/TX implementation `drivers/net/intel/idpf/idpf_common_rxtx.c`	Implemented `idpf_splitq_rearm_common()` for bulk MBUF allocation and RX descriptor initialization. Added new AVX2 split RX path entry to `idpf_rx_path_infos` array with 256-bit SIMD metadata.
AVX2 split-queue paths `drivers/net/intel/idpf/idpf_common_rxtx_avx2.c`	Implemented `idpf_dp_splitq_recv_pkts_avx2()` for vectorized split-queue RX packet reception with head-gen checks and descriptor prefetching. Implemented `idpf_dp_splitq_xmit_pkts_avx2()` for vectorized split-queue TX transmission. Added static helpers: `idpf_splitq_scan_cq_ring()`, `idpf_splitq_vtx1_avx2()`, `idpf_splitq_vtx_avx2()`, and `idpf_splitq_xmit_fixed_burst_vec_avx2()`.
AVX512 cleanup `drivers/net/intel/idpf/idpf_common_rxtx_avx512.c`	Removed static `idpf_splitq_rearm_common()` function, consolidating rearm logic into the shared public implementation.
Path selection logic `drivers/net/intel/idpf/idpf_rxtx.c`	Extended `idpf_set_tx_function()` to recognize AVX2 SIMD width (256-bit) and assign split-queue AVX2 transmit path, inserting AVX2 handling between AVX512 and scalar fallback paths.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant Driver as IDPF Driver
    participant PathSel as Path Selection<br/>(idpf_set_tx_function)
    participant AVX512Path as AVX512 Path
    participant AVX2Path as AVX2 Path
    participant ScalarPath as Scalar Path

    App->>Driver: Initialize TX
    Driver->>PathSel: Select TX path
    
    alt SIMD width = 512-bit
        PathSel->>AVX512Path: Use AVX512
        AVX512Path->>App: TX burst function
    else SIMD width = 256-bit
        PathSel->>AVX2Path: Use AVX2 (new)
        AVX2Path->>App: TX burst function
    else
        PathSel->>ScalarPath: Use Scalar
        ScalarPath->>App: TX burst function
    end

sequenceDiagram
    participant RXQ as RX Queue
    participant Rearm as Rearm<br/>(idpf_splitq_rearm_common)
    participant Burst as RX Burst<br/>(idpf_dp_splitq_recv_pkts_avx2)
    participant Desc as RX Descriptors
    participant MBUF as MBUF Pool

    Burst->>RXQ: Scan completion queue
    Burst->>Desc: Read 4 descriptors/iteration
    Burst->>Desc: Check head-gen state
    
    loop Process descriptors
        Burst->>Desc: Extract packet info
        Burst->>RXQ: Update tail & GEN
    end

    RXQ->>Rearm: Trigger rearm when needed
    Rearm->>MBUF: Allocate MBUFs in bulk
    Rearm->>Desc: Initialize RX descriptors
    Rearm->>RXQ: Update tail pointer

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

AVX2 vectorized RX/TX implementation (idpf_common_rxtx_avx2.c): Dense logic with descriptor processing loops, GEN state management, prefetching, and helper functions—requires careful verification of SIMD semantics and descriptor ring handling.
Code relocation and deduplication (idpf_common_rxtx_avx512.c and idpf_common_rxtx.c): Extracted idpf_splitq_rearm_common() from AVX512 to shared common code—verify functional equivalence and that both AVX2 and AVX512 paths use it correctly.
Path selection precedence (idpf_rxtx.c): AVX512 → AVX2 → Scalar ordering must be correct to avoid incorrect path selection.
Integration across multiple files: Changes span headers, implementations, and selection logic; cross-file consistency requires attention.

Poem

🐰 AVX2 wings sprout on split queues bright,
Rearm extracted, shared in morning light,
Vectorized bursts now dance 256-bit wide,
Old AVX512 twins, side by side,
Fast frames fly where the rabbit once tied! 🚀

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 7.14% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "[PWCI] 'net/idpf: enable AVX2 for split queue Rx/Tx'" directly and accurately describes the primary objective of the changeset. The changes consistently add AVX2-based implementations for split-queue receive and transmit paths, including new enum members, internal functions, and path selection logic to support AVX2 acceleration. The title is concise, specific, and clearly conveys the main enhancement being introduced, which allows developers scanning the history to immediately understand the purpose of the changeset.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch series_36480

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

The header declaration of idpf_splitq_rearm_common is missing the __rte_internal attribute – please add it to keep internal API visibility consistent.
There's significant duplication between the AVX2 TX vector code for single-queue and split-queue paths; consider extracting shared loops or helpers to reduce maintenance overhead.
The wrap-around and generation-bit logic in idpf_dp_splitq_recv_pkts_avx2 assumes nb_rx_desc is a power-of-two; please add an assert or explanatory comment to make this requirement explicit.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The header declaration of idpf_splitq_rearm_common is missing the __rte_internal attribute – please add it to keep internal API visibility consistent.
- There's significant duplication between the AVX2 TX vector code for single-queue and split-queue paths; consider extracting shared loops or helpers to reduce maintenance overhead.
- The wrap-around and generation-bit logic in idpf_dp_splitq_recv_pkts_avx2 assumes nb_rx_desc is a power-of-two; please add an assert or explanatory comment to make this requirement explicit.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

doc/guides/nics/idpf.rst (1)

82-86: Clarify the CQ-sharing limitation with guidance.

Suggest adding why it’s unsupported (completion ownership/coherency) and what to do instead (dedicated CQ per TX queue or pin queues sharing a CQ to the same lcore) to reduce operator confusion.

drivers/net/intel/idpf/idpf_common_rxtx.h (2)

59-61: Macro prefix typo: IDPD → IDPF.

Rename IDPD_TXQ_SCAN_CQ_THRESH to IDPF_TXQ_SCAN_CQ_THRESH for consistency with the codebase.

32-38: Duplicate constant definition.

IDPF_TX_MAX_MTU_SEG is defined twice; remove one to avoid drift.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 39b54f2 and 1a68515.

📒 Files selected for processing (7)

doc/guides/nics/idpf.rst (1 hunks)
drivers/net/intel/idpf/idpf_common_device.h (1 hunks)
drivers/net/intel/idpf/idpf_common_rxtx.c (2 hunks)
drivers/net/intel/idpf/idpf_common_rxtx.h (2 hunks)
drivers/net/intel/idpf/idpf_common_rxtx_avx2.c (4 hunks)
drivers/net/intel/idpf/idpf_common_rxtx_avx512.c (0 hunks)
drivers/net/intel/idpf/idpf_rxtx.c (1 hunks)

💤 Files with no reviewable changes (1)

drivers/net/intel/idpf/idpf_common_rxtx_avx512.c

🧰 Additional context used

🧬 Code graph analysis (4)

drivers/net/intel/idpf/idpf_common_rxtx.h (2)

drivers/net/intel/idpf/idpf_common_rxtx_avx2.c (5)

void (12-82)

void (765-802)

void (804-816)

idpf_dp_splitq_recv_pkts_avx2 (485-602)

idpf_dp_splitq_xmit_pkts_avx2 (910-935)

drivers/net/intel/idpf/idpf_common_rxtx_avx512.c (14)

void (14-124)

void (126-240)

void (543-648)

void (946-958)

void (962-1012)

void (1114-1145)

void (1149-1161)

void (1163-1213)

uint16_t (243-529)

uint16_t (650-934)

uint16_t (1014-1082)

uint16_t (1084-1104)

uint16_t (1215-1277)

uint16_t (1279-1306)

drivers/net/intel/idpf/idpf_common_rxtx.c (2)

drivers/net/intel/idpf/idpf_common_rxtx_avx2.c (4)

void (12-82)

void (765-802)

void (804-816)

idpf_dp_splitq_recv_pkts_avx2 (485-602)

drivers/net/intel/idpf/idpf_common_rxtx_avx512.c (8)

void (14-124)

void (126-240)

void (543-648)

void (946-958)

void (962-1012)

void (1114-1145)

void (1149-1161)

void (1163-1213)

drivers/net/intel/idpf/idpf_rxtx.c (1)

drivers/net/intel/idpf/idpf_common_rxtx_avx2.c (1)

idpf_dp_splitq_xmit_pkts_avx2 (910-935)

drivers/net/intel/idpf/idpf_common_rxtx_avx2.c (2)

drivers/net/intel/idpf/idpf_common_rxtx_avx512.c (8)

void (14-124)

void (126-240)

void (543-648)

void (946-958)

void (962-1012)

void (1114-1145)

void (1149-1161)

void (1163-1213)

drivers/net/intel/idpf/idpf_rxtx_vec_common.h (1)

idpf_tx_desc_done (34-44)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Sourcery review

🔇 Additional comments (5)

drivers/net/intel/idpf/idpf_common_device.h (1)

68-77: Enum addition looks correct; confirm table alignment.

Adding IDPF_RX_AVX2 here matches the new entry in idpf_rx_path_infos. Please verify array indexes remain aligned with enum order and that unknown/default paths still resolve to IDPF_RX_DEFAULT.

drivers/net/intel/idpf/idpf_common_rxtx.h (1)

206-207: New internal APIs are fine.

Signatures for common rearm and AVX2 split Rx/Tx are consistent with call sites.

Also applies to: 257-259, 267-269

drivers/net/intel/idpf/idpf_rxtx.c (1)

853-860: Split AVX2 TX selection LGTM.

Correct path selection and early return; prep function set.

Please confirm idpf_dp_splitq_xmit_pkts_avx2 is compiled in the same build configs guarded by RTE_ARCH_X86.

drivers/net/intel/idpf/idpf_common_rxtx.c (2)

253-304: Common split-queue rearm path looks sound.

Bulk allocation, descriptor init, tail update follow existing patterns; fallback matches prior logic.

1739-1745: New RX path table entry consistent with enum and features.

“Split AVX2 Vector” entry uses SIMD_256 and correct burst function.

coderabbitai · 2025-10-28T06:59:58Z

drivers/net/intel/idpf/idpf_common_rxtx_avx2.c

+uint16_t
+idpf_dp_splitq_recv_pkts_avx2(void *rxq, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
+{
+	struct idpf_rx_queue *queue = (struct idpf_rx_queue *)rxq;
+	const uint32_t *ptype_tbl = queue->adapter->ptype_tbl;
+	struct rte_mbuf **sw_ring = &queue->bufq2->sw_ring[queue->rx_tail];
+	volatile union virtchnl2_rx_desc *rxdp =
+		(volatile union virtchnl2_rx_desc *)queue->rx_ring + queue->rx_tail;
+	const __m256i mbuf_init = _mm256_set_epi64x(0, 0, 0, queue->mbuf_initializer);
+
+	rte_prefetch0(rxdp);
+	nb_pkts = RTE_ALIGN_FLOOR(nb_pkts, 4); /* 4 desc per AVX2 iteration */
+
+	if (queue->bufq2->rxrearm_nb > IDPF_RXQ_REARM_THRESH)
+		idpf_splitq_rearm_common(queue->bufq2);
+
+	/* head gen check */
+	uint64_t head_gen = rxdp->flex_adv_nic_3_wb.pktlen_gen_bufq_id;
+	if (((head_gen >> VIRTCHNL2_RX_FLEX_DESC_ADV_GEN_S) &
+		 VIRTCHNL2_RX_FLEX_DESC_ADV_GEN_M) != queue->expected_gen_id)
+		return 0;
+
+	uint16_t received = 0;
+
+	/* Shuffle mask: picks fields from each 16-byte descriptor pair into the
+	 * layout that will be merged into mbuf->rearm_data candidates.
+	 */
+
+	const __m256i shuf = _mm256_set_epi8(
+		/* high 128 bits (desc 3 then desc 2 lanes) */
+		0xFF, 0xFF, 0xFF, 0xFF, 11, 10, 5, 4,
+		0xFF, 0xFF, 5, 4, 0xFF, 0xFF, 0xFF, 0xFF,
+		/* low 128 bits (desc 1 then desc 0 lanes) */
+		0xFF, 0xFF, 0xFF, 0xFF, 11, 10, 5, 4,
+		0xFF, 0xFF, 5, 4, 0xFF, 0xFF, 0xFF, 0xFF
+	);
+
+	/* mask that clears bits 14 and 15 of the packet length word  */
+	const __m256i len_mask = _mm256_set_epi32(
+		0xffffffff, 0xffffffff, 0xffff3fff, 0xffffffff,
+		0xffffffff, 0xffffffff, 0xffff3fff, 0xffffffff
+	);
+
+	const __m256i ptype_mask = _mm256_set1_epi16(VIRTCHNL2_RX_FLEX_DESC_PTYPE_M);
+
+	for (int i = nb_pkts; i >= IDPF_VPMD_DESCS_PER_LOOP; i -= IDPF_VPMD_DESCS_PER_LOOP) {
+		rxdp -= IDPF_VPMD_DESCS_PER_LOOP;
+
+	  /* Check DD bits */
+		bool dd0 = (rxdp[0].flex_adv_nic_3_wb.status_err0_qw1 &
+				(1U << VIRTCHNL2_RX_FLEX_DESC_ADV_STATUS0_DD_S)) != 0;
+		bool dd1 = (rxdp[1].flex_adv_nic_3_wb.status_err0_qw1 &
+				(1U << VIRTCHNL2_RX_FLEX_DESC_ADV_STATUS0_DD_S)) != 0;
+		bool dd2 = (rxdp[2].flex_adv_nic_3_wb.status_err0_qw1 &
+				(1U << VIRTCHNL2_RX_FLEX_DESC_ADV_STATUS0_DD_S)) != 0;
+		bool dd3 = (rxdp[3].flex_adv_nic_3_wb.status_err0_qw1 &
+				(1U << VIRTCHNL2_RX_FLEX_DESC_ADV_STATUS0_DD_S)) != 0;
+
+		if (!(dd0 && dd1 && dd2 && dd3))
+			break;
+
+		/* copy mbuf pointers */
+		memcpy(&rx_pkts[i - IDPF_VPMD_DESCS_PER_LOOP],
+			&sw_ring[i - IDPF_VPMD_DESCS_PER_LOOP],
+			sizeof(rx_pkts[0]) * IDPF_VPMD_DESCS_PER_LOOP);
+
+		__m128i d3 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &rxdp[3]));
+		__m128i d2 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &rxdp[2]));
+		__m128i d1 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &rxdp[1]));
+		__m128i d0 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &rxdp[0]));
+
+		__m256i d23 = _mm256_set_m128i(d3, d2);
+		__m256i d01 = _mm256_set_m128i(d1, d0);
+
+		/* mask length and shuffle to build mbuf rearm data */
+		__m256i desc01 = _mm256_and_si256(d01, len_mask);
+		__m256i desc23 = _mm256_and_si256(d23, len_mask);
+		__m256i mb01 = _mm256_shuffle_epi8(desc01, shuf);
+		__m256i mb23 = _mm256_shuffle_epi8(desc23, shuf);
+
+		/* ptype extraction */
+		__m256i pt01 = _mm256_and_si256(d01, ptype_mask);
+		__m256i pt23 = _mm256_and_si256(d23, ptype_mask);
+
+		uint16_t ptype0 = (uint16_t)_mm256_extract_epi16(pt01, 1);
+		uint16_t ptype1 = (uint16_t)_mm256_extract_epi16(pt01, 9);
+		uint16_t ptype2 = (uint16_t)_mm256_extract_epi16(pt23, 1);
+		uint16_t ptype3 = (uint16_t)_mm256_extract_epi16(pt23, 9);
+
+		mb01 = _mm256_insert_epi32(mb01, (int)ptype_tbl[ptype1], 2);
+		mb01 = _mm256_insert_epi32(mb01, (int)ptype_tbl[ptype0], 0);
+		mb23 = _mm256_insert_epi32(mb23, (int)ptype_tbl[ptype3], 2);
+		mb23 = _mm256_insert_epi32(mb23, (int)ptype_tbl[ptype2], 0);
+
+		/* build rearm data for each mbuf */
+		__m256i rearm0 = _mm256_permute2f128_si256(mbuf_init, mb01, 0x20);
+		__m256i rearm1 = _mm256_blend_epi32(mbuf_init, mb01, 0xF0);
+		__m256i rearm2 = _mm256_permute2f128_si256(mbuf_init, mb23, 0x20);
+		__m256i rearm3 = _mm256_blend_epi32(mbuf_init, mb23, 0xF0);
+
+		_mm256_storeu_si256((__m256i *)&rx_pkts[i - 4]->rearm_data, rearm0);
+		_mm256_storeu_si256((__m256i *)&rx_pkts[i - 3]->rearm_data, rearm1);
+		_mm256_storeu_si256((__m256i *)&rx_pkts[i - 2]->rearm_data, rearm2);
+		_mm256_storeu_si256((__m256i *)&rx_pkts[i - 1]->rearm_data, rearm3);
+
+		received += IDPF_VPMD_DESCS_PER_LOOP;
+	}
+
+queue->rx_tail += received;
+queue->expected_gen_id ^= ((queue->rx_tail & queue->nb_rx_desc) != 0);
+queue->rx_tail &= (queue->nb_rx_desc - 1);
+if ((queue->rx_tail & 1) == 1 && received > 1) {
+	queue->rx_tail--;
+	received--;
+}
+queue->bufq2->rxrearm_nb += received;
+return received;
+}


⚠️ Potential issue | 🔴 Critical

Fix split RX AVX2: wrong ring traversal and out-of-order rx_pkts fill.

Current loop walks rxdp backwards and fills rx_pkts from the end. On early DD break, returned count N does not populate rx_pkts[0..N-1]. Also rxdp -= 4 risks reading before rx_tail. Iterate forward from rx_tail and fill rx_pkts contiguously.

Apply this diff:

@@ - struct rte_mbuf **sw_ring = &queue->bufq2->sw_ring[queue->rx_tail]; - volatile union virtchnl2_rx_desc *rxdp = - (volatile union virtchnl2_rx_desc *)queue->rx_ring + queue->rx_tail; + uint16_t idx = queue->rx_tail; + struct rte_mbuf **sw_ring = &queue->bufq2->sw_ring[idx]; + volatile union virtchnl2_rx_desc *rxdp = + (volatile union virtchnl2_rx_desc *)queue->rx_ring + idx; @@ - for (int i = nb_pkts; i >= IDPF_VPMD_DESCS_PER_LOOP; i -= IDPF_VPMD_DESCS_PER_LOOP) { - rxdp -= IDPF_VPMD_DESCS_PER_LOOP; + for (uint16_t i = 0; i + IDPF_VPMD_DESCS_PER_LOOP <= nb_pkts; i += IDPF_VPMD_DESCS_PER_LOOP) { @@ - /* Check DD bits */ - bool dd0 = (rxdp[0].flex_adv_nic_3_wb.status_err0_qw1 & + /* Check DD bits */ + bool dd0 = (rxdp[0].flex_adv_nic_3_wb.status_err0_qw1 & (1U << VIRTCHNL2_RX_FLEX_DESC_ADV_STATUS0_DD_S)) != 0; - bool dd1 = (rxdp[1].flex_adv_nic_3_wb.status_err0_qw1 & + bool dd1 = (rxdp[1].flex_adv_nic_3_wb.status_err0_qw1 & (1U << VIRTCHNL2_RX_FLEX_DESC_ADV_STATUS0_DD_S)) != 0; - bool dd2 = (rxdp[2].flex_adv_nic_3_wb.status_err0_qw1 & + bool dd2 = (rxdp[2].flex_adv_nic_3_wb.status_err0_qw1 & (1U << VIRTCHNL2_RX_FLEX_DESC_ADV_STATUS0_DD_S)) != 0; - bool dd3 = (rxdp[3].flex_adv_nic_3_wb.status_err0_qw1 & + bool dd3 = (rxdp[3].flex_adv_nic_3_wb.status_err0_qw1 & (1U << VIRTCHNL2_RX_FLEX_DESC_ADV_STATUS0_DD_S)) != 0; if (!(dd0 && dd1 && dd2 && dd3)) break; @@ - /* copy mbuf pointers */ - memcpy(&rx_pkts[i - IDPF_VPMD_DESCS_PER_LOOP], - &sw_ring[i - IDPF_VPMD_DESCS_PER_LOOP], - sizeof(rx_pkts[0]) * IDPF_VPMD_DESCS_PER_LOOP); + /* copy mbuf pointers contiguously */ + memcpy(&rx_pkts[received], + &sw_ring[received], + sizeof(rx_pkts[0]) * IDPF_VPMD_DESCS_PER_LOOP); @@ - __m128i d3 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &rxdp[3])); - __m128i d2 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &rxdp[2])); - __m128i d1 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &rxdp[1])); - __m128i d0 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &rxdp[0])); + __m128i d0 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &rxdp[0])); + __m128i d1 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &rxdp[1])); + __m128i d2 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &rxdp[2])); + __m128i d3 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &rxdp[3])); @@ - _mm256_storeu_si256((__m256i *)&rx_pkts[i - 4]->rearm_data, rearm0); - _mm256_storeu_si256((__m256i *)&rx_pkts[i - 3]->rearm_data, rearm1); - _mm256_storeu_si256((__m256i *)&rx_pkts[i - 2]->rearm_data, rearm2); - _mm256_storeu_si256((__m256i *)&rx_pkts[i - 1]->rearm_data, rearm3); + _mm256_storeu_si256((__m256i *)&rx_pkts[received + 0]->rearm_data, rearm0); + _mm256_storeu_si256((__m256i *)&rx_pkts[received + 1]->rearm_data, rearm1); + _mm256_storeu_si256((__m256i *)&rx_pkts[received + 2]->rearm_data, rearm2); + _mm256_storeu_si256((__m256i *)&rx_pkts[received + 3]->rearm_data, rearm3); @@ - received += IDPF_VPMD_DESCS_PER_LOOP; + received += IDPF_VPMD_DESCS_PER_LOOP; + rxdp += IDPF_VPMD_DESCS_PER_LOOP; } @@ - queue->rx_tail += received; + queue->rx_tail += received;

🤖 Prompt for AI Agents

In drivers/net/intel/idpf/idpf_common_rxtx_avx2.c around lines 485-602: the AVX2 receive loop walks rxdp backwards and copies mbuf pointers into rx_pkts from the end, which causes out-of-order/holes on early DD break and can underflow the ring; change the loop to iterate forward from queue->rx_tail (increment rxdp and index), fill rx_pkts sequentially from 0 upward, perform DD checks on the upcoming 4 descriptors without decrementing rxdp, use modulo masking when advancing rxdp/index to avoid reading before rx_tail, increment received per 4 processed descriptors, and then update queue->rx_tail, expected_gen_id and bufq2->rxrearm_nb using the new forward-advanced tail; ensure memcpy, loads, and stores use the forward indices so rx_pkts[0..received-1] are populated in order.

coderabbitai · 2025-10-28T06:59:58Z

drivers/net/intel/idpf/idpf_common_rxtx_avx2.c

+static inline uint16_t
+idpf_splitq_xmit_fixed_burst_vec_avx2(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
+{
+	struct ci_tx_queue *txq = (struct ci_tx_queue *)tx_queue;
+	struct idpf_flex_tx_sched_desc *txdp;
+	struct ci_tx_entry_vec *txep;
+	uint16_t n, nb_commit;
+	uint64_t cmd_dtype = IDPF_TXD_FLEX_FLOW_CMD_EOP;
+	uint16_t tx_id = txq->tx_tail;
+
+	nb_commit = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
+	nb_pkts = nb_commit;
+	if (unlikely(nb_pkts == 0))
+		return 0;
+
+	txdp = (struct idpf_flex_tx_sched_desc *)&txq->desc_ring[tx_id];
+	txep = &txq->sw_ring_vec[tx_id];
+
+	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free - nb_pkts);
+
+	n = (uint16_t)(txq->nb_tx_desc - tx_id);
+	if (nb_commit >= n) {
+		ci_tx_backlog_entry_vec(txep, tx_pkts, n);
+
+		idpf_splitq_vtx_avx2(txdp, tx_pkts, n - 1, cmd_dtype);
+		tx_pkts += (n - 1);
+		txdp += (n - 1);
+
+		idpf_splitq_vtx1_avx2(txdp, *tx_pkts++, cmd_dtype);
+
+		nb_commit = (uint16_t)(nb_commit - n);
+		tx_id = 0;
+
+		txdp = &txq->desc_ring[tx_id];
+		txep = (void *)txq->sw_ring;
+	}
+
+	ci_tx_backlog_entry_vec(txep, tx_pkts, nb_commit);
+
+	idpf_splitq_vtx_avx2(txdp, tx_pkts, nb_commit, cmd_dtype);
+
+	tx_id = (uint16_t)(tx_id + nb_commit);
+	txq->tx_tail = tx_id;
+
+	IDPF_PCI_REG_WRITE(txq->qtx_tail, txq->tx_tail);
+
+	return nb_pkts;
+}
+


⚠️ Potential issue | 🔴 Critical

Fix wrong TX backlog pointer after wrap.

After wrap, txep must point into sw_ring_vec, not sw_ring. Current cast breaks vector backlog bookkeeping.

Apply this diff:

@@ - txdp = &txq->desc_ring[tx_id]; - txep = (void *)txq->sw_ring; + txdp = &txq->desc_ring[tx_id]; + txep = &txq->sw_ring_vec[tx_id];

🤖 Prompt for AI Agents

In drivers/net/intel/idpf/idpf_common_rxtx_avx2.c around lines 861-909 the code incorrectly resets the software backlog pointer to sw_ring after a wrap; replace that cast with a pointer into sw_ring_vec at the wrapped index (tx_id, which is 0 here) so txep points to the sw_ring_vec entry (e.g. set txep = &txq->sw_ring_vec[tx_id]) to preserve correct vector backlog bookkeeping.

Shaiqwani added 3 commits October 28, 2025 02:08

sourcery-ai bot reviewed Oct 28, 2025

View reviewed changes

coderabbitai bot reviewed Oct 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[PWCI] "net/idpf: enable AVX2 for split queue Rx/Tx" #254

[PWCI] "net/idpf: enable AVX2 for split queue Rx/Tx" #254

Uh oh!

ovsrobot commented Oct 28, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

sourcery-ai bot commented Oct 28, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai bot commented Oct 28, 2025 •

edited

Loading

Uh oh!

sourcery-ai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 28, 2025

Uh oh!

coderabbitai bot Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[PWCI] "net/idpf: enable AVX2 for split queue Rx/Tx" #254

Are you sure you want to change the base?

[PWCI] "net/idpf: enable AVX2 for split queue Rx/Tx" #254

Uh oh!

Conversation

ovsrobot commented Oct 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Summary by CodeRabbit

Uh oh!

sourcery-ai bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Class diagram for new and updated split queue AVX2 Rx/Tx functions

Flow diagram for AVX2 split queue Rx packet processing

Flow diagram for AVX2 split queue Tx packet processing

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

coderabbitai bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ovsrobot commented Oct 28, 2025 •

edited by coderabbitai bot

Loading

sourcery-ai bot commented Oct 28, 2025 •

edited

Loading

coderabbitai bot commented Oct 28, 2025 •

edited

Loading