net/mlx5: fix risk in NEON Rx descriptor read

[ upstream commit 7ac7450b20d1e11a20c138707663867918b39403 ] In NEON vector PMD, vector load loads two contiguous 8B of descriptor data into vector register. Given vector load ensures no 16B atomicity, read of the word that includes op_own field could be reordered after read of other words. In this case, some words could contain invalid data. Reloaded qword0 after read barrier to update vector register. This ensures that the fetched data is correct. Testpmd single core test on N1SDP/ThunderX2 showed no performance drop. Fixes: 1742c2d ("net/mlx5: fix synchronization on polling Rx completions") Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com> Tested-by: Ali Alnubani <alialnu@nvidia.com> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
kevintraynor · Jul 12, 2023 · 660055f · 660055f
1 parent 999e9ce
commit 660055f
Showing 1 changed file with 8 additions and 0 deletions.
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -647,6 +647,14 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
 		c0 = vld1q_u64((uint64_t *)(p0 + 48));
 		/* Synchronize for loading the rest of blocks. */
 		rte_io_rmb();
+		/* B.0 (CQE 3) reload lower half of the block. */
+		c3 = vld1q_lane_u64((uint64_t *)(p3 + 48), c3, 0);
+		/* B.0 (CQE 2) reload lower half of the block. */
+		c2 = vld1q_lane_u64((uint64_t *)(p2 + 48), c2, 0);
+		/* B.0 (CQE 1) reload lower half of the block. */
+		c1 = vld1q_lane_u64((uint64_t *)(p1 + 48), c1, 0);
+		/* B.0 (CQE 0) reload lower half of the block. */
+		c0 = vld1q_lane_u64((uint64_t *)(p0 + 48), c0, 0);
 		/* Prefetch next 4 CQEs. */
 		if (pkts_n - pos >= 2 * MLX5_VPMD_DESCS_PER_LOOP) {
 			unsigned int next = pos + MLX5_VPMD_DESCS_PER_LOOP;