Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong data retrieved in the last slot of the rx ring when the ring is filled for the first time. #905

Closed
ghyls opened this issue Jan 17, 2023 · 9 comments

Comments

@ghyls
Copy link

ghyls commented Jan 17, 2023

Hello,

I am observing the following issue with the i40e driver:

  • The setup:
    The sending side: sends sorted packets with a tag that goes 0, 1, 2, 3, ... one after another.
    The receiving side: runs pkt-gen. Some information is printed for each received packet (see below).
    All the packets go to a single RX queue. Each RX ring has 512 slots, although the issue does not depend on the number of slots.

  • The observation:
    When the RX ring gets filled for the first time (and only for the first time), the last packet of the ring (which has head=511 and slot->buf_idx=513) does not correspond to the packet I would expect, which would have tag = 511. See the output below:

[...]
pktid: 506, rx:2, head:506, packetsInRing:3, ring->head: 504 ring->cur: 504, ring->tail: 507, slot->buf_idx:508
pktid: 507, rx:0, head:507, packetsInRing:3, ring->head: 507 ring->cur: 507, ring->tail: 510, slot->buf_idx:509
pktid: 508, rx:1, head:508, packetsInRing:3, ring->head: 507 ring->cur: 507, ring->tail: 510, slot->buf_idx:510
pktid: 509, rx:2, head:509, packetsInRing:3, ring->head: 507 ring->cur: 507, ring->tail: 510, slot->buf_idx:511
pktid: 510, rx:0, head:510, packetsInRing:3, ring->head: 510 ring->cur: 510, ring->tail: 1, slot->buf_idx:512
pktid: 1023, rx:1, head:511, packetsInRing:3, ring->head: 510 ring->cur: 510, ring->tail: 1, slot->buf_idx:513
pktid: 512, rx:2, head:0, packetsInRing:3, ring->head: 510 ring->cur: 510, ring->tail: 1, slot->buf_idx:2
pktid: 513, rx:0, head:1, packetsInRing:1, ring->head: 1 ring->cur: 1, ring->tail: 2, slot->buf_idx:3
pktid: 514, rx:0, head:2, packetsInRing:2, ring->head: 2 ring->cur: 2, ring->tail: 4, slot->buf_idx:4
pktid: 515, rx:1, head:3, packetsInRing:2, ring->head: 2 ring->cur: 2, ring->tail: 4, slot->buf_idx:5
pktid: 516, rx:0, head:4, packetsInRing:3, ring->head: 4 ring->cur: 4, ring->tail: 7, slot->buf_idx:6
[...]

The packet with the tag pktid = 511 is never received. Instead, it seems that the information is being read out from the wrong memory address, and its contents seem to be whatever was left there from the previous time I ran the program. I don't know if this makes sense. As I said before, this happens only the first time the ring gets filled. The 2nd, 3rd... time head reaches 511, the expected packet is received. Also, this does not depend on the number of rx slots. I have observed this behaviour with i40e versions 2.22.8, 2.20.12 and 2.21.12. So far I have not tested other versions.

Any help or hints on this would be very appreciated.

Thank you,
Mario

@jhk098
Copy link
Contributor

jhk098 commented Jan 18, 2023

Are you sure the issue is not on the TX side? This would happen if you write to a buffer not in the range [head, tail[.

Also, you may try to use emulated netmap on the RX side (dev.netmap.admode=2), in such a way that you rule out (or confirm) that the issue depends on the i40 patched driver.

@ghyls
Copy link
Author

ghyls commented Jan 18, 2023

Hello,

Many thanks for the help.

I do not observe the issue when using the emulated adapter (dev.netmap.admode=2).

The TX side is just a for loop that uses pcap.h to send one packet per iteration. It is not using Netmap. Also, if on the RX side I set the ring size to e.g. 1024, the problematic packet would be the 1024th and not the 512th.

@jhk098
Copy link
Contributor

jhk098 commented Jan 18, 2023

OK so that definitely points to some bug with i40e.

Does the following patch make any difference?

diff --git a/LINUX/i40e_netmap_linux.h b/LINUX/i40e_netmap_linux.h
index 55df4cd2..74a6e757 100644
--- a/LINUX/i40e_netmap_linux.h
+++ b/LINUX/i40e_netmap_linux.h
@@ -497,6 +497,7 @@ i40e_netmap_rxsync(struct netmap_kring *kring, int flags)
 	u_int nm_i;	/* index into the netmap ring */
 	u_int nic_i;	/* index into the NIC ring */
 	u_int ntail;	/* new tail for the user */
+	u_int stop_i;
 	u_int n;
 	u_int const lim = kring->nkr_num_slots - 1;
 	u_int const head = kring->rhead;
@@ -544,11 +545,13 @@ i40e_netmap_rxsync(struct netmap_kring *kring, int flags)
 
 		nic_i = rxr->next_to_clean; // or also k2n(kring->nr_hwtail)
 		nm_i = netmap_idx_n2k(kring, nic_i);
+		stop_i = nm_prev(kring->nr_hwcur, lim);
+		BUG_ON(nm_i != kring->nr_hwtail);
 		/* we advance tail only when we see a complete packet */
 		ntail = lim + 1;
 		complete = 0;
 
-		for (n = 0; ; n++) {
+		for (n = 0; nm_i != stop_i; n++) {
 			union i40e_rx_desc *curr = I40E_RX_DESC(rxr, nic_i);
 			uint64_t qword = le64toh(curr->wb.qword1.status_error_len);
 			uint32_t staterr = (qword & I40E_RXD_QW1_STATUS_MASK)

@ghyls
Copy link
Author

ghyls commented Jan 19, 2023

Hi,

Thanks for the help. Unfortunately, it doesn't seem to make any difference.
Just for clarification, I have applied it, and then run again:

make distclean
./configure --drivers=i40e --select-version=i40e:2.22.8
make
sudo make install

Please let me know if you need any further information.
I will let you know if I make any progress.

@giuseppelettieri
Copy link
Collaborator

Is this any better?

diff --git a/LINUX/i40e_netmap_linux.h b/LINUX/i40e_netmap_linux.h
index 55df4cd25..9acdc106b 100644
--- a/LINUX/i40e_netmap_linux.h
+++ b/LINUX/i40e_netmap_linux.h
@@ -176,7 +176,7 @@ i40e_netmap_configure_rx_ring(struct i40e_ring *ring)
 	kring = na->rx_rings[ring_nr];
 	lim = na->num_rx_desc - 1 - nm_kr_rxspace(kring);
 
-	for (i = 0; i < lim; i++) {
+	for (i = 0; i <= lim; i++) {
 		int si = netmap_idx_n2k(kring, i);
 		uint64_t paddr;
 		union i40e_rx_desc *rx = I40E_RX_DESC(ring, i);

@giuseppelettieri
Copy link
Collaborator

By the way, I assume that, after sudo make install, you have also run

rmmod i40e
modprobe i40e

@ghyls
Copy link
Author

ghyls commented Jan 23, 2023

Hi,

Yes, that fixes the problem.
(I ran rmmod i40e, rmmod netmap, insmod netmap.ko and insmod i40e/i40e.ko. Sorry for not writing it before.)

Thank you very much!

Mario

@giuseppelettieri
Copy link
Collaborator

pushed to mater, thanks!

@reinmark
Copy link

reinmark commented Mar 8, 2023

Thanks for describing and fixing this issue.
We observed the same behaviour on an igb based system which works now as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants