Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ice: add XDP mbuf support #4424

Closed
wants to merge 14 commits into from

Conversation

kernel-patches-bot
Copy link

Pull request for series with
subject: ice: add XDP mbuf support
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=717481

Kernel Patches Daemon and others added 14 commits January 29, 2023 19:27
Rx path is going to be modified in a way that fragmented frame will be
gathered within xdp_buff in the first place. This approach implies that
underlying buffer has to provide tailroom for skb_shared_info. This is
currently the case when ring uses build_skb but not when legacy-rx knob
is turned on. This case configures 2k Rx buffers and has no way to
provide either headroom or tailroom - FWIW it currently has
XDP_PACKET_HEADROOM which is broken and in here it is removed. 2k Rx
buffers were used so driver in this setting was able to support 9k MTU
as it can chain up to 5 Rx buffers. With offset configuring HW writing
2k of a data was passing the half of the page which broke the assumption
of our internal page recycling tricks.

Now if above got fixed and legacy-rx path would be left as is, when
referring to skb_shared_info via xdp_get_shared_info_from_buff(),
packet's content would be corrupted again. Hence size of Rx buffer needs
to be lowered and therefore supported MTU. This operation will allow us
to keep the unified data path and with 8k MTU users (if any of
legacy-rx) would still be good to go. However, tendency is to drop the
support for this code path at some point.

Add ICE_RXBUF_1664 as vsi::rx_buf_len and ICE_MAX_FRAME_LEGACY_RX (8320)
as vsi::max_frame for legacy-rx. For bigger page sizes configure 3k Rx
buffers, not 2k.

Since headroom support is removed, disable data_meta support on legacy-rx.
When preparing XDP buff, rely on ice_rx_ring::rx_offset setting when
deciding whether to support data_meta or not.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
In preparation for XDP multi-buffer support, let's store xdp_buff on
Rx ring struct. This will allow us to combine fragmented frames across
separate NAPI cycles in the same way as currently skb fragments are
handled. This means that skb pointer on Rx ring will become redundant
and will be removed. For now it is kept and layout of Rx ring struct was
not inspected, some member movement will be needed later on so that will
be the time to take care of it.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
This will allow us to avoid carrying additional auxiliary array of page
counts when dealing with XDP multi buffer support. Previously combining
fragmented frame to skb was not affected in the same way as XDP would be
as whole frame is needed to be in place before executing XDP prog.
Therefore, when going through HW Rx descriptors one-by-one, calls to
ice_put_rx_buf() need to be taken *after* running XDP prog on a
potentially multi buffered frame, so some additional storage of
page count is needed.

By adding page count to rx buf, it will make it easier to walk through
processed entries at the end of rx cleaning routine and decide whether
or not buffers should be recycled.

While at it, bump ice_rx_buf::pagecnt_bias from u16 up to u32. It was
proven many times that calculations on variables smaller than standard
register size are harmful. This was also the case during experiments
with embedding page count to ice_rx_buf - when this was added as u16 it
had a performance impact.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Plan is to move ice_put_rx_buf() to the end of ice_clean_rx_irq() so
in order to keep the ability of walking through HW Rx descriptors, pull
out next_to_clean handling out of ice_put_rx_buf().

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
This might be in future used by ZC driver and might potentially yield a
minor performance boost. While at it, constify arguments that
ice_is_non_eop() takes, since they are pointers and this will help compiler
while generating asm.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Currently calls to ice_put_rx_buf() are sprinkled through
ice_clean_rx_irq() - first place is for explicit flow director's
descriptor handling, second is after running XDP prog and the last one
is after taking care of skb.

1st callsite was actually only for ntc bump purpose, as Rx buffer to be
recycled is not even passed to a function.

It is possible to walk through Rx buffers processed in particular NAPI
cycle by caching ntc from beginning of the ice_clean_rx_irq().

To do so, let us store XDP verdict inside ice_rx_buf, so action we need
to take on will be known. For XDP prog absence, just store ICE_XDP_PASS
as a verdict.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
This should have been used in there from day 1, let us address that
before introducing XDP multi-buffer support for Rx side.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Currently ice_finalize_xdp_rx() is called only when xdp_prog is present
on VSI, which is a good thing. However, this optimization can be
enhanced and check only if any of the XDP_TX/XDP_REDIRECT took place in
current Rx processing. Non-zero value of @xdp_xmit indicates that
xdp_prog is present on VSI. This way XDP_DROP-based workloads will not
suffer from unnecessary calls to external function.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
SKB path calculates truesize on three different functions, which could
be avoided as xdp_buff carries the already calculated truesize under
xdp_buff::frame_sz. If ice_add_rx_frag() is adjusted to take the
xdp_buff as an input just like functions responsible for creating
sk_buff initially, codebase could be simplified by removing these
redundant recalculations and rely on xdp_buff::frame_sz instead.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Ice driver needs to be a bit reworked on Rx data path in order to
support multi-buffer XDP. For skb path, it currently works in a way that
Rx ring carries pointer to skb so if driver didn't manage to combine
fragmented frame at current NAPI instance, it can restore the state on
next instance and keep looking for last fragment (so descriptor with EOP
bit set). What needs to be achieved is that xdp_buff needs to be
combined in such way (linear + frags part) in the first place. Then skb
will be ready to go in case of XDP_PASS or BPF program being not present
on interface. If BPF program is there, it would work on multi-buffer
XDP. At this point xdp_buff resides directly on Rx ring, so given the
fact that skb will be built straight from xdp_buff, there will be no
further need to carry skb on Rx ring.

Besides removing skb pointer from Rx ring, lots of members have been
moved around within ice_rx_ring. First and foremost reason was to place
rx_buf with xdp_buff on the same cacheline. This means that once we
touch rx_buf (which is a preceding step before touching xdp_buff),
xdp_buff will already be hot in cache. Second thing was that xdp_rxq is
used rather rarely and it occupies a separate cacheline, so maybe it is
better to have it at the end of ice_rx_ring.

Other change that affects ice_rx_ring is the introduction of
ice_rx_ring::first_desc. Its purpose is twofold - first is to propagate
rx_buf->act to all the parts of current xdp_buff after running XDP
program, so that ice_put_rx_buf() that got moved out of the main Rx
processing loop will be able to tak an appriopriate action on each
buffer. Second is for ice_construct_skb().

ice_construct_skb() has a copybreak mechanism which had an explicit
impact on xdp_buff->skb conversion in the new approach when legacy Rx
flag is toggled. It works in a way that linear part is 256 bytes long,
if frame is bigger than that, remaining bytes are going as a frag to
skb_shared_info.

This means while memcpying frags from xdp_buff to newly allocated skb,
care needs to be taken when picking the destination frag array entry.
Upon the time ice_construct_skb() is called, when dealing with
fragmented frame, current rx_buf points to the *last* fragment, but
copybreak needs to be done against the first one.  That's where
ice_rx_ring::first_desc helps.

When frame building spans across NAPI polls (DD bit is not set on
current descriptor and xdp->data is not NULL) with current Rx buffer
handling state there might be some problems.
Since calls to ice_put_rx_buf() were pulled out of the main Rx
processing loop and were scoped from cached_ntc to current ntc, remember
that now mentioned function relies on rx_buf->act, which is set within
ice_run_xdp(). ice_run_xdp() is called when EOP bit was found, so
currently we could put Rx buffer with rx_buf->act being *uninitialized*.
To address this, change scoping to rely on first_desc on both boundaries
instead.

This also implies that cleaned_count which is used as an input to
ice_alloc_rx_buffers() and tells how many new buffers should be refilled
has to be adjusted. If it stayed as is, what could happen is a case
where ntc would go over ntu.

Therefore, remove cleaned_count altogether and use against allocing
routine newly introduced ICE_RX_DESC_UNUSED() macro which is an
equivalent of ICE_DESC_UNUSED() dedicated for Rx side and based on
struct ice_rx_ring::first_desc instead of next_to_clean.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Similarly as for Rx side in previous patch, logic on XDP Tx in ice
driver needs to be adjusted for multi-buffer support. Specifically, the
way how HW Tx descriptors are produced and cleaned.

Currently, XDP_TX works on strict ring boundaries, meaning it sets RS
bit (on producer side) / looks up DD bit (on consumer/cleaning side)
every quarter of the ring. It means that if for example multi buffer
frame would span across the ring quarter boundary (say that frame
consists of 4 frames and we start from 62 descriptor where ring is sized
to 256 entries), RS bit would be produced in the middle of multi buffer
frame, which would be a broken behavior as it needs to be set on the
last descriptor of the frame.

To make it work, set RS bit at the last descriptor from the batch of
frames that XDP_TX action was used on and make the first entry remember
the index of last descriptor with RS bit set. This way, cleaning side
can take the index of descriptor with RS bit, look up DD bit's presence
and clean from first entry to last.

In order to clean up the code base introduce the common ice_set_rs_bit()
which will return index of descriptor that got RS bit produced on so
that standard driver can store this within proper ice_tx_buf and ZC
driver can simply ignore return value.

Co-developed-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com>
Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Now that both ZC and standard XDP data paths stopped using Tx logic
based on next_dd and next_rs fields, we can safely remove these fields
and shrink Tx ring structure.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Let us store pointer to xdp_buff that came from xsk_buff_pool on tx_buf
so that it will be possible to recycle it via xsk_buff_free() on Tx
cleaning side. This way it is not necessary to do expensive copy to
another xdp_buff backed by a newly allocated page.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
@kernel-patches-bot
Copy link
Author

Upstream branch: c1a3daf
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=717481
version: 1

@kernel-patches-bot
Copy link
Author

Upstream branch: 10d1b0e
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=717481
version: 1

Pull request is NOT updated. Failed to apply https://patchwork.kernel.org/project/netdevbpf/list/?series=717481
error message:

Cmd('git') failed due to: exit code(128)
  cmdline: git am --3way
  stdout: 'Applying: ice: prepare legacy-rx for upcoming XDP multi-buffer support
Using index info to reconstruct a base tree...
M	drivers/net/ethernet/intel/ice/ice_base.c
M	drivers/net/ethernet/intel/ice/ice_lib.c
M	drivers/net/ethernet/intel/ice/ice_main.c
M	drivers/net/ethernet/intel/ice/ice_txrx.c
M	drivers/net/ethernet/intel/ice/ice_txrx.h
Falling back to patching base and 3-way merge...
Auto-merging drivers/net/ethernet/intel/ice/ice_txrx.h
Auto-merging drivers/net/ethernet/intel/ice/ice_txrx.c
CONFLICT (content): Merge conflict in drivers/net/ethernet/intel/ice/ice_txrx.c
Auto-merging drivers/net/ethernet/intel/ice/ice_main.c
CONFLICT (content): Merge conflict in drivers/net/ethernet/intel/ice/ice_main.c
Auto-merging drivers/net/ethernet/intel/ice/ice_lib.c
Auto-merging drivers/net/ethernet/intel/ice/ice_base.c
Patch failed at 0001 ice: prepare legacy-rx for upcoming XDP multi-buffer support
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".'
  stderr: 'error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch'

conflict:

diff --cc drivers/net/ethernet/intel/ice/ice_main.c
index 26a8910a41ff,88b4a017990d..000000000000
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@@ -7344,6 -7322,18 +7344,21 @@@ clear_recovery
  }
  
  /**
++<<<<<<< HEAD
++=======
+  * ice_max_xdp_frame_size - returns the maximum allowed frame size for XDP
+  * @vsi: Pointer to VSI structure
+  */
+ static int ice_max_xdp_frame_size(struct ice_vsi *vsi)
+ {
+ 	if (test_bit(ICE_FLAG_LEGACY_RX, vsi->back->flags))
+ 		return ICE_RXBUF_1664;
+ 	else
+ 		return ICE_RXBUF_3072;
+ }
+ 
+ /**
++>>>>>>> ice: prepare legacy-rx for upcoming XDP multi-buffer support
   * ice_change_mtu - NDO callback to change the MTU
   * @netdev: network interface device structure
   * @new_mtu: new value for maximum frame size
diff --cc drivers/net/ethernet/intel/ice/ice_txrx.c
index 466113c86e6f,d0a6534122e0..000000000000
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@@ -1028,11 -990,6 +1028,14 @@@ ice_construct_skb(struct ice_rx_ring *r
  
  	/* prefetch first cache line of first page */
  	net_prefetch(xdp->data);
++<<<<<<< HEAD
 +
 +	if (unlikely(xdp_buff_has_frags(xdp))) {
 +		sinfo = xdp_get_shared_info_from_buff(xdp);
 +		nr_frags = sinfo->nr_frags;
 +	}
++=======
++>>>>>>> ice: prepare legacy-rx for upcoming XDP multi-buffer support
  
  	/* allocate a skb to store the frags */
  	skb = __napi_alloc_skb(&rx_ring->q_vector->napi, ICE_RX_HDR_SIZE,
@@@ -1200,56 -1168,65 +1203,62 @@@ int ice_clean_rx_irq(struct ice_rx_rin
  			ICE_RX_FLX_DESC_PKT_LEN_M;
  
  		/* retrieve a buffer from the ring */
 -		rx_buf = ice_get_rx_buf(rx_ring, size, &rx_buf_pgcnt);
 +		rx_buf = ice_get_rx_buf(rx_ring, size, ntc);
  
 -		if (!size) {
 -			xdp.data = NULL;
 -			xdp.data_end = NULL;
 -			xdp.data_hard_start = NULL;
 -			xdp.data_meta = NULL;
 -			goto construct_skb;
 -		}
 +		if (!xdp->data) {
 +			void *hard_start;
  
++<<<<<<< HEAD
 +			hard_start = page_address(rx_buf->page) + rx_buf->page_offset -
 +				     offset;
 +			xdp_prepare_buff(xdp, hard_start, offset, size, !!offset);
++=======
+ 		hard_start = page_address(rx_buf->page) + rx_buf->page_offset -
+ 			     offset;
+ 		xdp_prepare_buff(&xdp, hard_start, offset, size, !!offset);
++>>>>>>> ice: prepare legacy-rx for upcoming XDP multi-buffer support
  #if (PAGE_SIZE > 4096)
 -		/* At larger PAGE_SIZE, frame_sz depend on len size */
 -		xdp.frame_sz = ice_rx_frame_truesize(rx_ring, size);
 +			/* At larger PAGE_SIZE, frame_sz depend on len size */
 +			xdp->frame_sz = ice_rx_frame_truesize(rx_ring, size);
  #endif
 +			xdp_buff_clear_frags_flag(xdp);
 +		} else if (ice_add_xdp_frag(rx_ring, xdp, rx_buf, size)) {
 +			break;
 +		}
 +		if (++ntc == cnt)
 +			ntc = 0;
  
 -		if (!xdp_prog)
 -			goto construct_skb;
 +		/* skip if it is NOP desc */
 +		if (ice_is_non_eop(rx_ring, rx_desc))
 +			continue;
  
 -		xdp_res = ice_run_xdp(rx_ring, &xdp, xdp_prog, xdp_ring);
 -		if (!xdp_res)
 +		ice_run_xdp(rx_ring, xdp, xdp_prog, xdp_ring, rx_buf);
 +		if (rx_buf->act == ICE_XDP_PASS)
  			goto construct_skb;
 -		if (xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR)) {
 -			xdp_xmit |= xdp_res;
 -			ice_rx_buf_adjust_pg_offset(rx_buf, xdp.frame_sz);
 -		} else {
 -			rx_buf->pagecnt_bias++;
 -		}
 -		total_rx_bytes += size;
 +		total_rx_bytes += xdp_get_buff_len(xdp);
  		total_rx_pkts++;
  
 -		cleaned_count++;
 -		ice_put_rx_buf(rx_ring, rx_buf, rx_buf_pgcnt);
 +		xdp->data = NULL;
 +		rx_ring->first_desc = ntc;
  		continue;
  construct_skb:
 -		if (skb) {
 -			ice_add_rx_frag(rx_ring, rx_buf, skb, size);
 -		} else if (likely(xdp.data)) {
 -			if (ice_ring_uses_build_skb(rx_ring))
 -				skb = ice_build_skb(rx_ring, rx_buf, &xdp);
 -			else
 -				skb = ice_construct_skb(rx_ring, rx_buf, &xdp);
 -		}
 +		if (likely(ice_ring_uses_build_skb(rx_ring)))
 +			skb = ice_build_skb(rx_ring, xdp);
 +		else
 +			skb = ice_construct_skb(rx_ring, xdp);
  		/* exit if we failed to retrieve a buffer */
  		if (!skb) {
 -			rx_ring->ring_stats->rx_stats.alloc_buf_failed++;
 -			if (rx_buf)
 -				rx_buf->pagecnt_bias++;
 +			rx_ring->ring_stats->rx_stats.alloc_page_failed++;
 +			rx_buf->act = ICE_XDP_CONSUMED;
 +			if (unlikely(xdp_buff_has_frags(xdp)))
 +				ice_set_rx_bufs_act(xdp, rx_ring,
 +						    ICE_XDP_CONSUMED);
 +			xdp->data = NULL;
 +			rx_ring->first_desc = ntc;
  			break;
  		}
 -
 -		ice_put_rx_buf(rx_ring, rx_buf, rx_buf_pgcnt);
 -		cleaned_count++;
 -
 -		/* skip if it is NOP desc */
 -		if (ice_is_non_eop(rx_ring, rx_desc))
 -			continue;
 +		xdp->data = NULL;
 +		rx_ring->first_desc = ntc;
  
  		stat_err_bits = BIT(ICE_RX_FLEX_DESC_STATUS0_RXE_S);
  		if (unlikely(ice_test_staterr(rx_desc->wb.status_error0,

@kernel-patches-bot
Copy link
Author

At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=717481 irrelevant now. Closing PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants