Skip to content

Conversation

@pull
Copy link
Contributor

@pull pull bot commented Oct 8, 2025

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

dgibson and others added 10 commits October 7, 2025 15:33
clang-tidy 20.1.8 doesn't like (VHOST_USER_MAX_VQS / 2), because it expands
to (2 / 2).  But in the context of the #define, this makes logical sense,
so suppress the warning.

I'm not sure why it isn't firing on the debug() line just below.  Possibly
it only complains once per expression per function, so we only have to
suppress it once?

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Apparently cppcheck 2.18.3 no longer trips the funcArgNamesDifferent
warning on our (re-)definition of close_range().  So instead we get an
unmatchedSuppression warning.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Another cppcheck package in Fedora, another bogus false positive.  This
one seems to not realise that a variable has been initialised by
getsockopt() under a complicated set of circumstances.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
At least some cppcheck versions (2.18.3 for me) complain that the _storage
variables in dhcpv6() could be reduced in scope.  That's not actually the
case, because although we don't reference the variables, we may touch
their memory via pointers after the blocks in question.  There's no
reasonable way for cppcheck to determine that, though, so suppress its
warnings.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
exetool is listed in $(LOCAL_ASSETS), which seems like it makes sense,
until you realise that $(LOCAL_ASSETS) are deleted on make clean, and we
can't rebuild exetool after deleting it (since it's just part of the
exeter tree we download).  Move it to $(DOWNLOAD_ASSETS) instead.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
We currently have one test moved to the new exeter based framwork written
in Python.  We plan to add many more, so add linting (flake8) and type
checking (mypy) of those scripts.  This can be invoked manually with
"make flake8" or "make mypy" in test/, and is also added to the static
checkers test set.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
If we reach end-of-file on a socket (or get EPOLLRDHUP / EPOLLHUP) and
send a FIN segment to the guest / container acknowledging a sequence
number that's behind what we received so far, we won't have any
further trigger to send an updated ACK segment, as we are now
switching the epoll socket monitoring to edge-triggered mode.

To avoid this situation, in tcp_update_seqack_wnd(), we set the next
acknowledgement sequence to the current observed sequence, regardless
of what was acknowledged socket-side.

However, we don't necessarily call tcp_update_seqack_wnd() before
sending the FIN segment, which might potentially lead to a situation,
not observed in practice, where we unnecessarily cause a
retransmission at some point after our FIN segment.

Avoid that by setting the ACK sequence to whatever we received from
the container / guest, before sending a FIN segment and switching to
EPOLLET.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
According to RFC 9293 we should ignore data (note: not data segments)
in CLOSE-WAIT state (indicated by TAP_FIN_RCVD), see 3.10.7.4
"Other states":

  [...]

  Seventh, process the segment text:

  [...]

  CLOSE-WAIT STATE

  This should not occur since a FIN has been received from the remote
  side. Ignore the segment text.

and we almost do that, except that we would look at the data length
to decide whether it's a request for fast re-transmission, so fix
that, and while at it, log a message, so that cases such as the
following one are more apparent in debug logs:

28692   0.009758 88.198.0.164 → 93.235.151.95 54 TCP 55414 → 47080 [FIN, ACK] Seq=121441 Ack=141 Win=65536 Len=0

we should ignore this FIN flag, because we didn't accept data up
to this sequence (see next segment), but we don't do it, so, here:

28693   0.000036 93.235.151.95 → 88.198.0.164 54 TCP 47080 → 55414 [ACK] Seq=141 Ack=90722 Win=32128 Len=0
28694   0.034597 93.235.151.95 → 88.198.0.164 54 TCP 47080 → 55414 [FIN, ACK] Seq=141 Ack=90722 Win=121216 Len=0
28695   0.000019 88.198.0.164 → 93.235.151.95 54 TCP 55414 → 47080 [ACK] Seq=121442 Ack=142 Win=65536 Len=0
28696   0.162968 88.198.0.164 → 93.235.151.95 30773 TCP [TCP Retransmission] 55414 → 47080 [FIN, PSH, ACK] Seq=90722 Ack=142 Win=65536 Len=30719 [TCP segment of a reassembled PDU]

we are erroneously in CLOSE-WAIT (TAP_FIN_RCVD) state, and this
segment would look pretty strange there.

This specific case is fixed by the next patch, so it should never
happen again.

Link: https://archives.passt.top/passt-dev/20250910115726.432bbb8d@elisabeth/
Link: https://bugs.passt.top/show_bug.cgi?id=126
Suggested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
If a guest or container sends us a FIN segment but its sequence number
doesn't match the highest sequence of data we *accepted* (not
necessarily the highest sequence we received), that is,
conn->seq_from_tap, plus any data we're accepting in the current
batch, we should discard the flag (not necessarily the segment),
because there's still data we need to receive (again) before the end
of the stream.

If we consider those FIN flags as such, we'll end up in the
situation described below.

Here, 192.168.10.102 is a HTTP server in a Podman container, and
192.168.10.44 is a client fetching approximately 121 KB of data from
it:

   82   2.026811 192.168.10.102 → 192.168.10.44 54 TCP 55414 → 44992 [FIN, ACK] Seq=121441 Ack=143 Win=65536 Len=0

the server is done sending

   83   2.026898 192.168.10.44 → 192.168.10.102 54 TCP 44992 → 55414 [ACK] Seq=143 Ack=114394 Win=216192 Len=0

pasta (client) acknowledges a previous sequence, because of
a short sendmsg()

   84   2.027324 192.168.10.44 → 192.168.10.102 54 TCP 44992 → 55414 [FIN, ACK] Seq=143 Ack=114394 Win=216192 Len=0

pasta (client) sends FIN, ACK as the client has no more data to
send (a single GET request), while still acknowledging a previous
sequence, because the retransmission didn't happen yet

   85   2.027349 192.168.10.102 → 192.168.10.44 54 TCP 55414 → 44992 [ACK] Seq=121442 Ack=144 Win=65536 Len=0

the server acknowledges the FIN, ACK

   86   2.224125 192.168.10.102 → 192.168.10.44 4150 TCP [TCP Retransmission] 55414 → 44992 [ACK] Seq=114394 Ack=144 Win=65536 Len=4096 [TCP segment of a reassembled PDU]

and finally a retransmission comes, but as we wrongly switched to
the CLOSE-WAIT state,

   87   2.224202 192.168.10.44 → 192.168.10.102 54 TCP 44992 → 55414 [RST] Seq=144 Win=0 Len=0

we consider frame #86 as an acknowledgement for the FIN segment we
sent, and close the connection, while we still had to re-receive
(and finally send) the missing data segment, instead.

Link: containers/podman#27179
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
… right away

...possibly with an updated window value. As we're discarding the
remaining data, we'll need to receive it again. If we don't request
a retransmission immediately, we'll see an apparent gap in the
sequence, and request a retransmission on the next data batch or
segment, but we're just wasting time like that. In packets:

28686   0.000007 88.198.0.164 → 93.235.151.95 16118 TCP 55414 → 47080 [ACK] Seq=80501 Ack=141 Win=65536 Len=16064 [TCP segment of a reassembled PDU]
28687   0.000012 88.198.0.164 → 93.235.151.95 16118 TCP [TCP Window Full] 55414 → 47080 [ACK] Seq=96565 Ack=141 Win=65536 Len=16064 [TCP segment of a reassembled PDU]

on this second data segment, we have a short sendmsg(), and

28688   0.000026 93.235.151.95 → 88.198.0.164 54 TCP 47080 → 55414 [ACK] Seq=141 Ack=90721 Win=32128 Len=0

consequently acknowledge it, without requesting a retransmission,

28689   0.000006 88.198.0.164 → 93.235.151.95 8866 HTTP HTTP/1.1 200 ok  (text/css)

so the server proceeds sending a bit more, but

28690   0.000016 93.235.151.95 → 88.198.0.164 54 TCP [TCP Dup ACK 28688#1] 47080 → 55414 [ACK] Seq=141 Ack=90721 Win=32128 Len=0
28691   0.000000 93.235.151.95 → 88.198.0.164 54 TCP [TCP Dup ACK 28688#2] 47080 → 55414 [ACK] Seq=141 Ack=90721 Win=32128 Len=0

we'll throw that data (from frame #28689) away, and finally request
a retransmission as we spotted the gap now.

Request a retransmission as soon as we know we'll need it, instead.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
@pull pull bot locked and limited conversation to collaborators Oct 8, 2025
@pull pull bot added the ⤵️ pull label Oct 8, 2025
@pull pull bot merged commit 4af27ff into master Oct 8, 2025
@pull pull bot deleted the upstream branch October 8, 2025 02:59
@kroese kroese restored the upstream branch October 8, 2025 03:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants