-
Notifications
You must be signed in to change notification settings - Fork 250
-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nic_router: broken SSH pipes #4729
Comments
I'm going to integrate the fixes in #4728 plus your debug commit into my Sculpt system and will report any insights. |
Update: I had one aborted connection yesterday but the log did not reveal any new information as I didn't enable verbose packet log yet. But, after I integrate your additional debug commit networking broke completely with the following error.
Thus, I investigated and finally 7d1779e fixed the uncaught exception.
I'm curious how this may happen and how you ensure the exception is caught at all other places. |
Small update: I use Sculpt 23.04 and had an SSH interruption this morning with the following extended error message.
The second line bothers me. It may stem from the nature of the connection that uses the SSH proxy mechanism but could also hint the issue we are looking for. |
I set up a Linux-based bridge to monitor all network traffic of my Sculpt machine and sighted two SSH connection interruptions. In both cases the TCP source port of the established connection on the Sculpt side suddenly changes, which the server side denied by RST. Further packets from the SSH server to the original TCP port are denied by NIC router with ICMP Destination unreachable (Network unreachable). @m-stein I can provide you with the PCAP files. This morning I provoked stress on the NIC router with
|
Let me update my interpretation here: There is nothing mixed up, it's just the original TCP packet embedded in the ICMP message. Nevertheless I think Destination unreachable (Network unreachable) is not the correct error reply here. According to RFC1812
I propose to change the nic_router as follows. +++ b/repos/os/src/server/nic_router/interface.cc
@@ -1396,7 +1396,7 @@ void Interface::_handle_ip(Ethernet_frame ð,
if(not ip.dst().is_multicast()) {
_send_icmp_dst_unreachable(local_intf, eth, ip,
- Icmp_packet::Code::DST_NET_UNREACHABLE);
+ Icmp_packet::Code::DST_HOST_UNREACHABLE);
}
if (_config().verbose()) {
log("[", local_domain, "] unroutable packet"); } After looking at the captured packet traffic during four connection drops, I'm certain that the NIC router decides to drop the link for no reason related to the traffic itself. |
@chelmuth Thanks a lot for gathering and providing all this detailed information! As discussed offline, I'll continue with this issue as soon as the File Vault has settled on a presentable state again. The ICMP-code modification you suggest for the router sounds sensible to me! |
According to RFC 1812 ICMP Destination unreachable (Network unreachable) does not quite our case of clients directly behind the router. If a packet is to be forwarded to a host on a network that is directly connected to the router (i.e., the router is the last-hop router) and the router has ascertained that there is no path to the destination host then the router MUST generate a Destination Unreachable, Code 1 (Host Unreachable) ICMP message. Issue #4729
According to RFC 1812 ICMP Destination unreachable (Network unreachable) does not quite our case of clients directly behind the router. If a packet is to be forwarded to a host on a network that is directly connected to the router (i.e., the router is the last-hop router) and the router has ascertained that there is no path to the destination host then the router MUST generate a Destination Unreachable, Code 1 (Host Unreachable) ICMP message. Issue #4729
Thanks to the wonderful trace recorder, I was able to create a pcap trace in sculpt and debug the issue in wireshark. In a setup where I have an open ssh connection and then run nmap -sS -O , the connection breaks quickly. It does so because after a bit of a pause in the ssh exchange, my vbox guest sends a valid ssh packet again (maybe keep alive) that is then nat'ed by the nic router to a source port other than the one used before in the connection. The server doesn't like this and sends a tcp reply with reset flag set on the new port. The reply is nat'ed back to the correct guest ports and so, my ssh has no chance of understanding why the connection was cancelled. I found a disappointingly simple explanation for the events: The nmap causes the nic router to run into resource exhaustion with the session at some point. So, the internal link state of the ssh connection is thrown away in an attempt to free resources for the nmap stress. When ssh eventually becomes active again, the nic router creates a new link state with a different port. I can only guess that the reason for this not being a frequent problem is that not so security-aware applications may just work around a changing source port (ignorance, new connection). Anyway, I'm not sure yet what to do about the ssh issue. One solution would be to give a nic router client the opportunity to resolve resource exhaustion by updating the session quota before throwing stuff away. Kind of a band aid would be to make garbage collection smarter, in case it actually makes a difference which link state to throw away and which not. |
Regarding our offline discussion about network timeouts it seems worthwhile to look in to Linux. For Linux as client or server host I played around with
|
I took some reading into online resources regarding the topic. Here are some things I found: Timeouts
Resource exhaustion
Prevention and recovery
Furthermore I found an article series about nf_conntrack (https://thermalcircle.de/doku.php?id=blog:linux:connection_tracking_3_state_and_examples) that gives some insight about "early dropping":
We should also keep in mind, that the above referenced examples are talking about very different limits than we do. While they usually accept at least several 10K connections, I observed limits like 170-270 connections with a session to the NIC router. |
Thanks for the exhaustive review. @m-stein given those findings, do you already have an actionable plan? If not, for addressing the concrete issue at hand, I'd suggest two steps:
|
Just some links for reference depicting Linux kernel default values. https://elixir.bootlin.com/linux/latest/source/net/netfilter/nf_conntrack_proto_icmp.c#L25 A unidirectional UDP timeout of 30s looks quite reasonable to me and may be implemented first following @nfeske's plan. |
@nfeske Thanks for your feedback! On 23.05.24 13:49, Norman Feske wrote:
So far:
I'll implement the first but would advice against the latter. As far as I learned, other appliances simply drop new packets in this case in order to prevent the issue that @chelmuth ran into. What do you think of the probing-approach instead? Martin |
@chelmuth Thanks for these helpful references! My suggestion would be to use all nf_conntrack timeouts as default in the nic_router and actively probe established TCP, say every 5 minutes, in order to cut down the 5 days. |
Probing sounds interesting. How does it work? |
From https://netdevconf.info/2.1/papers/conntrack.pdf:
So, in essence, the router would do the same as any Linux with TCP keepalive but not after an eternity but much more frequent. From https://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/:
As far as I understand it:
There's also a paragraph in the latter regarding the more general topic of this issue:
|
#4534 seems related this discussion (ARP waiter removal). |
@chelmuth Thanks for cross-referencing. |
Regarding probing:
|
@chelmuth Regarding our offline discussion: Currently, a NIC session request at the router comes in with 3,5M quota, of which only 205K remain after The sizes of dynamically allocated session objects:
So, theoretical max number of connections (without counting DHCP/ARP objects or meta data) is 485. Furthermore, I have to correct myself regarding the current idle timeouts:
Suggestions:
|
I agree to all four points but request your special attention regarding the impact of increased default resource requirements in point 3. Automatic tests may need to be adapted and Sculpt integration tested. |
@chelmuth Thanks for your feedback. I'll keep an eye on the tests. |
Similar to how XML nodes can be accessed via a lambda, this commit implements accessing attributes of XML nodes via a lambda. There are two new methods in Xml_node: 'with_optional_attribute' calls a functor only if the attribute was found and with_attribute' additionally calls a functor if the attribute could not be found. Both methods are const and call the "found" functor with one argument 'Xml_attribute const &'. Ref genodelabs#4729
This commit raises the default RAM quota for Nic/Uplink connections from 'rx/tx buffers + 256K' (x86_64) to 'rx/tx buffers + 512K'. This follows the observation that a Nic/Uplink connection at the NIC router had only 205K left after 'create_session' for things like TCP/UDP connection states. This limited the max number of simultaneous TCP/UDP connections to something like 480. Note that this increases the total amount of quota per session by only 7% but the max number of TCP/UDP connections per session by 120%. Furthermore, this commit makes the quota arch-independent as there are no indications for why it should depend on the architecture. Ref genodelabs#4729
Adds default constructor to Net::Port that initializes the value to 0. This allows for using Net::Port with the Genode::Attempt utility. Ref genodelabs#4729
This removes the garbage collection that the router used to do when a session ran out of quota. This is motivated by the fact that the garbage collection was rather simple and removed connection states regardless of their current state. This caused problems, e.g., when the router dropped an established TCP connection with NAPT and on the next matching packet re-created it with different NAPT port, thereby breaking the connection. With this commit, existing connections are prioritized over new ones during resource exhaustion and the packets that attempt to create a new connection in such a state are dropped with a warning in the log (verbose_packet_drop="yes"). Note that the state resolves itself with time as existing connections time out or are closed by peers. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
This commit is part of a series that aims for removing the use of C++ exception as much as possible from the router as C++ exception handling can be resource intensive and can make code hard to understand. Ref genodelabs#4729
The deinitialization method of Domain used to rely on Domain::with_dhcp_server in order to dissolve and destroy a present DHCP server. However, this method skipped calling its functor argument also when there was a DHCP server but an invalid one. This commt replaces the with_dhcp_server with a pointer null-check in order to fix the leak. Ref genodelabs#4729
The Reference and Const_reference utility were introduced in order to express that something is a reference (no null value) but can be changed dynamically (not possible with built-in C++ references). However, the idea of preventing every possibility for null pointer faults, whith wich the router was built initially, has not prevailed and using pointers instead of the utility saves logic and makes the code more readable to other C++ developers. Ref genodelabs#4729
The TCP connection state "ESTABLISHED" (in the router "OPEN") is a privileged one for peers because it lasts very long without any peer interaction (in the NIC router it's only 10 minutes, but RFCs recommend not less than 2 hours and 4 minutes). Furthermore, TCP connections in this state are normally not available for early-drop on resource exhaustion. This means that this state binds resources to a connection potentially for a long time without the option of regaining them under stress. Therefore, this state should be entered with care. Up to now, the router marked a TCP connection with this state as soon as it had seen one matching packet in both directions, which is rather quick. However, implementing a very precise tracking of the exact TCP states of both peers and only marking the connection "ESTABLISHED" when both peers are "ESTABLISHED" is a difficult task with lots of corner cases. That said, this commit implements a compromise. The router now has two flags for each peer of a TCP connection - FIN sent and FIN acked - and sets them according to the observed TCP flags. The "ESTABLISHED" state is entered only when FIN acked is set for both peers (without having observed an RST or FIN flag meanwhile). Ref genodelabs#4729
The previous value of 60 seconds was never observed in real-time scenarios and UDP, for instance always used a timeout of 30 seconds without causing issues. Note that this applies only to TCP connections in a state other than ESTABLISHED, i.e., while it is still safe to early-drop the connection. Ref genodelabs#4729
RFCs recommend to keep TCP connections for a certain time even after they finished a close handshake, AFAIK, in order to be able to recognize astray packets when they arrive later. This seems overambitious especially when in the context of the router where session quota is pretty limited. Therefore, this commit drops this final timeout and drops closed connections immediately. Ref genodelabs#4729
The only object that is dynamically allocated by a network interface and that was not equipped with a self-destruct timeout was the ARP waiter. This commit closes this gap by adding a timeout to each ARP waiter that is set to 10 seconds by default but can be configured via the new <config> attribute 'arp_request_timeout_sec'. Ref genodelabs#4729
@chelmuth reported that SSH connections from his Sculpt VM towards a server on a remote machine sporadically end up broken (without router re-configuration involved).
The text was updated successfully, but these errors were encountered: