Non-Datagram Packet Ping fails with multiple targets #225
Comments
@bekriebel Thanks for reporting the issue. We did make some changes to the ping probe's code recently (https://github.com/google/cloudprober/commits/master/probes/ping) to improve the memory performance. I'll take a look at what might be going on here. |
Thanks, @manugarg. I just did a bisect and did indeed find that the first point where the error presents itself is in commit 8932ffb. I'm going to dig into it a little further, but may not have time to find the root cause right now. In case it makes a difference, I'm building on Windows Subsystem for Linux. I've attempted to run on both WSL and an ARM board, both platforms have the same issue. |
Thanks @bekriebel. I confirmed that it's the commit 8932ffb#diff-c9bd39f324facba8690a0ef9b7be50d6 that introduced the bug. In this change, we started parsing ICMP messages ourselves instead of relying on icmp.ParseMessage as icmp.ParseMessage is not memory efficient. There should be a problem somewhere in that parsing logic. I'll continue to look. Very much appreciate the bug report. It's clear that we need to enhance our testing. I'll do that after finding the root cause of this bug. |
I found the bug. I'll send a fix and add more details shortly. |
Taking a guess that it has to do with the
|
@bekriebel I am in the process of exporting the change, but in the mean time, problem was in the fact that we were resetting pktbuf inside the loop to the size of the bytes read: cloudprober/probes/ping/ping.go Line 343 in 667a6db
This is not a problem when using datagram sockets because in that case we don't receive packets not belonging to us (kernel takes care of sending only the relevant packets), hence we never receive a smaller packet and pktbuf size doesn't go further down. In the case of raw sockets, we end up reading all ICMP packets. As soon as we'll read a smaller packet (possibly sent by something else on the network), pktbuf will become smaller. Regarding your diff, I actually wanted to keep memory allocation outside the for loop. For small number of targets it doesn't matter much, but for large number of targets (say 1000+), like we have in some of our setups, it matters quite a bit. |
…er reading every packet. Instead, keep the read buffer same and use truncated bytes slice for data. See the following github issue for details: #225 PiperOrigin-RevId: 239456052
…er reading every packet. Instead, keep the read buffer same and use truncated bytes slice for data. See the following github issue for details: #225 PiperOrigin-RevId: 239456052
…er reading every packet. Instead, keep the read buffer same and use truncated bytes slice for data. See the following github issue for details: #225 PiperOrigin-RevId: 239456052
On the current master branch (667a6db), when the setting
use_datagram_socket: false
with multiple targets, the ping begins failing with the errorpacket too small: size (4) < minPacketSize (16)
. If two targets are specified, it seems like half of the packets start failing. If three or more targets are listed, no packets from the third+ targets will succeed.This can be seen using this config:
The errors will be:
And the metrics will show:
The text was updated successfully, but these errors were encountered: