Skip to content
This repository has been archived by the owner on Nov 5, 2021. It is now read-only.

Non-Datagram Packet Ping fails with multiple targets #225

Closed
bekriebel opened this issue Mar 20, 2019 · 6 comments
Closed

Non-Datagram Packet Ping fails with multiple targets #225

bekriebel opened this issue Mar 20, 2019 · 6 comments
Assignees
Labels

Comments

@bekriebel
Copy link
Contributor

On the current master branch (667a6db), when the setting use_datagram_socket: false with multiple targets, the ping begins failing with the error packet too small: size (4) < minPacketSize (16). If two targets are specified, it seems like half of the packets start failing. If three or more targets are listed, no packets from the third+ targets will succeed.

This can be seen using this config:

probe {
    name: "ping"
    type: PING
    targets {
      host_names: "google.com, github.com"
    }
    ping_probe {
      use_datagram_socket: false
    }
  }

The errors will be:

W0320 11:25:40.905329    5273 ping.go:339] packet too small: size (4) < minPacketSize (16)
W0320 11:25:41.875763    5273 ping.go:331] read ip4 0.0.0.0: i/o timeout
W0320 11:25:42.906471    5273 ping.go:339] packet too small: size (4) < minPacketSize (16)
W0320 11:25:43.874060    5273 ping.go:331] read ip4 0.0.0.0: i/o timeout
W0320 11:25:44.904014    5273 ping.go:339] packet too small: size (4) < minPacketSize (16)
W0320 11:25:45.874674    5273 ping.go:331] read ip4 0.0.0.0: i/o timeout
W0320 11:25:46.903673    5273 ping.go:339] packet too small: size (4) < minPacketSize (16)
W0320 11:25:47.874429    5273 ping.go:331] read ip4 0.0.0.0: i/o timeout
W0320 11:25:48.903609    5273 ping.go:339] packet too small: size (4) < minPacketSize (16)
W0320 11:25:49.873686    5273 ping.go:331] read ip4 0.0.0.0: i/o timeout

And the metrics will show:

#TYPE total counter
total{ptype="ping",probe="ping",dst="google.com"} 160
total{ptype="ping",probe="ping",dst="github.com"} 160
#TYPE success counter
success{ptype="ping",probe="ping",dst="google.com"} 160
success{ptype="ping",probe="ping",dst="github.com"} 80
@manugarg
Copy link
Contributor

@bekriebel Thanks for reporting the issue. We did make some changes to the ping probe's code recently (https://github.com/google/cloudprober/commits/master/probes/ping) to improve the memory performance. I'll take a look at what might be going on here.

@manugarg manugarg self-assigned this Mar 20, 2019
@manugarg manugarg added the bug label Mar 20, 2019
@bekriebel
Copy link
Contributor Author

Thanks, @manugarg. I just did a bisect and did indeed find that the first point where the error presents itself is in commit 8932ffb. I'm going to dig into it a little further, but may not have time to find the root cause right now.

In case it makes a difference, I'm building on Windows Subsystem for Linux. I've attempted to run on both WSL and an ARM board, both platforms have the same issue.

@manugarg
Copy link
Contributor

Thanks @bekriebel. I confirmed that it's the commit 8932ffb#diff-c9bd39f324facba8690a0ef9b7be50d6 that introduced the bug. In this change, we started parsing ICMP messages ourselves instead of relying on icmp.ParseMessage as icmp.ParseMessage is not memory efficient. There should be a problem somewhere in that parsing logic. I'll continue to look.

Very much appreciate the bug report. It's clear that we need to enhance our testing. I'll do that after finding the root cause of this bug.

@manugarg
Copy link
Contributor

I found the bug. I'll send a fix and add more details shortly.

@bekriebel
Copy link
Contributor Author

bekriebel commented Mar 20, 2019

Taking a guess that it has to do with the pktbuf channel. It looks like it isn't being reused cleanly. I noticed if I move its creation into the tracker channel's for loop, it fixes the issue; but I'm not sure if that is the cleanest fix.

diff --git a/probes/ping/ping.go b/probes/ping/ping.go
index 6b6b2be..c2ade86 100644
--- a/probes/ping/ping.go
+++ b/probes/ping/ping.go
@@ -306,7 +306,6 @@ func (p *Probe) recvPackets(runID uint16, tracker chan bool) {
        received := make(map[packetKey]bool, int(p.c.GetPacketsPerProbe())*len(p.targets))
        outstandingPkts := 0
        p.conn.setReadDeadline(time.Now().Add(p.opts.Timeout))
-       pktbuf := make([]byte, 1500)
        for {
                // To make sure that we have picked up all the packets sent by the sender, we
                // use a tracker channel. Whenever sender successfully sends a packet, it notifies
@@ -325,6 +324,7 @@ func (p *Probe) recvPackets(runID uint16, tracker chan bool) {
                }

                // Read packet from the socket
+               pktbuf := make([]byte, icmpHeaderSize+p.c.GetPayloadSize())
                n, peer, err := p.conn.read(pktbuf)

                if err != nil {

@manugarg
Copy link
Contributor

manugarg commented Mar 20, 2019

@bekriebel I am in the process of exporting the change, but in the mean time, problem was in the fact that we were resetting pktbuf inside the loop to the size of the bytes read:

pktbuf = pktbuf[:n]

This is not a problem when using datagram sockets because in that case we don't receive packets not belonging to us (kernel takes care of sending only the relevant packets), hence we never receive a smaller packet and pktbuf size doesn't go further down. In the case of raw sockets, we end up reading all ICMP packets. As soon as we'll read a smaller packet (possibly sent by something else on the network), pktbuf will become smaller.

Regarding your diff, I actually wanted to keep memory allocation outside the for loop. For small number of targets it doesn't matter much, but for large number of targets (say 1000+), like we have in some of our setups, it matters quite a bit.

manugarg added a commit that referenced this issue Mar 20, 2019
…er reading every packet.

Instead, keep the read buffer same and use truncated bytes slice for data. See the following github issue for details:
#225

PiperOrigin-RevId: 239456052
manugarg added a commit that referenced this issue Mar 20, 2019
…er reading every packet.

Instead, keep the read buffer same and use truncated bytes slice for data. See the following github issue for details:
#225

PiperOrigin-RevId: 239456052
manugarg added a commit that referenced this issue Mar 20, 2019
…er reading every packet.

Instead, keep the read buffer same and use truncated bytes slice for data. See the following github issue for details:
#225

PiperOrigin-RevId: 239456052
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants