-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tcptop: incorrect bytes sent (tcp_sendmsg) #2440
Comments
Is this a real-world issue or not? We could get the size from the kretprobe, but we need the sock args as well, so that's a kprobe + kretprobe. As frequent functions, this is going to cost CPU. If this is a contrived case, I'd rather the docs point it out than for us all to pay a tax for something that doesn't matter in the real world. By docs I mean man page. So you triggered it by doing a 100 Mbyte sendmsg()? Is that what makes it to tcp_sendmsg()? Here's a Netflix production example showing sizes and return code:
zero errors, and a maximum size in the 64k - 128k bucket. |
I can trigger this but from what I gather the expected output doesn't make much sense. It's showing the total write not total transfered which I'd hope exceed the requested size if the socket were left open long enough for recovery. It's quite possiblle for the receiver to be rescheduled out long enough for the timeout to pop or socket buffer to fill, which is expected no? If anything it's an argument for packet pacing. |
What are you expecting? TCP throughput to match the on-wire transmit per second? How are you going to deal with TSO and TX queues? The best we can do is close enough at the net driver level, but that (say, post-GSO) means tracing every packet. I'd much rather tcptop trace at the TCP level, as the event rate and overhead is much less. If you need this from the NIC level, that's a different tool, and one that costs more overhead. What problem is it solving? tcptop(8) is to solve the same problem as (CPU) top(1): oh, I didn't know that PID was sending so much. Note that @sfluor works at DataDog, so I assume he is cooking up a 24x7 monitoring agent, so understanding overhead and picking the most efficent method is critical. We don't want 24x7 agents burning CPU if it's not really solving a real problem. |
Hi, sorry for the late answer:
We've seen that with consul traffic on our hosts, I got those dumps from
You can see that in one case the result for one exchange with
I also tried to run
and
it seems that no error is returned however the bytes for |
@brendangregg Not sure if that was directed at me or sfluor? I don't disagree with you, the test was a bit ambigious.
That however could be a bit misleading. It would be more like sockettop (if such a thing existed). Though top does not provide insight into metrics lilke scheduling delays nor any indication as to what the use is. sched_yielld() for example. I can udnerstand why someone would be confused by the results. I could duplicate specifically the loss using default socket buffers.
Fights between departments whom all want their own monitoring on hosts, that hits home. |
I can't trace tcp_transmit_skb(), only __tcp_transmit_skb(). But it should be the same thing. Just looking at the stacks, I see at least one problem:
__tcp_transmit_skb() can happen in IRQ context: so the on-CPU PID and comm are unreliable. tcp_sendmsg() is close to the syscall, and should still be the right context. If you have a heavy CPU workload, you might find the on-CPU PID for __tcp_transmit_skb() seems correct, as the right PID is on CPU by chance. There's a number of ways to fix this, but ugh. What we actually want is TCP transmit tracepoints, so we have a stable and reliable way to do this. Plus raw tracepoints are a lot faster than kretprobes, so monitoring is less of a concern (which is why Facebook did the raw tracepoint stuff). I have some new networking stuff in #1506, but it's not out yet (it will be soon!); it might provide other ways to solve this. At the very least, the man page should say that tcptop is showing the requested transmit bytes, which may overcount in cases of failures. Maybe we'll have to add more probes later (until tracepoints), but I just don't want to go straight to adding extra overhead for every transmit packet without thinking about the options first. |
Hi I noticed that for the
tcptop.py
tool, binding to thetcp_sendmsg
probe to count the bytes could sometimes results into miscounting whentcp_sendmsg
fails (it can occur when it returns theEAGAIN
error code for instance, it seems that this can happen here).In this case the
tcp_sendmsg
probe is triggered and thesize
parameter passed is used to increment the count of bytes sent without checking for an error.Some solutions would be to:
use a
kretprobe
ontcp_sendmsg
to check for the returned value (# of bytes sent OR an error) but my understanding is that this can lead to missingtcp_sendmsg
since there is a limit for multiplekretprobe
running at the same time set by themax_active
parameter.use
tcp_transmit_skb
which has the # of bytes to be sent inside the sk_buff parameter. This would also help capture bytes sent via TCP retransmits, However it seems that tcp_transmit_skb do not have a reliable access to the PID (so this might require to remove the pid from the tuple ?)Do you think there is a better way to fix this issue ?
Reproducing the issue
The following was done on a Vagrant VM (
hashicorp-vagrant/ubuntu-16.04
):I was able to reproduce it by setting the
so_sndtimeo
parameter to a really low value and trying to send a "big" payload.Running the following command (which will open a local tcp connection and try to send 100Mb of
1
s to it should trigger this issue):nc -l 3000 & python3 script.py 100000
Where
script.py
is:Output of
nc -l 3000 & python3 script.py 100000
Output of
sudo python tcptop.py -C
:Expected:
The text was updated successfully, but these errors were encountered: