tcptop #692

brendangregg · 2016-09-16T18:49:39Z

At least:

# tcptop
Tracing... Output every 1 secs. Hit Ctrl-C to end
<screen clears>
23:53:38 loadavg: 0.17 0.15 0.11 1/364 17269

PID    COMM         SADDR            DADDR             RX_KB  TX_KB
9370   sshd         100.66.3.172     11.16.213.254         1   4013
3254   redis-server 127.0.0.1        127.0.0.1             0      0
11386  sshd         100.66.3.172     11.16.213.254         0      0
17264  17264        127.0.0.1        127.0.0.1             0      0

PID    COMM         SADDR6                     DADDR6                      RX_KB  TX_KB
20972  sshd         fe80::8a3:9dff:fed5:6b19   fe80::8a3:9dff:fed5:6b19        1      1
20952  ssh          fe80::8a3:9dff:fed5:6b19   fe80::8a3:9dff:fed5:6b19        1      1
20963  sshd         fe80::8a3:9dff:fed5:6b19   fe80::8a3:9dff:fed5:6b19        0      0

Maybe more columns as well.

I submitted one version, and it was discussed in #690, but I removed it from that PR while considering an alternate approach. I'm now investigating two approaches:

kprobes of tcp_sendmsg()/tcp_recvmsg() (or tcp_cleanup_rbuf()).
polling of "ss -nti" for long-running connections (since tcp_info has been extended recently to have bytes & segs!!), and also kprobes of tcp_close() (or tcp_set_state()) to catch short-lived sessions.

The text was updated successfully, but these errors were encountered:

brendangregg · 2016-10-04T19:32:47Z

Polling of ss -nti

This is to use the recent RFC-4898 additions to tcp_info, namely tcpEStatsAppHCThruOctetsAcked and tcpEStatsAppHCThruOctetsReceived.

This will also need to BPF trace tcp_close() as well, to catch short-lived sessions or those that closed during the interval. One would think they'd stay around in TIME-WAIT, but they aren't visible in ss -nti, as I believe the full socket is no longer kept (as a measure to survive DoS attacks).

Another problem is the overhead; here's a busy production server:

# time ss -nti | wc
  15279  126252 1439591

real    0m0.085s
user    0m0.074s
sys     0m0.045s

# time sh -c 'for i in 1 2 3 4 5 6 7 8 9 0; do ss -nti > /dev/null; done'

real    0m0.808s
user    0m0.522s
sys     0m0.276s

So about 80 ms of CPU time each time we poll this. Some servers will be much better (fewer connections), some a bit worse. A tcptop with a 1 second refresh is looking at at least 8% of one CPU overhead on this server.

BPF TCP send/receive

Back to tracing tcp_sendmsg and tcp_recvmsg (or tcp_cleanup_rbuf, which has the socket and correct size in one probe).

I'd done a stress test of 600k TCP events/second, and measured 10% CPU overhead over 8 CPUs (or 80% of one CPU). So the overhead can become significant. But is that likely?

I've used funccount (from my ftrace perf-tools) to measure the total rate of tcp_sendmsg & tcp_cleanup_rbuf on three common production services, and can extrapolate the expected overhead. They are:

proxy service, TCP rate 3900 events/sec, expected overhead = 0.13% overall (0.52% CPU cost across 4 CPUs)
app service, TCP rate 9600 events/sec, expected overhead = 0.04% overall (1.27% CPU cost across 32 CPUs)
database service, TCP rate 14400 events/sec, expected overhead = 0.12% overall (1.91% CPU cost across 16 CPUs)

Now for the ss -nti cost on these services:

proxy service: 85 ms
app service: 84 ms
database service: 36 ms

So while the BPF cost can be significant during stress tests, for real workloads it's expected to be low overhead, and even faster than ss -nti polling (at a 1 second interval).

4ast · 2016-10-04T22:18:58Z

'ss' time in excessively high.
what is the version of 'ss' and version of the kernel? I'll try to reproduce...

brendangregg · 2016-10-04T22:25:54Z

one of the systems:

# time ss -nti | wc
  15119  121596 1396724

real    0m0.075s
user    0m0.082s
sys 0m0.013s

# ss -v
ss utility, iproute2-ss131122

# uname -a
Linux xxxxx-prod-i-xxxxx 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

This is why I was using ftrace to count the kernel function rates.

brendangregg · 2016-10-04T22:26:31Z

I'll close the issue but we can still discuss...

4ast · 2016-10-05T02:58:43Z

time ss -tni |wc -l
261591
real 0m0.819s
user 0m0.649s
sys  0m0.171s

but the problem is not in the kernel or netlink interface:

    17.04%  ss       libc-2.12.so       [.] vfprintf
     7.35%  ss       libc-2.12.so       [.] __printf_fp
     6.23%  ss       libc-2.12.so       [.] __mpn_divrem
     4.07%  ss       libc-2.12.so       [.] _IO_file_xsputn@@GLIBC_2.2.5
     2.83%  ss       libc-2.12.so       [.] hack_digit.15676
     2.79%  ss       [kernel.kallsyms]  [k] inet_sk_diag_fill
     2.54%  ss       libc-2.12.so       [.] __printf_chk
     2.45%  ss       libc-2.12.so       [.] __strchrnul
     2.09%  ss       libc-2.12.so       [.] _IO_default_xsputn
     1.78%  ss       [kernel.kallsyms]  [k] inet_diag_dump_icsk
     1.71%  ss       [kernel.kallsyms]  [k] tcp_get_info

so I still think tcp_info is a proper way forward.

brendangregg · 2016-10-05T03:19:18Z

Ok, but still, 171 ms in sys? We'll have a lot less user time, but still need to store all byte counts in a session hash and do deltas to see if anything changed each interval. And we still have to trace tcp_close() to catch short lived. It's all going to add up.

brendangregg self-assigned this Sep 16, 2016

brendangregg mentioned this issue Sep 16, 2016

new tool: capable #690

Merged

brendangregg mentioned this issue Oct 4, 2016

tcptop #726

Merged

brendangregg closed this as completed Oct 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tcptop #692

tcptop #692

brendangregg commented Sep 16, 2016

brendangregg commented Oct 4, 2016 •

edited

Loading

4ast commented Oct 4, 2016

brendangregg commented Oct 4, 2016 •

edited

Loading

brendangregg commented Oct 4, 2016

4ast commented Oct 5, 2016

brendangregg commented Oct 5, 2016

tcptop #692

tcptop #692

Comments

brendangregg commented Sep 16, 2016

brendangregg commented Oct 4, 2016 • edited Loading

Polling of ss -nti

BPF TCP send/receive

4ast commented Oct 4, 2016

brendangregg commented Oct 4, 2016 • edited Loading

brendangregg commented Oct 4, 2016

4ast commented Oct 5, 2016

brendangregg commented Oct 5, 2016

brendangregg commented Oct 4, 2016 •

edited

Loading

brendangregg commented Oct 4, 2016 •

edited

Loading