Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tcptop #692

Closed
brendangregg opened this issue Sep 16, 2016 · 6 comments
Closed

tcptop #692

brendangregg opened this issue Sep 16, 2016 · 6 comments
Assignees

Comments

@brendangregg
Copy link
Member

At least:

# tcptop
Tracing... Output every 1 secs. Hit Ctrl-C to end
<screen clears>
23:53:38 loadavg: 0.17 0.15 0.11 1/364 17269

PID    COMM         SADDR            DADDR             RX_KB  TX_KB
9370   sshd         100.66.3.172     11.16.213.254         1   4013
3254   redis-server 127.0.0.1        127.0.0.1             0      0
11386  sshd         100.66.3.172     11.16.213.254         0      0
17264  17264        127.0.0.1        127.0.0.1             0      0

PID    COMM         SADDR6                     DADDR6                      RX_KB  TX_KB
20972  sshd         fe80::8a3:9dff:fed5:6b19   fe80::8a3:9dff:fed5:6b19        1      1
20952  ssh          fe80::8a3:9dff:fed5:6b19   fe80::8a3:9dff:fed5:6b19        1      1
20963  sshd         fe80::8a3:9dff:fed5:6b19   fe80::8a3:9dff:fed5:6b19        0      0

Maybe more columns as well.

I submitted one version, and it was discussed in #690, but I removed it from that PR while considering an alternate approach. I'm now investigating two approaches:

  1. kprobes of tcp_sendmsg()/tcp_recvmsg() (or tcp_cleanup_rbuf()).
  2. polling of "ss -nti" for long-running connections (since tcp_info has been extended recently to have bytes & segs!!), and also kprobes of tcp_close() (or tcp_set_state()) to catch short-lived sessions.
@brendangregg
Copy link
Member Author

brendangregg commented Oct 4, 2016

Polling of ss -nti

This is to use the recent RFC-4898 additions to tcp_info, namely tcpEStatsAppHCThruOctetsAcked and tcpEStatsAppHCThruOctetsReceived.

This will also need to BPF trace tcp_close() as well, to catch short-lived sessions or those that closed during the interval. One would think they'd stay around in TIME-WAIT, but they aren't visible in ss -nti, as I believe the full socket is no longer kept (as a measure to survive DoS attacks).

Another problem is the overhead; here's a busy production server:

# time ss -nti | wc
  15279  126252 1439591

real    0m0.085s
user    0m0.074s
sys     0m0.045s

# time sh -c 'for i in 1 2 3 4 5 6 7 8 9 0; do ss -nti > /dev/null; done'

real    0m0.808s
user    0m0.522s
sys     0m0.276s

So about 80 ms of CPU time each time we poll this. Some servers will be much better (fewer connections), some a bit worse. A tcptop with a 1 second refresh is looking at at least 8% of one CPU overhead on this server.

BPF TCP send/receive

Back to tracing tcp_sendmsg and tcp_recvmsg (or tcp_cleanup_rbuf, which has the socket and correct size in one probe).

I'd done a stress test of 600k TCP events/second, and measured 10% CPU overhead over 8 CPUs (or 80% of one CPU). So the overhead can become significant. But is that likely?

I've used funccount (from my ftrace perf-tools) to measure the total rate of tcp_sendmsg & tcp_cleanup_rbuf on three common production services, and can extrapolate the expected overhead. They are:

  • proxy service, TCP rate 3900 events/sec, expected overhead = 0.13% overall (0.52% CPU cost across 4 CPUs)
  • app service, TCP rate 9600 events/sec, expected overhead = 0.04% overall (1.27% CPU cost across 32 CPUs)
  • database service, TCP rate 14400 events/sec, expected overhead = 0.12% overall (1.91% CPU cost across 16 CPUs)

Now for the ss -nti cost on these services:

  • proxy service: 85 ms
  • app service: 84 ms
  • database service: 36 ms

So while the BPF cost can be significant during stress tests, for real workloads it's expected to be low overhead, and even faster than ss -nti polling (at a 1 second interval).

@brendangregg brendangregg mentioned this issue Oct 4, 2016
@4ast
Copy link
Member

4ast commented Oct 4, 2016

'ss' time in excessively high.
what is the version of 'ss' and version of the kernel? I'll try to reproduce...

@brendangregg
Copy link
Member Author

brendangregg commented Oct 4, 2016

one of the systems:

# time ss -nti | wc
  15119  121596 1396724

real    0m0.075s
user    0m0.082s
sys 0m0.013s

# ss -v
ss utility, iproute2-ss131122

# uname -a
Linux xxxxx-prod-i-xxxxx 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

This is why I was using ftrace to count the kernel function rates.

@brendangregg
Copy link
Member Author

I'll close the issue but we can still discuss...

@4ast
Copy link
Member

4ast commented Oct 5, 2016

time ss -tni |wc -l
261591
real 0m0.819s
user 0m0.649s
sys  0m0.171s

but the problem is not in the kernel or netlink interface:

    17.04%  ss       libc-2.12.so       [.] vfprintf
     7.35%  ss       libc-2.12.so       [.] __printf_fp
     6.23%  ss       libc-2.12.so       [.] __mpn_divrem
     4.07%  ss       libc-2.12.so       [.] _IO_file_xsputn@@GLIBC_2.2.5
     2.83%  ss       libc-2.12.so       [.] hack_digit.15676
     2.79%  ss       [kernel.kallsyms]  [k] inet_sk_diag_fill
     2.54%  ss       libc-2.12.so       [.] __printf_chk
     2.45%  ss       libc-2.12.so       [.] __strchrnul
     2.09%  ss       libc-2.12.so       [.] _IO_default_xsputn
     1.78%  ss       [kernel.kallsyms]  [k] inet_diag_dump_icsk
     1.71%  ss       [kernel.kallsyms]  [k] tcp_get_info

so I still think tcp_info is a proper way forward.

@brendangregg
Copy link
Member Author

Ok, but still, 171 ms in sys? We'll have a lot less user time, but still need to store all byte counts in a session hash and do deltas to see if anything changed each interval. And we still have to trace tcp_close() to catch short lived. It's all going to add up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants