Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add tcp congestion status duration statistic tool #3899

Merged
merged 12 commits into from Mar 24, 2022
Merged

add tcp congestion status duration statistic tool #3899

merged 12 commits into from Mar 24, 2022

Conversation

jackygam2001
Copy link
Contributor

add tcp congestion contol status duration statistic tool, and it can be used to evalute the networking and congestion algorithm performace.

@iamkafai
Copy link
Contributor

iamkafai commented Mar 9, 2022

The use case makes sense to me. The implementation is not straightforward though because of the kernel missing a tracepoint in tcp_set_ca_state(), so it has to resort to kprobe and kretprobe on all functions calling tcp_set_ca_state().

".set_state" of "struct tcp_congestion_ops" is implemented for some popular cc like cubic, bbr, and dctcp. That will be a more stable kprobe points.

If you are interested, a tracepoint in tcp_set_ca_state() may be a good add to the kernel considering it is not the hot path. This future kernel work does not need to derail the current bcc pull request.

@jackygam2001
Copy link
Contributor Author

The use case makes sense to me. The implementation is not straightforward though because of the kernel missing a tracepoint in tcp_set_ca_state(), so it has to resort to kprobe and kretprobe on all functions calling tcp_set_ca_state().

".set_state" of "struct tcp_congestion_ops" is implemented for some popular cc like cubic, bbr, and dctcp. That will be a more stable kprobe points.

If you are interested, a tracepoint in tcp_set_ca_state() may be a good add to the kernel considering it is not the hot path. This future kernel work does not need to derail the current bcc pull request.

yes, currently the upstream kernel only has inline function tcp_set_ca_state(), which is optimized by compiler when building and can not be probed. So in my implementation i have to kprobe and kretprobe on all functions calling it.

@brendangregg
Copy link
Member

Thanks; Several comments:

  • Name is too long; should be tcpcong or something.

  • It has 20 k{ret}probes plus a lot of struct digging, which would make it by far the most brittle tool in all of bcc. Most kprobe tools use between 1 and 6 kprobes. The prior biggest is zfsdist/zfsslower, which has 16, but only attaches to 6 at a time depending on the software version, and it also does no struct digging. (While BTF can help solve struct members moving, it can't solve logic changes.) Also, looking at these tcp functions:

grep attach_k.*probe tcpcongestdura.py 
b.attach_kprobe(event="tcp_try_keep_open",
b.attach_kretprobe(event="tcp_try_keep_open",
b.attach_kprobe(event="tcp_enter_cwr", fn_name="trace_entry_tcp_enter_cwr")
b.attach_kretprobe(event="tcp_enter_cwr",
b.attach_kprobe(event="tcp_process_tlp_ack",
b.attach_kretprobe(event="tcp_process_tlp_ack",
b.attach_kprobe(event="tcp_enter_recovery",
b.attach_kretprobe(event="tcp_enter_recovery",
b.attach_kprobe(event="tcp_enter_loss", fn_name="trace_entry_tcp_enter_loss")
b.attach_kretprobe(event="tcp_enter_loss",
b.attach_kprobe(event="tcp_simple_retransmit",
b.attach_kretprobe(event="tcp_simple_retransmit",
b.attach_kprobe(event="tcp_try_undo_recovery",
b.attach_kretprobe(event="tcp_try_undo_recovery",
b.attach_kprobe(event="tcp_try_undo_loss",
b.attach_kretprobe(event="tcp_try_undo_loss",
b.attach_kprobe(event="tcp_fastretrans_alert",
b.attach_kretprobe(event="tcp_fastretrans_alert",
b.attach_kprobe(event="tcp_disconnect", fn_name="trace_entry_tcp_enter_open")
b.attach_kretprobe(event="tcp_disconnect",

I guess this is only really tracing 10 functions as it's doing both kprobe and kretprobe.

Things like tcp_disconnect() I'm not too worried about as it's in tcp_prot, and isn't likely going to change, and tcp_enter_loss() is in tcp.h. But others like tcp_try_undo_loss() feel like gritty internals. I'm ok with one or two such functions in a tool, but if it ends up being several it's a pain to maintain.

So my two questions are: How many of these are unstable internal functions, and does it need to trace them all or can you pop up a stack frame and trace some more-stable parent function to do the same job? I'm not saying this is a blocker, I just want to be thoughtful about the future maintenance.

  • How does it compare to tcp_probe and other TCP congestion analysis tools? (Other people will no doubt ask.)

  • Output columns look misaligned by one.

  • Default output is > 80 chars, but fortunately only by 5. This does make it hard (and sometimes impossible) to include the output in things like published books and printed articles, but I don't see an easy fix.

  • Columns are in "us", but the numbers are so big it's confusing. Would it make sense to make them "ms" by default? Have a switch to output "us".

  • I think the opening example in the examples file is missing a sentence or more to explain why the output matters. Which of these columns is evidence of a problem, such that clients are blocked waiting? Help a newcomer interpret what they are seeing.

  • These numbers are great, but how would I double check that they are right? Have you already got some fields in nstat/netstat or something to compare them to, at least as a sanity check? (They won't have latency totals, but they may have counts.) Just looking for a way to sanity check the output.

@jackygam2001
Copy link
Contributor Author

jackygam2001 commented Mar 16, 2022

@brendangregg ,
thanks for your comments!
For comment 2, you are right. I have checked kernel code, most of cc status change functions are called by 5 functions(tcp_fastretrans_alert, tcp_enter_cwr, tcp_enter_loss, tcp_enter_recovery, tcp_process_tlp_ack), so i will update the code in next pull request. I think @iamkafai 's advice is very good, i will try to add tracepoint for tcp_set_ca_state to upstream kernel.
For comment 3, i think the tcp_probe is for trace the connection's congestion window update, but this tool is for tracing the congestion control status change and duration. you know the cwnd's update may not cause status's change, so they are two different tools.
For comment 8, i don't use the nstat/netstat to validate the tool, as those tools has only summary for all tcp sockets. Actually i use packetdrill script which was provided by google on github to check the output of my tool.
For other comments, i will update the code in next pull request. thanks very much!

Copy link
Collaborator

@yonghong-song yonghong-song left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of inlining COULD happen for the probe functions used in this tool. So I strongly suggest you to implement the tracepoint as Martin suggested. The tool can still proceed. But with tracepoint in the kernel, we can have an alternative better implementation for future kernels.

tests/python/test_tools_smoke.py Outdated Show resolved Hide resolved
tools/tcpcong.py Outdated Show resolved Hide resolved
tools/tcpcong.py Outdated Show resolved Hide resolved
tools/tcpcong.py Outdated Show resolved Hide resolved
tools/tcpcong.py Show resolved Hide resolved
tools/tcpcong.py Outdated Show resolved Hide resolved
tools/tcpcong.py Outdated Show resolved Hide resolved
tools/tcpcong.py Outdated Show resolved Hide resolved
tools/tcpcong.py Show resolved Hide resolved
@jackygam2001
Copy link
Contributor Author

@yonghong-song ,
thanks for your comments, yes, i will prepare the tracepoint for tcp_set_ca_state patch to upstream kernel. And will update my code for other comments in next commit. thanks very much!

Copy link
Collaborator

@yonghong-song yonghong-song left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of okay. A few minor comments.

man/man8/tcpcong.8 Outdated Show resolved Hide resolved
tools/tcpcong.py Show resolved Hide resolved
tools/tcpcong.py Outdated Show resolved Hide resolved
tools/tcpcong.py Outdated Show resolved Hide resolved
tools/tcpcong.py Outdated Show resolved Hide resolved
tools/tcpcong_example.txt Outdated Show resolved Hide resolved
tools/tcpcong_example.txt Outdated Show resolved Hide resolved
tools/tcpcong_example.txt Show resolved Hide resolved
@yonghong-song yonghong-song merged commit a58c795 into iovisor:master Mar 24, 2022
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Mar 30, 2022
The congestion status of a tcp flow may be updated since there
is congestion between tcp sender and receiver. It makes sense for
adding tracepoint for congestion status update function to evaluate
the performance of network and congestion algorithm.

Link: iovisor/bcc#3899

Signed-off-by: jackygam2001 <jacky_gam_2001@163.com>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Apr 6, 2022
The congestion status of a tcp flow may be updated since there
is congestion between tcp sender and receiver. It makes sense to
add tracepoint for congestion status set function to summate cc
status duration and evaluate the performance of network
and congestion algorithm. the backgound of this patch is below.

Link: iovisor/bcc#3899

Signed-off-by: Ping Gan <jacky_gam_2001@163.com>
ammarfaizi2 pushed a commit to ammarfaizi2/linux-block that referenced this pull request Apr 8, 2022
The congestion status of a tcp flow may be updated since there
is congestion between tcp sender and receiver. It makes sense to
add tracepoint for congestion status set function to summate cc
status duration and evaluate the performance of network
and congestion algorithm. the backgound of this patch is below.

Link: iovisor/bcc#3899

Signed-off-by: Ping Gan <jacky_gam_2001@163.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220406010956.19656-1-jacky_gam_2001@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants