Add socket size bpftrace tool#1287
Conversation
Signed-off-by: Ata FatahiBaarzi <afatahibaarzi@linkedin.com>
|
Can one of the admins verify this patch? |
|
@pixie-io-buildbot test this please |
Signed-off-by: Ata FatahiBaarzi <afatahibaarzi@linkedin.com>
|
@pixie-io-buildbot test this please |
Signed-off-by: Ata FatahiBaarzi <afatahibaarzi@linkedin.com>
Signed-off-by: Ata FatahiBaarzi <afatahibaarzi@linkedin.com>
|
@pixie-io-buildbot test this please |
|
The change looks good to me once the existing comments are addressed. |
oazizi000
left a comment
There was a problem hiding this comment.
Hey @MrAta,
Thanks so much for this contribution. Great to see another bpftrace script in our repo. Code looks great, but did have one topic to discuss:
Unlike the original script which dumps out summary stats when you hit ctrl-c, this one prints out every event. Kudos to you for doing it that way, since that's what we recommend with Pixie. :)
In this case, however, I do wonder what the volume of events will be like. Do you have any sense of the impact of the script? We may want to put some sort of warning in the text if it is going to be noisy. Also, I assume you have the px.head(100000) for the same reason...which is another sign that we should put some note of this restriction near the top of the script somewhere.
Another thought--although more advanced--is to do something hybrid: Collect the events into a bpftrace map, and then use a periodic timer to sample the maps and write them out with a printf. It'd be in more experimental territory if we try to do that, but would be potentially less noisy, and more complete data.
All the above is just food for thought, so let us know what you think. And thanks again for the contribution!
Signed-off-by: Ata FatahiBaarzi <afatahibaarzi@linkedin.com>
|
Hi @oazizi000 , As for aggregation idea, to get the same results (e.g. counts and average size), internally we use the bpftrace's stats to aggregate them: @rstats[comm, pid, tid, @rsocket[tid], @rsock[tid], @rdad[tid], @rdp[tid], @rsad[tid], @rlp[tid]] = stats(retval);I'll do some experimentation and will see if I can make it work with Pixie to collect those aggregated stats in intervals. Note that, at least for us, the most valuable info out of this tool is actually the count and average size for some workload characterization at process level which are used for deterministic workloads like distributed model training; otherwise, the rx/tx total size can be collected from other tables/tools. Therefore, even if the aggregation idea doesn't work with Pixie, the current form of this tool is extremely useful (at least for us). |
|
@pixie-io-buildbot test this please |
Summary: Add socket size bpftrace tool for socket-level network workload characterization Detail: For distributed workloads that don't use http (e.g. ML model training) it's desired to be able to do network workload characterization at socket level. This tool (adapted from "BPF Performance Tools" book by Brendan Gregg) enables socket-level profiling which provides socket requests sizes, counts, and throughput. Type of change: /kind feature Test Plan: Tested on an airgapped pixie deployment. Signed-off-by: Ata FatahiBaarzi <afatahibaarzi@linkedin.com>
Summary: Add socket size bpftrace tool for socket-level network workload characterization Detail: For distributed workloads that don't use http (e.g. ML model training) it's desired to be able to do network workload characterization at socket level. This tool (adapted from "BPF Performance Tools" book by Brendan Gregg) enables socket-level profiling which provides socket requests sizes, counts, and throughput. Type of change: /kind feature Test Plan: Tested on an airgapped pixie deployment. Signed-off-by: Ata FatahiBaarzi <afatahibaarzi@linkedin.com>
Summary: Add socket size bpftrace tool for socket-level network workload characterization
Detail: For distributed workloads that don't use http (e.g. ML model training) it's desired to be able to do network workload characterization at socket level. This tool (adapted from "BPF Performance Tools" book by Brendan Gregg) enables socket-level profiling which provides socket requests sizes, counts, and throughput.
Type of change: /kind feature
Test Plan: Tested on an airgapped pixie deployment.