Skip to content

Add socket size bpftrace tool#1287

Merged
vihangm merged 6 commits intopixie-io:mainfrom
MrAta:main
May 9, 2023
Merged

Add socket size bpftrace tool#1287
vihangm merged 6 commits intopixie-io:mainfrom
MrAta:main

Conversation

@MrAta
Copy link
Copy Markdown
Contributor

@MrAta MrAta commented May 4, 2023

Summary: Add socket size bpftrace tool for socket-level network workload characterization

Detail: For distributed workloads that don't use http (e.g. ML model training) it's desired to be able to do network workload characterization at socket level. This tool (adapted from "BPF Performance Tools" book by Brendan Gregg) enables socket-level profiling which provides socket requests sizes, counts, and throughput.

Type of change: /kind feature

Test Plan: Tested on an airgapped pixie deployment.

Signed-off-by: Ata FatahiBaarzi <afatahibaarzi@linkedin.com>
@pixie-io-buildbot
Copy link
Copy Markdown
Member

Can one of the admins verify this patch?

Signed-off-by: Ata FatahiBaarzi <afatahibaarzi@linkedin.com>
@JamesMBartlett
Copy link
Copy Markdown
Member

@pixie-io-buildbot test this please

Signed-off-by: Ata FatahiBaarzi <afatahibaarzi@linkedin.com>
@vihangm
Copy link
Copy Markdown
Member

vihangm commented May 5, 2023

@pixie-io-buildbot test this please

@vihangm vihangm requested a review from a team May 5, 2023 19:16
Signed-off-by: Ata FatahiBaarzi <afatahibaarzi@linkedin.com>
Comment thread src/pxl_scripts/bpftrace/socket_size/data.pxl
Comment thread src/pxl_scripts/bpftrace/socket_size/data.pxl Outdated
@vihangm vihangm requested review from a team, oazizi000 and zasgar May 5, 2023 20:05
Signed-off-by: Ata FatahiBaarzi <afatahibaarzi@linkedin.com>
@MrAta MrAta requested a review from vihangm May 5, 2023 20:31
@vihangm
Copy link
Copy Markdown
Member

vihangm commented May 5, 2023

@pixie-io-buildbot test this please

@ddelnano
Copy link
Copy Markdown
Member

ddelnano commented May 5, 2023

The change looks good to me once the existing comments are addressed.

Copy link
Copy Markdown
Contributor

@oazizi000 oazizi000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @MrAta,

Thanks so much for this contribution. Great to see another bpftrace script in our repo. Code looks great, but did have one topic to discuss:

Unlike the original script which dumps out summary stats when you hit ctrl-c, this one prints out every event. Kudos to you for doing it that way, since that's what we recommend with Pixie. :)

In this case, however, I do wonder what the volume of events will be like. Do you have any sense of the impact of the script? We may want to put some sort of warning in the text if it is going to be noisy. Also, I assume you have the px.head(100000) for the same reason...which is another sign that we should put some note of this restriction near the top of the script somewhere.

Another thought--although more advanced--is to do something hybrid: Collect the events into a bpftrace map, and then use a periodic timer to sample the maps and write them out with a printf. It'd be in more experimental territory if we try to do that, but would be potentially less noisy, and more complete data.

All the above is just food for thought, so let us know what you think. And thanks again for the contribution!

Comment thread src/pxl_scripts/bpftrace/socket_size/data.pxl Outdated
Comment thread src/pxl_scripts/bpftrace/socket_size/data.pxl Outdated
Signed-off-by: Ata FatahiBaarzi <afatahibaarzi@linkedin.com>
@MrAta
Copy link
Copy Markdown
Contributor Author

MrAta commented May 5, 2023

Hi @oazizi000 ,
Thanks for your thoughts;
Yes, you are totally right that it can generate a high volume of profiling data (mainly because that's how Pixie recommends). For now, I reduced the duration further to 1m and added a warning on running the tool.

As for aggregation idea, to get the same results (e.g. counts and average size), internally we use the bpftrace's stats to aggregate them:

@rstats[comm, pid, tid, @rsocket[tid], @rsock[tid], @rdad[tid], @rdp[tid], @rsad[tid], @rlp[tid]] = stats(retval);

I'll do some experimentation and will see if I can make it work with Pixie to collect those aggregated stats in intervals.

Note that, at least for us, the most valuable info out of this tool is actually the count and average size for some workload characterization at process level which are used for deterministic workloads like distributed model training; otherwise, the rx/tx total size can be collected from other tables/tools. Therefore, even if the aggregation idea doesn't work with Pixie, the current form of this tool is extremely useful (at least for us).

@MrAta MrAta requested a review from oazizi000 May 5, 2023 23:38
@vihangm
Copy link
Copy Markdown
Member

vihangm commented May 8, 2023

@pixie-io-buildbot test this please

Copy link
Copy Markdown
Contributor

@oazizi000 oazizi000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tweaks @MrAta. Looks good to me.

@vihangm vihangm merged commit a52b009 into pixie-io:main May 9, 2023
ddelnano pushed a commit to k8sstormcenter/pixie that referenced this pull request Feb 25, 2026
Summary: Add socket size bpftrace tool for socket-level network workload
characterization

Detail: For distributed workloads that don't use http (e.g. ML model
training) it's desired to be able to do network workload
characterization at socket level. This tool (adapted from "BPF
Performance Tools" book by Brendan Gregg) enables socket-level profiling
which provides socket requests sizes, counts, and throughput.

Type of change: /kind feature

Test Plan: Tested on an airgapped pixie deployment.

Signed-off-by: Ata FatahiBaarzi <afatahibaarzi@linkedin.com>
ddelnano pushed a commit to k8sstormcenter/pixie that referenced this pull request Feb 25, 2026
Summary: Add socket size bpftrace tool for socket-level network workload
characterization

Detail: For distributed workloads that don't use http (e.g. ML model
training) it's desired to be able to do network workload
characterization at socket level. This tool (adapted from "BPF
Performance Tools" book by Brendan Gregg) enables socket-level profiling
which provides socket requests sizes, counts, and throughput.

Type of change: /kind feature

Test Plan: Tested on an airgapped pixie deployment.

Signed-off-by: Ata FatahiBaarzi <afatahibaarzi@linkedin.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants