fix(bpf): Fix issue(s) introduced with bpf refactor #1407

dave-tucker · 2024-05-07T08:36:16Z

This fixes a few issues identified with the bpf code format and refactor.

Zero initialize all variables
Use the bpf_perf_event_read_value helper exlusively
Cherry-pick of PR fix(bpf): Incorrect map size for processes and pid_time #1410 - reported by @vimalk78 - which correctly sizes the processes and pid_time maps. Without this patch entries won't be created in the processes map for pids that are > number of cpu cores which fundamentally breaks power reporting.

In addition, add some logging around eBPF array resizing.

Fixes: #1402 #1411

bpfassets/libbpf/src/kepler.bpf.c

dave-tucker · 2024-05-08T12:50:45Z

@sthaha @vimalk78 thanks for your review and comments.

I took a pass over the code and ensured that everything that needed to be zero'd, was zero'd.
In doing so I was able to fix #1411 so we now use bpf_perf_event_read_value everywhere.
In addition, I was able to verify that this solves the verifier error encountered in #1402.

Force pushed and updated commit message - I'd appreciate you both taking another look.

bpfassets/libbpf/src/kepler.bpf.c

sthaha · 2024-05-09T00:49:48Z

@dave-tucker , thanks a lot for cleaning up the code quite a bit 👼 !

bpfassets/libbpf/src/kepler.bpf.c

vimalk78 · 2024-05-09T05:31:34Z

bpfassets/libbpf/src/kepler.bpf.c

@@ -203,7 +204,8 @@ int kepler_sched_switch_trace(struct sched_switch_info *ctx)
 	u64 pid_tgid, cgroup_id, cur_ts;
 	pid_t cur_pid;

-	struct process_metrics_t *cur_pid_metrics, *prev_pid_metrics, buf;
+	struct process_metrics_t *cur_pid_metrics, *prev_pid_metrics;
+	struct process_metrics_t buf;

 	if (SAMPLE_RATE > 0) {


SAMPLE_RATE seems changed from 1 to 5. means once in every 6 task switch we are updating maps, earlier it would be every other task switch. Update in CHANGELOG.md

may be i missed this in earlier PR, but we were using SEC("tracepoint/sched/sched_switch") some time ago, and switched to SEC("kprobe/finish_task_switch") in this commit , and now we are back to using tracepoint, because API is stable. We can confirm the reason for using kprobes with @rootfs , and capture the reason in CHANGELOG.md

re: SAMPLE_RATE it's 5 in the code - but can be changed back to 0 if you prefer.
Regardless, at program load time it gets patched here to whatever the user has configured where the default of 0 remains unchanged.

and re: tracepoints, I got confirmation from one of our kernel devs that the tracepoint and kprobe/finish_task_switch are close enough together that it makes no difference to power measurements.
there was nothing I can see in the patch you linked that indicates that kprobes would be required for any reason, but we can let @rootfs confirm.

sthaha · 2024-05-12T02:48:26Z

@dave-tucker , could you please rebase this PR? There are some open comments as well which you may want to address.

This fixes a few issues identified with the bpf code format and refactor. 1. Zero initialize all variables 2. Use the bpf_perf_event_read_value helper exlusively In addition, add some logging around eBPF array resizing. Fixes: sustainable-computing-io#1402 sustainable-computing-io#1411 Signed-off-by: Dave Tucker <dave@dtucker.co.uk>

Signed-off-by: Dave Tucker <dave@dtucker.co.uk>

vimalk78

tested, works fine. `/lgtm

1583: hash  name processes  flags 0x0
	key 4B  value 112B  max_entries 32768  memlock 6034304B
	btf_id 812
	pids kepler(750518)
1584: hash  name pid_time  flags 0x0
	key 4B  value 8B  max_entries 32768  memlock 2624352B
	btf_id 812
	pids kepler(750518)

rootfs · 2024-06-02T00:05:56Z

@dave-tucker for 5.4 kernel, bpf_perf_event_read_value has an issue (discussed here)

dave-tucker force-pushed the fix-1402 branch from 9f3943a to 5caa110 Compare May 7, 2024 08:52

sthaha reviewed May 8, 2024

View reviewed changes

bpfassets/libbpf/src/kepler.bpf.c Outdated Show resolved Hide resolved

sthaha reviewed May 8, 2024

View reviewed changes

bpfassets/libbpf/src/kepler.bpf.c Show resolved Hide resolved

dave-tucker force-pushed the fix-1402 branch 2 times, most recently from ecb4e9a to d7b6775 Compare May 8, 2024 12:47

dave-tucker changed the title ~~fix(bpf): get_on_cpu_cycles not returning delta~~ fix(bpf): Fix issue introduced with bpf refactor May 8, 2024

dave-tucker mentioned this pull request May 8, 2024

No value reported by Kepler latest on OpenShift #1402

Closed

dave-tucker mentioned this pull request May 8, 2024

fix(bpf): Incorrect map size for processes and pid_time #1410

Closed

dave-tucker changed the title ~~fix(bpf): Fix issue introduced with bpf refactor~~ fix(bpf): Fix issue(s) introduced with bpf refactor May 8, 2024

sthaha reviewed May 9, 2024

View reviewed changes

bpfassets/libbpf/src/kepler.bpf.c Outdated Show resolved Hide resolved

sthaha reviewed May 9, 2024

View reviewed changes

bpfassets/libbpf/src/kepler.bpf.c Outdated Show resolved Hide resolved

vimalk78 reviewed May 9, 2024

View reviewed changes

dave-tucker added 2 commits May 13, 2024 11:05

fix(bpf): Incorrect map size for processes and pid_time

9492870

Signed-off-by: Dave Tucker <dave@dtucker.co.uk>

dave-tucker force-pushed the fix-1402 branch from d82e5af to 9492870 Compare May 13, 2024 10:05

vimalk78 approved these changes May 13, 2024

View reviewed changes

rootfs approved these changes May 13, 2024

View reviewed changes

rootfs merged commit 258f0ca into sustainable-computing-io:main May 13, 2024
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(bpf): Fix issue(s) introduced with bpf refactor #1407

fix(bpf): Fix issue(s) introduced with bpf refactor #1407

dave-tucker commented May 7, 2024 •

edited

Loading

dave-tucker commented May 8, 2024

sthaha commented May 9, 2024

vimalk78 May 9, 2024

vimalk78 May 9, 2024

dave-tucker May 9, 2024 •

edited

Loading

dave-tucker May 9, 2024

sthaha commented May 12, 2024

vimalk78 left a comment

rootfs commented Jun 2, 2024

fix(bpf): Fix issue(s) introduced with bpf refactor #1407

fix(bpf): Fix issue(s) introduced with bpf refactor #1407

Conversation

dave-tucker commented May 7, 2024 • edited Loading

dave-tucker commented May 8, 2024

sthaha commented May 9, 2024

vimalk78 May 9, 2024

Choose a reason for hiding this comment

vimalk78 May 9, 2024

Choose a reason for hiding this comment

dave-tucker May 9, 2024 • edited Loading

Choose a reason for hiding this comment

dave-tucker May 9, 2024

Choose a reason for hiding this comment

sthaha commented May 12, 2024

vimalk78 left a comment

Choose a reason for hiding this comment

rootfs commented Jun 2, 2024

dave-tucker commented May 7, 2024 •

edited

Loading

dave-tucker May 9, 2024 •

edited

Loading