Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(bpf): Fix issue(s) introduced with bpf refactor #1407

Merged
merged 2 commits into from
May 13, 2024

Conversation

dave-tucker
Copy link
Collaborator

@dave-tucker dave-tucker commented May 7, 2024

This fixes a few issues identified with the bpf code format and refactor.

  1. Zero initialize all variables
  2. Use the bpf_perf_event_read_value helper exlusively
  3. Cherry-pick of PR fix(bpf): Incorrect map size for processes and pid_time #1410 - reported by @vimalk78 - which correctly sizes the processes and pid_time maps. Without this patch entries won't be created in the processes map for pids that are > number of cpu cores which fundamentally breaks power reporting.

In addition, add some logging around eBPF array resizing.

Fixes: #1402 #1411

@dave-tucker dave-tucker force-pushed the fix-1402 branch 2 times, most recently from ecb4e9a to d7b6775 Compare May 8, 2024 12:47
@dave-tucker dave-tucker changed the title fix(bpf): get_on_cpu_cycles not returning delta fix(bpf): Fix issue introduced with bpf refactor May 8, 2024
@dave-tucker
Copy link
Collaborator Author

@sthaha @vimalk78 thanks for your review and comments.

I took a pass over the code and ensured that everything that needed to be zero'd, was zero'd.
In doing so I was able to fix #1411 so we now use bpf_perf_event_read_value everywhere.
In addition, I was able to verify that this solves the verifier error encountered in #1402.

Force pushed and updated commit message - I'd appreciate you both taking another look.

@dave-tucker dave-tucker changed the title fix(bpf): Fix issue introduced with bpf refactor fix(bpf): Fix issue(s) introduced with bpf refactor May 8, 2024
@sthaha
Copy link
Collaborator

sthaha commented May 9, 2024

@dave-tucker , thanks a lot for cleaning up the code quite a bit 👼 !

bpfassets/libbpf/src/kepler.bpf.c Show resolved Hide resolved
@@ -203,7 +204,8 @@ int kepler_sched_switch_trace(struct sched_switch_info *ctx)
u64 pid_tgid, cgroup_id, cur_ts;
pid_t cur_pid;

struct process_metrics_t *cur_pid_metrics, *prev_pid_metrics, buf;
struct process_metrics_t *cur_pid_metrics, *prev_pid_metrics;
struct process_metrics_t buf;

if (SAMPLE_RATE > 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SAMPLE_RATE seems changed from 1 to 5. means once in every 6 task switch we are updating maps, earlier it would be every other task switch. Update in CHANGELOG.md

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be i missed this in earlier PR, but we were using SEC("tracepoint/sched/sched_switch") some time ago, and switched to SEC("kprobe/finish_task_switch") in this commit , and now we are back to using tracepoint, because API is stable. We can confirm the reason for using kprobes with @rootfs , and capture the reason in CHANGELOG.md

Copy link
Collaborator Author

@dave-tucker dave-tucker May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: SAMPLE_RATE it's 5 in the code - but can be changed back to 0 if you prefer.
Regardless, at program load time it gets patched here to whatever the user has configured where the default of 0 remains unchanged.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and re: tracepoints, I got confirmation from one of our kernel devs that the tracepoint and kprobe/finish_task_switch are close enough together that it makes no difference to power measurements.
there was nothing I can see in the patch you linked that indicates that kprobes would be required for any reason, but we can let @rootfs confirm.

@sthaha
Copy link
Collaborator

sthaha commented May 12, 2024

@dave-tucker , could you please rebase this PR? There are some open comments as well which you may want to address.

This fixes a few issues identified with the bpf code
format and refactor.

1. Zero initialize all variables
2. Use the bpf_perf_event_read_value helper exlusively

In addition, add some logging around eBPF array resizing.

Fixes: sustainable-computing-io#1402 sustainable-computing-io#1411

Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Copy link
Collaborator

@vimalk78 vimalk78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested, works fine. `/lgtm

1583: hash  name processes  flags 0x0
	key 4B  value 112B  max_entries 32768  memlock 6034304B
	btf_id 812
	pids kepler(750518)
1584: hash  name pid_time  flags 0x0
	key 4B  value 8B  max_entries 32768  memlock 2624352B
	btf_id 812
	pids kepler(750518)

@rootfs rootfs merged commit 258f0ca into sustainable-computing-io:main May 13, 2024
20 checks passed
@rootfs
Copy link
Contributor

rootfs commented Jun 2, 2024

@dave-tucker for 5.4 kernel, bpf_perf_event_read_value has an issue (discussed here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No value reported by Kepler latest on OpenShift
4 participants