Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ebpf: reducing ebpf call overhead by using sampling instead of tracing every calls #823

Closed
wants to merge 3 commits into from

Conversation

rootfs
Copy link
Contributor

@rootfs rootfs commented Jul 24, 2023

fix #668

this reduces the bpf call overhead from 1352ns to 99ns

without this fix

# sysctl -w kernel.bpf_stats_enabled=1
# bpftool prog show |grep kepler_trace |awk '{print $(NF-2)/$NF}'
1352.07

with this fix

# sysctl -w kernel.bpf_stats_enabled=1
# bpftool prog show |grep kepler_trace |awk '{print $(NF-2)/$NF}'
99.0167
  • make bcc work
  • make libbpf able to set sampling rate by calling InitilizeGlobalVar
  • evaluate trade-off between overhead and prediction accuracy

@rootfs rootfs added this to the kepler-release-0.6 milestone Jul 24, 2023
@rootfs
Copy link
Contributor Author

rootfs commented Jul 24, 2023

depend on #824

Copy link
Collaborator

@marceloamaral marceloamaral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, I only concerned about the sample rate.

@@ -44,6 +44,9 @@ BPF_ARRAY(cache_miss, u64, NUM_CPUS);
// cpu freq counters
BPF_ARRAY(cpu_freq_array, u32, NUM_CPUS);

int sample_rate = 1000;
int counter = 1000;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be:

int sample_rate = SAMPLE_RATE;
int counter = SAMPLE_RATE;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libbpf is pre-compiled, we cannot use compilation flags. In this case, we have to global variable to set it. I opened #824 so I can use InitGlobalVar function

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it!
If we need to hard code now, let's use a smaller value.

@@ -71,6 +71,11 @@ BPF_ARRAY(cache_miss, u64, NUM_CPUS);
// cpu freq counters
BPF_ARRAY(cpu_freq_array, u32, NUM_CPUS);

#ifndef SAMPLE_RATE
#define SAMPLE_RATE 1000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't skipping 1000 sample to extreme?
Did you try 10 and 100?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, 10 or 100 does have some reduction from 1000ns to 300ns.

@@ -80,6 +80,7 @@ var (
BindAddressKey = "BIND_ADDRESS"
CPUArchOverride = getConfig("CPU_ARCH_OVERRIDE", "")
MaxLookupRetry = getIntConfig("MAX_LOOKUP_RETRY", defaultMaxLookupRetry)
BPFSampleRate = getIntConfig("BPF_SAMPLE_RATE", 1000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use smaller values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, let's have some stats first so users know what to choose and start from small values as default.

@rootfs
Copy link
Contributor Author

rootfs commented Jul 26, 2023

test environment

RHEL 8.6
Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz

kepler command

BPF_SAMPLE_RATE=1000 _output/bin/linux_amd64/kepler

ebpf calculation

bpftool prog show |grep kepler_trace |awk '{print $(NF-2)/$NF}'

result

sample frequency per call time (ns)
0 1239
1 797
10 300
50 152
100 129
1000 93

@marceloamaral @sunya-ch what's your recommended default sample rate?

@rootfs rootfs changed the title [WIP] ebpf: reducing ebpf call overhead by using sampling instead of tracing every calls ebpf: reducing ebpf call overhead by using sampling instead of tracing every calls Jul 26, 2023
@eklee15
Copy link

eklee15 commented Jul 26, 2023

Thanks for sharing it. This is great!

@marceloamaral
Copy link
Collaborator

@rootfs, for now, let's set the default value to 10 since we're uncertain about the consequences of skipping samples. Moreover, using 10 seems to bring significant improvements. Once we conduct further analysis, we can increasing this value.

@rootfs
Copy link
Contributor Author

rootfs commented Jul 27, 2023

@marceloamaral sure, let default to 10. One more question, when we sample the ebpf calls, should we also extrapolate the metrics (cpu time, cpu instructions, etc) as well?

@eklee15
Copy link

eklee15 commented Jul 27, 2023

Just created a discussion
#836

@rootfs
Copy link
Contributor Author

rootfs commented Jul 28, 2023

@eklee15 @marceloamaral @sunya-ch Let's disable sampling for now until we have a resolution on #836

Comment on lines +204 to +209
if (sample_counter_value > 0) {
if (*sample_counter_value > 0) {
(*sample_counter_value)--;
return 0;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New to this so please excuse if this suggestion looks stupid . Would this work?

Suggested change
if (sample_counter_value > 0) {
if (*sample_counter_value > 0) {
(*sample_counter_value)--;
return 0;
}
}
if (sample_counter_value && *sample_counter_value > 0) {
(*sample_counter_value)--;
return 0;
}

Comment on lines +277 to +276
if c == nil {
return
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be done we must expect the receiver to be initialised at all time. If it is not the case, then it is a programming error and must panic. By returning we are only shadowing a logical error.

@@ -74,6 +74,10 @@ func getProcessResUsage(process *collector_metric.ProcessMetrics, usageMetric st
// UpdateProcessComponentEnergyByRatioPowerModel calculates the process energy consumption based on the energy consumption of the container that contains all the processes
func UpdateProcessComponentEnergyByRatioPowerModel(processMetrics map[uint64]*collector_metric.ProcessMetrics, containerMetrics *collector_metric.ContainerMetrics, component, usageMetric string, wg *sync.WaitGroup) {
defer wg.Done()
if containerMetrics == nil || processMetrics == nil {
klog.V(5).Infoln("containerMetrics or processMetrics is nil")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we return an error if the arguments aren't initialised properly?

u32 next_pid = ctx->next_pid; // the new pid that is to be scheduled
=======
>>>>>>> 2c38bc2dc7b9aca85374c16c96db555f16784169
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge left over

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the libbpf module conflict is hard to merge, now my git log gets quite messy. Will open a different PR.

author Huamin Chen <hchen@redhat.com> 1690211838 -0400
committer Huamin Chen <hchen@redhat.com> 1692968086 -0400

ebpf: reducing ebpf call overhead by using sampling instead of tracing every calls

Signed-off-by: Huamin Chen <hchen@redhat.com>
author Huamin Chen <hchen@redhat.com> 1690211838 -0400
committer Huamin Chen <hchen@redhat.com> 1692968086 -0400

ebpf: reducing ebpf call overhead by using sampling instead of tracing every calls

Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
Copy link
Collaborator

@marceloamaral marceloamaral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rootfs, could you also modify the approach for dropping samples? Instead of calculating the percentage, could we implement a method that aggregates a counter and drops samples accordingly?

For instance, after collecting 99 samples, we could skip the next 1 sample, effectively skipping 1% of the total. Then, if we decide to skip 10 samples, it would translate to a 10% reduction. To achieve this, we would need a mechanism to skip 'y' samples after gathering 'x' samples. This would provide us with the flexibility to adjust the dropout rate as needed.

@rootfs
Copy link
Contributor Author

rootfs commented Sep 13, 2023

the rebase was not successful, will reopen another PR

@rootfs rootfs closed this Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

eBPF scalability improvement
4 participants