Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kernel-collector doesn't start on systems with more than 128 logical CPUs #258

Open
level-a opened this issue Mar 21, 2024 · 2 comments
Open
Labels
bug Something isn't working

Comments

@level-a
Copy link

level-a commented Mar 21, 2024

What happened?

Description

kernel-collector fails to start on a system with more than 128 logical CPUs

Steps to Reproduce

run on baremetal server with 256 CPUs

Expected Result

running

Actual Result

failed with error

eBPF Collector version

v0.10.2

Environment information

Environment

OS: Debian GNU/Linux 11 (bullseye)
Kernel: 6.0.0-0.deb11.2-amd64 (with installed linux-headers-amd64)

eBPF Collector configuration

default

Log output

2024-03-21 12:23:35.575638+00:00 info [p:2611192 t:2611192] eBPF program successfully compiled
2024-03-21 12:23:36.010724+00:00 error [p:2611192 t:2611192] Exception during BPFHandler initialization, closing connection: Only up to 128 cpus are currently supported

Failed to compile eBPF code for the Linux distro 'unknown' running kernel version 6.0.0-0.deb11.2-amd64.

troubleshoot item bpf_compilation_failed (os=Linux,flavor=unknown,headers_src=unknown,kernel=6.0.0-0.deb11.2-amd64): Only up to 128 cpus are currently supported

This usually means that kernel headers weren't installed correctly.

Please reach out to support and include this log in its entirety so we can diagnose and fix
the problem.
2024-03-21 12:23:36.010872+00:00 error [p:2611192 t:2611192] troubleshoot item bpf_compilation_failed (os=Linux,flavor=unknown,headers_src=unknown,kernel=6.0.0-0.deb11.2-amd64): Only up to 128 cpus are currently supported

Additional context

most likely BPF_MAX_CPUS constant should be increased here

#define BPF_MAX_CPUS 128 // Maximum number of CPUs to support

@level-a level-a added the bug Something isn't working label Mar 21, 2024
@level-a level-a changed the title kernel-collector doesn't work on systems with more than 128 logical CPUs kernel-collector doesn't start on systems with more than 128 logical CPUs Mar 21, 2024
@yonch
Copy link
Contributor

yonch commented Apr 2, 2024

Yes there should be an artificial upper limit to the number of CPUs, that is used to allocate some static memory in the kernel collector. There is no inherent limitation to the number of CPUs that would be supported.

I think the current limitation would only manifest in:

  • the perf ring allocation when loading eBPF
  • when dequeueing events from the perf rings, iirc there is a fixed-size heap to sort incoming events

Happy to review a patch if you have the bandwidth!

cc @open-telemetry/network-maintainers if you remember anywhere else the CPU core count would manifest

@yonch
Copy link
Contributor

yonch commented Apr 9, 2024

After some further investigation:

Looking at mentions of BPF_MAX_CPUS it seems like it is used in just a few places:

collector/kernel/perf_reader.h
99:  PerfEntry entries_[BPF_MAX_CPUS];
103:  std::bitset<BPF_MAX_CPUS> readers_in_entries_;

collector/kernel/bpf_src/render_bpf.h
14:#define BPF_MAX_CPUS 128              // Maximum number of CPUs to support

collector/kernel/bpf_src/tcp-processor/bpf_types.h
108:BPF_ARRAY(bpf_log_globals_per_cpu, struct BPF_LOG_GLOBALS, BPF_MAX_CPUS);
119:  if (cpu < 0 || cpu >= BPF_MAX_CPUS) {

collector/kernel/perf_reader.cc
24:  if (readers_.size() >= BPF_MAX_CPUS)
25:    throw std::runtime_error("Only up to " _STRINGIZE(BPF_MAX_CPUS) " cpus are currently supported");
33:  if (data_readers_.size() >= BPF_MAX_CPUS)
34:    throw std::runtime_error("Only up to " _STRINGIZE(BPF_MAX_CPUS) " cpus are currently supported");

Bottom line, I think just increasing BPF_MAX_CPUS should get you better coverage. I don't see a large memory requirement or reduction in performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants