Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move from BCC to libbpf with CO-RE #6027

Closed
wants to merge 1 commit into from

Conversation

eyakubovich
Copy link
Contributor

No description provided.

Copy link
Contributor

@russjones russjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eyakubovich Did a first past, will continue tomorrow.

Makefile Outdated Show resolved Hide resolved
Makefile Show resolved Hide resolved
bpf/command.bpf.c Show resolved Hide resolved

#define ARGSIZE 128

char LICENSE[] SEC("license") = "Dual BSD/GPL";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be Apache License Version 2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it has the same license requirements as a kernel module: https://elixir.bootlin.com/linux/v5.8/source/include/linux/module.h#L182

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klizhentas Is Dual BSD/GPL okay?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you tell me more about how we distribute it? Are we using dual licensed components linking from here and deriving license of this file from those? How are we going to distribute the program?

go.mod Outdated
@@ -12,6 +12,8 @@ require (
github.com/alecthomas/colour v0.1.0 // indirect
github.com/alecthomas/repr v0.0.0-20200325044227-4184120f674c // indirect
github.com/alecthomas/units v0.0.0-20210208195552-ff826a37aa15 // indirect
github.com/aquasecurity/tracee v0.5.1-0.20210316063448-ec34648b7aff // indirect
github.com/aquasecurity/tracee/libbpfgo v0.0.0-20210316063448-ec34648b7aff
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why you didn't go with https://github.com/cilium/ebpf which appears to be pure Go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was concerned that this project wouldn't properly/fully reimplement libbpf. There's a lot of low-level fiddling going on there and it seemed more trustworthy to use a library that's maintained by the kernel devs. For example, I need CO-RE but it's unclear how much of it is implemented in there as this issue is still open: cilium/ebpf#114

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I've noticed you linked to an issue on our repo. The goal of cilium/ebpf is to eventually be at feature parity with libbpf, as libbpf is considered to be the reference implementation for lack of an actual spec. As you've noticed, CO-RE support is not there yet, but there are a few incoming contributions that seek to improve the situation. As to whether we're trustworthy or not, well.. that's entirely up to you. 😄 Note that libbpfgo (the wrapper) is not maintained by kernel devs. If you do happen to give our library a try (outside of CO-RE), please let us know what you find!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're still evaluating which library to use both now and long-term. cilium/ebpf would be a good target once eBPF development slows down and things solidify more. libbpfgo wrapper is not maintained by the kernel devs but the heavy lifting is done by libbpf. We will definitely be keeping an eye on cilium/ebpf development.

@eyakubovich eyakubovich force-pushed the ey/bpf-co-re branch 3 times, most recently from ee4a087 to 30edc41 Compare March 22, 2021 21:18
if err != nil {
return nil, trace.Wrap(err)
}
s.open, err = startOpen(closeContext, *config.DiskBufferSize)
Copy link
Contributor

@russjones russjones Mar 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the ring buffer differ from the old perf buffer. If I recall correctly were were passing in buffer size because so many disk events are generated we would drop most of them with default sized perf buffer.

https://www.kernel.org/doc/html/latest/bpf/ringbuf.html

Seems to indicate you can size it, am I reading it wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can size it. The easiest is to do that in the .c file, which is what I currently do now: https://github.com/gravitational/teleport/pull/6027/files#diff-ef9be70e923b4d943486324770b32a29edfc00f6c35d8cdceda249482644dbb7R45
I size them with the defaults that teleport used.

What happens is that this size info is actually read by libbpf in userspace and it creates the ring buffer with the size info (it basically looks for "maps" section, parses them and issues bpf syscalls to create them. However I just found that there's bpf_map__set_max_entries that should be able to adjust this during the initialization. I'll give it a try.

Of course moving cgroup filtering into the kernel should help a lot with reducing the number of events.

lsPath, err := os_exec.LookPath("ls")
c.Assert(err, check.IsNil)
// Find "ls" binary.
lsPath, err := os_exec.LookPath("ls")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not about this particular line, but we use cgroups to correlate BPF events and Teleport identities. However, when these tests run on CI, they will be running within a container so they won't be able to create cgroups.

Any ideas on how we can test BPF functionality within a container?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you run the container as privileged (in Docker semantics). I think those have enough permissions to create cgroups. From the kernel POV, cgroups nest so it should be possible with some set of permissions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to. @webvictim is this an option for us with Drone?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can run privileged containers in Drone.

func (e *exec) close() {
for _, perfMap := range e.perfMaps {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the function comment back and explain that ring buffer does not need to be closed because it's closed as part of module close.

// attachProbe will attach a kprobe to the given function name.
func attachProbe(module *bcc.Module, eventName string, functionName string) error {
kprobe, err := module.LoadKprobe(functionName)
func LoadBPF(code []byte, name string) (*libbpfgo.Module, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function comment missing on public functions here and elsewhere.

return mod, nil
}

// attaches both kprobe and kretprobe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go comments (here and elsewhere) start with the function name, so "AttachKprobe attaches kprobe and kretprobe."

// attachRetProbe will attach a kretprobe to the given function name.
func attachRetProbe(module *bcc.Module, eventName string, functionName string) error {
kretprobe, err := module.LoadKprobe(functionName)
func AttachTracepoint(mod *libbpfgo.Module, category, name string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here (and elsewhere) needs a function comment that explains what this function does. Something along the lines of:

// AttachTracepoint attaches tracepoint to BPF program. Passed in parameters
// "category" and "name" are used to construct the tracepoint path:
// /sys/kernel/debug/tracing/events/{category}/{name}.
//
// For more details, see https://www.kernel.org/doc/html/v5.8/trace/events.html.

}

func AttachSyscallTracepoint(mod *libbpfgo.Module, syscall string) error {
if err := AttachTracepoint(mod, "syscalls", "sys_enter_" + syscall); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere, can you make things like "syscalls" constants?

return eventCh, lostCh, nil
func (rb *RingBuffer) Stop() {
rb.buf.Stop()
// don't Close rb.buf as it'll be closed as part of Module.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this comment to function description, so:

// Stop will stop the ring buffer. Note that the underlying ring buffer is not
// closed because it's closed as part of Module.Close().

u64 cgroup;
};

BPF_RING_BUF(execve_events, 4096*8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will become configurable after your other PR lands right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes


static int __submit_arg(void *ptr, struct data_t *data)
{
bpf_probe_read(data->argv, sizeof(data->argv), ptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here (and elsewhere), looks like bpf_probe_read is being split into bpf_probe_read_kernel and bpf_probe_read_user, I'd change this to bpf_probe_read_user like execsnoop.

bpftrace/bpftrace#614
https://github.com/iovisor/bcc/blob/master/libbpf-tools/execsnoop.bpf.c#L76

struct data_t data = {};
struct task_struct *task;

data.pid = bpf_get_current_pid_tgid() >> 32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is about how we report PID and TGID.

I'm looking at execsnoop and others and they now report both the PID and TGID instead of just reporting TGID like we were doing before.

@eyakubovich @klizhentas I'm thinking it's probably better for visibility to have both and PID and TGID be reported and switch the PID value to true PID in our events. What do you two think?

https://github.com/iovisor/bcc/blob/master/libbpf-tools/execsnoop.bpf.c#L46-L57

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should just report TGID (userspace PID). That's what an average user would expect. Unless you're debugging something, which thread performed an action is usually not interesting.

Comment on lines 84 to 88
// Some kernels, like Ubuntu 4.13.0-generic, return 0
// as the real_parent->tgid.
// We use the getPpid function as a fallback in those cases.
// See https://github.com/iovisor/bcc/issues/1883.
data.ppid = get_ppid(task);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eyakubovich Why use get_ppid here but BPF_CORE_READ directly below? I think we can remove the wrapper and also drop the comment, we we won't be supporting kernels 5.8 and below right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how it happened. You're right about both.


// skip first arg, as we submitted filename
#pragma unroll
for (int i = 1; i < 20; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make 20 a constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I can also remove #pragma unroll -- it's no longer needed.

bpf_get_current_comm(&data.comm, sizeof(data.comm));
data.type = EVENT_ARG;

__submit_arg((void *)filename, &data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do a little refactoring around __submit_arg and submit_arg and some comments around what the difference between is and why they're split. Will help future readers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere I wasn't trying to do much refactoring or improving. Mostly just enough to get it ported to libbpf reliably. Technically with a ring buffer, this split is no longer needed. We can use bpf_ring_reserve to allocate the memory in the ring buffer, write out all the args there and then use bpf_ring_submit to send it. The only downside is that it'll require a double scan -- first to compute the strlen of all the args and then to copy them. But that's no worse than it is now -- it first copies it into the data_t and then into the ring buffer.

SEC("tp/syscalls/sys_execve")
int tracepoint__syscalls__sys_enter_execve(struct trace_event_raw_sys_enter *tp)
{
const char *filename = (const char *)tp->args[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question, but here (and elsewhere) we're guaranteed to always be able to index into args like this right? No need to do a length check here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so (unless there's a kernel bug). The length of the args will match the number of args to the syscall.

bpf/disk.bpf.c Outdated
int flags;
};

BPF_HASH(infotmp, u64, struct val_t, 8192);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here (and in network.bpf.c) can you make the hash size a constant and explain why you picked the size?

Copy link
Contributor Author

@eyakubovich eyakubovich Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's what bcc picked as the default size :)


#define ARGSIZE 128

char LICENSE[] SEC("license") = "Dual BSD/GPL";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klizhentas Is Dual BSD/GPL okay?

@eyakubovich eyakubovich force-pushed the ey/bpf-co-re branch 5 times, most recently from 7ecaaca to 398d9b2 Compare April 6, 2021 02:33
@eyakubovich eyakubovich force-pushed the ey/bpf-co-re branch 4 times, most recently from 353177b to ee574b2 Compare April 10, 2021 20:27
BPF_TAG := bpf
BPF_MESSAGE := "with BPF support"
CLANG ?= $(shell which clang || which clang-10)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CO-RE requires Clang 10+ so that's what buildbox installs. Here we look for regular clang or clang-10. This is not ideal for folks building from source outside of buildbox. However it would require a more elaborate search to look for a working clang.
Same logic is also used below for clang-format and llvm-strip

@russjones
Copy link
Contributor

Merged in #7098.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants