Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(bpf): added function supported by arm64 #5

Merged
merged 21 commits into from
Oct 31, 2022

Conversation

lukidzi
Copy link

@lukidzi lukidzi commented Oct 20, 2022

Looks like fexit and fenter are not supported on arm64

Running:  /kuma/ebpf/mb_netns_cleanup --bpffs /sys/fs/bpf
libbpf: prog 'net_ns_net_exit': failed to attach: ERROR: strerror_r(-524)=22
libbpf: prog 'net_ns_net_exit': failed to auto-attach: -524
attaching mb_netns_cleanup program failed with error: -524

I've also tried to run an example application with fenter/fexit and it didn't work. I've created a macro that depends on the architecture use for arm64 kretprobe and for others fexit.

bpftrace/bpftrace#1833

Signed-off-by: Łukasz Dziedziak <lukidzi@gmail.com>
Signed-off-by: Łukasz Dziedziak <lukidzi@gmail.com>
bpf/mb_netns_cleanup.c Outdated Show resolved Hide resolved
@@ -36,7 +36,12 @@ struct {
__uint(pinning, LIBBPF_PIN_BY_NAME);
} local_pod_ips SEC(".maps");

// arm64 doesn't support fexit/fenter

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you maybe add a link to explain why kretprobe is the right substitute?

SEC("fexit/net_ns_net_exit")
#endif
int BPF_PROG(net_ns_net_exit, struct net *net, long ret)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably for kprobe we need to change the type and parameters to BPF_KPROBE(net_ns_net_exit, struct net *net)

Signed-off-by: Łukasz Dziedziak <lukidzi@gmail.com>
Signed-off-by: Łukasz Dziedziak <lukidzi@gmail.com>
Signed-off-by: Łukasz Dziedziak <lukidzi@gmail.com>
Signed-off-by: Łukasz Dziedziak <lukidzi@gmail.com>
Signed-off-by: Łukasz Dziedziak <lukidzi@gmail.com>
Signed-off-by: Łukasz Dziedziak <lukidzi@gmail.com>
Signed-off-by: Łukasz Dziedziak <lukidzi@gmail.com>
Signed-off-by: Łukasz Dziedziak <lukidzi@gmail.com>
Signed-off-by: Łukasz Dziedziak <lukidzi@gmail.com>
Signed-off-by: Łukasz Dziedziak <lukidzi@gmail.com>
Signed-off-by: Łukasz Dziedziak <lukidzi@gmail.com>
Signed-off-by: Łukasz Dziedziak <lukidzi@gmail.com>
Signed-off-by: Łukasz Dziedziak <lukidzi@gmail.com>
Signed-off-by: Łukasz Dziedziak <lukidzi@gmail.com>
Signed-off-by: Łukasz Dziedziak <lukidzi@gmail.com>
Signed-off-by: Łukasz Dziedziak <lukidzi@gmail.com>
Signed-off-by: Łukasz Dziedziak <lukidzi@gmail.com>
instead of cleaning by ourself we can use LRU_HASH map
that after reaching max size the oldest element is removed

Signed-off-by: Łukasz Dziedziak <lukidzi@gmail.com>
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 1024);
__uint(type, BPF_MAP_TYPE_LRU_HASH);
__uint(max_entries, 65535);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth saying where this number is coming from

Signed-off-by: Łukasz Dziedziak <lukidzi@gmail.com>
// run 65535 unique pods, and if new one appears
// the oldest not accessed entry is going to be
// removed from the map by the kernel. This ensure
// that configuration of living pod is removed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random question could there be a problem with a pod receiving 0 traffic?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be, if pod doesn't receive traffic the only situation in which it might be removed is when we deploy > 65535 pods in the cluster

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but we're not talking about simultaneous pods right? So for example in a cluster that has batch jobs we create new pods all the time (that are short lived). Could this potentially be a problem?

This doesn't have to be fixed now as it's an edge case but might be worth at least understanding if it's possible

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that might be the edge case. 65k is the number of pods per node - not cluster. But true, if we have a lot of short-living pods their configuration is placed in the map and if we have a pod without any traffic(even healthcheck) it might cause this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants