Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for TC programs #2376

Merged
merged 8 commits into from
Feb 3, 2024

Conversation

mauriciovasquezbernal
Copy link
Member

@mauriciovasquezbernal mauriciovasquezbernal commented Jan 17, 2024

This PR implements support for TC programs in inspector gadget.

The main component of this PR is a new tchandler package that handles how TC programs are attached to containers and network interfaces. (very similar to the existing pkg/gadgets/internal/networktracer one). Programs are attached by using a clsact qdisc and filter in direct-action mode (check https://qmonnet.github.io/whirl-offload/2020/04/11/tc-bpf-direct-action/ to learn more).

For containers, the programs are attached to the peer interface on the host, this guarantees that processes running inside the container can't detach them from the interface. We might considering adding support for attaching these in the container side later on.

Gadgets with multiple TC programs, multiple gadgets and multiple instances of Inspektor Gadget running at the same time are supported. This is done by using a different handle each time a program is attached to the qdisc.

How to test

Define a TC program, for instance, this one is a very simple firewall: (It's not supported to fill the ips map yet, to this only works to print messages to trace_pipe)

#include <stdint.h>
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/swab.h>
#include <linux/pkt_cls.h>
#include <bpf/bpf_endian.h>

char __license[] SEC("license") = "GPL";

struct {
	__uint(type, BPF_MAP_TYPE_HASH);
	__uint(max_entries, 1024);
	__type(key, __u32); // ipv4
	__type(value, __u8); // not used
} ips SEC(".maps");

SEC("classifier/egress/drop")
int egress_drop(struct __sk_buff *skb) {
	//bpf_printk("Packet on egress");

	void *data_end = (void *)(unsigned long long)skb->data_end;
	void *data = (void *)(unsigned long long)skb->data;

	// Check if the packet is not malformed
	struct ethhdr *eth = data;
	if (data + sizeof(struct ethhdr) > data_end)
		return TC_ACT_SHOT;

	// Check that this is an IP packet
	if (bpf_ntohs(eth->h_proto) != ETH_P_IP)
		return TC_ACT_UNSPEC;

	// Check if the packet is not malformed
	struct iphdr *ip = data + sizeof(struct ethhdr);
	if (data + sizeof(struct ethhdr) + sizeof(struct iphdr) > data_end)
		return TC_ACT_SHOT;

	bpf_printk("egress: src:%pi4 -> dst: %pi4", &ip->saddr, &ip->daddr);

	// Drop packet if in map
	__u8 *dummy = bpf_map_lookup_elem(&ips, &ip->daddr);
	if (dummy) {
		bpf_printk("Dropping in egress");
		return TC_ACT_SHOT;
	}

	return TC_ACT_UNSPEC;
}

SEC("classifier/ingress/drop")
int ingress_drop(struct __sk_buff *skb) {
	//bpf_printk("Packet on ingress");

	void *data_end = (void *)(unsigned long long)skb->data_end;
	void *data = (void *)(unsigned long long)skb->data;

	// Check if the packet is not malformed
	struct ethhdr *eth = data;
	if (data + sizeof(struct ethhdr) > data_end)
		return TC_ACT_SHOT;

	// Check that this is an IP packet
	if (bpf_ntohs(eth->h_proto) != ETH_P_IP)
		return TC_ACT_UNSPEC;

	// Check if the packet is not malformed
	struct iphdr *ip = data + sizeof(struct ethhdr);
	if (data + sizeof(struct ethhdr) + sizeof(struct iphdr) > data_end)
		return TC_ACT_SHOT;

	bpf_printk("ingress: src:%pi4 -> dst: %pi4", &ip->saddr, &ip->daddr);

	// Drop packet if in map
	__u8 *dummy = bpf_map_lookup_elem(&ips, &ip->saddr);
	if (dummy)
		return TC_ACT_SHOT;

	return TC_ACT_UNSPEC;
}

Build the gadget:

$ sudo -E ig image build . -t mytcgadget

And run it:

$ sudo -E ig run mytcgadget --iface enp5s0 
INFO[0000] Experimental features enabled                
INFO[0000] Running. Press Ctrl + C to finish

check trace_pipe to see the debug messages:

$ sudo cat /sys/kernel/debug/tracing/trace_pipe
          <idle>-0       [010] d.s.. 32963.221506: bpf_trace_printk: comm: wget, pid: 105845
          <idle>-0       [010] d.s.. 32963.227058: bpf_trace_printk: egress: src:192.178.050.068 -> dst: 172.017.000.004
          <idle>-0       [010] d.s.. 32963.227062: bpf_trace_printk: comm: wget, pid: 105845
          <idle>-0       [010] d.s.. 32963.227071: bpf_trace_printk: egress: src:172.017.000.004 -> dst: 192.178.050.068
          <idle>-0       [010] d.s.. 32963.227072: bpf_trace_printk: comm: wget, pid: 105845
          <idle>-0       [010] d.s.. 32963.227944: bpf_trace_printk: egress: src:192.178.050.068 -> dst: 172.017.000.004
          <idle>-0       [010] d.s.. 32963.227947: bpf_trace_printk: comm: wget, pid: 105845
          <idle>-0       [010] d.s.. 32963.227952: bpf_trace_printk: egress: src:172.017.000.004 -> dst: 192.178.050.068
          <idle>-0       [010] d.s.. 32963.227953: bpf_trace_printk: comm: wget, pid: 105845

Fixes #2369

TODO:

  • Tests
  • Documentation
  • How to handle multiple TC gadgets at the same time
  • How to handle multiple instances of ig at the same time
  • How to handle collisions with other applications using TC programs (clsact qdisc and filters)
  • Understand how the --iface parameter should behave when there are containers present

@alban
Copy link
Member

alban commented Jan 18, 2024

Should there be a way to attach the same tc program to the network interfaces of several containers, like we do for the socket filters?

Should we have a way to attach the tc program to a non-root qdisc? Maybe the handle and the parent should be a parameter, defaulting to the root handle as you've done?

@alban
Copy link
Member

alban commented Jan 18, 2024

How to handle multiple TC gadgets at the same time

Do you mean a gadget containing several conflicting tc programs?

How to handle multiple instances of ig at the same time
How to handle collisions with other applications using TC programs (clsact qdisc and filters)

I would do it the same way the iptables and tc tools do: just do what's requested by the user and allow newer instances to overwrite the setup done by a previous instance.

Comment on lines 367 to 393
case ebpf.SchedCLS:
parts := strings.Split(p.SectionName, "/")
if len(parts) != 3 {
return nil, fmt.Errorf("invalid section name %q", p.SectionName)
}
if parts[0] != "classifier" {
return nil, fmt.Errorf("invalid section name %q", p.SectionName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the documentation, we will need to say we don't support tc ebpf programs that uses tc's ELF convention. For examples, those programs won't work:
https://github.com/iproute2/iproute2/tree/main/examples/bpf

And I guess the documentation should say which ELF convention we support instead...

@mauriciovasquezbernal
Copy link
Member Author

Should there be a way to attach the same tc program to the network interfaces of several containers, like we do for the socket filters?

I implement that in the last update. The support is almost identical to the socket programs. I implement it as as separated package, but I'll check if this is possible to consolidate on the same module.

Should we have a way to attach the tc program to a non-root qdisc? Maybe the handle and the parent should be a parameter, defaulting to the root handle as you've done?

I'm learning more about traffic control in linux. We're using the clsact for attaching these programs, and AFAIU it's only possible to have a one of them per interface. Do you know if that's right?

Do you mean a gadget containing several conflicting tc programs?

I was thinking more about multiple instances of the same gadget or even different gadgets. I'm not sure if a gadget with multiple programs is a valid use case, perhaps there it's better to have a single program that is attached and then use tail calls to invoke other programs.

I would do it the same way the iptables and tc tools do: just do what's requested by the user and allow newer instances to overwrite the setup done by a previous instance.

I agree, however I still need to better understand how it works to write down the documentation.

@mauriciovasquezbernal mauriciovasquezbernal force-pushed the mauricio/experiments/support-tc branch 3 times, most recently from 63f2ed9 to 7e4b883 Compare January 23, 2024 20:57
@mauriciovasquezbernal
Copy link
Member Author

After some more investigation I found out that it's actually possible to attach multiple eBPF programs to the same TC hook (qdisc). The thing is that the programs have to return TC_ACT_UNSPEC to let other programs process the packet, if they return TC_ACT_OK other programs are skipped.

I implemented a simple logic that loops using a different handle until one available is found. I limited that to the first 128 handles, but we can discuss this approach once this PR is more mature.

@mauriciovasquezbernal mauriciovasquezbernal force-pushed the mauricio/experiments/support-tc branch 6 times, most recently from ebd6893 to e32ddb0 Compare January 26, 2024 14:10
@mauriciovasquezbernal mauriciovasquezbernal marked this pull request as ready for review January 26, 2024 14:47
Copy link
Member

@eiffel-fl eiffel-fl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will take another big picture look and will test it, but for now here are some comments:

pkg/runtime/local/local.go Show resolved Hide resolved
pkg/gadgets/internal/tchandler/tc.go Outdated Show resolved Hide resolved
pkg/gadgets/internal/tchandler/tc.go Outdated Show resolved Hide resolved
pkg/gadgets/internal/tchandler/tc.go Outdated Show resolved Hide resolved
pkg/gadgets/internal/tchandler/tc.go Outdated Show resolved Hide resolved
pkg/gadgets/run/types/metadata.go Show resolved Hide resolved
pkg/gadgets/run/types/metadata.go Outdated Show resolved Hide resolved
Key: "iface",
Description: "Network interface to attach to",
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

pkg/gadgets/run/tracer/tracer.go Outdated Show resolved Hide resolved
@@ -0,0 +1,75 @@
// Copyright 2024 The Inspektor Gadget authors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but this is still an integration test.
We should aim for more unit tests for run and we can still use real gadgets as integration ones.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a specific idea of how this test could be implement as a unit one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this particular one but I would rather use a real gadget as integration one here.
Otherwise, if we start to have "real gadgets" and "integration test ones" this will begin to be a real mess to maintain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With image-based gadgets Inspektor Gadget is becoming more a framework than a collection of tools, hence we need to make sure the framework is working fine. I think we should be testing the framework part (the ability to pull, load, attach, confiture, etc.. ebpf programs.) without using real gadgets. The reasons are:

  • It's possible that don't have gadgets that use all features of the framework. The TC support on this PR is one example of that.
  • It's easier to understand if an issue is on the framework rather than in the gadget.
  • Tests are easier to implement as we don't depend on the gadget's logic.
  • By using mock gadgets we could be able to test more corner cases than by using real ones.
  • It can happen that we'll move the "official gadgets" to an independent repository later on, so using them to test will add an "external dependency"

I agree we need more unit tests, but these ones are needed as well.

#2301 is following this discussion.

pkg/gadgets/internal/tchandler/tracer.go Outdated Show resolved Hide resolved
pkg/gadgets/internal/tchandler/tc.go Outdated Show resolved Hide resolved
pkg/gadgets/internal/tchandler/tc.go Outdated Show resolved Hide resolved
pkg/container-utils/containerutils.go Outdated Show resolved Hide resolved
@mauriciovasquezbernal mauriciovasquezbernal force-pushed the mauricio/experiments/support-tc branch 2 times, most recently from a58f43f to 8298c78 Compare January 30, 2024 13:50
Copy link
Member

@blanquicet blanquicet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. I'll try it tomorrow.

pkg/gadgets/internal/tchandler/tracer.go Outdated Show resolved Hide resolved
return nil, fmt.Errorf("invalid filter type")
}

for handle := uint32(0x1); handle < 128; handle++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to make it a const.

pkg/gadgets/internal/tchandler/tracer.go Show resolved Hide resolved
pkg/container-utils/containerutils.go Outdated Show resolved Hide resolved
pkg/gadgets/internal/tchandler/tc.go Outdated Show resolved Hide resolved
pkg/gadgets/internal/tchandler/tracer.go Show resolved Hide resolved
pkg/gadgets/internal/tchandler/tracer.go Outdated Show resolved Hide resolved
pkg/gadgets/internal/tchandler/tracer.go Outdated Show resolved Hide resolved
pkg/gadgets/internal/tchandler/tracer.go Outdated Show resolved Hide resolved
pkg/gadgets/internal/tchandler/tc.go Outdated Show resolved Hide resolved
Copy link
Member

@eiffel-fl eiffel-fl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it and it works fine:

...
            wget-25601   [003] ..s2.  5108.850198: bpf_trace_printk: egress: src:010.010.000.003 -> dst: 188.114.096.002
            wget-25601   [003] ..s2.  5108.851160: bpf_trace_printk: egress: src:010.010.000.003 -> dst: 188.114.096.002
...

I have some comments though I will approve it but I would like another approval before it being merged.

pkg/gadgets/internal/tchandler/tc.go Outdated Show resolved Hide resolved
// In this case we use a priority of 1 and ETH_P_ALL as protocol.
// See https://github.com/iproute2/iproute2/blob/f443565f8df65e7d3b3e7cb5f4e94aec1e12d067/tc/tc_filter.c#L147
// https://github.com/iproute2/iproute2/blob/f443565f8df65e7d3b3e7cb5f4e94aec1e12d067/tc/tc_filter.c#L68
filterInfo = uint32(0x00010300) // priority (1) << 16 | proto (htons(ETH_P_ALL))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you mimic how it is defined in the code you linked? Something like this:

Suggested change
filterInfo = uint32(0x00010300) // priority (1) << 16 | proto (htons(ETH_P_ALL))
proto = /* ... */
filterInfo = uint32(1 << 16) | proto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to make it a const because the compiler complains when I use htons. I changed it a bit to be more clear without using htons.

}
}()

// Keep in sync with tail_call map in bpf/dispatcher.bpf.c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be a const?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's just fine to have it there. It's not used anywhere else.

@mauriciovasquezbernal mauriciovasquezbernal force-pushed the mauricio/experiments/support-tc branch 2 times, most recently from 5baf374 to f46317a Compare February 1, 2024 21:54
There are gadgets that don't provide any information, like networking
ones. Then the parser could be nil.

Signed-off-by: Mauricio Vásquez <mauriciov@microsoft.com>
This will be useful for the support of networking gadgets to be
implemented in the next commits.

Signed-off-by: Mauricio Vásquez <mauriciov@microsoft.com>
Add method to get network interface of a container on the host. It'll be
used by other commits to attach TC programs to it.

Signed-off-by: Mauricio Vásquez <mauriciov@microsoft.com>
Copy link
Member

@blanquicet blanquicet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the PR is ready to go after managing these comments.

pkg/gadgets/internal/tchandler/tracer.go Outdated Show resolved Hide resolved
pkg/gadgets/run/tracer/tracer.go Show resolved Hide resolved
pkg/gadgets/run/tracer/tracer.go Show resolved Hide resolved
pkg/gadgets/run/tracer/tracer.go Show resolved Hide resolved
The tchandler packet handles how TC programs (only SchedCLS for now)
are attached to containers and network interfaces.

The logic is similar to the existing pkg/gadgets/internal/networktracer
for socket programs, however there are some differences:

1. It's possible to attach TC programs in ingress or egress, hence
the "sense" of attachment has to be passed around.
2. This package is designed only for image-based gadgets, hence it
implement support for polling events from the ebpf program.

It could be possible to unify these two components, but it's not so easy
to do right now, hence this is future work once the built-in gadgets
are removed.

Signed-off-by: Mauricio Vásquez <mauriciov@microsoft.com>
Some gadgets could require parameters that are not related to eBPF, like
the networking interface that a TC program should be attached to.
This commit adds a new GadgetParameters section in the metadata file.

TODO: add tests

Signed-off-by: Mauricio Vásquez <mauriciov@microsoft.com>
This commit adds support for TC (more precisely SchedCLS) programs.
These programs can be used to modify the content of the packets or just
to extract information from them.

Signed-off-by: Mauricio Vásquez <mauriciov@microsoft.com>
Signed-off-by: Mauricio Vásquez <mauriciov@microsoft.com>
This is one example of how we can implement tests for Inspektor Gadget
that don't use a real gadget.

Signed-off-by: Mauricio Vásquez <mauriciov@microsoft.com>
@mauriciovasquezbernal mauriciovasquezbernal merged commit 5bba230 into main Feb 3, 2024
56 checks passed
@mauriciovasquezbernal mauriciovasquezbernal deleted the mauricio/experiments/support-tc branch February 3, 2024 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

image-based gadgets: Implement support for TC programs
4 participants