Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/uprobetracer: use kfilefields to identify inodes #2669

Merged
merged 1 commit into from
Apr 3, 2024

Conversation

i-Pear
Copy link
Contributor

@i-Pear i-Pear commented Mar 31, 2024

pkg/uprobetracer: use kfilefields to identify inodes

This patch introduced private_inode field to pkg kfilefields,
and it's then used in uprobetracer to get an unique inode-ID for each fd.

Testing

Run many containers sharing the same image, and see if the records are duplicate.
Run many containers not sharing the same image, and see if any record is missing.
Run programs by docker exec, try multi-process applications, and see if any record is missing.

@i-Pear i-Pear changed the title pkg/uprobetracer: use ReadFInodeFromFd from kfilefields to identify inodes [WIP] pkg/uprobetracer: use ReadFInodeFromFd from kfilefields to identify inodes Mar 31, 2024
@i-Pear i-Pear force-pushed the kfilefield branch 2 times, most recently from c86d00c to 5ade22d Compare April 1, 2024 05:03
pkg/kfilefields/bpf/filefields.bpf.c Outdated Show resolved Hide resolved
Comment on lines 61 to 63
func getInodeUUID(file *os.File) (uint64, error) {
token, err := kfilefields.ReadFInodeFromFd(int(file.Fd()))
priv, err := kfilefields.ReadPrivateDataFromFd(int(file.Fd()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since getInodeUUID() can be called several times for a single gadget, it would make sense to optimize kfilefields.ReadFInodeFromFd to create the kfilefield ebpf program only once. Maybe with:

func ReadFInodesFromsFd(fd []int) ([]uint64, error)

(this could be done in a future PR)

@i-Pear i-Pear changed the title [WIP] pkg/uprobetracer: use ReadFInodeFromFd from kfilefields to identify inodes [WIP] pkg/uprobetracer: use kfilefields to identify inodes Apr 3, 2024
@i-Pear i-Pear marked this pull request as ready for review April 3, 2024 03:42
@i-Pear i-Pear requested a review from alban April 3, 2024 03:43
@i-Pear i-Pear changed the title [WIP] pkg/uprobetracer: use kfilefields to identify inodes pkg/uprobetracer: use kfilefields to identify inodes Apr 3, 2024
@alban
Copy link
Member

alban commented Apr 3, 2024

I use the following to test it:

package main

import (
	"fmt"
	"os"

	"github.com/inspektor-gadget/inspektor-gadget/pkg/kfilefields"
)

func main() {
	for _, filename := range os.Args[1:] {
		file, err := os.Open(filename)
		if err != nil {
			panic(err)
		}
		pd, err := kfilefields.ReadPrivateDataFromFd(int(file.Fd()))
		if err != nil {
			panic(err)
		}
		pi, err := kfilefields.ReadPrivateInodeFromFd(int(file.Fd()))
		if err != nil {
			panic(err)
		}
		fmt.Printf("%v\n", filename)
		fmt.Printf("PrivateData: %v\nPrivateInode: %v\n", pd, pi)
		runtime.KeepAlive(file)
	}
}

Build it:

go build -o kfilefield ./main.go

Prepare 2 busybox containers:

docker run -ti --rm -v /:/host --name busybox1 busybox
docker run -ti --rm -v /:/host --name busybox2 busybox

Get the pids of the 2 containers:

export BUSYBOX1=$(docker inspect busybox1|jq '.[0].State.Pid')
export BUSYBOX2=$(docker inspect busybox2|jq '.[0].State.Pid')

A couple of tests on /bin/sh:

$ sudo ./kfilefield /proc/$BUSYBOX1/root/bin/sh /proc/$BUSYBOX2/root/bin/sh
/proc/2517303/root/bin/sh
PrivateData: 18446612684266798848
PrivateInode: 18446612699345928680
/proc/2517475/root/bin/sh
PrivateData: 18446612684261101312
PrivateInode: 18446612699345928680
$ sudo ./kfilefield /proc/$BUSYBOX1/root/bin/sh /proc/$BUSYBOX2/root/bin/sh
/proc/2517303/root/bin/sh
PrivateData: 18446612682768175872
PrivateInode: 12884901889
/proc/2517475/root/bin/sh
PrivateData: 18446612683637488896
PrivateInode: 18446612699345928680

The results are not consistent. So I don't think it works fine.

Comment on lines 69 to 72
ff->private_data = (u64)BPF_CORE_READ(ret, private_data);
ff->f_op = (u64)BPF_CORE_READ(ret, f_op);
private_file = (struct file *)ff->private_data;
ff->private_inode = (u64)BPF_CORE_READ(private_file, f_inode);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(struct file *)->private_data is casted to a struct file * but this is only valid if the filesystem is overlay.

We can check if it is the case with (struct file *)->i_sb->s_type->name == "overlay".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better check without string comparisons:

--- a/pkg/kfilefields/bpf/filefields.bpf.c
+++ b/pkg/kfilefields/bpf/filefields.bpf.c
@@ -4,6 +4,11 @@
 #include <bpf/bpf_core_read.h>
 #include <bpf/bpf_tracing.h>
 
+// include/uapi/linux/magic.h
+#ifndef OVERLAYFS_SUPER_MAGIC
+#define OVERLAYFS_SUPER_MAGIC   0x794c7630
+#endif
+
 // configured by userspace
 const volatile u64 socket_ino = 0;
 
@@ -52,6 +57,7 @@ int BPF_KRETPROBE(ig_fget_x, struct file *ret)
        u64 current_pid_tgid;
        struct file_fields *ff;
        struct file *private_file;
+       struct file *real_file;
        int zero = 0;
 
        if (tracer_pid_tgid == 0)
@@ -68,8 +74,12 @@ int BPF_KRETPROBE(ig_fget_x, struct file *ret)
 
        ff->private_data = (u64)BPF_CORE_READ(ret, private_data);
        ff->f_op = (u64)BPF_CORE_READ(ret, f_op);
-       private_file = (struct file *)ff->private_data;
-       ff->private_inode = (u64)BPF_CORE_READ(private_file, f_inode);
+
+       real_file = ret;
+       if (BPF_CORE_READ(ret, f_inode, i_sb, s_magic) == OVERLAYFS_SUPER_MAGIC) {
+               real_file = (struct file *) ff->private_data;
+       }
+       ff->private_inode = (u64)BPF_CORE_READ(real_file, f_inode);
 
        // Initialize private_data for only one execution of __scm_send
        tracer_pid_tgid = 0;

@i-Pear
Copy link
Contributor Author

i-Pear commented Apr 3, 2024

The results are not consistent. So I don't think it works fine.

Yes, I can reproduce. At most of the time it works, but sometimes the PrivateInode is zero, or to be a short value :(

@alban
Copy link
Member

alban commented Apr 3, 2024

The results are not consistent. So I don't think it works fine.

Yes, I can reproduce. At most of the time it works, but sometimes the PrivateInode is zero, or to be a short value :(

The randomness was due to a bug in my test program main.go. I added runtime.KeepAlive(file) (I updated my previous comment) to ensure the fd is not closed and recycled to something else. That fixed the randomness.

@i-Pear
Copy link
Contributor Author

i-Pear commented Apr 3, 2024

OK, I've learned once again that if a Golang program encounters sporadic randomness issues, the first thing to check is the garbage collector.

@i-Pear i-Pear force-pushed the kfilefield branch 2 times, most recently from ca69d87 to 103eeab Compare April 3, 2024 15:46
@i-Pear i-Pear requested a review from alban April 3, 2024 15:46
@alban
Copy link
Member

alban commented Apr 3, 2024

It needs make clang-format and make ebpf-objects.

@i-Pear
Copy link
Contributor Author

i-Pear commented Apr 3, 2024

It needs make clang-format and make ebpf-objects.

Done, thanks.

pkg/kfilefields/kfilefields.go Outdated Show resolved Hide resolved
pkg/uprobetracer/tracer.go Outdated Show resolved Hide resolved
Comment on lines 95 to 102
// keeps the fd and refCount for each inodeUUID
inodeRefCount map[inodeUUID]*inodeKeeper
inodeRefCount map[uint64]*inodeKeeper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to rename inodeUUID to realInodePtr everywhere. What do you think?

And the comment before inodeRefCount should briefly explain why the real inode ptr is used instead of just the inode number, and why we need to care about getting the underlying file under overlayfs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean using realInodePtr as a new type, and replace all uint64?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, uint64 is fine for me. I meant in the comment above and elsewhere in the file.

pkg/uprobetracer/tracer.go Outdated Show resolved Hide resolved
// ReadRealInodeFromFd uses ebpf to read the f_inode pointer from the
// kernel "struct file" associated with the given fd.
// Specifically, if fd belongs to overlayFS, it will return the underlying, real inode.
func ReadRealInodeFromFd(fd int) (uint64, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could ReadRealInodeFromFd be tested in an unit test? We would not test whether the pointer has the correct value, but rather that the comparison between two realInodePtr is correct. The test would need to setup two temporary overlay filesystems with the same lower layer.

This could be in a separate PR, so this PR is not delayed too much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a unittest for kfilefields, if the testing framework could support this, I'd like to do so.

@i-Pear i-Pear force-pushed the kfilefield branch 3 times, most recently from 0fd19d9 to 9554911 Compare April 3, 2024 17:46
@i-Pear i-Pear requested a review from alban April 3, 2024 17:49
pkg/uprobetracer/tracer.go Outdated Show resolved Hide resolved
Signed-off-by: Tianyi Liu <i.pear@outlook.com>
Signed-off-by: Alban Crequy <albancrequy@linux.microsoft.com>
Copy link
Member

@alban alban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM from code review. I haven't tested it yet. Let's see the results of the CI first.

@alban
Copy link
Member

alban commented Apr 3, 2024

I prepared the following workload:

$ cat main.c 
#include <stdlib.h>
#include <unistd.h>

int main() {
  for (;;) {
    sleep(2);
    void *ptr = malloc(1);
    sleep(1);
    free(ptr);
  }
  return 0;
}
$ cat Dockerfile 
FROM ubuntu
COPY main /bin/main
$ cat Makefile 
all:
	gcc -Wall -o main ./main.c

docker:
	docker build -t malloc .

I started the workload on the host filesystem:

$ docker run -ti --rm -v /:/host --name c1 ubuntu
# /host/home/alban/test/malloc/main
$ docker run -ti --rm -v /:/host --name c2 ubuntu
# /host/home/alban/test/malloc/main

And traced with and without filtering options:

sudo -E ./ig run trace_malloc
sudo -E ./ig run trace_malloc -c c1
sudo -E ./ig run trace_malloc -c c2

The container filtering and enrichment worked fine and there are no duplicate events.

Then, I tried the workload again but on the overlay filesystem:

$ docker run -ti --rm -v /:/host --name c1 malloc /bin/main
$ docker run -ti --rm -v /:/host --name c2 malloc /bin/main
$ sudo -E ./ig run trace_malloc
$ sudo -E ./ig run trace_malloc -c c1
$ sudo -E ./ig run trace_malloc -c c2

It worked fined as well.

@alban alban merged commit f5d4a91 into inspektor-gadget:main Apr 3, 2024
60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants