Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upPID filtering issues when running bpf script inside container #1329
Comments
This comment has been minimized.
This comment has been minimized.
Wow, this is indeed a pretty serious problem. As a stop-gap, do we really need a kernel helper for the PID namespace? I think @brendangregg recently added a feature to runqlat to filter by PID namespace by traversing So perhaps we could come up with a general
It has the disadvantage of changing every tool that uses pid filtering to fetch the PID namespace id and pass it into the BPF program text. It also doesn't take into account the possibility that we have a |
This comment has been minimized.
This comment has been minimized.
@goldshtn Thanks for the information. The proposed scheme below
is not working. The reason is that: However, from current pid namespace, we should be able to enumerate all processes (it may similar to what "ps" is doing). I have not studied it thoroughly yet. If a particular <childnsid, childpid> is found, that means the condition is hit. Since the loop is involved, this cannot be implemented in bpf program. Also, it is inefficient. So a kernel helper may be more appropriate. |
This comment has been minimized.
This comment has been minimized.
Oops. Yeah, I had a bit of a mind block there, focused on the namespace and forgot the original problem was with the pid itself. :-( |
This comment has been minimized.
This comment has been minimized.
Hi, Here are the diffs against Kernel 4.13 diff -uN linux/linux-4.13/kernel/bpf/core.c ebpf-backports/new-bcc-helpers/linux-4.13/kernel/bpf/core.c
*/ |
This comment has been minimized.
This comment has been minimized.
On Mon, Sep 11, 2017 at 11:30 AM, Carlos Neira ***@***.***> wrote:
Hi,
I have implemented helper bpf_get_current_ns_info(void* buf, int size) as
was proposed on the dev list, that could help on this issue.
Here are the diffs against Kernel 4.13
diff -uN linux/linux-4.13/kernel/bpf/core.c
ebpf-backports/new-bcc-helpers/linux-4.13/kernel/bpf/core.c
--- linux/linux-4.13/kernel/bpf/core.c 2017-09-03 13:56:17.000000000 -0700
+++ ebpf-backports/new-bcc-helpers/linux-4.13/kernel/bpf/core.c 2017-09-11
04:25:04.200417393 -0700
@@ -1379,6 +1379,9 @@
const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
const struct bpf_func_proto bpf_get_current_comm_proto __weak;
+const struct bpf_func_proto bpf_get_current_ns_info __weak;
+
+
const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
{
return NULL;
diff -uN linux/linux-4.13/kernel/bpf/helpers.c
ebpf-backports/new-bcc-helpers/linux-4.13/kernel/bpf/helpers.c
--- linux/linux-4.13/kernel/bpf/helpers.c 2017-09-03 13:56:17.000000000
-0700
+++ ebpf-backports/new-bcc-helpers/linux-4.13/kernel/bpf/helpers.c
2017-09-11 06:23:55.329880482 -0700
@@ -18,6 +18,7 @@
#include <linux/sched.h>
#include <linux/uidgid.h>
#include <linux/filter.h>
+#include <linux/pid_namespace.h>
/* If kernel subsystem is allowing eBPF programs to call this function,
* inside its own verifier_ops->get_func_proto() callback it should return
@@ -177,5 +178,51 @@
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_UNINIT_MEM,
+ .arg2_type = ARG_CONST_SIZE,
+};
+
+BPF_CALL_2(bpf_get_current_ns_info, void *, buf, u32, size)
+
There are several different kinds of ns_info, we may need to change the name
to bpf_get_current_pidns_info.
{
+ struct task_struct ts = current;
+ struct task_struct ns_task = NULL;
+ const struct cred cred = NULL;
+ pid_t pid;
+
+ if (unlikely(!ts))
+ goto err_clear;
You should check the "size" is equal to the size of structure
bpf_current_ns_info so that you do not overrun the memory.
+
+ ((struct bpf_current_ns_info)buf)->ns_id =
+ ts->nsproxy->pid_ns_for_children->ns.inum;
What is this ns.inum corresponding to if you access the process inside
the container?
+
+ pid = task_pid_nr_ns(ts,
+ ts->nsproxy->pid_ns_for_children);
+
+ ns_task = find_task_by_pid_ns(pid,
+ ts->nsproxy->pid_ns_for_children);
+
+ if (unlikely(!ns_task))
+ goto err_clear;
+
+ ((struct bpf_current_ns_info)buf)->tgid = ns_task->tgid;
+
+ cred = get_task_cred(ns_task);
+
+ if (unlikely(!cred))
+ goto err_clear;
+
+ ((struct bpf_current_ns_info)buf)->gid = cred->gid.val;
Sorry, my typo. It should be "pid".
Have you checked the above pid and tgid are what you got from inside
the container?
In some past linux mailing list discussion, it is suggested that
device major/minor number should
be also presented to user (in the context networking namespace) since
inode MAY not be unique
across different devices... This may need to be resolved in netdev discussion.
For the complete patch, you need to write some examples as well, under
linux:samples/bpf and/or
linux:tools/testing/selftests/bpf.
Also, it would be good if you have linux coding style as it is
enforced in upstream submission.
… +
+ return 0;
+
+err_clear:
+ memset(buf, 0, size);
+ return -EINVAL;
+}
+
+const struct bpf_func_proto bpf_get_current_ns_info_proto = {
+ .func = bpf_get_current_ns_info,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_UNINIT_MEM,
.arg2_type = ARG_CONST_SIZE,
};
--- linux/linux-4.13/include/linux/bpf.h 2017-09-03 13:56:17.000000000 -0700
+++ ebpf-backports/new-bcc-helpers/linux-4.13/include/linux/bpf.h 2017-09-11
04:36:30.460969799 -0700
@@ -226,6 +226,12 @@
struct file map_file;
struct rcu_head rcu;
};
+/ struct used by helper bpf_get_current_ns_info /
+struct bpf_current_ns_info {
+ u64 ns_id; /namespace id/
+ u32 tgid; /tgid inside namespace/
+ u32 gid; /gid inside namespace/
+};
u64 bpf_tail_call(u64 ctx, u64 r2, u64 index, u64 r4, u64 r5);
u64 bpf_get_stackid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
@@ -375,6 +381,9 @@
extern const struct bpf_func_proto bpf_skb_vlan_pop_proto;
extern const struct bpf_func_proto bpf_get_stackid_proto;
+
+extern const struct bpf_func_proto bpf_get_current_ns_info_proto;
+
/ Shared helpers among cBPF and eBPF. */
void bpf_user_rnd_init_once(void);
u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
--- linux/linux-4.13/include/uapi/linux/bpf.h 2017-09-03 13:56:17.000000000
-0700
+++ ebpf-backports/new-bcc-helpers/linux-4.13/include/uapi/linux/bpf.h
2017-09-11 04:32:08.127055536 -0700
@@ -539,6 +539,15 @@
* @mode: operation mode (enum bpf_adj_room_mode)
* @flags: reserved for future use
* Return: 0 on success or negative error code
int bpf_get_current_ns_info(void *buf, int size_of_buf)
* stores the following namespace data into
* bpf_current_ns_info struct:
* namespace id
* tgid inside namespace
* gid inside namespace
* Return: 0 on success or negative error
*/
#define __BPF_FUNC_MAPPER(FN)
FN(unspec),
@@ -591,7 +600,9 @@
FN(get_socket_uid),
FN(set_hash),
FN(setsockopt),
- FN(skb_adjust_room),
+ FN(skb_adjust_room),
+ FN(get_current_ns_info),
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
This comment has been minimized.
This comment has been minimized.
Just tested the code last night, and it did not work. Now I have fixed it and tested in a container, it's finally working except for the the device major and minor info. Here is the updated code for helper get_current_pidns_info BPF_CALL_2(bpf_get_current_pidns_info, void *, buf, u32, size)
{
struct task_struct *ts = current;
struct pid_namespace *pidns = NULL;
pid_t pid = 0;
pid_t tgid = 0;
int res = 0;
const char *ppath = "/proc/self/ns/pid";
mm_segment_t oldsegfs;
struct kstat stat;
if (unlikely(!ts))
return -EINVAL;
pidns = task_active_pid_ns(ts);
if (unlikely(!pidns))
return -EINVAL;
((struct bpf_current_pidns_info*)buf)->ns_id = (u64) pidns->ns.inum;
pid = task_pid_nr_ns(ts, pidns);
if (unlikely(!pid))
return -EINVAL;
tgid = task_tgid_nr_ns(ts, pidns);
if (unlikely(!tgid))
return -EINVAL;
((struct bpf_current_pidns_info*)buf)->tgid = (s32)tgid;
((struct bpf_current_pidns_info*)buf)->pid = (s32) pid;
oldsegfs = get_fs();
set_fs(KERNEL_DS);
res = vfs_stat((const char __user*)ppath, &stat);
set_fs(oldsegfs);
if(unlikely(res))
return -EINVAL;
((struct bpf_current_pidns_info*)buf)->major = (u32) MAJOR(stat.dev);
((struct bpf_current_pidns_info*)buf)->minor = (u32) MINOR(stat.dev);
return 0;
}
const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
.func = bpf_get_current_pidns_info,
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_RAW_STACK,
.arg2_type = ARG_CONST_STACK_SIZE,
}; And a BCC test program #!/usr/bin/python
from bcc import BPF
from os import getpid
import sys, os
devinfo= os.lstat('/proc/self/ns/pid');
# Get major and minor device number
# define BPF program
prog = """
struct nsinfo_t {
u64 ns;
s32 pid;
s32 tgid;
s32 major;
s32 minor;
};
BPF_HASH(nsdata,struct nsinfo_t);
int collect_ns(void *ctx) {
struct nsinfo_t ns = {};
u64 zero=0, *val;
bpf_get_current_pidns_info(&ns,sizeof (ns));
val = nsdata.lookup_or_init(&ns,&zero);
(*val)++;
return 0;
}
"""
# load BPF program
b = BPF(text=prog)
b.attach_kprobe(event="sys_read", fn_name="collect_ns")
fo = open("./test.py", "r")
minor = os.minor(devinfo.st_dev)
major = os.major(devinfo.st_dev)
pid = getpid();
pidns = os.readlink('/proc/%d/ns/pid' % pid).split('[')[-1][:-1]
print ('STARTING: user space pid %s ns %s major %s minor %s' % (pid, pidns, major , minor))
#exit()
while 1:
try:
fo.read(20)
nsdata = b.get_table('nsdata');
for k, v in nsdata.items():
if k.pid == pid :
print ( 'OK : reading matched pid %s ns %s major %s minor %s' % (k.pid, k.ns, k.major, k.minor ))
exit()
else:
print ( 'MAP: pid %s and ns %s' % (k.pid, k.ns))
except ValueError:
continue For testing I created a container using lxd, and copy bcc and kernel modules needed to it, the output of this test inside the container is : root@test:~# ./test.py
STARTING: user space pid 403 ns 4026532211 major 0 minor 49
MAP: pid 1569 and ns 4026531836
MAP: pid 935 and ns 4026531836
MAP: pid 1596 and ns 4026531836
OK : reading matched pid 403 ns 4026532211 major 0 minor 3 it also works in root namespace neirac@debian-dev:~$ sudo ./test2.py
STARTING: user space pid 1708 ns 4026531836 major 0 minor 4
MAP: pid 1553 and ns 4026531836
OK : reading matched pid 1708 ns 4026531836 major 0 minor 3 But I still I have the following doubts
Here is the full patch for kernel version 4.9.115 From 781744241180594ae17f4edaf14f071beacfc76b Mon Sep 17 00:00:00 2001
From: neirac <neirac@krondor>
Date: Fri, 27 Jul 2018 14:54:04 -0400
Subject: [PATCH] pid ns info changes
---
include/linux/bpf.h | 9 ++++
include/uapi/linux/bpf.h | 7 +++
kernel/bpf/core.c | 2 +
kernel/bpf/helpers.c | 59 ++++++++++++++++++++++++
kernel/trace/bpf_trace.c | 3 ++
samples/bpf/Makefile | 4 ++
samples/bpf/bpf_helpers.h | 2 +
samples/bpf/trace_ns_info_user.c | 25 +++++++++++
samples/bpf/trace_ns_info_user_kern.c | 85 +++++++++++++++++++++++++++++++++++
9 files changed, 196 insertions(+)
create mode 100644 samples/bpf/trace_ns_info_user.c
create mode 100644 samples/bpf/trace_ns_info_user_kern.c
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 7995940..5bef82b 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -222,6 +222,14 @@ struct bpf_event_entry {
struct rcu_head rcu;
};
+struct bpf_current_pidns_info {
+ u64 ns_id;
+ s32 tgid;
+ s32 pid;
+ u32 major;
+ u32 minor;
+};
+
u64 bpf_tail_call(u64 ctx, u64 r2, u64 index, u64 r4, u64 r5);
u64 bpf_get_stackid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
@@ -339,6 +347,7 @@ extern const struct bpf_func_proto bpf_get_current_comm_proto;
extern const struct bpf_func_proto bpf_skb_vlan_push_proto;
extern const struct bpf_func_proto bpf_skb_vlan_pop_proto;
extern const struct bpf_func_proto bpf_get_stackid_proto;
+extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
/* Shared helpers among cBPF and eBPF. */
void bpf_user_rnd_init_once(void);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f09c70b..9063cc3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -232,6 +232,13 @@ enum bpf_func_id {
*/
BPF_FUNC_get_current_comm,
+ /**
+ * bpf_get_current_pidns_info(void *buf, int size_of_buf)
+ * stores ns,pid,tgid,major,minor from current into buf
+ * Return: 0 on success
+ */
+ BPF_FUNC_get_current_pidns_info,
+
/**
* bpf_get_cgroup_classid(skb) - retrieve a proc's classid
* @skb: pointer to skb
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 879ca84..04c06d5 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1067,6 +1067,8 @@ const struct bpf_func_proto bpf_get_current_pid_tgid_proto __weak;
const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
const struct bpf_func_proto bpf_get_current_comm_proto __weak;
+
+const struct bpf_func_proto bpf_get_current_pidns_info __weak;
const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
{
return NULL;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 3991840..66d1f6c 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -17,6 +17,9 @@
#include <linux/sched.h>
#include <linux/uidgid.h>
#include <linux/filter.h>
+#include <linux/pid_namespace.h>
+#include <linux/major.h>
+#include <linux/stat.h>
/* If kernel subsystem is allowing eBPF programs to call this function,
* inside its own verifier_ops->get_func_proto() callback it should return
@@ -167,3 +170,59 @@ const struct bpf_func_proto bpf_get_current_comm_proto = {
.arg1_type = ARG_PTR_TO_RAW_STACK,
.arg2_type = ARG_CONST_STACK_SIZE,
};
+BPF_CALL_2(bpf_get_current_pidns_info, void *, buf, u32, size)
+{
+ struct task_struct *ts = current;
+ struct pid_namespace *pidns = NULL;
+ pid_t pid = 0;
+ pid_t tgid = 0;
+ int res = 0;
+ const char *ppath = "/proc/self/ns/pid";
+ mm_segment_t oldsegfs;
+ struct kstat stat;
+
+ if (unlikely(!ts))
+ return -EINVAL;
+
+ pidns = task_active_pid_ns(ts);
+
+ if (unlikely(!pidns))
+ return -EINVAL;
+
+ ((struct bpf_current_pidns_info*)buf)->ns_id = (u64) pidns->ns.inum;
+
+ pid = task_pid_nr_ns(ts, pidns);
+
+ if (unlikely(!pid))
+ return -EINVAL;
+
+ tgid = task_tgid_nr_ns(ts, pidns);
+
+ if (unlikely(!tgid))
+ return -EINVAL;
+
+ ((struct bpf_current_pidns_info*)buf)->tgid = (s32)tgid;
+ ((struct bpf_current_pidns_info*)buf)->pid = (s32) pid;
+
+ oldsegfs = get_fs();
+ set_fs(KERNEL_DS);
+ res = vfs_stat((const char __user*)ppath, &stat);
+ set_fs(oldsegfs);
+
+ if(unlikely(res))
+ return -EINVAL;
+
+ ((struct bpf_current_pidns_info*)buf)->major = (u32) MAJOR(stat.dev);
+ ((struct bpf_current_pidns_info*)buf)->minor = (u32) MINOR(stat.dev);
+
+ return 0;
+}
+
+
+const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
+ .func = bpf_get_current_pidns_info,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_RAW_STACK,
+ .arg2_type = ARG_CONST_STACK_SIZE,
+};
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 41805fb..333b993 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -456,6 +456,9 @@ static const struct bpf_func_proto *tracing_func_proto(enum bpf_func_id func_id)
return &bpf_current_task_under_cgroup_proto;
case BPF_FUNC_get_prandom_u32:
return &bpf_get_prandom_u32_proto;
+
+ case BPF_FUNC_get_current_pidns_info:
+ return &bpf_get_current_pidns_info_proto;
default:
return NULL;
}
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 72c5867..5383bd2 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -28,6 +28,7 @@ hostprogs-y += test_current_task_under_cgroup
hostprogs-y += trace_event
hostprogs-y += sampleip
hostprogs-y += tc_l2_redirect
+hostprogs-y += trace_ns_info
test_verifier-objs := test_verifier.o libbpf.o
test_maps-objs := test_maps.o libbpf.o
@@ -58,6 +59,7 @@ test_current_task_under_cgroup-objs := bpf_load.o libbpf.o \
trace_event-objs := bpf_load.o libbpf.o trace_event_user.o
sampleip-objs := bpf_load.o libbpf.o sampleip_user.o
tc_l2_redirect-objs := bpf_load.o libbpf.o tc_l2_redirect_user.o
+trace_ns_info-objs := bpf_load.o libbpf.o trace_ns_info_user.o
# Tell kbuild to always build the programs
always := $(hostprogs-y)
@@ -88,6 +90,7 @@ always += xdp2_kern.o
always += test_current_task_under_cgroup_kern.o
always += trace_event_kern.o
always += sampleip_kern.o
+always += trace_ns_info_user_kern.o
HOSTCFLAGS += -I$(objtree)/usr/include
@@ -115,6 +118,7 @@ HOSTLOADLIBES_test_current_task_under_cgroup += -lelf
HOSTLOADLIBES_trace_event += -lelf
HOSTLOADLIBES_sampleip += -lelf
HOSTLOADLIBES_tc_l2_redirect += -l elf
+HOSTLOADLIBES_trace_ns_info += -lelf
# Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
# make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h
index dadd516..c425730 100644
--- a/samples/bpf/bpf_helpers.h
+++ b/samples/bpf/bpf_helpers.h
@@ -57,6 +57,8 @@ static int (*bpf_skb_set_tunnel_opt)(void *ctx, void *md, int size) =
(void *) BPF_FUNC_skb_set_tunnel_opt;
static unsigned long long (*bpf_get_prandom_u32)(void) =
(void *) BPF_FUNC_get_prandom_u32;
+static int (*bpf_get_current_pidns_info)(void *buf, int buf_size) =
+ (void *) BPF_FUNC_get_current_pidns_info;
/* llvm builtin functions that eBPF C program may use to
* emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/samples/bpf/trace_ns_info_user.c b/samples/bpf/trace_ns_info_user.c
new file mode 100644
index 0000000..78660d9
--- /dev/null
+++ b/samples/bpf/trace_ns_info_user.c
@@ -0,0 +1,25 @@
+#include <stdio.h>
+#include <linux/bpf.h>
+#include <unistd.h>
+#include "libbpf.h"
+#include "bpf_load.h"
+
+int main(int ac, char **argv)
+{
+ FILE *f;
+ char filename[256];
+
+ snprintf(filename, sizeof(filename), "%s_user_kern.o", argv[0]);
+ printf("loading %s\n",filename);
+
+
+ if (load_bpf_file(filename)) {
+ printf("%s", bpf_log_buf);
+ return 1;
+ }
+
+ f = popen("taskset 1 ping localhost", "r");
+ (void) f;
+ read_trace_pipe();
+ return 0;
+}
diff --git a/samples/bpf/trace_ns_info_user_kern.c b/samples/bpf/trace_ns_info_user_kern.c
new file mode 100644
index 0000000..5d2438e
--- /dev/null
+++ b/samples/bpf/trace_ns_info_user_kern.c
@@ -0,0 +1,85 @@
+/* Copyright (c) 2017 cneirabustos@gmail.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <linux/version.h>
+#include <uapi/linux/bpf.h>
+#include "bpf_helpers.h"
+
+struct pidns_info {
+ u64 ns_id;
+ s32 ns_tgid;
+ s32 ns_pid;
+ u32 major;
+ u32 minor;
+};
+struct ns {
+ u32 major;
+ u32 minor;
+};
+
+
+struct bpf_map_def SEC("maps") nsmap = {
+ .type = BPF_MAP_TYPE_HASH,
+ .key_size = sizeof(struct ns),
+ .value_size = sizeof(struct ns),
+ .max_entries = 10,
+};
+
+
+/* kprobe is NOT a stable ABI
+ * kernel functions can be removed, renamed or completely change semantics.
+ * Number of arguments and their positions can change, etc.
+ * In such case this bpf+kprobe example will no longer be meaningful
+ */
+SEC("kprobe/__netif_receive_skb_core")
+int bpf_prog1(struct pt_regs *ctx)
+{
+ struct pidns_info nsinfo;
+ struct ns ns;
+ long init_val = 1;
+ long *value;
+ int ok=0;
+
+ ok = bpf_get_current_pidns_info(&nsinfo,sizeof (nsinfo));
+
+ if(ok == 0)
+ {
+ ns.major = (u32) nsinfo.major;
+ ns.minor = (u32) nsinfo.minor;
+
+ char fmt[] = "ns_id %lld tgid %ld pid %ld \n";
+
+ bpf_trace_printk(fmt, sizeof(fmt), (long)nsinfo.ns_id,
+ (long) nsinfo.ns_tgid, (long)nsinfo.ns_pid);
+
+ char fmtm[] = "major %ld minor %ld \n";
+
+ bpf_trace_printk(fmtm, sizeof(fmtm), (long)nsinfo.major,
+ (long)nsinfo.minor);
+
+ value = bpf_map_lookup_elem(&nsmap, &ns);
+
+ if (value)
+ {
+ char fmtm[] = "already in map major %ld minor %ld\n";
+ bpf_trace_printk(fmtm, sizeof(fmtm), (long)nsinfo.major,
+ (long)nsinfo.minor);
+ } else
+ {
+ char fmtm[] = "adding to map major %ld minor %ld\n";
+ bpf_trace_printk(fmtm, sizeof(fmtm), (long)nsinfo.major,
+ (long)nsinfo.minor);
+ bpf_map_update_elem(&nsmap, &ns, &init_val, BPF_ANY);
+ }
+ }
+
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+u32 _version SEC("version") = LINUX_VERSION_CODE;
--
2.11.0 Bests |
This comment has been minimized.
This comment has been minimized.
Thanks, Carlos,
The overall looks good. Some detailed comments below.
On Fri, Jul 27, 2018 at 12:46 PM, Carlos Neira ***@***.***> wrote:
Hi All,
Just tested the code last night, and it did not work. Now I have fixed it
and tested in a container, it's finally working except for the the device
major and minor info.
Here is the updated code for helper get_current_pidns_info
BPF_CALL_2(bpf_get_current_pidns_info, void *, buf, u32, size)
{
struct task_struct *ts = current;
struct pid_namespace *pidns = NULL;
pid_t pid = 0;
pid_t tgid = 0;
int res = 0;
const char *ppath = "/proc/self/ns/pid";
mm_segment_t oldsegfs;
struct kstat stat;
if (unlikely(!ts))
return -EINVAL;
pidns = task_active_pid_ns(ts);
if (unlikely(!pidns))
return -EINVAL;
((struct bpf_current_pidns_info*)buf)->ns_id = (u64) pidns->ns.inum;
pid = task_pid_nr_ns(ts, pidns);
if (unlikely(!pid))
return -EINVAL;
tgid = task_tgid_nr_ns(ts, pidns);
if (unlikely(!tgid))
return -EINVAL;
((struct bpf_current_pidns_info*)buf)->tgid = (s32)tgid;
((struct bpf_current_pidns_info*)buf)->pid = (s32) pid;
oldsegfs = get_fs();
set_fs(KERNEL_DS);
res = vfs_stat((const char __user*)ppath, &stat);
set_fs(oldsegfs);
if(unlikely(res))
return -EINVAL;
((struct bpf_current_pidns_info*)buf)->major = (u32) MAJOR(stat.dev);
((struct bpf_current_pidns_info*)buf)->minor = (u32) MINOR(stat.dev);
return 0;
}
const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
.func = bpf_get_current_pidns_info,
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_RAW_STACK,
.arg2_type = ARG_CONST_STACK_SIZE,
};
And a BCC test program
#!/usr/bin/python
from bcc import BPFfrom os import getpidimport sys, os
devinfo= os.lstat('/proc/self/ns/pid');# Get major and minor device number# define BPF program
prog = """struct nsinfo_t { u64 ns; s32 pid; s32 tgid; s32 major; s32 minor;};BPF_HASH(nsdata,struct nsinfo_t);int collect_ns(void *ctx) { struct nsinfo_t ns = {}; u64 zero=0, *val; bpf_get_current_pidns_info(&ns,sizeof (ns)); val = nsdata.lookup_or_init(&ns,&zero); (*val)++; return 0;}"""
# load BPF program
b = BPF(text=prog)
b.attach_kprobe(event="sys_read", fn_name="collect_ns")
fo = open("./test.py", "r")
minor = os.minor(devinfo.st_dev)
major = os.major(devinfo.st_dev)
pid = getpid();
pidns = os.readlink('/proc/%d/ns/pid' % pid).split('[')[-1][:-1]
print ('STARTING: user space pid %s ns %s major %s minor %s' % (pid, pidns, major , minor))
#exit()while 1:
try:
fo.read(20)
nsdata = b.get_table('nsdata');
for k, v in nsdata.items():
if k.pid == pid :
print ( 'OK : reading matched pid %s ns %s major %s minor %s' % (k.pid, k.ns, k.major, k.minor ))
exit()
else:
print ( 'MAP: pid %s and ns %s' % (k.pid, k.ns))
except ValueError:
continue
For testing I created a container using lxd, and copy bcc and kernel
modules needed to it, the output of this test inside the container is :
***@***.***:~# ./test.py
STARTING: user space pid 403 ns 4026532211 major 0 minor 49
MAP: pid 1569 and ns 4026531836
MAP: pid 935 and ns 4026531836
MAP: pid 1596 and ns 4026531836
OK : reading matched pid 403 ns 4026532211 major 0 minor 3
it also works in root namespace
***@***.***:~$ sudo ./test2.py
STARTING: user space pid 1708 ns 4026531836 major 0 minor 4
MAP: pid 1553 and ns 4026531836
OK : reading matched pid 1708 ns 4026531836 major 0 minor 3
But I still I have the following doubts
- Should I be reading /proc/self/ns/pid using vfst_stat?.
I am not sure either. But we can have this in the first version. The
community should be
able to provide a better solution if there is a better interface.
- I'm using the Major/Minor macros but it seems it's not working Am I
pointing to the wrong dev_t ?.
Could you take a look at the stat syscall implementation in kernel?
fs/stat.c:
SYSCALL_DEFINE2(stat, const char __user *, filename,
struct __old_kernel_stat __user *, statbuf)
{
struct kstat stat;
int error;
error = vfs_stat(filename, &stat);
if (error)
return error;
return cp_old_stat(&stat, statbuf);
}
Especially, take a look at function cp_old_stat. There will be some
information about what major/minor
numbers are copied.
Please do provide enough context why this helper is helpful and what kind
of problems it
tries to solve in the commit message. This is very important.
The patch is against a very old kernel and this makes trying the patch a
little bit difficult.
Could you re-apply your patch on top of the latest bpf-next or net-next?
This way, people
can easily apply the patch and give a try.
Here is the full patch
From 781744241180594ae17f4edaf14f071beacfc76b Mon Sep 17 00:00:00 2001
From: neirac ***@***.***>
Date: Fri, 27 Jul 2018 14:54:04 -0400
Subject: [PATCH] pid ns info changes
---
include/linux/bpf.h | 9 ++++
include/uapi/linux/bpf.h | 7 +++
kernel/bpf/core.c | 2 +
kernel/bpf/helpers.c | 59 ++++++++++++++++++++++++
kernel/trace/bpf_trace.c | 3 ++
samples/bpf/Makefile | 4 ++
samples/bpf/bpf_helpers.h | 2 +
samples/bpf/trace_ns_info_user.c | 25 +++++++++++
samples/bpf/trace_ns_info_user_kern.c | 85 +++++++++++++++++++++++++++++++++++
9 files changed, 196 insertions(+)
create mode 100644 samples/bpf/trace_ns_info_user.c
create mode 100644 samples/bpf/trace_ns_info_user_kern.c
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 7995940..5bef82b 100644--- a/include/linux/bpf.h+++ b/include/linux/bpf.h@@ -222,6 +222,14 @@ struct bpf_event_entry {
struct rcu_head rcu;
};
+struct bpf_current_pidns_info {+ u64 ns_id;+ s32 tgid;+ s32 pid;+ u32 major;+ u32 minor;+};
Let us reoder this as
u32 dev; /* major << 16 | minor */
u32 nsid;
u32 tgid;
u32 pid;
The idea is the data type/size trying to match kernels.
We use unsigned tgid/pid since for a particular process, this should be
nonnegative.
kernel may have dev as "unsigned short", here we use u32 to avoid holes in
the data structure.
Just use the name "bpf_pidns_info". No need to have "current" in the
structure name.
Also, this data structure should be defined in include/uapi/linux/bpf.h
since the bpf program may access it.
+
u64 bpf_tail_call(u64 ctx, u64 r2, u64 index, u64 r4, u64 r5);
u64 bpf_get_stackid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5);
@@ -339,6 +347,7 @@ extern const struct bpf_func_proto bpf_get_current_comm_proto;
extern const struct bpf_func_proto bpf_skb_vlan_push_proto;
extern const struct bpf_func_proto bpf_skb_vlan_pop_proto;
extern const struct bpf_func_proto bpf_get_stackid_proto;+extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
/* Shared helpers among cBPF and eBPF. */
void bpf_user_rnd_init_once(void);diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f09c70b..9063cc3 100644--- a/include/uapi/linux/bpf.h+++ b/include/uapi/linux/bpf.h@@ -232,6 +232,13 @@ enum bpf_func_id {
*/
BPF_FUNC_get_current_comm,
+ /**+ * bpf_get_current_pidns_info(void *buf, int size_of_buf)+ * stores ns,pid,tgid,major,minor from current into buf+ * Return: 0 on success+ */+ BPF_FUNC_get_current_pidns_info,
Please add the new helper to the end of list to preserve
the backward compatibility.
+
/**
* bpf_get_cgroup_classid(skb) - retrieve a proc's classid
* @skb: pointer to skbdiff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 879ca84..04c06d5 100644--- a/kernel/bpf/core.c+++ b/kernel/bpf/core.c@@ -1067,6 +1067,8 @@ const struct bpf_func_proto bpf_get_current_pid_tgid_proto __weak;
const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
const struct bpf_func_proto bpf_get_current_comm_proto __weak;
++const struct bpf_func_proto bpf_get_current_pidns_info __weak;
const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
{
return NULL;diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 3991840..66d1f6c 100644--- a/kernel/bpf/helpers.c+++ b/kernel/bpf/helpers.c@@ -17,6 +17,9 @@
#include <linux/sched.h>
#include <linux/uidgid.h>
#include <linux/filter.h>+#include <linux/pid_namespace.h>+#include <linux/major.h>+#include <linux/stat.h>
/* If kernel subsystem is allowing eBPF programs to call this function,
* inside its own verifier_ops->get_func_proto() callback it should return@@ -167,3 +170,59 @@ const struct bpf_func_proto bpf_get_current_comm_proto = {
.arg1_type = ARG_PTR_TO_RAW_STACK,
.arg2_type = ARG_CONST_STACK_SIZE,
};+BPF_CALL_2(bpf_get_current_pidns_info, void *, buf, u32, size)
The buffer can be defined as "struct bpf_pidns_info"?
+{+ struct task_struct *ts = current;
You can just use "current", no need to have a separate variable.
+ struct pid_namespace *pidns = NULL;+ pid_t pid = 0;+ pid_t tgid = 0;+ int res = 0;+ const char *ppath = "/proc/self/ns/pid";+ mm_segment_t oldsegfs;+ struct kstat stat;++ if (unlikely(!ts))+ return -EINVAL;
Not sure whether this is needed or not.
++ pidns = task_active_pid_ns(ts);++ if (unlikely(!pidns))+ return -EINVAL;++ ((struct bpf_current_pidns_info*)buf)->ns_id = (u64) pidns->ns.inum;++ pid = task_pid_nr_ns(ts, pidns);++ if (unlikely(!pid))+ return -EINVAL;++ tgid = task_tgid_nr_ns(ts, pidns);++ if (unlikely(!tgid))+ return -EINVAL;++ ((struct bpf_current_pidns_info*)buf)->tgid = (s32)tgid;+ ((struct bpf_current_pidns_info*)buf)->pid = (s32) pid;++ oldsegfs = get_fs();+ set_fs(KERNEL_DS);+ res = vfs_stat((const char __user*)ppath, &stat);+ set_fs(oldsegfs);
Not sure about whether this is the most efficient way to get the device
info.
But this is a good start.
++ if(unlikely(res))+ return -EINVAL;++ ((struct bpf_current_pidns_info*)buf)->major = (u32) MAJOR(stat.dev);+ ((struct bpf_current_pidns_info*)buf)->minor = (u32) MINOR(stat.dev);
I will test it once you have an implementation rebased on the top of
bpf-next or net-next.
++ return 0;+}+++const struct bpf_func_proto bpf_get_current_pidns_info_proto = {+ .func = bpf_get_current_pidns_info,+ .gpl_only = false,+ .ret_type = RET_INTEGER,+ .arg1_type = ARG_PTR_TO_RAW_STACK,+ .arg2_type = ARG_CONST_STACK_SIZE,
You can use ARG_PTR_TO_MEM or ARG_PTR_TO_UNIT_MEM here
followed with ARG_CONST_SIZE. I suggest you use
ARG_PTR_TO_UNIT_MEM since the helper tries to fill the buffer
and the initialization for the buffer is not required.
… +};diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 41805fb..333b993 100644--- a/kernel/trace/bpf_trace.c+++ b/kernel/trace/bpf_trace.c@@ -456,6 +456,9 @@ static const struct bpf_func_proto *tracing_func_proto(enum bpf_func_id func_id)
return &bpf_current_task_under_cgroup_proto;
case BPF_FUNC_get_prandom_u32:
return &bpf_get_prandom_u32_proto;++ case BPF_FUNC_get_current_pidns_info:+ return &bpf_get_current_pidns_info_proto;
default:
return NULL;
}diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 72c5867..5383bd2 100644--- a/samples/bpf/Makefile+++ b/samples/bpf/Makefile@@ -28,6 +28,7 @@ hostprogs-y += test_current_task_under_cgroup
hostprogs-y += trace_event
hostprogs-y += sampleip
hostprogs-y += tc_l2_redirect+hostprogs-y += trace_ns_info
test_verifier-objs := test_verifier.o libbpf.o
test_maps-objs := test_maps.o libbpf.o@@ -58,6 +59,7 @@ test_current_task_under_cgroup-objs := bpf_load.o libbpf.o \
trace_event-objs := bpf_load.o libbpf.o trace_event_user.o
sampleip-objs := bpf_load.o libbpf.o sampleip_user.o
tc_l2_redirect-objs := bpf_load.o libbpf.o tc_l2_redirect_user.o+trace_ns_info-objs := bpf_load.o libbpf.o trace_ns_info_user.o
# Tell kbuild to always build the programs
always := $(hostprogs-y)@@ -88,6 +90,7 @@ always += xdp2_kern.o
always += test_current_task_under_cgroup_kern.o
always += trace_event_kern.o
always += sampleip_kern.o+always += trace_ns_info_user_kern.o
HOSTCFLAGS += -I$(objtree)/usr/include
@@ -115,6 +118,7 @@ HOSTLOADLIBES_test_current_task_under_cgroup += -lelf
HOSTLOADLIBES_trace_event += -lelf
HOSTLOADLIBES_sampleip += -lelf
HOSTLOADLIBES_tc_l2_redirect += -l elf+HOSTLOADLIBES_trace_ns_info += -lelf
# Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
# make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clangdiff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h
index dadd516..c425730 100644--- a/samples/bpf/bpf_helpers.h+++ b/samples/bpf/bpf_helpers.h@@ -57,6 +57,8 @@ static int (*bpf_skb_set_tunnel_opt)(void *ctx, void *md, int size) =
(void *) BPF_FUNC_skb_set_tunnel_opt;
static unsigned long long (*bpf_get_prandom_u32)(void) =
(void *) BPF_FUNC_get_prandom_u32;+static int (*bpf_get_current_pidns_info)(void *buf, int buf_size) =+ (void *) BPF_FUNC_get_current_pidns_info;
/* llvm builtin functions that eBPF C program may use to
* emit BPF_LD_ABS and BPF_LD_IND instructionsdiff --git a/samples/bpf/trace_ns_info_user.c b/samples/bpf/trace_ns_info_user.c
new file mode 100644
index 0000000..78660d9--- /dev/null+++ b/samples/bpf/trace_ns_info_user.c@@ -0,0 +1,25 @@+#include <stdio.h>+#include <linux/bpf.h>+#include <unistd.h>+#include "libbpf.h"+#include "bpf_load.h"++int main(int ac, char **argv)+{+ FILE *f;+ char filename[256];++ snprintf(filename, sizeof(filename), "%s_user_kern.o", argv[0]);+ printf("loading %s\n",filename);+++ if (load_bpf_file(filename)) {+ printf("%s", bpf_log_buf);+ return 1;+ }++ f = popen("taskset 1 ping localhost", "r");+ (void) f;+ read_trace_pipe();+ return 0;+}diff --git a/samples/bpf/trace_ns_info_user_kern.c b/samples/bpf/trace_ns_info_user_kern.c
new file mode 100644
index 0000000..5d2438e--- /dev/null+++ b/samples/bpf/trace_ns_info_user_kern.c@@ -0,0 +1,85 @@+/* Copyright (c) 2017 ***@***.***+ *+ * This program is free software; you can redistribute it and/or+ * modify it under the terms of version 2 of the GNU General Public+ * License as published by the Free Software Foundation.+ */+#include <linux/skbuff.h>+#include <linux/netdevice.h>+#include <linux/version.h>+#include <uapi/linux/bpf.h>+#include "bpf_helpers.h"++struct pidns_info {+ u64 ns_id;+ s32 ns_tgid;+ s32 ns_pid;+ u32 major;+ u32 minor;+};+struct ns {+ u32 major;+ u32 minor;+};+++struct bpf_map_def SEC("maps") nsmap = {+ .type = BPF_MAP_TYPE_HASH,+ .key_size = sizeof(struct ns),+ .value_size = sizeof(struct ns),+ .max_entries = 10,+};+++/* kprobe is NOT a stable ABI+ * kernel functions can be removed, renamed or completely change semantics.+ * Number of arguments and their positions can change, etc.+ * In such case this bpf+kprobe example will no longer be meaningful+ */+SEC("kprobe/__netif_receive_skb_core")+int bpf_prog1(struct pt_regs *ctx)+{+ struct pidns_info nsinfo;+ struct ns ns;+ long init_val = 1;+ long *value;+ int ok=0;++ ok = bpf_get_current_pidns_info(&nsinfo,sizeof (nsinfo));++ if(ok == 0)+ {+ ns.major = (u32) nsinfo.major;+ ns.minor = (u32) nsinfo.minor;++ char fmt[] = "ns_id %lld tgid %ld pid %ld \n";++ bpf_trace_printk(fmt, sizeof(fmt), (long)nsinfo.ns_id,+ (long) nsinfo.ns_tgid, (long)nsinfo.ns_pid);++ char fmtm[] = "major %ld minor %ld \n";++ bpf_trace_printk(fmtm, sizeof(fmtm), (long)nsinfo.major,+ (long)nsinfo.minor);++ value = bpf_map_lookup_elem(&nsmap, &ns);++ if (value)+ {+ char fmtm[] = "already in map major %ld minor %ld\n";+ bpf_trace_printk(fmtm, sizeof(fmtm), (long)nsinfo.major,+ (long)nsinfo.minor);+ } else+ {+ char fmtm[] = "adding to map major %ld minor %ld\n";+ bpf_trace_printk(fmtm, sizeof(fmtm), (long)nsinfo.major,+ (long)nsinfo.minor);+ bpf_map_update_elem(&nsmap, &ns, &init_val, BPF_ANY);+ }+ }++ return 0;+}++char _license[] SEC("license") = "GPL";+u32 _version SEC("version") = LINUX_VERSION_CODE;--
2.11.0
Bests
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1329 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ALq6otmN7p2SH3CbqrCn41HNz9X-0uLCks5uK24pgaJpZM4PHzwf>
.
|
This comment has been minimized.
This comment has been minimized.
@yonghong-song, thanks a lot for the feedback. |
This comment has been minimized.
This comment has been minimized.
@yonghong-song, From 932f55d885bf34443bb73ec4b24f9dd4b95b64c7 Mon Sep 17 00:00:00 2001
From: cneira <cneirabustos@...>
Date: Fri, 3 Aug 2018 13:36:55 -0400
Subject: [PATCH] BPF: helpers: New helper to obtain namespace data from
current.
This helper obtains the active namespace from current and returns
pid, tgid, device major/minor and namespace as seen from that ns,
allowing to instrument process inside a container. Major and minor
are obtained from /proc/self/pid, the reason behind is that in the
future, it is possible that different pid_ns files may belong
to different devices according to Eric Biederman as ys114321@...
found out in last linux plumbers conference.
---
include/linux/bpf.h | 1 +
include/uapi/linux/bpf.h | 20 +++++++++-
kernel/bpf/core.c | 1 +
kernel/bpf/helpers.c | 65 +++++++++++++++++++++++++++++++
kernel/trace/bpf_trace.c | 2 +
samples/bpf/Makefile | 3 ++
samples/bpf/trace_ns_info_user.c | 25 ++++++++++++
samples/bpf/trace_ns_info_user_kern.c | 40 +++++++++++++++++++
tools/include/uapi/linux/bpf.h | 3 +-
tools/testing/selftests/bpf/bpf_helpers.h | 3 ++
10 files changed, 161 insertions(+), 2 deletions(-)
create mode 100644 samples/bpf/trace_ns_info_user.c
create mode 100644 samples/bpf/trace_ns_info_user_kern.c
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index cd8790d2c6ed..3f4b999f7c99 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -787,6 +787,7 @@ extern const struct bpf_func_proto bpf_get_stack_proto;
extern const struct bpf_func_proto bpf_sock_map_update_proto;
extern const struct bpf_func_proto bpf_sock_hash_update_proto;
extern const struct bpf_func_proto bpf_get_current_cgroup_id_proto;
+extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
extern const struct bpf_func_proto bpf_get_local_storage_proto;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index dd5758dc35d3..b53e5c21805a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -173,6 +173,16 @@ enum bpf_attach_type {
__MAX_BPF_ATTACH_TYPE
};
+/* helper bpf_get_current_pidns_info will store the following
+ * data, dev will contain major/minor from /proc/self/pid.
+ */
+struct bpf_pidns_info {
+ __u32 dev;
+ __u32 nsid;
+ __u32 tgid;
+ __u32 pid;
+};
+
#define MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
/* cgroup-bpf attach flags used in BPF_PROG_ATTACH command
@@ -2113,7 +2123,14 @@ union bpf_attr {
* the shared data.
* Return
* Pointer to the local storage area.
+ * int bpf_get_current_pidns(struct bpf_pidns_info *pidns, u32 size_of_pidns)
+ * Description
+ * Obtains from current task values for pid, namespace, tgid and
+ * device major/minor from /proc/self/ns/pid
+ * Return
+ * 0 on success -EINVAL on error.
*/
+
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
FN(map_lookup_elem), \
@@ -2196,7 +2213,8 @@ union bpf_attr {
FN(rc_keydown), \
FN(skb_cgroup_id), \
FN(get_current_cgroup_id), \
- FN(get_local_storage),
+ FN(get_local_storage), \
+ FN(get_current_pidns_info),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 4d09e610777f..98ce53ce2ea6 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1796,6 +1796,7 @@ const struct bpf_func_proto bpf_sock_map_update_proto __weak;
const struct bpf_func_proto bpf_sock_hash_update_proto __weak;
const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
const struct bpf_func_proto bpf_get_local_storage_proto __weak;
+const struct bpf_func_proto bpf_get_current_pidns_info __weak;
const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
{
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 1991466b8327..44367a5208b1 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -18,6 +18,9 @@
#include <linux/sched.h>
#include <linux/uidgid.h>
#include <linux/filter.h>
+#include <linux/pid_namespace.h>
+#include <linux/major.h>
+#include <linux/stat.h>
/* If kernel subsystem is allowing eBPF programs to call this function,
* inside its own verifier_ops->get_func_proto() callback it should return
@@ -214,3 +217,65 @@ const struct bpf_func_proto bpf_get_local_storage_proto = {
.arg2_type = ARG_ANYTHING,
};
#endif
+
+BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
+ size)
+{
+ struct pid_namespace *pidns = NULL;
+ pid_t pid = 0;
+ pid_t tgid = 0;
+ int res = 0;
+ const char *ppath = "/proc/self/ns/pid";
+ mm_segment_t oldsegfs;
+ struct kstat stat;
+
+ if (unlikely(!pidns_info))
+ goto error;
+
+ pidns = task_active_pid_ns(current);
+
+ if (unlikely(!pidns))
+ goto error;
+
+ pidns_info->nsid = (u32) pidns->ns.inum;
+
+ pid = task_pid_nr_ns(current, pidns);
+
+ if (unlikely(!pid))
+ goto error;
+
+ tgid = task_tgid_nr_ns(current, pidns);
+
+ if (unlikely(!tgid))
+ goto error;
+
+ pidns_info->tgid = (u32) tgid;
+ pidns_info->pid = (u32) pid;
+
+ oldsegfs = get_fs();
+ set_fs(KERNEL_DS);
+ res = vfs_stat((const char __user *)ppath, &stat);
+ set_fs(oldsegfs);
+
+ if (unlikely(res))
+ goto error;
+
+ pidns_info->dev = (u32) stat.dev;
+
+ return 0;
+
+error:
+ if (pidns_info)
+ memset((void *)pidns, 0, (size_t) size);
+
+ return -EINVAL;
+}
+
+const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
+ .func = bpf_get_current_pidns_info,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_UNINIT_MEM,
+ .arg2_type = ARG_CONST_SIZE,
+};
+
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 0ae6829804bc..f70be29e49ab 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -568,6 +568,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_get_current_cgroup_id:
return &bpf_get_current_cgroup_id_proto;
#endif
+ case BPF_FUNC_get_current_pidns_info:
+ return &bpf_get_current_pidns_info_proto;
default:
return NULL;
}
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index f88d5683d6ee..fdcde00554ce 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -53,6 +53,7 @@ hostprogs-y += xdpsock
hostprogs-y += xdp_fwd
hostprogs-y += task_fd_query
hostprogs-y += xdp_sample_pkts
+hostprogs-y += trace_ns_info
# Libbpf dependencies
LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a
@@ -109,6 +110,7 @@ xdpsock-objs := xdpsock_user.o
xdp_fwd-objs := xdp_fwd_user.o
task_fd_query-objs := bpf_load.o task_fd_query_user.o $(TRACE_HELPERS)
xdp_sample_pkts-objs := xdp_sample_pkts_user.o $(TRACE_HELPERS)
+trace_ns_info-objs := bpf_load.o trace_ns_info_user.o
# Tell kbuild to always build the programs
always := $(hostprogs-y)
@@ -166,6 +168,7 @@ always += xdpsock_kern.o
always += xdp_fwd_kern.o
always += task_fd_query_kern.o
always += xdp_sample_pkts_kern.o
+always += trace_ns_info_user_kern.o
HOSTCFLAGS += -I$(objtree)/usr/include
HOSTCFLAGS += -I$(srctree)/tools/lib/
diff --git a/samples/bpf/trace_ns_info_user.c b/samples/bpf/trace_ns_info_user.c
new file mode 100644
index 000000000000..e5754558a06f
--- /dev/null
+++ b/samples/bpf/trace_ns_info_user.c
@@ -0,0 +1,25 @@
+#include <stdio.h>
+#include <linux/bpf.h>
+#include <unistd.h>
+#include "bpf/libbpf.h"
+#include "bpf_load.h"
+
+int main(int ac, char **argv)
+{
+ FILE *f;
+ char filename[256];
+
+ snprintf(filename, sizeof(filename), "%s_user_kern.o", argv[0]);
+ printf("loading %s\n", filename);
+
+
+ if (load_bpf_file(filename)) {
+ printf("%s", bpf_log_buf);
+ return 1;
+ }
+
+ f = popen("taskset 1 ping localhost", "r");
+ (void) f;
+ read_trace_pipe();
+ return 0;
+}
diff --git a/samples/bpf/trace_ns_info_user_kern.c b/samples/bpf/trace_ns_info_user_kern.c
new file mode 100644
index 000000000000..8fa40351eebd
--- /dev/null
+++ b/samples/bpf/trace_ns_info_user_kern.c
@@ -0,0 +1,40 @@
+/* Copyright (c) 2017 cneirabustos@...
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <linux/version.h>
+#include <uapi/linux/bpf.h>
+#include "bpf_helpers.h"
+
+typedef __u64 u64;
+typedef __u32 u32;
+
+
+/* kprobe is NOT a stable ABI
+ * kernel functions can be removed, renamed or completely change semantics.
+ * Number of arguments and their positions can change, etc.
+ * In such case this bpf+kprobe example will no longer be meaningful
+ */
+SEC("kprobe/__netif_receive_skb_core")
+int bpf_prog1(struct pt_regs *ctx)
+{
+ struct bpf_pidns_info nsinfo;
+ int ok = 0;
+
+ ok = bpf_get_current_pidns_info(&nsinfo, sizeof nsinfo);
+ if ( ok == 0 ) {
+ char fmt[] = "nsid:%u, dev: %u, pid:%u\n";
+
+ bpf_trace_printk(fmt, sizeof(fmt), (u32)nsinfo.nsid, (u32) nsinfo.dev,
+ (u32)nsinfo.pid);
+ }
+
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index dd5758dc35d3..643174e5fc9e 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2196,7 +2196,8 @@ union bpf_attr {
FN(rc_keydown), \
FN(skb_cgroup_id), \
FN(get_current_cgroup_id), \
- FN(get_local_storage),
+ FN(get_local_storage), \
+ FN(get_current_pidns_info),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index cb9fcfbc9307..fbc807620839 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -137,6 +137,9 @@ static unsigned long long (*bpf_get_current_cgroup_id)(void) =
(void *) BPF_FUNC_get_current_cgroup_id;
static void *(*bpf_get_local_storage)(void *map, unsigned long long flags) =
(void *) BPF_FUNC_get_local_storage;
+static int (*bpf_get_current_pidns_info)(struct bpf_pidns_info* buf,
+ unsigned int buf_size) =
+ (void *) BPF_FUNC_get_current_pidns_info;
/* llvm builtin functions that eBPF C program may use to
* emit BPF_LD_ABS and BPF_LD_IND instructions
--
2.11.0 |
This comment has been minimized.
This comment has been minimized.
On Mon, Aug 6, 2018 at 7:35 AM, Carlos Neira ***@***.***> wrote:
Here is the updated code for bpf-next branch
https://lists.iovisor.org/g/iovisor-dev/message/1423
From 932f55d885bf34443bb73ec4b24f9dd4b95b64c7 Mon Sep 17 00:00:00 2001
From: cneira ***@***.***>
Date: Fri, 3 Aug 2018 13:36:55 -0400
Subject: [PATCH] BPF: helpers: New helper to obtain namespace data from
current.
This helper obtains the active namespace from current and returns
pid, tgid, device major/minor and namespace as seen from that ns,
allowing to instrument process inside a container. Major and minor
are obtained from /proc/self/pid, the reason behind is that in the
future, it is possible that different pid_ns files may belong
to different devices according to Eric Biederman as ***@***.***
found out in last linux plumbers conference.
according to the discussion between Eric Biederman and Yonghong in 2017
linux
plumbers conference.
Please remember to cc Eric Biederman when you formally submit your patch
to bpf-next.
Please do add some reasoning about what kind of problem this helper tries
to solve
and may suggest typical workflow to use this helper in practice.
---
include/linux/bpf.h | 1 +
include/uapi/linux/bpf.h | 20 +++++++++-
kernel/bpf/core.c | 1 +
kernel/bpf/helpers.c | 65 +++++++++++++++++++++++++++++++
kernel/trace/bpf_trace.c | 2 +
samples/bpf/Makefile | 3 ++
samples/bpf/trace_ns_info_user.c | 25 ++++++++++++
samples/bpf/trace_ns_info_user_kern.c | 40 +++++++++++++++++++
tools/include/uapi/linux/bpf.h | 3 +-
tools/testing/selftests/bpf/bpf_helpers.h | 3 ++
10 files changed, 161 insertions(+), 2 deletions(-)
create mode 100644 samples/bpf/trace_ns_info_user.c
create mode 100644 samples/bpf/trace_ns_info_user_kern.c
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index cd8790d2c6ed..3f4b999f7c99 100644--- a/include/linux/bpf.h+++ b/include/linux/bpf.h@@ -787,6 +787,7 @@ extern const struct bpf_func_proto bpf_get_stack_proto;
extern const struct bpf_func_proto bpf_sock_map_update_proto;
extern const struct bpf_func_proto bpf_sock_hash_update_proto;
extern const struct bpf_func_proto bpf_get_current_cgroup_id_proto;+extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
extern const struct bpf_func_proto bpf_get_local_storage_proto;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index dd5758dc35d3..b53e5c21805a 100644--- a/include/uapi/linux/bpf.h+++ b/include/uapi/linux/bpf.h@@ -173,6 +173,16 @@ enum bpf_attach_type {
__MAX_BPF_ATTACH_TYPE
};
+/* helper bpf_get_current_pidns_info will store the following+ * data, dev will contain major/minor from /proc/self/pid.+ */+struct bpf_pidns_info {+ __u32 dev;
Please add a comment to "dev" on how major/minor is encoded.
+ __u32 nsid;+ __u32 tgid;+ __u32 pid;+};+
Please put this new struct definition at the end of the file.
#define MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
/* cgroup-bpf attach flags used in BPF_PROG_ATTACH command@@ -2113,7 +2123,14 @@ union bpf_attr {
* the shared data.
* Return
* Pointer to the local storage area.+ * int bpf_get_current_pidns(struct bpf_pidns_info *pidns, u32 size_of_pidns)+ * Description+ * Obtains from current task values for pid, namespace, tgid and+ * device major/minor from /proc/self/ns/pid+ * Return+ * 0 on success -EINVAL on error.
*/+
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
FN(map_lookup_elem), \@@ -2196,7 +2213,8 @@ union bpf_attr {
FN(rc_keydown), \
FN(skb_cgroup_id), \
FN(get_current_cgroup_id), \- FN(get_local_storage),+ FN(get_local_storage), \+ FN(get_current_pidns_info),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to calldiff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 4d09e610777f..98ce53ce2ea6 100644--- a/kernel/bpf/core.c+++ b/kernel/bpf/core.c@@ -1796,6 +1796,7 @@ const struct bpf_func_proto bpf_sock_map_update_proto __weak;
const struct bpf_func_proto bpf_sock_hash_update_proto __weak;
const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
const struct bpf_func_proto bpf_get_local_storage_proto __weak;+const struct bpf_func_proto bpf_get_current_pidns_info __weak;
const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
{diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 1991466b8327..44367a5208b1 100644--- a/kernel/bpf/helpers.c+++ b/kernel/bpf/helpers.c@@ -18,6 +18,9 @@
#include <linux/sched.h>
#include <linux/uidgid.h>
#include <linux/filter.h>+#include <linux/pid_namespace.h>+#include <linux/major.h>+#include <linux/stat.h>
/* If kernel subsystem is allowing eBPF programs to call this function,
* inside its own verifier_ops->get_func_proto() callback it should return@@ -214,3 +217,65 @@ const struct bpf_func_proto bpf_get_local_storage_proto = {
.arg2_type = ARG_ANYTHING,
};
#endif++BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,+ size)+{+ struct pid_namespace *pidns = NULL;+ pid_t pid = 0;+ pid_t tgid = 0;+ int res = 0;+ const char *ppath = "/proc/self/ns/pid";+ mm_segment_t oldsegfs;+ struct kstat stat;
For the difinitions, please follow linux convention with reverse Christmas
tree sequence
like below:
const char *ppath = "/proc/self/ns/pid";
struct pid_namespace *pidns = NULL;
mm_segment_t oldsegfs;
...
++ if (unlikely(!pidns_info))+ goto error;
This check is not right. Instead, we should check
if (unlikely(size != sizeof(struct bpf_pidns_info))
goto error;
Verifier will guarantee that pidns_info points to at lease "size"
bytes of valid memory.
Also, could you rename the label name "error" to "clear"?
In kernel, typically, the label name reflects that action to be taken.
++ pidns = task_active_pid_ns(current);++ if (unlikely(!pidns))+ goto error;++ pidns_info->nsid = (u32) pidns->ns.inum;
pidns->ns.inum is "unsigned int"? So we do not need u32 conversion
here, right?
++ pid = task_pid_nr_ns(current, pidns);++ if (unlikely(!pid))+ goto error;++ tgid = task_tgid_nr_ns(current, pidns);++ if (unlikely(!tgid))+ goto error;++ pidns_info->tgid = (u32) tgid;+ pidns_info->pid = (u32) pid;++ oldsegfs = get_fs();+ set_fs(KERNEL_DS);+ res = vfs_stat((const char __user *)ppath, &stat);+ set_fs(oldsegfs);++ if (unlikely(res))+ goto error;++ pidns_info->dev = (u32) stat.dev;
Maybe we do not need conversion here since stat.dev is already u32?
+ return 0;++error:+ if (pidns_info)+ memset((void *)pidns, 0, (size_t) size);++ return -EINVAL;+}++const struct bpf_func_proto bpf_get_current_pidns_info_proto = {+ .func = bpf_get_current_pidns_info,+ .gpl_only = false,+ .ret_type = RET_INTEGER,+ .arg1_type = ARG_PTR_TO_UNINIT_MEM,+ .arg2_type = ARG_CONST_SIZE,+};+diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 0ae6829804bc..f70be29e49ab 100644--- a/kernel/trace/bpf_trace.c+++ b/kernel/trace/bpf_trace.c@@ -568,6 +568,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_get_current_cgroup_id:
return &bpf_get_current_cgroup_id_proto;
#endif+ case BPF_FUNC_get_current_pidns_info:+ return &bpf_get_current_pidns_info_proto;
default:
return NULL;
}diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index f88d5683d6ee..fdcde00554ce 100644--- a/samples/bpf/Makefile+++ b/samples/bpf/Makefile@@ -53,6 +53,7 @@ hostprogs-y += xdpsock
hostprogs-y += xdp_fwd
hostprogs-y += task_fd_query
hostprogs-y += xdp_sample_pkts+hostprogs-y += trace_ns_info
# Libbpf dependencies
LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a@@ -109,6 +110,7 @@ xdpsock-objs := xdpsock_user.o
xdp_fwd-objs := xdp_fwd_user.o
task_fd_query-objs := bpf_load.o task_fd_query_user.o $(TRACE_HELPERS)
xdp_sample_pkts-objs := xdp_sample_pkts_user.o $(TRACE_HELPERS)+trace_ns_info-objs := bpf_load.o trace_ns_info_user.o
# Tell kbuild to always build the programs
always := $(hostprogs-y)@@ -166,6 +168,7 @@ always += xdpsock_kern.o
always += xdp_fwd_kern.o
always += task_fd_query_kern.o
always += xdp_sample_pkts_kern.o+always += trace_ns_info_user_kern.o
HOSTCFLAGS += -I$(objtree)/usr/include
HOSTCFLAGS += -I$(srctree)/tools/lib/diff --git a/samples/bpf/trace_ns_info_user.c b/samples/bpf/trace_ns_info_user.c
new file mode 100644
index 000000000000..e5754558a06f--- /dev/null+++ b/samples/bpf/trace_ns_info_user.c@@ -0,0 +1,25 @@+#include <stdio.h>+#include <linux/bpf.h>+#include <unistd.h>+#include "bpf/libbpf.h"+#include "bpf_load.h"++int main(int ac, char **argv)+{+ FILE *f;+ char filename[256];++ snprintf(filename, sizeof(filename), "%s_user_kern.o", argv[0]);+ printf("loading %s\n", filename);+++ if (load_bpf_file(filename)) {+ printf("%s", bpf_log_buf);+ return 1;+ }++ f = popen("taskset 1 ping localhost", "r");+ (void) f;+ read_trace_pipe();+ return 0;+}diff --git a/samples/bpf/trace_ns_info_user_kern.c b/samples/bpf/trace_ns_info_user_kern.c
new file mode 100644
index 000000000000..8fa40351eebd--- /dev/null+++ b/samples/bpf/trace_ns_info_user_kern.c@@ -0,0 +1,40 @@+/* Copyright (c) 2017 ***@***.***
It is already year 2018. Maybe you want to have full name plus other
stuff. You can see other examples in
the kernel code.
+ *+ * This program is free software; you can redistribute it and/or+ * modify it under the terms of version 2 of the GNU General Public+ * License as published by the Free Software Foundation.+ */+#include <linux/skbuff.h>+#include <linux/netdevice.h>+#include <linux/version.h>+#include <uapi/linux/bpf.h>+#include "bpf_helpers.h"++typedef __u64 u64;+typedef __u32 u32;+++/* kprobe is NOT a stable ABI+ * kernel functions can be removed, renamed or completely change semantics.+ * Number of arguments and their positions can change, etc.+ * In such case this bpf+kprobe example will no longer be meaningful+ */+SEC("kprobe/__netif_receive_skb_core")+int bpf_prog1(struct pt_regs *ctx)+{+ struct bpf_pidns_info nsinfo;+ int ok = 0;++ ok = bpf_get_current_pidns_info(&nsinfo, sizeof nsinfo);+ if ( ok == 0 ) {+ char fmt[] = "nsid:%u, dev: %u, pid:%u\n";++ bpf_trace_printk(fmt, sizeof(fmt), (u32)nsinfo.nsid, (u32) nsinfo.dev,+ (u32)nsinfo.pid);+ }++ return 0;+}++char _license[] SEC("license") = "GPL";+u32 _version SEC("version") = LINUX_VERSION_CODE;diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index dd5758dc35d3..643174e5fc9e 100644--- a/tools/include/uapi/linux/bpf.h+++ b/tools/include/uapi/linux/bpf.h@@ -2196,7 +2196,8 @@ union bpf_attr {
FN(rc_keydown), \
FN(skb_cgroup_id), \
FN(get_current_cgroup_id), \- FN(get_local_storage),+ FN(get_local_storage), \+ FN(get_current_pidns_info),
Please just copy include/uapi/linux/bpf.h to tools directory.
…
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to calldiff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index cb9fcfbc9307..fbc807620839 100644--- a/tools/testing/selftests/bpf/bpf_helpers.h+++ b/tools/testing/selftests/bpf/bpf_helpers.h@@ -137,6 +137,9 @@ static unsigned long long (*bpf_get_current_cgroup_id)(void) =
(void *) BPF_FUNC_get_current_cgroup_id;
static void *(*bpf_get_local_storage)(void *map, unsigned long long flags) =
(void *) BPF_FUNC_get_local_storage;+static int (*bpf_get_current_pidns_info)(struct bpf_pidns_info* buf,+ unsigned int buf_size) =+ (void *) BPF_FUNC_get_current_pidns_info;
/* llvm builtin functions that eBPF C program may use to
* emit BPF_LD_ABS and BPF_LD_IND instructions--
2.11.0
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1329 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ALq6olzUu3uDWGkdd5bqDdGiKnqGYNXaks5uOFQrgaJpZM4PHzwf>
.
|
This comment has been minimized.
This comment has been minimized.
@yonghong-song thank you very much again for your help. |
This comment has been minimized.
This comment has been minimized.
Hi @yonghong-song https://lore.kernel.org/netdev/20190320164920.ha5ne7nlnu7fbx5i@dev00/T/#t Bests |
This comment has been minimized.
This comment has been minimized.
Thanks! |
This comment has been minimized.
This comment has been minimized.
I will monitor the conversation as well. |
This comment has been minimized.
This comment has been minimized.
I have a couple of doubts after @4ast comments on the patch :
Is this is not acceptable there another way to interact with the vfs layer within a bpf helper?. |
This comment has been minimized.
This comment has been minimized.
@cneira Sorry, I did not get time to monitor this patch plus a lot of other bpf patches as well. Do you have the link to the related conversation so I can take a look? Thanks! |
This comment has been minimized.
This comment has been minimized.
@yonghong-song , sure here is the link to that conversation. |
This comment has been minimized.
This comment has been minimized.
The reason is bpf program cannot sleep and it is possible that inner functions of vfs_getattr may sleep and it won't work. The "sleep" here means the current work is suspended due to waiting for some info.
This won't work. If some inner calls inside vfs_getattr() sleeps, spin lock cannot solve it.
So the goal is to make sure bpf program won't sleep. Let us check what are possible solutions:
I did not check details, but is it possible that security_inode_getattr may sleep. Please double check. But even calling vfs_getattr_nosec() can be simplified.
Could you find what inode->i_op->getattr is? You may directly call that function if it is extern.
Also I am 100% sure whether kern_path() can sleep or not. The call tree is too deep. When you submit the next revision, could you explicit ask Eric Biederman ebiederm@xmission.com to take a look as well? |
This comment has been minimized.
This comment has been minimized.
@yonghong-song |
This comment has been minimized.
This comment has been minimized.
@yonghong-song
inode->i_op describes the implemented operations that the VFS can invoke on an inode.
kern_path() may sleep as it calls kmalloc with GFP_KERNEL modifier, as far as I know a modified that does not sleeps is GFP_ATOMIC or GFP_NOWAIT. |
This comment has been minimized.
This comment has been minimized.
On Tue, Apr 16, 2019 at 6:40 PM Carlos Neira ***@***.***> wrote:
@yonghong-song <https://github.com/yonghong-song>
I have submitted a new patch to the dev list for review.
Could you find what inode->i_op->getattr is? You may directly call that
function if it is extern.
Your code could be as simple as below if inode->i_op->getattr d
inode->i_op describes the implemented operations that the VFS can invoke
on an inode.
Also I am 100% sure whether kern_path() can sleep or not. The call tree is
too deep.
kern_path() may sleep as it calls kmalloc with GFP_KERNEL modifier, as far
as I know a modified that does not sleeps is GFP_ATOMIC or GFP_NOWAIT.
Sorry for reply later. Yes, this kmalloc with GFP_KERNEL will be an issue.
We cannot use kern_path as is. We either need to create another
kern_path_nonblock() (or something like this) or we can do the following:
name = getname_kernel_light(name);
filename_lookup(...);
where getname_kernel_light() does not block, e.g., does not call kmalloc
with GFP_KERNEL, it can call kmalloc with GFP_ATOMIC.
I am not sure why spin_lock is needed. different cpus can do kern_path() at
the same time, right?
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1329 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ALq6oquTIacKp-VqEDmRIouAixDOO0qkks5vhnuOgaJpZM4PHzwf>
.
|
This comment has been minimized.
This comment has been minimized.
@yonghong-song Patch submitted to bpf-next we are just waiting for review at this point. |
This comment has been minimized.
This comment has been minimized.
Thanks. @cneira We will need to try again next kernel development cycle. |
This comment has been minimized.
This comment has been minimized.
@yonghong-song Sure!. Bests |
When running
tools/funccount.py
inside a container, we discovered that no func counting actually happened.Further investigation shows the reason is due to the pid filtering like below:
The
<pid_arg_passed_in>
is typically the pid of a process to be traced inside the container.On the other hand,
bpf_get_current_pid_tgid()
in the kernel will get the pid outside the container. Such a mismatch will causepid != <pid_arg_passed_in>
evaluated to be true, and bpf program returns 0 prematurely.Most of our bcc tools have pid based filtering. That means most of them will not run properly inside the container.
How to resolve the issue? I can think of two options:
(1). Have one more option in each of the tools which have pid as its arguments. This option will specify host pid, the program will take this pid in the bpf pid filtering if it is available.
Users can get the host pid by using
ps
command on the host.(2). Invent new kernel helpers to:
. helper 1: get namespace id of the current process
. helper 2: pass in the process id and namespace id and get back true if the
the current process is the one having the same <ns_id, pid_inside_ns>.
Any comments or suggestions? cc @brendangregg @goldshtn @4ast