-
Notifications
You must be signed in to change notification settings - Fork 1
Sensor Probes
The eBPF sensor library lives at internal/runner/sensor/. Probes are
written in CO-RE BPF C (bpf/sensor.bpf.c), compiled with clang +
bpf2go, and embedded into the runner binary via go:embed.
-
Sensor.New(Options)— at runner startup. Loads BPF objects, attaches every probe ONCE, opens the ringbuf reader. -
AddCgroup(opts)— per-job. Inserts the cgroup_id + run_id into the CGMAP and populates path_filter with the watched paths. -
RemoveCgroup(cgroup_id)— per-job teardown. Removes the cgmap entry + the path_filter entries for this cgroup. -
Events(ctx)— long-lived event channel. Reader goroutine decodes ringbuf records into typed Go events. -
Close()— detaches all probes, releases BPF objects.
Probes stay attached across jobs. CGMAP + path_filter mutations are
the only per-job state. Combined with the runner pre-creating
/sys/fs/cgroup/.../fangs/<run_id>/ and registering ITS inode before
docker-start, this closes the container-start vs sensor-attach race
window — events fire from syscall #1 inside the container.
| Probe | Section | What it captures |
|---|---|---|
handle_openat |
tracepoint/syscalls/sys_enter_openat |
File access; LPM-trie filtered to watched prefixes; @cred tag |
handle_execve |
tracepoint/syscalls/sys_enter_execve |
Process exec; binary path, argv (8×64 B), 5-level ancestry |
handle_connect |
tracepoint/syscalls/sys_enter_connect |
TCP+UDP connect; v4 and v6; port=0 dropped |
handle_tcp_v4_connect |
kprobe/tcp_v4_connect |
TCP v4 connects via io_uring (bypasses sys_enter) |
handle_tcp_v6_connect |
kprobe/tcp_v6_connect |
Same, v6 |
handle_sendto |
tracepoint/syscalls/sys_enter_sendto |
DNS via sendto+addr; TLS ClientHello detection |
handle_sendmsg |
tracepoint/syscalls/sys_enter_sendmsg |
DNS via single-message sendmsg |
handle_sendmmsg |
tracepoint/syscalls/sys_enter_sendmmsg |
DNS via curl/glibc-2.30+ batched A+AAAA |
handle_write |
tracepoint/syscalls/sys_enter_write |
TLS ClientHello on TCP sockets (Node's BoringSSL path) |
handle_ssl_ctrl |
uprobe:libssl/SSL_ctrl |
TLS SNI via OpenSSL SSL_set_tlsext_host_name
|
| Map | Type | Purpose |
|---|---|---|
cgmap |
HASH | watched cgroup_id → run_id |
path_filter |
LPM_TRIE | watched-path allowlist + @cred tag per prefix |
events |
RINGBUF | 64 MB capture buffer |
drops_counter |
PERCPU_ARRAY | bumped when ringbuf reserve fails (overflow indicator) |
cgmap_misses |
HASH | diagnostic: cgroup_ids that hit lookup_cgroup but missed cgmap |
The lookup_cgroup helper:
-
bpf_get_current_cgroup_id()→ check cgmap directly. Match → return. - Walk
bpf_get_current_ancestor_cgroup_id(level)from 1..8. Deepest match wins (LPM-style). Match → return. - No match → record in
cgmap_misses(diagnostic), return NULL.
The ancestor walk is what makes the pre-attach pattern work. Docker
nests the container's leaf cgroup under our pre-created
/fangs/<run_id> parent; the leaf cgroup_id never matches CGMAP
directly, but the ancestor does.
Up to 8 ancestor levels covered. Docker hierarchies are typically ≤4 levels deep.
Every event begins with a 72-byte fangs_event_header:
struct fangs_event_header {
__u64 ts_ns; // bpf_ktime_get_ns
__u64 cgroup_id; // matched cgroup_id from lookup_cgroup
__u8 run_id[16]; // copied from cgmap value
__u32 pid; // current task's tgid
__u32 tid; // current task's pid
__u32 ppid; // task->real_parent->tgid via BPF_CORE_READ
__u32 uid;
__u32 gid;
char comm[16]; // bpf_get_current_comm
__u8 type; // discriminator (1=file, 2=exec, 3=net, 4=dns, 5=tls)
__u8 tags; // EVENT_TAG_* bits
__u8 _pad[2];
};The tags field can carry:
-
EVENT_TAG_INTERESTING(bit 0) — generic high-signal marker -
EVENT_TAG_CRED_ACCESS(bit 1) — set on file_access events whose path_filter entry was@cred-tagged
struct openat_event {
fangs_event_header h;
__s32 dfd;
__s32 flags;
__u16 path_len;
__u8 truncated;
__u8 _pad;
char path[256];
};Filter pipeline:
-
lookup_cgroup— drop if not in any watched cgroup. -
bpf_probe_read_user_str(filename)intoe->path. -
path_filterLPM trie lookup usinge->pathas key. No match → drop. - If matching action ==
PATH_ACTION_KEEP_CRED_TAGGED, setEVENT_TAG_INTERESTING | EVENT_TAG_CRED_ACCESSon the tags byte.
The LPM trie's prefix-length-bits is set to PATH_LEN * 8 so the trie
matches longest-prefix automatically. Operator's watched-path list
populates the trie via AddCgroup.
struct exec_event {
fangs_event_header h;
__u8 argc;
__u8 _pad[3];
__u8 argv_lens[8];
char argv[8 * 64];
char binary_path[256];
Ancestor ancestors[5]; // pid + ppid + comm[16] each
};argv capture:
- Reads the first 8 argv pointers via
bpf_probe_read_user. - For each non-NULL pointer, reads up to 64 bytes of the string into
the corresponding
argv[i * 64]slot. -
argv_lens[i]records the captured length per slot. - Tail args truncated (operator can grep for
truncated=1).
Ancestry: walks task->real_parent 5 levels deep via BPF_CORE_READ.
Each ancestor's pid, ppid, and comm get recorded. Loop is
#pragma unroll-ed for verifier-friendliness on older kernels.
struct net_connect_event {
fangs_event_header h;
__u8 family; // AF_INET=2, AF_INET6=10
__u8 source; // NET_SOURCE_SYSCALL=1, NET_SOURCE_KPROBE=2
__u16 dest_port; // host byte order
__u32 sockfd;
__u8 dest_addr[16]; // IPv4 uses lower 4 bytes
};The kprobe arms (tcp_v4_connect, tcp_v6_connect) fire from the
kernel function both syscall-path and io_uring-path connects pass
through. The source byte distinguishes:
-
NET_SOURCE_SYSCALL(1) → sys_enter_connect tracepoint -
NET_SOURCE_KPROBE(2) → kprobe ontcp_v{4,6}_connect
Userspace dedups: if a kprobe event arrives within 100ms of a
matching (pid, family, ip, port) syscall event, the kprobe is
dropped. io_uring connects (no preceding syscall) fire only the
kprobe — they survive the dedup.
port=0 is filtered at the kernel — drops glibc getaddrinfo
source-address-selection probes.
struct dns_query_event {
fangs_event_header h;
__u8 family;
__u8 _pad[1];
__u16 dest_port; // always 53 by filter
__u16 query_len;
__u8 _pad2[2];
__u8 dest_addr[16];
__u8 query[200]; // raw bytes; userspace parses question section
};DNS capture across three syscalls because clients use different paths:
| Caller | Syscall |
|---|---|
Classic sendto(fd, buf, len, 0, addr, addrlen)
|
sys_enter_sendto |
connect(fd, addr); send(fd, buf, len) (glibc default since ~2.30) |
sys_enter_sendto with NULL addr → walks task->files->fdt->fd[sockfd]->private_data->sk to read skc_dport and skc_daddr
|
sendmmsg(fd, mmsghdr_vec, vlen) (curl, glibc batched A+AAAA) |
sys_enter_sendmmsg — emits one event per vec entry up to 2 |
sendmsg(fd, msghdr) (rare custom resolvers) |
sys_enter_sendmsg |
All share the dns_dest resolution path. The userspace DNS-question
parser (internal/runner/sensor/parsing.go) walks the label-prefix-
encoded name in the raw payload.
struct tls_sni_event {
fangs_event_header h;
__u8 source; // 1=libssl, 2=node-internal (future), 3=tcp_clienthello
__u8 _pad[1];
__u16 sni_len;
__u16 raw_payload_len;
__u8 _pad2[2];
char sni[256]; // populated when source=libssl
__u8 raw_payload[512]; // populated when source=tcp_clienthello
};Three capture mechanisms:
| Source | How | Coverage |
|---|---|---|
libssl (1) |
uprobe on SSL_ctrl in libssl.so when called with cmd=55 (SSL_CTRL_SET_TLSEXT_HOSTNAME) |
Catches every OpenSSL-based client (curl, Python requests, most Node TLS libs) |
node-internal (2) |
future uprobe on Node-internal TLS — currently DEFERRED-REDUNDANT because mechanism 3 covers it | — |
tcp_clienthello (3) |
sys_enter_write + sys_enter_sendto detect the TLS record-header signature \x16\x03\x01..\x01 and capture raw bytes; userspace parses the SNI extension |
Catches statically-linked Node (Alpine), Go binaries, anything bypassing libssl |
Userspace dedup: 5-second window keyed on (pid, sni). When two
mechanisms report the same SNI, the second one carries
DuplicateOf="<first-source>" for visibility but is still emitted.
drops_counter is a per-CPU u64 array (single key=0) bumped whenever
bpf_ringbuf_reserve() returns NULL — meaning the ringbuf was full and
the event was dropped at probe time.
Userspace reads + sums on shutdown and reports as part of ScanResult.
The orchestrator stores events_dropped per run + the Prometheus
counter fangs_events_dropped_total accumulates lifetime drops.
Non-zero rate is the indicator that the ringbuf is undersized for the workload. Tuning options:
- Reduce sandbox concurrency on the runner.
- Sandbox commands that do less I/O.
- Bump the ringbuf size in
bpf/sensor.bpf.cand rebuild (currently 64 MB; a doubling halves the overflow rate for most workloads).
| Gap | What's not observed |
|---|---|
openat2 syscall |
Modern glibc may use it; we only hook openat. Most npm-install workloads still hit the legacy openat path because Node + npm don't use the new flags |
io_uring file I/O |
Read/write via io_uring sqe queue bypasses every read/write tracepoint. Rare in current npm packages |
mmap-with-prot-write + dirty pages |
Memory-mapped writes don't surface as syscalls; we'd miss file modification through that path |
| DNS-over-HTTPS to already-baselined hosts | Cloudflare's 1.1.1.1 via cloudflare-dns.com is invisible because the DoH-tunneled query is encrypted inside the existing baselined SNI |
| Raw-socket netlink → uevent IPC | Edge case; not observed |
Each gap is a future probe addition. The delta-vs-baseline approach means a new probe retroactively catches new behaviors as soon as fingerprints from the new event type start landing in the runs table.
Built against vmlinux.h generated from the BUILD host's BTF. Compiled
with -target bpf -O2 -g. Runs CO-RE so it adapts to the runtime
kernel's BTF without recompilation — provided structures it reads
(task_struct, files_struct, struct sock, etc.) maintain the BPF_CORE
relocations we use.
Minimum runtime kernel: 4.18 for tracepoint + BTF + ringbuf. 5.5+ recommended for the connected-UDP sock-fd walk path used in DNS capture.