New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpf: Improve publisher reliability #7302
bpf: Improve publisher reliability #7302
Conversation
e0f0c24
to
4a4efc1
Compare
Tried this packages from this pr and saw the following: I0908 16:45:09.437614 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall fork (20)
I0908 16:45:09.442298 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall vfork (28)
I0908 16:45:09.447696 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall clone (36)
I0908 16:45:09.451020 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall close (44)
I0908 16:45:09.454308 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall dup (50)
I0908 16:45:09.457621 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall dup2 (56)
I0908 16:45:09.460983 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall dup3 (62)
I0908 16:45:09.471495 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall creat (71)
I0908 16:45:09.475664 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall mknod (77)
I0908 16:45:09.479838 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall mknodat (83)
I0908 16:45:09.490257 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall open (91)
I0908 16:45:09.494812 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall openat (97)
I0908 16:45:09.494910 25867 bpfeventpublisher.cpp:243] Failed to load the BPF probe for syscall openat2: Failed to open the tracepoint descriptor file: /sys/kernel/debug/tracing/events/syscalls/sys_enter_openat2/id. This syscall may not be available on this system, continuing despite the error
I0908 16:45:09.504556 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall socket (105)
I0908 16:45:09.507974 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall fcntl (111)
I0908 16:45:09.512300 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall connect (117)
I0908 16:45:09.516983 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall accept (123)
I0908 16:45:09.521683 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall accept4 (129)
I0908 16:45:09.525904 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall bind (135)
I0908 16:45:09.529222 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall listen (141)
I0908 16:45:09.539371 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall chdir (149)
I0908 16:45:09.542677 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall fchdir (155)
I0908 16:45:09.547883 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall name_to_handle_at (161)
I0908 16:45:09.551900 25867 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall open_by_handle_at (167)
I0908 16:45:09.674875 25867 bpfeventpublisher.cpp:243] Failed to load the BPF probe for syscall execve: The 'enter' program could not be loaded: The program could not be loaded: 0: (bf) r8 = r1 There was also a fair amount of what looks like I0908 16:45:09.674875 25867 bpfeventpublisher.cpp:243] Failed to load the BPF probe for syscall execve: The 'enter' program could not be loaded: The program could not be loaded: 0: (bf) r8 = r1
1: (85) call bpf_get_current_pid_tgid#14
2: (77) r0 >>= 32
3: (55) if r0 != 0x650b goto pc+2
R0_w=inv25867 R8_w=ctx(id=0,off=0,imm=0) R10=fp0
4: (b7) r0 = 0
5: (95) exit
from 3 to 6: R0_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R8_w=ctx(id=0,off=0,imm=0) R10=fp0
6: (b7) r7 = 0
7: (63) *(u32 *)(r10 -172) = r7
last_idx 7 first_idx 0
regs=80 stack=0 before 6: (b7) r7 = 0
8: (18) r1 = 0xffff901d8f2eae00
10: (bf) r2 = r10
11: (07) r2 += -172
12: (85) call bpf_map_lookup_elem#1
13: (bf) r6 = r0
14: (55) if r6 != 0x0 goto pc+2
R0=inv0 R6_w=inv0 R7=invP0 R8=ctx(id=0,off=0,imm=0) R10=fp0 fp-176=mmmm????
15: (b7) r0 = 0
16: (95) exit
from 14 to 17: R0=map_value(id=0,off=0,ks=4,vs=92,imm=0) R6_w=map_value(id=0,off=0,ks=4,vs=92,imm=0) R7=invP0 R8=ctx(id=0,off=0,imm=0) R10=fp0 fp-176=mmmm????
17: (b7) r1 = 175
18: (7b) *(u64 *)(r6 +4) = r1
R0=map_value(id=0,off=0,ks=4,vs=92,imm=0) R1_w=inv175 R6_w=map_value(id=0,off=0,ks=4,vs=92,imm=0) R7=invP0 R8=ctx(id=0,off=0,imm=0) R10=fp0 fp-176=mmmm????
19: (b7) r1 = 92
20: (63) *(u32 *)(r6 +0) = r1
R0=map_value(id=0,off=0,ks=4,vs=92,imm=0) R1_w=inv92 R6_w=map_value(id=0,off=0,ks=4,vs=92,imm=0) R7=invP0 R8=ctx(id=0,off=0,imm=0) R10=fp0 fp-176=mmmm????
21: (85) call bpf_ktime_get_ns#5
22: (7b) *(u64 *)(r6 +12) = r0
R0=inv(id=0) R6=map_value(id=0,off=0,ks=4,vs=92,imm=0) R7=invP0 R8=ctx(id=0,off=0,imm=0) R10=fp0 fp-176=mmmm????
23: (85) call bpf_get_current_pid_tgid#14
24: (7b) *(u64 *)(r6 +20) = r0
R0_w=inv(id=0) R6=map_value(id=0,off=0,ks=4,vs=92,imm=0) R7=invP0 R8=ctx(id=0,off=0,imm=0) R10=fp0 fp-176=mmmm????
25: (85) call bpf_get_current_uid_gid#15
26: (7b) *(u64 *)(r6 +28) = r0
R0_w=inv(id=0) R6=map_value(id=0,off=0,ks=4,vs=92,imm=0) R7=invP0 R8=ctx(id=0,off=0,imm=0) R10=fp0 fp-176=mmmm????
27: (85) call bpf_get_current_cgroup_id#80
28: (7b) *(u64 *)(r6 +36) = r0
R0_w=inv(id=0) R6=map_value(id=0,off=0,ks=4,vs=92,imm=0) R7=invP0 R8=ctx(id=0,off=0,imm=0) R10=fp0 fp-176=mmmm????
29: (7b) *(u64 *)(r6 +60) = r7
R0_w=inv(id=0) R6=map_value(id=0,off=0,ks=4,vs=92,imm=0) R7=invP0 R8=ctx(id=0,off=0,imm=0) R10=fp0 fp-176=mmmm????
30: (7b) *(u64 *)(r6 +52) = r7
R0_w=inv(id=0) R6=map_value(id=0,off=0,ks=4,vs=92,imm=0) R7=invP0 R8=ctx(id=0,off=0,imm=0) R10=fp0 fp-176=mmmm????
31: (7b) *(u64 *)(r6 +44) = r7
R0_w=inv(id=0) R6=map_value(id=0,off=0,ks=4,vs=92,imm=0) R7=invP0 R8=ctx(id=0,off=0,imm=0) R10=fp0 fp-176=mmmm????
32: (63) *(u32 *)(r10 -172) = r7
33: (18) r1 = 0xffff901d8f2eac00
35: (bf) r2 = r10
36: (07) r2 += -172
37: (85) call bpf_map_lookup_elem#1 Further, the events tables isn't populating: osquery> select * from bpf_socket_events;
osquery> System info: [STD-DEV]19:53:43 zmackie@si-i-0fe66e9061a89d90d ~ $ uname -srm
Linux 5.4.0-1054-aws x86_64
[STD-DEV]19:53:45 zmackie@si-i-0fe66e9061a89d90d ~ $ cat /etc/issue
Ubuntu 18.04.5 LTS \n \l |
4a4efc1
to
04de7d7
Compare
Thank you so much for testing the packages! I have updated the PR again, and this should hopefully fix the problem |
The recent fix has addressed some of the issues, but I'm still seeing malformed event warnings come through: [STD-DEV]18:01:52 root@si-i-02caa65087583c219 /home/zmackie # osqueryi --flagfile=/etc/osquery/osquery.flags --config_path /etc/osquery/osquery.conf --disable_events=false --enable_bpf_events=true --verbose --bpf_perf_event_array_exp 14
I0915 18:01:53.539261 11049 init.cpp:357] osquery initialized [version=5.0.1-5-gd63adb159]
I0915 18:01:53.539309 11049 extensions.cpp:453] Could not autoload extensions: Cannot open file for reading: /etc/osquery/extensions.load
I0915 18:01:53.539412 11049 dispatcher.cpp:78] Adding new service: ExtensionWatcher (0x56283552aad8) to thread: 139973499758336 (0x562835529820) in process 11049
I0915 18:01:53.539463 11049 dispatcher.cpp:78] Adding new service: ExtensionRunnerCore (0x56283552ae58) to thread: 139973491365632 (0x56283552b060) in process 11049
I0915 18:01:53.539492 11049 auto_constructed_tables.cpp:97] Removing stale ATC entries
I0915 18:01:53.539503 11115 interface.cpp:299] Extension manager service starting: /root/.osquery/shell.em
I0915 18:01:53.539896 11049 packs.cpp:177] No queries defined for pack testing
W0915 18:01:53.539928 11049 config.cpp:755] Error reading the query pack named: osquery-monitoring
I0915 18:01:53.542188 11049 virtual_table.cpp:1081] Table curl is disabled, not attaching
I0915 18:01:53.554993 11049 smbios_tables.cpp:105] Reading SMBIOS from sysfs DMI node
I0915 18:01:53.690915 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall fork (20)
I0915 18:01:53.695461 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall vfork (28)
I0915 18:01:53.700693 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall clone (36)
I0915 18:01:53.703929 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall close (44)
I0915 18:01:53.707250 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall dup (50)
I0915 18:01:53.710623 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall dup2 (56)
I0915 18:01:53.714445 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall dup3 (62)
I0915 18:01:53.724747 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall creat (71)
I0915 18:01:53.728834 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall mknod (77)
I0915 18:01:53.733505 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall mknodat (83)
I0915 18:01:53.745663 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall open (91)
I0915 18:01:53.749629 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall openat (97)
I0915 18:01:53.749713 11049 bpfeventpublisher.cpp:243] Failed to load the BPF probe for syscall openat2: Failed to open the tracepoint descriptor file: /sys/kernel/debug/tracing/events/syscalls/sys_enter_openat2/id. This syscall may not be available on this system, continuing despite the error
I0915 18:01:53.759366 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall socket (105)
I0915 18:01:53.762704 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall fcntl (111)
I0915 18:01:53.767246 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall connect (117)
I0915 18:01:53.773125 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall accept (123)
I0915 18:01:53.777843 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall accept4 (129)
I0915 18:01:53.782261 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall bind (135)
I0915 18:01:53.785670 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall listen (141)
I0915 18:01:53.797273 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall chdir (149)
I0915 18:01:53.800541 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall fchdir (155)
I0915 18:01:53.805913 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall name_to_handle_at (161)
I0915 18:01:53.809834 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall open_by_handle_at (167)
I0915 18:01:53.847544 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall execve (175)
I0915 18:01:53.879746 11049 bpfeventpublisher.cpp:256] Initialized BPF probe for syscall execveat (181)
I0915 18:01:53.916539 11049 eventfactory.cpp:156] Event publisher not enabled: auditeventpublisher: Publisher disabled via configuration
I0915 18:01:53.916572 11049 eventfactory.cpp:156] Event publisher not enabled: syslog: Publisher disabled via configuration
I0915 18:01:53.916643 11049 events.cpp:70] Skipping subscriber: apparmor_events: Subscriber disabled via configuration
I0915 18:01:53.916692 11049 events.cpp:70] Skipping subscriber: process_file_events: Subscriber disabled via configuration
I0915 18:01:53.916705 11049 events.cpp:70] Skipping subscriber: seccomp_events: Seccomp subscriber disabled via configuration
I0915 18:01:53.916718 11049 events.cpp:70] Skipping subscriber: selinux_events: Subscriber disabled via configuration
I0915 18:01:53.916729 11049 events.cpp:70] Skipping subscriber: socket_events: Subscriber disabled via configuration
I0915 18:01:53.917062 11773 eventfactory.cpp:390] Starting event publisher run loop: BPFEventPublisher
I0915 18:01:53.917089 11774 eventfactory.cpp:390] Starting event publisher run loop: inotify
Using a virtual database. Need help, type '.help'
I0915 18:01:53.917121 11775 eventfactory.cpp:390] Starting event publisher run loop: udev
osquery>
osquery> select * from bpf_process_events limit 10;
+-------+-------+--------+-----+-----+-----+-----------+-------------+---------+--------------+-------+-----------------------+----------+-----------------+
| tid | pid | parent | uid | gid | cid | exit_code | probe_error | syscall | path | cwd | cmdline | duration | ntime |
+-------+-------+--------+-----+-----+-----+-----------+-------------+---------+--------------+-------+-----------------------+----------+-----------------+
| 11643 | 11643 | 11641 | 0 | 0 | 706 | 0 | 0 | exec | /usr/bin/cut | /root | cut -c6 | 131027 | 439105822391599 |
| 11650 | 11650 | 11645 | 0 | 0 | 706 | 0 | 0 | exec | /usr/bin/cut | /root | cut -c8 | 157591 | 439105823516615 |
| 11656 | 11656 | 11653 | 0 | 0 | 706 | 0 | 0 | exec | /usr/bin/cut | /root | cut -c9 | 179209 | 439105824704640 |
| 11659 | 11659 | 11657 | 0 | 0 | 706 | 0 | 0 | exec | /usr/bin/cut | /root | cut -c10 | 111678 | 439105826105671 |
| 11662 | 11662 | 11660 | 0 | 0 | 706 | 0 | 0 | exec | /bin/ls | /root | ls -ld /home/jiriogbe | 122515 | 439105827081868 |
| 11663 | 11663 | 11660 | 0 | 0 | 706 | 0 | 0 | exec | /usr/bin/cut | /root | cut -f1 '-d ' | 112134 | 439105827149882 |
| 11666 | 11666 | 11664 | 0 | 0 | 706 | 0 | 0 | exec | /usr/bin/cut | /root | cut -c6 | 114199 | 439105830273089 |
| 11669 | 11669 | 11667 | 0 | 0 | 706 | 0 | 0 | exec | /usr/bin/cut | /root | cut -c8 | 107158 | 439105831316658 |
| 11672 | 11672 | 11670 | 0 | 0 | 706 | 0 | 0 | exec | /usr/bin/cut | /root | cut -c9 | 181226 | 439105832437880 |
| 11675 | 11675 | 11673 | 0 | 0 | 706 | 0 | 0 | exec | /usr/bin/cut | /root | cut -c10 | 132795 | 439105833717800 |
+-------+-------+--------+-----+-----+-----+-----------+-------------+---------+--------------+-------+-----------------------+----------+-----------------+
osquery> I0915 18:02:16.035015 11773 bpfeventpublisher.cpp:374] BPFEventPublisher failed to process event from tracer #175
I0915 18:02:16.044416 11773 bpfeventpublisher.cpp:374] BPFEventPublisher failed to process event from tracer #175
E0915 18:02:16.070116 11773 bpfeventpublisher.cpp:381] BPFEventPublisher has encountered 2 malformed events |
I should mention that I was seeing |
04de7d7
to
a62cd1a
Compare
d3539b4
to
22b0586
Compare
I just tested the latest build https://github.com/osquery/osquery/suites/3940064858/artifacts/9875875 with bpf events enabled I am no longer seeing malformed events or the memory leak, osqueryd has stayed at about 100mb of memory usage on my test instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything obviously wrong, but the bpf code is somewhat beyond me. @zwass or @theopolis either of you want to peek a this? I can approve otherwise.
22b0586
to
4c738e4
Compare
Code seems okay to me. @zwass do you want to review this at all? |
4c738e4
to
9facb8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look through and did not see anything concerning. It's a large PR and I can't claim to have understood it thoroughly. I like the idea of refreshing the state and logging an error report on interval.
What's the easiest way to test this? Just build? |
The CI builds unsigned packages at every commit of this PR (and master). |
When capturing syscall events that emit strings (i.e.: chdir()) or buffers (i.e.: connect()), there's a chance the memory we need to access has not been mapped yet (page fault), causing events to lack some of the syscall parameters we need for tracking system state.
This PR improves the BPF publisher reliability by capturing strings on syscall exit (as opposed to using the on enter program), increasing chances that the buffers we need are accessible when our probes run.