Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add remote device support via bpfd #2298

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

10ne1
Copy link
Contributor

@10ne1 10ne1 commented Apr 2, 2019

Hello, this is a continuation of the work in PR #1675. I'm very happy with the current state of bpfd and would like to eventually see it merged.

First of all, big thanks to @joelagnel and @jcanseco for starting the project and doing the initial development. Really awesome. I've also made some fixes and additions since the last PR, see below for a general overview.

Currently bpfd runs as intended on x86/64 and aarch64. There is WIP to support 32bit arm in #2292 (bpfd itself cross-compiles and runs on 32bit arm without problems, additional work is needed only for BCC to correctly cross-compile ebpf programs to be sent to bpfd).

Changes since PR #1675:

  • Replaced the custom struct bpf_create_map_args with the kernel-provided struct bpf_create_map_attr.
  • Starting with commit 8300c7b, map creation has moved from the frontend where "fake" file descriptors are now generated to bpf_module.cc, so the remote map creation callback call was also moved to the corresponding place in bpf_module.cc.
  • Dropped the first commit 240a982 of the previous series because it's not necessary anymore after the above change.
  • Added two new args to libremote & bpfd attach_kprobe functions: int maxactive (for kretprobes) and uint64_t fn_offset (for kprobes). They were added to BCC/kernel since the last PR.
  • Fixed some hangs in the bpfd C code because perf_reader_poll() / perf_remote_reader added stdin (fd 0) to the list of fd's to poll. Also fixed some uninitialized memory bugs in parse_user_input which lead to bad free's and fixed some other reasource leakages/out-of-bounds reads/writes in the perf event logic.
  • KeyboardInterupt's are properly handled in the remote/pexpect backends, no more random stack traces from libremode/shell/adb/ssh when interrupting a BCC python script.
  • Lots of python 3 compatibility fixes (libremote was mostly tested with python 2 until now).
  • Added new commits to significantly shrink the size of bpfd and allow it to be built/packaged separately. Also added some toolchain files to ease cross-compilation or be used as reference when cross-compiling.
  • Adedd a new ssh backend.
  • All tests added in test_tools_on_remote.py pass with both python 2 and 3. Python 2 performance is noticeably worse than python 3 due to bpfd/libremote's heavy reliance on string/regex processing.

@yonghong-song
Copy link
Collaborator

[buildbot, test this please]

@10ne1
Copy link
Contributor Author

10ne1 commented Apr 5, 2019

[buildbot, test this please]

@yonghong-song Looks like I can't trigger jenkins build jobs? :) The packaging issues should be fixed now.

@yonghong-song
Copy link
Collaborator

[buildbot, test this please]

@yonghong-song
Copy link
Collaborator

Thanks, just requested buildbot.

@yonghong-song
Copy link
Collaborator

@joelagnel You are the original author of bpfd. Could you also help review this patch? Thanks.

@10ne1
Copy link
Contributor Author

10ne1 commented Apr 5, 2019

I've fixed the static C test NULL callback segfault and am currently looking in the stackcount test flakiness (it happens because bpfd hangs sometimes when when the sys_bpf() syscall cmd = BPF_MAP_GET_NEXT_KEY).

@yonghong-song
Copy link
Collaborator

[buildbot, test this please]

@10ne1
Copy link
Contributor Author

10ne1 commented Apr 5, 2019

@yonghong-song I also fixed the stackcount flakiness, it was because you implemented sys_bpf() BPF_MAP_GET_NEXT_KEY with a different semantic in this commit than what was implemented here and expected by BPFd.

Basically if the get_next_key key arg is not valid your implementation still returns the first key (value 0 in our case) instead of -ENOENT which bpfd expected from the alternative implementation.

I fixed it by making BPFd detect when the same key is returned multiple times, then there are no more elems to delete.

 do {
     (...)
     ret = bpf_delete_elem(map_fd, kbin);
     (...)
     ret = bpf_get_next_key(map_fd, kbin, next_kbin);
+    if (!memcmp(kbin, next_kbin, klen))
+      same_keys = 1;
     (...)
 while (ret >=0 && !same_keys);

@10ne1 10ne1 force-pushed the master-bpfd-upstream branch 2 times, most recently from 3dfac73 to 4de9918 Compare April 8, 2019 10:21
@russoue
Copy link

russoue commented Apr 17, 2019

@10ne1, I am interested in this PR. Do you have any doc with some examples on how to use the daemon? Is there any doc on the API? I see that the daemon reads from stdin, where can I learn how to construct a command like cmd = "BPF_ATTACH_UPROBE {} {} {} {} {} {}".format(fd, t, evname, binpath, offset, pid)?

@10ne1
Copy link
Contributor Author

10ne1 commented Apr 17, 2019

@russoue I'm preparing a blog post containing the info you're asking but it's not ready yet, I've published two parts already here and here but they're about eBPF in general not BPFd specifically.

To briefly answer your questions, the daemon communicates via stdin/stdout but you shouldn't interact with it directly (only maybe for debugging the bpfd binary itself). All communication is handled by BCC and you only need to export some environment variables then run the bcc-tools as you normally would. See this commit message.

@joelagnel
Copy link
Contributor

Sorry I missed this. I will review it today. Thanks

@yonghong-song
Copy link
Collaborator

@joelagnel Thanks!

jcanseco and others added 7 commits April 23, 2019 15:07
This patch adds BPFd, a standalone executable designed to provide BCC the
ability to work across system and architecture boundaries.

This is done by loading the BPFd executable onto a remote target device to
have it act as a proxy for whenever a BCC tool wishes to perform an operation on
the system (e.g. load BPF programs, read /proc/kallsyms, attach kprobes, etc.).

This arrangement allows developers to have kernel sources and the LLVM stack
on a separate host machine (e.g. the development machine) instead of needing
to set all these up on the target device, thereby allowing for the drastic
reduction of space required on a target for BCC tools to run. The reduction of
the space requirement, in particular, becomes a much more critical factor
for devices that have more limited disk space (e.g. embedded devices).

In addition, the above set-up also allows developers to run clang on a
different architecture than the target's architecture, thus facilitating
cross-compilation development.

However, the natural disadvantage for cross-developers is that there is a
need to have a copy of the target's kernel sources on the host for the
above set-up to work.

For more information, please check out the README in the original
BPFd repository (https://github.com/joelagnel/bpfd) and this LWN article
explaining the purpose of and how BPFd works in more detail
(https://lwn.net/Articles/744522/)

Signed-off-by: Jazel Canseco <jcanseco@google.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
The bpf shared library (i.e. libbpf.so) is included since the bpfd
executable dynamically links to it.

Signed-off-by: Jazel Canseco <jcanseco@google.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
This modifies BCC to query BPFd whenever BCC wishes to perform an
operation on a remote target device (e.g. load BPF programs, read
/proc/kallsyms, attach kprobes, etc.)

If no remote target device has been configured, BCC defaults to
performing the operations on the local system just like before.

Signed-off-by: Jazel Canseco <jcanseco@google.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Jazel Canseco <jcanseco@google.com>
libbcc is big because it links to llvm/clang and, linking directly to it,
bpfd has a much bigger footprint than necessary. Since we don't want
to compile restricted-C to eBPF on embedded devices running just bpfd,
we can achieve a much smaller disk footprint by linking only the
minimum objects required by bpfd to run and communicate with the host bcc
to load/unload the pre-compiled ebpf programs.

Before this commit:
$ ldd src/cc/bpfd/bpfd
   linux-vdso.so.1 (0x00007ffecc3df000)
   libz.so.1 => /usr/lib/libz.so.1 (0x00007f02b5606000)
   libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f02b5601000)
   libncursesw.so.6 => /usr/lib/libncursesw.so.6 (0x00007f02b5592000)
   libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f02b5571000)
   libelf.so.1 => /usr/lib/libelf.so.1 (0x00007f02b5557000)
   libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f02b53c8000)
   libm.so.6 => /usr/lib/libm.so.6 (0x00007f02b5241000)
   libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f02b5227000)
   libc.so.6 => /usr/lib/libc.so.6 (0x00007f02b5063000)
   /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f02b7a67000)
$ ls -lsha src/cc/bpfd/bpfd
40M -rwxr-xr-x 1 adi adi 40M Mar 25 10:59 src/cc/bpfd/bpfd

After:
$ ldd ./src/cc/bpfd/bpfd
   linux-vdso.so.1 (0x00007ffffc5f1000)
   libelf.so.1 => /usr/lib/libelf.so.1 (0x00007f93b60d6000)
   libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f93b5f47000)
   libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f93b5f2d000)
   libc.so.6 => /usr/lib/libc.so.6 (0x00007f93b5d69000)
   libz.so.1 => /usr/lib/libz.so.1 (0x00007f93b5b52000)
   /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f93b614f000)
   libm.so.6 => /usr/lib/libm.so.6 (0x00007f93b59cd000)
$ ls -lsha ./src/cc/bpfd/bpfd
204K -rwxr-xr-x 1 adi adi 203K Mar 23 19:18 ./src/cc/bpfd/bpfd

Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Building bpfd should not require also building full BCC, because bpfd
is much smaller (~180kb release binary), has fewer dependencies and is
easier to cross compile and run on resource constrained embedded
devices without LLVM/Python.

This enables standalone bpfd builds, for example, to cross-compile
bpfd for 32bit arm systems:

$ mkdir build; cd build
$ cmake -DCMAKE_TOOLCHAIN_FILE=../cmake/toolchain-arm.cmake ..
$ make

When running the above cross-build instructions, the proper
cross-compilation dependencies (libelf, libz, libstdc++) and
toolchains are needed (can be added to CMAKE_FIND_ROOT_PATH).

Bpfd is also built when compiling full BCC.

Distribution packagers can package bpfd stand-alone like the BCC
python bindings or together with the rest of bcc.

Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
This backend allows BCC to connect and run eBPF on remote targets
with bpfd's stdin/stdout attached to ssh sockets. It deliberately
avoids the complications of ssh authentication and authorization,
so the ssh communication channel should be setup before BCC runs.

The environment variables necessary to use this backend:
BCC_REMOTE=ssh
BCC_REMOTE_SSH_USER=<user>
BCC_REMOTE_SSH_ADDR=<ip-addr-or-hostname>

Optional variables:
BCC_REMOTE_SSH_CMD=<cmd> (default runs 'bpfd' in $PATH)
BCC_REMOTE_SSH_PORT=<port> (default 22)

Special priviledges (usually root/sudo access) to run BCC need to be
present only on the remote machine which runs bpfd, the local host
running python/bcc/llvm can run as a normal user.

This backend can also be used on machines with different architectures
than x86 by enabling BCC cross-compilation via the ARCH environment
variable (example ARCH=arm64).

Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
@yonghong-song
Copy link
Collaborator

bpfd here intends to solve a problem where remote host has limited memory/disk space to run full blown bcc. I have been thinking whether the alternative, compile-once-run-everywhere (CO-RE) (https://lwn.net/Articles/773198/) is easier. The goal of CO-RE is to compile bcc programs into .o and this .o can be loaded into remote host for execution with bcc. Yes, bcc will need adding an interface to load object files which has been our long term goal as well.

With CO-RE, we can solve the same problem, the bcc program with CO-RE should be slimmer in size as we can avoid to link vast llvm/clang libraries. bcc program should take less memories at run time as well. From timing perspective, this should be better too as the long latency between different bpf syscalls are avoided. The support of CO-RE should be much less intrusive than bpfd here.

@anakryiko and I have been working on CO-RE for some time. @anakryiko will present some preliminary findings in LSFMM 2019 BPF track (https://lwn.net/Articles/779206/).

@10ne1
Copy link
Contributor Author

10ne1 commented Apr 24, 2019

@yonghong-song using CO RE is indeed interesting because the produced .o file contains BTF typing info which can alleviate some of the pains I'm experiencing cross-compiling with the current BCC implementation for 32bit ARM, so, in terms of portability, it definitely sounds like the right solution.

A few questions just to make sure I understand correctly its relation to BPFd:

  • Is CO RE planning to eliminate the current on-the-fly compilation done by libbcc linking & calling llvm/clang at run-time?
  • Will the restricted-C part of BCC programs be compiled just once at BCC build time? Am I correct in assuming the python/lua parts will remain mostly intact and thus BCC programs will still depend on a python runtime? (BPFd removes the python runtime dependency at the cost of increased complexity).
  • To load a CO RE .o file remotely, something like BPFd is still needed on the remote device to communicate with the host BCC, unless the plan is to discard BPFd entirely and install the slimmer / more lightweight version of BCC on the remote device. Is this the direction you're suggesting?

If I understand it correctly then I agree with using CO-RE, I would gladly accept to install python & BCC on the remote device and skip all the BPFd complexity if clang/LLVM goes away.

Do you have some more details, docs or changes I could look at?

@joelagnel
Copy link
Contributor

joelagnel commented Apr 24, 2019

Hi @10ne1 @yonghong-song

I wanted to discuss all the projects we are working on at Android how it relates to all of this:

(1)
This patch is to remove filesystem dependency on kernel headers:
https://lore.kernel.org/patchwork/patch/1061170/
I have a userspace patch that @4ast has been reviewing which is ready for merge IIUC:
#2312

(2)
About bpfd, I felt that running BCC programs remotely is harder and has lot of disadvantages that's why we started using a chroot hack to run BCC on device inside a debian shell. The main issue is limited feature set, and maintenance nightmare: every API in libbpf has to be proxied through bpfd, this is not maintanable. The other disadvantage of bpfd is that it works only for the python front-end. All these reasons are why I feel BPFd should not be merged and that's why I didn't push for it to be merged as-is.
I am not fully sure how CO-RE can get rid of runtime compilation that BCC needs since program sources are dynamically generated. @yonghong-song could you explain more?

(3) bpftrace on Android: Michał Gregorczyk (from fb) has been working on this. I believe the first step to get that to work easily is the kernel headers mentioned in step (1). The next step then is to get a slimmed down verison of LLVM/Clang/Python that has a small memory/disk footprint.

CC @anakryiko

@yonghong-song
Copy link
Collaborator

@10ne1 to answer your questions below:

Is CO RE planning to eliminate the current on-the-fly compilation done by libbcc linking & calling llvm/clang at run-time?
Yes.

Will the restricted-C part of BCC programs be compiled just once at BCC build time? Am I correct in assuming the python/lua parts will remain mostly intact and thus BCC programs will still depend on a python runtime? (BPFd removes the python runtime dependency at the cost of increased complexity).
It has not been decided when BCC programs will be compiled. My hack is to take bcc rewriter output and compiled it into an object file and loaded the object file into the bcc based application.
We could defined a few macros and conventions for C program so that we do not need to take rewriter output. Yes, without rewriter output, user has to explicitly using bpf_probe_read.
I also have not looked at USDT yet, but I think it is doable.

I have no plan to change lua.

For python, it can take an object file as well. bcc program does not depends on python runtime. python runtime gets information about bcc program through python/C++ interfaces.

To load a CO RE .o file remotely, something like BPFd is still needed on the remote device to communicate with the host BCC, unless the plan is to discard BPFd entirely and install the slimmer / more lightweight version of BCC on the remote device. Is this the direction you're suggesting?

The lightweight version of BCC can be installed. The object files will be supplied. This is the direction I would like to persue. The main issue with bpfd is long latency for every bpf syscall, which will severely skew the result.

Do you have some more details, docs or changes I could look at?
I will share the LSFMM CO-RE presentation as soon as it is done by @anakryiko.
I do not have any doc's yet. I will start to hash out the details and sharing in bcc mailing list.

@yonghong-song
Copy link
Collaborator

yonghong-song commented Apr 24, 2019

@joelagnel
Thanks for summarizing your work!

(1)
This patch is to remove filesystem dependency on kernel headers:
https://lore.kernel.org/patchwork/patch/1061170/
I have a userspace patch that @4ast has been reviewing which is ready for merge IIUC:
#2312

This sounds great. Thanks!

(2)
About bpfd, I felt that running BCC programs remotely is harder and has lot of disadvantages > that's why we started using a chroot hack to run BCC on device inside a debian shell. The main issue is limited feature set, and maintenance nightmare: every API in libbpf has to be proxied through bpfd, this is not maintanable. The other disadvantage of bpfd is that it works only for the python front-end. All these reasons are why I feel BPFd should not be merged and that's why I didn't push for it to be merged as-is.
I am not fully sure how CO-RE can get rid of runtime compilation that BCC needs since program sources are dynamically generated. @yonghong-song could you explain more?

We may have to compromise to generate slightly bloated programs. But this may or may not work for all programs, we have to experiment for this.
I will share more once I have more :-).

Another way is based on the specific options, the object file is generated on the development host and then the object file is copied to the target host and then target host can run bcc with supplied obj file. This is similar to bpfd, but should be much simpler as the daemon just takes remote objfile and command line and execute it.

@10ne1
Copy link
Contributor Author

10ne1 commented Apr 24, 2019

Thank you @yonghong-song for the additional info. The path you're suggesting sounds good and I'll eagerly await more news. We can revisit later the decision if a simpler daemon is needed or if the object files should be supplied directly to BCC on the target device, once the BTF-based implementation is further along.

Thank you both @joelagnel and @yonghong-song for all your hard work on this. Shall I close this PR?

@joelagnel
Copy link
Contributor

joelagnel commented Apr 24, 2019 via email

@joelagnel
Copy link
Contributor

@joelagnel

(2)
About bpfd, I felt that running BCC programs remotely is harder and has lot of disadvantages > that's why we started using a chroot hack to run BCC on device inside a debian shell. The main issue is limited feature set, and maintenance nightmare: every API in libbpf has to be proxied through bpfd, this is not maintanable. The other disadvantage of bpfd is that it works only for the python front-end. All these reasons are why I feel BPFd should not be merged and that's why I didn't push for it to be merged as-is.
I am not fully sure how CO-RE can get rid of runtime compilation that BCC needs since program sources are dynamically generated. @yonghong-song could you explain more?

We may have to compromise to generate slightly bloated programs. But this may or may not work for all programs, we have to experiment for this.
I will share more once I have more :-).

Another way is based on the specific options, the object file is generated on the development host and then the object file is copied to the target host and then target host can run bcc with supplied obj file. This is similar to bpfd, but should be much simpler as the daemon just takes remote objfile and command line and execute it.

@yonghong-song something like the trace BCC tool probably will be too difficult to implement without a dynamic runtime compilation process.

It would be nice if we could just include a small subset of llvm/clang infrastructure enough to compile programs on small memory/storage systems. I had started to look into that iin #2218 but I did not get time. @10ne1 you are welcomed to look more into #2218 if you have some time.

@yonghong-song
Copy link
Collaborator

something like the trace BCC tool probably will be too difficult to implement without a dynamic runtime compilation process.

That is right. We may have to compile trace.py programs on the fly or we will have to compile the code on dev host and send to the target host. So the project to reduce the llvm/clang size is still worth to pursue! Thanks!

@michalgr
Copy link
Contributor

Hi

It would be nice if we could just include a small subset of llvm/clang infrastructure enough to compile programs on small memory/storage systems. I had started to look into that iin #2218 but I did not get time. @10ne1 you are welcomed to look more into #2218 if you have some time.

I have some numbers that might be interesting in this context. I am cross compiling bcc, bpftrace and python for Android (arm64) + deps using makefiles available here and at the end I get a sysroot consisting of 400MB which is rather a lot. Some of the biggest contributors:

  • bin/bpftrace 40MB
  • bin/python3 12MB
  • lib/libbcc.so 105MB
  • lib/libclang.so 70MB
  • bpftools-0.0.1/lib/python3.6/* 137MB

It seems that python needs to be included in the list of things to reduce size of as well (asked about some ideas here: https://bugs.python.org/issue36735)

If I understand it correctly then I agree with using CO-RE, I would gladly accept to install python & BCC on the remote device and skip all the BPFd complexity if clang/LLVM goes away.

From what I am seeing python in it's own is quite a lot. Is there a slimmed down version that you're using ?

@10ne1
Copy link
Contributor Author

10ne1 commented Apr 26, 2019

It seems that python needs to be included in the list of things to reduce size of as well (asked about some ideas here: https://bugs.python.org/issue36735)

Agreed.

From what I am seeing python in it's own is quite a lot. Is there a slimmed down version that you're using ?

Yes, I'm using Yocto/OpenEmbedded based distributions where I have full control of packaging so I can leave out stuff I don't need. Also there are other tricks like for example you can compress all the python libraries because they can be executed from inside .zip files, see here and here. To give you a complete concrete example, even though it's for python 2, you can get an idea and adapt it / port to python 3, look at this recipe.

@joelagnel
Copy link
Contributor

@10ne1 @russoue could you send me an email at joel at joelfernandes.org . I wanted to collect email addresses of people working bpf/bcc for Android or mobile systems and coordinate any efforts on it. I spoke to @michalgr as well and he is quite interested in working on it. I don't think we need a separate mailing list and we can just use bpf@vger.kernel.org list, however I still want to collect people's email addresses so we can include them on CC on any work related to this. I think so far I have around 5-6 people who are working on this area. Thanks.

@Kullu14
Copy link

Kullu14 commented May 7, 2019

@joel I am also interested in working on BCC bpf on embedded systems I have tried bpfd on remote target setup using telnet. Can you please include me too the mailing list

@joelagnel
Copy link
Contributor

joelagnel commented May 7, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants