# BPF
* [eBPF - wikipedia](https://en.wikipedia.org/wiki/EBPF)

TODO: 2025-02-14
* bpftrace Programming - BPF Performance Tools Chapter 5
* Linux Observability with BPF
  * Chapter 5. BPF Utilities
    * bpftool: `/home/zhoujiagen/WSL2-Linux-Kernel/tools/bpf/bpftool`
    * bpftrace: https://github.com/bpftrace/bpftrace
    * kubectl-trace: https://github.com/iovisor/kubectl-trace/
    * ebpf_exporter: https://github.com/cloudflare/ebpf_exporter
  * Chapter 6. Linux Networking and BPF - Networking
  * Chapter 7. Express Data Path - Networking
  * Chapter 8. Linux Kernel Security, Capabilities, and Seccomp - Security

# 术语

* benchmark: 基准. 通过执行工作负载实验修改系统的状态.
* instrumentation: 插桩.
  * dynamic instrumentation: 可能存在被插桩的函数重命名问题. 
    * kprobes, uprobes
  * static instrumentation: 有稳定的事件名称.
    * tracepoints, USDT(user-level statically defined tracing)
* observability: 可观测性. 通过观察理解系统, 包括跟踪工具, 采样工具, 基于固定计数器的工具, 但不包括基准工具.
* profiling: 性能分析/剖析/轮廓.
* sampling: 采样. 取目标度量的子集输出目标的粗粒度概貌, 又称创建轮廓(profiling).
* snooping: 窥探.
* tracing: 跟踪. 基于事件的记录, 是BPF工具使用的插桩类型.
  * `strace(1)`
  * `tcpdump(8)` 
  * `execsnoop` ...


Technology terms:
* Stack Trace Walking
* Flame Graphs
* Event Sources
* kprobes: Kprobes enables you to dynamically break into any kernel routine and collect debugging and performance information non-disruptively. You can trap at almost any kernel code address, specifying a handler routine to be invoked when the breakpoint is hit.<br/>Kernel Probes (Kprobes): https://www.kernel.org/doc/html/latest/trace/kprobes.html
* uprobes
* Tracepoints: A tracepoint placed in code provides a hook to call a function (probe) that you can provide at runtime. A tracepoint can be “on” (a probe is connected to it) or “off” (no probe is attached).<br/>Using the Linux Kernel Tracepoints: https://www.kernel.org/doc/html/latest/trace/tracepoints.html
* USDT(User-level Statically Defined Tracing)
* Dynamic USDT
* PMC(Performance Monitoring Counters)
* perf_events
* Event tracing: Tracepoints can be used without creating custom kernel modules to register probe functions using the event tracing infrastructure.<br/>Event Tracing: https://www.kernel.org/doc/html/latest/trace/events.html


## Stack Trace Walking

- frame pointer-based stack walk

栈帧链表的头部总是可以在寄存器中(x86_64 RBP)找到, 返回地址在固定偏移量处.

- debuginfo

调试信息文件: 包含ELF调试信息文件(DWARF格式).

- LBR: last branch record

Intel处理器的特性: 在硬件缓冲区中记录包括函数调用分支在内的分支.

- ORC-based statck walk: Oops Rewind Capability

一种新的调试信息格式.

# Architecture
* [BPF Architecture](https://docs.cilium.io/en/latest/reference-guides/bpf/architecture/): in cilium

Figure 2-1 BPF tracing technologies - BPF Performance Tools

* instruction set
* maps: key/value stores
* helper functions: to interact with and leverage kernel functionality
* tail calls: for calling into other BPF programs
* security hardening primitives
* a pseudo file system: pin objects(map, program)
* infrastructure for allowing BPF to be offloaded (ex. to a network card)

clang编译C到BPF目标文件, 加载入内核.

使用BPF的内核子系统:
* tc: 在网络栈的后续阶段执行, 可以访问更多的元数据和内核核心功能.
* XDP: 在最早的网络驱动器阶段附加, 接收到每个报文时触发BPF程序的运行.
* tracing: kprobes, uprobes, tracepoints.


## Instructions

- Classic BPF: [filter.h](https://github.com/torvalds/linux/blob/master/include/uapi/linux/filter.h) and [bpf_common.h](https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf_common.h)
- Extended BPF: [bpf.h](https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h) and [bpf_common.h](https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf_common.h)

## bpf system call

In [2]:
!man 2 bpf

BPF(2)                     Linux Programmer's Manual                    BPF(2)

NAME
       bpf - perform a command on an extended BPF map or program

SYNOPSIS
       #include <linux/bpf.h>

       int bpf(int cmd, union bpf_attr *attr, unsigned int size);

DESCRIPTION
       The  bpf()  system  call  performs a range of operations related to ex‐
       tended Berkeley Packet Filters.  Extended BPF (or eBPF) is  similar  to
       the  original  ("classic")  BPF  (cBPF) used to filter network packets.
       For both cBPF and eBPF programs, the  kernel  statically  analyzes  the
       programs  before loading them, in order to ensure that they cannot harm
       the running system.

       eBPF extends cBPF in multiple ways, including the  ability  to  call  a
       fixed set of in-kernel helper functions (via the BPF_CALL opcode exten‐
       sion provided by eBPF) and access shared data structures such  as  eBPF
       maps.

   Extended BPF Design/Architecture
       eBPF  maps  a

## /sys/fs/bpf

In [None]:
!tree /sys/fs/bpf

[01;34m/sys/fs/bpf[00m

0 directories, 0 files


# 工具

## 相关工具

## core

* BPF in Linux kernel: 内核中BPF运行时, 包括指令集, 存储对象和辅助函数, 以及解释器, JIT编译器, 验证器(verifier).
* BCC: 提供了C语言的内核插桩, Python和lua前端.
* bpftrace: 用于Linux的高级跟踪语言. 使用LLVM作为后端将脚本编译为eBPF字节码, 使用libbpf和bcc与Linux BPF子系统和已有的跟踪能力交互.
  * Linux已有的跟踪能力: kprobes, uprobes, tracepoints等.
* [ply](https://github.com/iovisor/ply): 轻量级的Linux动态跟踪器, 有较少的外部依赖.
* [libbcc](https://github.com/iovisor/bcc/blob/master/src/cc/libbcc.pc.in): BCC Program library
* [libbpf](https://github.com/libbpf/libbpf): libbpf is a C-based library containing a BPF loader that takes compiled BPF object files and prepares and loads them into the Linux kernel. libbpf takes the heavy lifting of loading, verifying, and attaching BPF programs to various kernel hooks, allowing BPF application developers to focus only on BPF program correctness and performance.

## related

* LLVM
* kprobes
* uprobes
* tracepoints
* perf(1): Linux性能工具集.
* Ftrace: Ftrace is an internal tracer designed to help out developers and designers of systems to find what is going on inside the kernel. It can be used for debugging or analyzing latencies and performance issues that take place outside of user-space.<br/>ftrace - Function Tracer: https://www.kernel.org/doc/html/latest/trace/ftrace.html
* Dynamic instrumentation: DTrace, SystemTap, BCC, bpftrace, ...
* LTT: first Linux tracer in 1999
* Dprobes: 2000, lead to kprobes
* DTrace: 一个包括了编程语言和工具的观测框架. 通过称为探针的指令点, 可以观察所有用户级和内核级的代码. http://dtrace.org/blogs/about/ https://github.com/dtrace4linux
* SystemTap: 对用户级和内核级的代码提供静态和动态跟踪能力: 静态探针使用tracepoint, 动态探针使用kprobes, 用户级别的探针使用uprobes. https://sourceware.org/systemtap/
* ktap: for VM-based tracers

## misc

* strace: 基于Linux系统的系统调用跟踪.
  - [strace: linux syscall tracer](https://strace.io/)
  - [strace(1) — Linux manual page](https://man7.org/linux/man-pages/man1/strace.1.html)
* oprofile: Linux系统剖析
* `/proc`: 提供内核统计信息的文件系统接口.
* `/sys`: sysfs文件系统, 为内核统计提供一个基于目录的结构.
* blktrace: 块I/O跟踪.
* tcpdump: 网络包跟踪, 使用了libpcap库. http://man7.org/linux/man-pages/man1/tcpdump.1.html
* pmap: 进程的内存段和使用统计.
* [KUtrace](https://github.com/dicksites/KUtrace): KUtrace is an extremely low-overhead Linux kernel tracing facility for observing all the execution time on all cores of a multi-core processor, nothing missing, while running completely unmodified user programs written in any computer language. It has been used in live datacenters (x86 processors) and in real-time autonomous driving (ARM processors) to understand long-standing performance mysteries. The design goal of KUtrace is to reveal the root cause(s) of unexpected delayed responses in real-time transactions or database processing while having such low overhead that it does not distort the system under test.


* ping: send ICMP ECHO_REQUEST to network hosts http://man7.org/linux/man-pages/man8/ping.8.html
* nicstat: print network traffic statistics
* dstat: versatile tool for generating system resource statistics Dstat is a versatile replacement for **vmstat**, **iostat** and **ifstat**.
* ifstat: eport InterFace STATistics Ifstat is a little tool to report interface activity, just like iostat/vmstat do for other system statistics.
* netstat: 网络接口的统计, TCP/IP栈的统计. Print network connections, routing tables, interface statistics, masquerade connections, and multicast memberships [http://man7.org/linux/man-pages/man8/netstat.8.html](http://man7.org/linux/man-pages/man8/netstat.8.html)
* pidstat: Report statistics for Linux tasks. http://man7.org/linux/man-pages/man1/pidstat.1.html
* btrace, blktrace: perform live tracing for block devices http://man7.org/linux/man-pages/man8/btrace.8.html generate traces of the i/o traffic on block devices http://man7.org/linux/man-pages/man8/blktrace.8.html
* iotop: simple top-like I/O monitor http://man7.org/linux/man-pages/man8/iotop.8.html
* slabtop: display kernel slab cache information in real time http://man7.org/linux/man-pages/man1/slabtop.1.html

## BPF工具
![](https://www.brendangregg.com/BPF/bpf_performance_tools.png)

# BCC
* https://github.com/iovisor/bcc
* [bcc Reference Guide](https://github.com/iovisor/bcc/blob/master/docs/reference_guide.md): 参考指南.
* [bcc Python Developer Tutorial](https://github.com/iovisor/bcc/blob/master/docs/tutorial_bcc_python_developer.md): Python开发教程.

BCC was the first higher-level tracing framework developed for BPF.

It provides a C programming environment for writing kernel BPF code and other languages for the user-level interface: Python, Lua, and C++.


Actions: [BCC.ipynb](./BCC.ipynb)

## Features

Kernel Space:

- dynamic instrumentation, kernel-level: kprobes
- dynamic instrumentation, user-level: uprobes
- static tracing, kernel-level: tracepoints
- timed sampling events: BPF with `perf_event_open()`
- PMC events: BPF with `perf_event_open()`
- filtering: via BPF programs
- debug output: `bpf_trace_printk()`
- per-event output: `bpf_perf_event_output()`
- basic variables: global and per-thread variables, via BPF maps
- associative arrays: via BPF maps
- frequency counting: via BPF maps
- histograms: power-of-two, linear, custom, via BPF maps
- timestamps and time deltas: `bpf_ktime_get_ns()` and BPF programs
- stack trace, kernel: BPF stackmap
- stack trace, user: BPF stackmap
- overwrite ring buffers: `perf_event_attr.write_backward`
- low-overhead instrumentation: BP JIT, BPF map summarizes
- production safe: BPF verifier

User Space:

- static tracing, user-level: SystemTap-stype USDT probes, via uprobes
- debug output: Python with `BPF.trace_pipe()` and `BPF_.trace_fields()`
- per-event output: `BPF_PERF_OUTPUT` macro and `BPF.open_perf_buffer()`
- interval output: `BPF.get_table()` and `table.clear()`
- histogram printing: `table.print_log2_hist()`
- C struct navigation, kernel-level: BCC rewriter maps to `bpf_probe_read()`
- symbol resolution, kernel-level: `ksym()`, `ksymaddr()`
- symbol resolution, user-level: `usymaddr()`
- debuginfo symbol resolution support
- BPF tracepoint support: via `TRACEPOINT_PROBE`
- BPF stack trace support: `BPF_STACK_TRACE`
- various other helper macros and functions
- examples: under `/examples`
- tools: under `/tools`
- tutorials: under `/docs/tutorial*.md`
- reference guide: under `/docs/reference_guide.md`

## Installation

WSL: https://github.com/iovisor/bcc/blob/master/INSTALL.md#wslwindows-subsystem-for-linux---binary

In [43]:
%env ROOT_PWD=xxx

env: ROOT_PWD=xxx


In [3]:
!echo $ROOT_PWD | sudo -S apt-get install flex bison libssl-dev libelf-dev dwarves bc

Reading package lists... Donen: 
Building dependency tree... Done
Reading state information... Done
bc is already the newest version (1.07.1-3build1).
bc set to manually installed.
Suggested packages:
  bison-doc flex-doc libssl-doc
The following NEW packages will be installed:
  bison dwarves flex libelf-dev libfl-dev libssl-dev
0 upgraded, 6 newly installed, 0 to remove and 43 not upgraded.
Need to get 3856 kB of archives.
After this operation, 19.7 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 dwarves amd64 1.21-0ubuntu1~20.04.1 [359 kB]
Get:2 http://th.archive.ubuntu.com/ubuntu jammy/main amd64 flex amd64 2.6.4-8build2 [307 kB]
Get:3 http://th.archive.ubuntu.com/ubuntu jammy/main amd64 bison amd64 2:3.8.2+dfsg-1build1 [748 kB]
Get:4 http://th.archive.ubuntu.com/ubuntu jammy/main amd64 libelf-dev amd64 0.186-1build1 [64.4 kB]
Get:5 http://th.archive.ubuntu.com/ubuntu jammy/main amd64 libfl-dev amd64 2.6.4-8build2 [6236 

In [None]:
%cd

/home/zhoujiagen
/home/zhoujiagen


  bkms = self.shell.db.get('bookmarks', {})
  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [None]:
!pwd

/home/zhoujiagen


In [None]:
# KERNEL_VERSION=$(uname -r | cut -d '-' -f 1)
KERNEL_VERSION=!uname -r | cut -d '-' -f 1
KERNEL_VERSION=KERNEL_VERSION[0]
print(KERNEL_VERSION)
# !git clone --depth 1 https://github.com/microsoft/WSL2-Linux-Kernel.git -b linux-msft-wsl-$KERNEL_VERSION
!git clone --depth 1 https://github.com/microsoft/WSL2-Linux-Kernel.git -b linux-msft-wsl-{KERNEL_VERSION}
%cd WSL2-Linux-Kernel

!cp Microsoft/config-wsl .config
# CONFIG_IKHEADERS=m
!make oldconfig && make prepare
!make scripts
!make modules
!echo $ROOT_PWD | sudo -S make modules_install

# !mv /lib/modules/$KERNEL_VERSION-microsoft-standard-WSL2+/ /lib/modules/$KERNEL_VERSION-microsoft-standard-WSL2
!echo $ROOT_PWD | sudo -S mv /lib/modules/{KERNEL_VERSION}-microsoft-standard-WSL2+ /lib/modules/{KERNEL_VERSION}-microsoft-standard-WSL2

In [26]:
!pwd

/home/zhoujiagen/WSL2-Linux-Kernel


In [31]:
KERNEL_VERSION=!uname -r | cut -d '-' -f 1
KERNEL_VERSION=KERNEL_VERSION[0]
print(KERNEL_VERSION)
!echo $ROOT_PWD | sudo -S mv /lib/modules/5.15.153.1-microsoft-standard-WSL2+ /lib/modules/5.15.153.1-microsoft-standard-WSL2

5.15.153.1
[sudo] password for zhoujiagen: mv: cannot stat '/lib/modules/5.15.153.1-microsoft-standard-WSL2+': No such file or directory


In [36]:
# !echo $ROOT_PWD | sudo -S apt-get install bpfcc-tools linux-headers-$(uname -r)
!echo $ROOT_PWD | sudo -S apt-get install bpfcc-tools

Reading package lists... Donen: 
Building dependency tree... 0%

Building dependency tree... Done
Reading state information... Done
bpfcc-tools is already the newest version (0.12.0-2).
0 upgraded, 0 newly installed, 0 to remove and 43 not upgraded.


In [37]:
!ls /sbin | grep bpfcc

argdist-bpfcc
bashreadline-bpfcc
biolatency-bpfcc
biosnoop-bpfcc
biotop-bpfcc
bitesize-bpfcc
bpflist-bpfcc
btrfsdist-bpfcc
btrfsslower-bpfcc
cachestat-bpfcc
cachetop-bpfcc
capable-bpfcc
cobjnew-bpfcc
cpudist-bpfcc
cpuunclaimed-bpfcc
criticalstat-bpfcc
dbslower-bpfcc
dbstat-bpfcc
dcsnoop-bpfcc
dcstat-bpfcc
deadlock-bpfcc
deadlock.c-bpfcc
drsnoop-bpfcc
execsnoop-bpfcc
exitsnoop-bpfcc
ext4dist-bpfcc
ext4slower-bpfcc
filelife-bpfcc
fileslower-bpfcc
filetop-bpfcc
funccount-bpfcc
funclatency-bpfcc
funcslower-bpfcc
gethostlatency-bpfcc
hardirqs-bpfcc
inject-bpfcc
javacalls-bpfcc
javaflow-bpfcc
javagc-bpfcc
javaobjnew-bpfcc
javastat-bpfcc
javathreads-bpfcc
killsnoop-bpfcc
klockstat-bpfcc
llcstat-bpfcc
mdflush-bpfcc
memleak-bpfcc
mountsnoop-bpfcc
mysqld_qslower-bpfcc
nfsdist-bpfcc
nfsslower-bpfcc
nodegc-bpfcc
nodestat-bpfcc
offcputime-bpfcc
offwaketime-bpfcc
oomkill-bpfcc
opensnoop-bpfcc
perlcalls-bpfcc
perlflow-bpfcc
perlstat-bpfcc
phpcalls-bpfcc
phpflow-bpfcc
phpstat-bpfcc
pidpersec-bpfcc
pro

Fix:

```python
# /usr/lib/python3/dist-packages/bcc/table.py
from collections.abc import MutableMapping
```


```c
// /lib/modules/5.15.153.1-microsoft-standard-WSL2/build/include/linux/compiler-clang.h
#if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP)
//#define __HAVE_BUILTIN_BSWAP32__
// #define __HAVE_BUILTIN_BSWAP64__
//#define __HAVE_BUILTIN_BSWAP16__
#endif /* CONFIG_ARCH_USE_BUILTIN_BSWAP */
```

## bpftool
* https://github.com/iovisor/bcc/tree/master/libbpf-tools

tool for inspection and simple manipulation of eBPF programs and maps

# bpftrace
* https://github.com/iovisor/bpftrace

bpftrace is a newer front end that provides a special-purpose, high-level language for developing BPF tools.

Actions: [bpftrace.ipynb](./bpftrace.ipynb)

## Features

Event Sources:

- dynamic instrumentation, kernel-level: kprobe
- dynamic instrumentation, user-level: uprobe
- static tracing, kernel-level: tracepoint, software
- static tracing, user-level: usdt, via libbcc
- timed sampling events: profile
- interval events: interval
- PMC events: hardware
- synthetic events: BEGIN, END


Actions:

- filtering: predicates
- per-event output: `printf()`
- base variables: `global`, `$local`, `per[tid]`
- built-in variables: `pid`, `tid`, `comm`, `nsecs`, ...
- associative arrays: `key[value]`
- frequency counting: `count()`, `++`
- statistics: `min()`, `max()`, `sum()`, `avg()`, `stats()`
- histogram: `hist()`, `lhist()`
- timestamps and time deltas: `nsecs`, hash storage
- stack trace, kerbel: kstack
- stack trace, user: ustack
- symbol resolution, kernel-level: `ksym()`, `kaddr()`
- symbol resolution, user-level: `usym()`, `uaddr()`
- C struct navifation: `->`
- array access: `[]`
- shell commands: `system()`
- printing files: `cat()`
- positional parameter: `$1`, `$2`, ...


General Features:

- low-overhead instrumentation: BPF JIT, maps
- production safe: BPF verifier
- tools: under `/tools`
- tutorial: `/docs/tutorial_one_liners_chinese.md`
- reference guide: `/docs/reference_guide.md`

## Installation

In [21]:
!echo $ROOT_PWD | sudo -S apt-get install bpftrace

Reading package lists... Donen: 
Building dependency tree... Done
Reading state information... Done
bpftrace is already the newest version (0.9.4-1).
0 upgraded, 0 newly installed, 0 to remove and 43 not upgraded.


In [38]:
!which bpftrace

/usr/bin/bpftrace
