-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
88 lines (80 loc) · 4.29 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
=== Structure of this repository ===
The driver/ directory contains the bpfhv guest driver for
Linux kernels (>= 4.18):
- bpfhv.c: driver source code
The proxy/ directory contains the implementation of an external
backend process associated to the QEMU bpfhv-proxy network backend
(-netdev bpfhv-proxy).
The external backend provides the queue processing functionalities
for a single bpfhv device that belongs to a QEMU VM. In other words,
QEMU only implements the control functionalities of the device, while
RX and TX queues are processed by the external process. Similarly
to vhost-user, QEMU and the external backend process use a dedicated
control channel to exchange some information needed for the packet
processing tasks (e.g. guest memory map, addresses of each TX and RX
queue, file descriptors for notifications, etc.).
Files:
- backend.c: main file, implements the control protocol and
two packet processing loops (the first one is
a poll() event loop, whereas the second one uses
busy-wait;
- sring.[ch]: hv implementation of a device which uses a minimal
descriptor format, with no support for offloads (and
reduced per-packet overhead);
- sring_progs.c: eBPF programs for the sring device;
- sring_gso.[ch]: hv implementation of a device which uses an
extended descriptor format, supporting checksum
offloads and TCP/UDP segmentation offloads;
- sring_gso_progs.c: eBPF programs for the sring_gso device
- vring_packed.[ch]: hv implementation of the packed virtqueue
in the VirtIO 1.1 specification;
- vring_packed_progs.c: eBPF programs for the vring_packed device;
- start-qemu.sh: an example script to start a QEMU VM with a
bpfhv device peered with a bpfhv-proxy network
backend;
- start-proxy.sh: an example script to start the external backend
process and configure the backend network device
(e.g. a TAP interface or a netmap port);
=== Some advantages of bpfhv ===
- Have doorbells on separate pages (configurable stride)
- Provider can evolve the medatata header (e.g., virtio-net)
to balance between the needs of FreeBSD and Linux.
(virtio-net is good for Linux, but not for FreeBSD).
- Virtio 1.1 vs 1.0 (while 0.95 is still around). This is a
sign that there is a need for evolution and compatibility
problems.
- You can define a metadata format (e.g. virtio-net header)
that fits the specific hardware NIC features used by the
cloud provider.
- Let the provider inject code to encrypt/decrypt the payload,
together with the hardcoded key. The encrypt/decrypt routines
can be helper functions that take as argument the OS packet
pointer and the key.
- Simplification of device paravirtualization. Fixed datapath
ABI means that you need to be backward compatible. Look at
virtio implementation in Linux 4.20: it needs to support
both split and packet ring --> complex, error prone, less
efficient.
- Change virtual switch and backend under the hood (tap,
netmap, other).
- Adapt to changing workloads.
=== TODOs (driver) ===
- Let BPFHV_MAX_TX_BUFS and BPFHV_MAX_RX_BUFS be variable.
This requires of course reshaping the layout of the
context data structures.
- Try to replace dma_map_single() with dma_map_page() on
the RX datapath ? Not sure this is relevant.
- What if the eBPF program needs to modify the SG layout,
e.g., for encapsulation or encryption? This would require
changing the paddr/vaddr/len in the buffer descriptors,
and DMA mapping and unmapping... So maybe we should ask
the eBPF program to DMA map/unmap so, that it can do
that after encapsulation or encryption (i.e. once
the SG layout is stable).
=== TODOs (qemu) ===
- Replace cpu_physical_memory_[un]map() with dma_memory_[un]map()
and the MemoryRegionCache library. This should be only necessary
if the guest platform has an IOMMU.
Code in virtqueue_pop() and virtqueue_push().
- let backend.ops.init fail (vring packed < 2^15)
- move vring_packed generic code at the top of the files