Uses ebpf_exporter to export Prometheus metrics about Bluetooth data in Linux, and it brings a Grafana and Prometheus on docker-compose for easy visualization of this data.
This project attaches some kprobes
and kretprobes
events inside the kernel to export Prometheus data of some relevant Bluetooth components.
Number of times that a read
and write
are called on all the Bluetooth sockets.
Additionally, the return of the syscall is registered, and it's possible to map errors in errono-base.h file
Notice how the number of packets decrease and are more unstable when the audio codec is changed from AAC
to SBC
.
Same as metric above, but shows the length of the read/writes
The size of the packet is the same even when changing codecs.
When a new sk_buff
is being allocated, there is a verification that checks if the field sk_wmem_alloc
on the struct sock
is bigger than sk_sndbuff
. If it's bigger, then it has two options:
When the socket is set as non-blocking, an EAGAIN
(-11) error is returned.
When a socket is blocking, then the call is in a loop until sk_wmem_alloc
is smaller.
This verification is done in the function sock_alloc_send_pskb
The default value of sk_sndbuff
(taken from /proc/sys/net/core/wmem_default
) is pretty high.
So, PipeWire and Pulseaudio set a much lower value to avoid out of sync errors.
This metric shows a heatmap showing the value of sk_wmem_alloc
every time an L2CAP or SCO send syscall is called.
Every struct hci_dev
associated with a controller has the fields acl_cnt
, sco_cnt
and le_cnt
.
When a acl
or le
packet is sent to btusb
, acl_cnt
or le_cnt
is decremented.
After the packet is acknowledged, the controller sends an event packet of type HCI_EV_NUM_COMP_PKTS
(0x13) with the processed packets. This value is then incremented to these fields.
The controller sends an event package of type HCI_EV_NUM_COMP_PKTS
with the number of completed packets.
When acl_cnt
, le_cnt
or sco_cnt
reach 0, then no packets are dequeued from the queue and consequently not sent to btusb
. This is used to not overflood the controller with packets it can't handle.
Tracks the time of how long the urb
took to be completed after it's submitted.
The delta is taken from the time of when the sk_buff
is sent to btusb
layer until the callback configured on usb_complete_t
field of the urb
is invoked.
Shows the delta when the packet hits the btusb
layer until the controller sends back the event packet HCI_EV_NUM_COMP_PKTS
.
When taking the headset to the kitchen for around 1 minute and a half, the time the controller acknowledges a packet is higher, but the time to send the urb
to the controller remains the same.
Notice: There is an issue that happened a few times that this event packet is not sent on my controller (AX200).
This means the BPF_QUEUE
is outdated and presents a wrong value.
The responsibility of the kernel is to be the bridge between user space and the Bluetooth controller. The userspace interface is a socket.
// Always the same domain is used. AF_BLUETOOTH and PF_BLUETOOTH are equivalent
int fd = socket(PF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_L2CAP);
int fd = socket(PF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_SCO);
int fd = socket(PF_BLUETOOTH, SOCK_RAW, BTPROTO_HCI);
The protocol passed as the third argument is different based on the use cases of the user.
BTPROTO_L2CAP
on top of ACL
is used for high-quality audio, BTPROTO_SCO.
for bi-directional and simultaneous voice and (poorer) audio,
and BTPROTO_HCI
to talk directly to the controller.
Based on the type of protocol, the kernel uses different files to handle incoming/outgoing data from userspace. It register socket types and the callbacks defined on struct proto_ops are invoked whenever userspace wants to connect, bind or write/read data.
To understand some of these metrics, let's trace the life of a Bluetooth L2CAP audio packet.
- user space writes binary data that should be delivered as L2CAP protocol to the connected device.
struct buffer *buf = alloc_data();
write(fd, buf->data, buf->size);
-
Inside
bluetooth
module, the callback declared in thesendmsg
field ofstruct proto_ops
is called. In this specific case, the function isl2cap_sock_sendmsg
, which receives astruct msghdr
containing data from userspace. -
The
struct msghdr
is converted into astruct sk_buff
that's now used across this layer. -
After that, it adds this
sk_buff
into a linked listdata_q
insidel2cap_chan
. This list is initialized in the socket creation. After that, astruct work_struct
is enqueued in aworkqueue
associated with the controller. -
Later, in a worker thread, the function
hci_tx_work
is invoked and tries to dequeue all thesk_buffs
from all the sockets. Thisskb_buff
from the list is eventually dequeued and sent to thebtusb
lower layer. -
The
btusb
module receives thesk_buff
and converts it to a USB Request Block (urb) setting all its configuration and callbacks.usb_submit_urb
is then called to allow the lower layer to perform the communication. -
The
xhci_hcd
module interfaces with USB devices, and it's responsible for sending thestruct urb
the endpoint address registered by the controller. -
The controller receives this and sends this data to the device. The code is closed source, and generally, vendors export only the blobs inside the linux-firmware project.
-
After the packet is transmitted, the controller sends back an event packet to update the availability, signalling that this slot can now be used by a new packet.
The first layer lives in the bluetooth
module. Then it goes to modules btusb
and finally to xhci_hcd
.
Receiving a packet from the controller goes the reverse direction from xhci_hcd
, btusb
and bluetooth
until the data is handled by the socket reading it.
# Requires ebpf_exporter to be installed in the host
go get -u -v github.com/cloudflare/ebpf_exporter/
sudo ebpf_exporter --config.file=config.yaml
# In another terminal session to start Prometheus and Grafana
docker-compose -f docker/docker-compose.yaml up
# Visit http://localhost:3000 with user admin and password foobar and check the panel
The eBPF programs are c files living in src
.
There are python scripts inside the python
directory that print the eBPF data structures for a quicker development cycle.
ebpf_exporter
expects that all metrics to be in a single YAML file. The script aggregate.py
merges the individual metrics configuration from exporter
with the eBPF programs from src
directory.
All of the tests were done on a single controller and a couple of devices. Open an issue if you think any of these eBPF programs or metrics could be misleading in other devices or kernel versions.