This project leverages eBPF (BCC) to capture TCP metrics from the kernel for performance diagnosis in microservices architectures. It probes two levels of statistics: flows and packets. The flow-level statistics currently have sixteen metrics, such as flight size, CWnd, sampled RTT, number of fast retransmission and timeout. The packet-level statistics are the breakdown of RTT, including latencies in TCP layer, IP layer, MAC layer, and the network (from NIC to NIC). It also measure the application layer latencies.
Most of the following flow-level statistics are collected from SNAP (NSDI'11) and NetPoiror (SIGCOMM'16).
Index | Statistics | Definition |
---|---|---|
1 | FlightSize | Packets sent but not ACKed |
2 | RWND | Receive window size |
3 | CWND | Congestion window size |
4 | MSS | Maximum segment size |
5 | RePackets | Retransmitted packets |
6 | BSent | Number of bytes sent |
7 | BReceived | Number of bytes received |
8 | fastRetrans | Number of fast restransmission |
9 | Timeout | Number of timeout |
10 | CurAppWQueue | Number of bytes in the send buffer |
11 | MaxAppWQueue | Max CurAppWQueue |
12 | SampleRTT | Total number of RTT samples |
13 | SumRTT | Sum of RTTs that TCP samples |
14 | ReadByte | Number of bytes when the socket makes a read call |
15 | WriteByte | Number of bytes when the socket makes a write call |
16 | Duration | Duration in which connection has been open |
MicroBPF aims to measure the latencies in different layers. This figure shows the overview of MicroBPF.
Given a Request R generated from the sender. It will traverse the networking stack and depart the NIC of the sender. After going through the network, it arrives at the NIC of the receiver and traverse its networking stack. Then the application will process this request and return the Response R back to the sender.
This figures shows the data flow of a packet generating from a host to microservices.
This figures shows the main kernel function invocations and buffers when receiving and transmitting packets.
This table shows the kernel functions for probing latencies when receiving packets.
Layer | Start Function | End Function |
---|---|---|
MAC Layer | eth_type_trans() | ip_rcv() |
IP Layer | ip_rcv() | tcp_v4_rcv() |
TCP Layer | tcp_v4_rcv() | skb_copy_datagram_iter() |
This table shows the kernel functions for probing latencies when transmitting packets.
Layer | Start Function | End Function |
---|---|---|
TCP Layer | tcp_write_queue_tail() | ip_queue_xmit() |
IP Layer | ip_queue_xmit() | dev_queue_xmit() |
MAC Layer | dev_queue_xmit() | dev_hard_start_xmit() |
QDISC Layer* |
To measure the network latency in VMs, uBPF timestamps SKB in eth_type_trans()/dev_hard_start_xmit() and sends the metrics to a measurement node to calculate the network latency. A better way to measure the network latency is to timestamp in the physical NIC driver, while there is no physical NIC driver in the AWS VMs. We will add this feature for physical machines soon.
This table shows the kernel functions for measuring the application layer latencies.
Side | Ideal trace point | Practical trace point |
---|---|---|
Receive | recv() | skb_copy_datagram_iter() |
Transmit | send() | tcp_transmit_skb() |
We launched two AWS EC2 instances, each with only one CPU core. One is running the Apache server (VM B) and the other is running the Apache benchmark (VM A). The number of concurrent connections is 10.
This figure shows the network latencies and RTTs from A (Apache Benchmark) to B (Apache Server). The TCP layer latency is measured in VM B. We can see that the network latency is stable (around 1000 us) while the TCP layer latency and RTT are both greatly fluctuating and have the similar CDF. That is because that the network stack process ACKs in the TCP layer. On the sender side, the stack timestamps an SKB when it is transmitted to the IP layer. While on the receiver side, the stack has to process the SKB in the TCP layer and then return the ACK. The kernel scheduling in the TCP layer will significantly affect RTTs.
Similarly, the next figure is the preliminary results from B to A. The TCP layer latency is measured in VM A (benchmark) and is quite small. The network latency is stable (around 550 us), while RTT is still fluctuating.
We argue that it may provide a new prospective for performance diagnosis, system monitoring and congestion control in data centers if we split RTTs into the kernel latencies and the network latencies.
tcpin.py: trace the received packets in the kernel.
tcpout.py: trace the transmitted packets in the kernel.
tcpack.py: trace flow-level metrics triggered by ACKs.
app.py: measure the application layer latency.
tcp.py: measure the latencies in different layers in hosts, i.e., the combination of tcpin.py, tcpout.py and app.py.
tcpsock.py: just an example to probe ReadByte and WriteByte.
clock.py: Clock synchronization for uBPF.
nic/: files for measuring the network latencies.
Refer to perf.md.
Refer to network_kernel.md.
Refer to docker.md.
Refer to docker.md.
Refer to test_example.md