Skip to content

ranchyang96/MicroBPF

 
 

Repository files navigation

Introduce

This project leverages eBPF (BCC) to capture TCP metrics from the kernel for performance diagnosis in microservices architectures. It probes two levels of statistics: flows and packets. The flow-level statistics currently have sixteen metrics, such as flight size, CWnd, sampled RTT, number of fast retransmission and timeout. The packet-level statistics are the breakdown of RTT, including latencies in TCP layer, IP layer, MAC layer, and the network (from NIC to NIC). It also measure the application layer latencies.

Flow-level Statistics

Most of the following flow-level statistics are collected from SNAP (NSDI'11) and NetPoiror (SIGCOMM'16).

Index Statistics Definition
1 FlightSize Packets sent but not ACKed
2 RWND Receive window size
3 CWND Congestion window size
4 MSS Maximum segment size
5 RePackets Retransmitted packets
6 BSent Number of bytes sent
7 BReceived Number of bytes received
8 fastRetrans Number of fast restransmission
9 Timeout Number of timeout
10 CurAppWQueue Number of bytes in the send buffer
11 MaxAppWQueue Max CurAppWQueue
12 SampleRTT Total number of RTT samples
13 SumRTT Sum of RTTs that TCP samples
14 ReadByte Number of bytes when the socket makes a read call
15 WriteByte Number of bytes when the socket makes a write call
16 Duration Duration in which connection has been open

Packet-level Statistics

Overview

MicroBPF aims to measure the latencies in different layers. This figure shows the overview of MicroBPF.

Given a Request R generated from the sender. It will traverse the networking stack and depart the NIC of the sender. After going through the network, it arrives at the NIC of the receiver and traverse its networking stack. Then the application will process this request and return the Response R back to the sender.

This figures shows the data flow of a packet generating from a host to microservices.

This figures shows the main kernel function invocations and buffers when receiving and transmitting packets.

Receiving packets

This table shows the kernel functions for probing latencies when receiving packets.

Layer Start Function End Function
MAC Layer eth_type_trans() ip_rcv()
IP Layer ip_rcv() tcp_v4_rcv()
TCP Layer tcp_v4_rcv() skb_copy_datagram_iter()

Transmitting packets

This table shows the kernel functions for probing latencies when transmitting packets.

Layer Start Function End Function
TCP Layer tcp_write_queue_tail() ip_queue_xmit()
IP Layer ip_queue_xmit() dev_queue_xmit()
MAC Layer dev_queue_xmit() dev_hard_start_xmit()
QDISC Layer*
* Currently, uBPF is just deployed on AWS EC2 instances. The default setting of EC2 VMs has no QDISC layer.


The network latency

To measure the network latency in VMs, uBPF timestamps SKB in eth_type_trans()/dev_hard_start_xmit() and sends the metrics to a measurement node to calculate the network latency. A better way to measure the network latency is to timestamp in the physical NIC driver, while there is no physical NIC driver in the AWS VMs. We will add this feature for physical machines soon.

The application layer latency

This table shows the kernel functions for measuring the application layer latencies.

Side Ideal trace point Practical trace point
Receive recv() skb_copy_datagram_iter()
Transmit send() tcp_transmit_skb()

Preliminary evaluations

Testbed

We launched two AWS EC2 instances, each with only one CPU core. One is running the Apache server (VM B) and the other is running the Apache benchmark (VM A). The number of concurrent connections is 10.

Preliminary results

This figure shows the network latencies and RTTs from A (Apache Benchmark) to B (Apache Server). The TCP layer latency is measured in VM B. We can see that the network latency is stable (around 1000 us) while the TCP layer latency and RTT are both greatly fluctuating and have the similar CDF. That is because that the network stack process ACKs in the TCP layer. On the sender side, the stack timestamps an SKB when it is transmitted to the IP layer. While on the receiver side, the stack has to process the SKB in the TCP layer and then return the ACK. The kernel scheduling in the TCP layer will significantly affect RTTs.

Similarly, the next figure is the preliminary results from B to A. The TCP layer latency is measured in VM A (benchmark) and is quite small. The network latency is stable (around 550 us), while RTT is still fluctuating.

We argue that it may provide a new prospective for performance diagnosis, system monitoring and congestion control in data centers if we split RTTs into the kernel latencies and the network latencies.

BCC files

tcpin.py: trace the received packets in the kernel.
tcpout.py: trace the transmitted packets in the kernel.
tcpack.py: trace flow-level metrics triggered by ACKs.
app.py: measure the application layer latency.
tcp.py: measure the latencies in different layers in hosts, i.e., the combination of tcpin.py, tcpout.py and app.py.
tcpsock.py: just an example to probe ReadByte and WriteByte.
clock.py: Clock synchronization for uBPF.
nic/: files for measuring the network latencies.

Kernel Functions Probe

Refer to perf.md.

Main functions in network stack

Refer to network_kernel.md.

How to run

Refer to docker.md.

Container

Refer to docker.md.

Test Examples

Refer to test_example.md

About

Probe TCP metrics and latencies from the kernel with BCC

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 86.6%
  • Shell 13.1%
  • Dockerfile 0.3%