Introduce

This project leverages eBPF (BCC) to capture TCP metrics from the kernel for performance diagnosis in microservices architectures. It probes two levels of statistics: flows and packets. The flow-level statistics currently have sixteen metrics, such as flight size, CWnd, sampled RTT, number of fast retransmission and timeout. The packet-level statistics are the breakdown of RTT, including latencies in TCP layer, IP layer, MAC layer, and the network (from NIC to NIC). It also measure the application layer latencies.

Flow-level Statistics

Most of the following flow-level statistics are collected from SNAP (NSDI'11) and NetPoiror (SIGCOMM'16).

Index	Statistics	Definition
1	FlightSize	Packets sent but not ACKed
2	RWND	Receive window size
3	CWND	Congestion window size
4	MSS	Maximum segment size
5	RePackets	Retransmitted packets
6	BSent	Number of bytes sent
7	BReceived	Number of bytes received
8	fastRetrans	Number of fast restransmission
9	Timeout	Number of timeout
10	CurAppWQueue	Number of bytes in the send buffer
11	MaxAppWQueue	Max CurAppWQueue
12	SampleRTT	Total number of RTT samples
13	SumRTT	Sum of RTTs that TCP samples
14	ReadByte	Number of bytes when the socket makes a read call
15	WriteByte	Number of bytes when the socket makes a write call
16	Duration	Duration in which connection has been open

Packet-level Statistics

Overview

MicroBPF aims to measure the latencies in different layers. This figure shows the overview of MicroBPF.

Given a Request R generated from the sender. It will traverse the networking stack and depart the NIC of the sender. After going through the network, it arrives at the NIC of the receiver and traverse its networking stack. Then the application will process this request and return the Response R back to the sender.

This figures shows the data flow of a packet generating from a host to microservices.

This figures shows the main kernel function invocations and buffers when receiving and transmitting packets.

Receiving packets

This table shows the kernel functions for probing latencies when receiving packets.

Layer	Start Function	End Function
MAC Layer	eth_type_trans()	ip_rcv()
IP Layer	ip_rcv()	tcp_v4_rcv()
TCP Layer	tcp_v4_rcv()	skb_copy_datagram_iter()

Transmitting packets

This table shows the kernel functions for probing latencies when transmitting packets.

Layer	Start Function	End Function
TCP Layer	tcp_write_queue_tail()	ip_queue_xmit()
IP Layer	ip_queue_xmit()	dev_queue_xmit()
MAC Layer	dev_queue_xmit()	dev_hard_start_xmit()
QDISC Layer*

* Currently, uBPF is just deployed on AWS EC2 instances. The default setting of EC2 VMs has no QDISC layer.

The network latency

To measure the network latency in VMs, uBPF timestamps SKB in eth_type_trans()/dev_hard_start_xmit() and sends the metrics to a measurement node to calculate the network latency. A better way to measure the network latency is to timestamp in the physical NIC driver, while there is no physical NIC driver in the AWS VMs. We will add this feature for physical machines soon.

The application layer latency

This table shows the kernel functions for measuring the application layer latencies.

Side	Ideal trace point	Practical trace point
Receive	recv()	skb_copy_datagram_iter()
Transmit	send()	tcp_transmit_skb()

Preliminary evaluations

Testbed

We launched two AWS EC2 instances, each with only one CPU core. One is running the Apache server (VM B) and the other is running the Apache benchmark (VM A). The number of concurrent connections is 10.

Preliminary results

This figure shows the network latencies and RTTs from A (Apache Benchmark) to B (Apache Server). The TCP layer latency is measured in VM B. We can see that the network latency is stable (around 1000 us) while the TCP layer latency and RTT are both greatly fluctuating and have the similar CDF. That is because that the network stack process ACKs in the TCP layer. On the sender side, the stack timestamps an SKB when it is transmitted to the IP layer. While on the receiver side, the stack has to process the SKB in the TCP layer and then return the ACK. The kernel scheduling in the TCP layer will significantly affect RTTs.

Similarly, the next figure is the preliminary results from B to A. The TCP layer latency is measured in VM A (benchmark) and is quite small. The network latency is stable (around 550 us), while RTT is still fluctuating.

We argue that it may provide a new prospective for performance diagnosis, system monitoring and congestion control in data centers if we split RTTs into the kernel latencies and the network latencies.

BCC files

tcpin.py: trace the received packets in the kernel.
tcpout.py: trace the transmitted packets in the kernel.
tcpack.py: trace flow-level metrics triggered by ACKs.
app.py: measure the application layer latency.
tcp.py: measure the latencies in different layers in hosts, i.e., the combination of tcpin.py, tcpout.py and app.py.
tcpsock.py: just an example to probe ReadByte and WriteByte.
clock.py: Clock synchronization for uBPF.
nic/: files for measuring the network latencies.

Kernel Functions Probe

Refer to perf.md.

Main functions in network stack

Refer to network_kernel.md.

How to run

Refer to docker.md.

Container

Refer to docker.md.

Test Examples

Refer to test_example.md

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
figures		figures
nic		nic
results		results
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
apache_fetch.sh		apache_fetch.sh
apache_server.sh		apache_server.sh
app.py		app.py
clock.py		clock.py
clock.pyc		clock.pyc
docker.md		docker.md
docker_run.sh		docker_run.sh
entrypoint.sh		entrypoint.sh
learning.txt		learning.txt
multirun.py		multirun.py
nginx_server.sh		nginx_server.sh
old_tcpack.py		old_tcpack.py
perf.md		perf.md
redis_server.sh		redis_server.sh
run_master.sh		run_master.sh
run_slave.sh		run_slave.sh
tcp.py		tcp.py
tcpack.py		tcpack.py
tcpack_mod.py		tcpack_mod.py
tcpin.py		tcpin.py
tcpin_mod.py		tcpin_mod.py
tcpout.py		tcpout.py
tcpsock.py		tcpsock.py
tcptools.py		tcptools.py
tcptools.pyc		tcptools.pyc
temp.txt		temp.txt
test.sh		test.sh
test_example.md		test_example.md
ubpf_tests.txt		ubpf_tests.txt
vm1.sh		vm1.sh
vm2.sh		vm2.sh

ranchyang96/MicroBPF

Folders and files

Latest commit

History

Repository files navigation