Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
125 lines (75 sloc) 8.23 KB

[GSoC 2019 Report] Support Kafka Metrics in Linkerd

This page records the achievements and future work of the Google Summer of Code 2019 project Support Kafka Metrics in Linkerd.

Related web pages:

Project Overview

Linkerd is a popular service mesh for Kubernetes and it can detect a few network traffic protocol automatically and create metrics accordingly. Currently, however, Kafka is not supported to be detected and it will be handled as a general TCP traffic.

This project aims to

  1. Implement a Kafka codec so its protocol can be detected and decoded;
  2. Integrate the codec with Linkerd;
  3. Create corresponding metrics for Kafka in Linkerd;

Achievements

In this section, we will describe the achievements of the project in three parts.

Kafka Codec

A new Kafka codec is implemented and the source code is in the current Github repository.

Design of the Codec

A major achievement of the project is the design of the codec. Since Rust is pretty different from other traditional OOP programming language such as Java, the design is also very distinctive.

The architecture of a Kafka request is in Figure 1 where the RequestBody type is a Rust enum type (related code). Because Rust doesn't support inheritance, we use enumerate type rather than traditional OOP design. The architecture of a Kafka response is similar to a request.

1566580196018

Figure 1: Architecture of the Kafka

Note that each option of the RequestBody enumerate type is a specific Kafka request such as the metadata request (related code, and related Kafka protocol). What's more, the FromByte trait is implemented in each Kafka types. For instance, primitive types such as i32 or more complicated types such as HeaderRequest all implement such a trait so that the buffer can be decoded in a smart and elegant way. A good example is the HeaderRequest (related code) where you just need to call decode_buffer(buf) sequentially and no need to care about the type of the primitives to be decoded.

When parsing a Kafka traffic, the first a few bytes in the buffer are peeked and the request header will be decoded. If everything fine, e.g. request size is reasonable and the API key is in the protocol, then the traffic will be judged as of Kafka protocol. Later, the header and body will be analyzed accordingly, and a DecodedRequest or DecodedResponse variable will be returned to the codec caller.

Linkerd Integration

A few PR has been made to try to integrate the above codec with Linkerd (list of related PRs).

Basically, when Linkerd is injected into Kubernetes as a proxy and a new connection is setup, the first a few bytes will be peeked to detect the protocol of the traffic. In the above PRs, a new protocol type Kafka is added and it will call the codec to try to decode the traffic. If the attempt succeeds, then the traffic will be judged as Kafka and the incoming traffic will be parsed as Kafka requests and outgoing one as responses.

The decoding operation is conducted in the KafkaIo part which is a wrapper of the original Io type (related code). Inside the read() and write() method of the IO, the buffer is sent to the codec to be parsed. In this way, we achieve the goal to analyze the traffic with a minimal change in the codebase, however, a side-effect of this design is the state of the buffer. The buffer may be read in frame or be read a few times, which means we may parse a buffer with a broken content or decode the same buffer a few times. To solve this issue, a state is added in the KafkaIo to remember which part of the buffer is decoded and which is not yet, and retry effort can be made if the content cannot be decoded meaningfully.

Metrics Creation

Currently, the size, header, and type of the request/response will be printed into the console and log when running the proxy. More detailed metrics and more way to present the metrics are listed in the future work.

How to run the codes

Development Environment

To develop this project, you may need the following dependencies:

  • IDE: We recommend IntelliJ or VS Code as the IDE to develop Linkerd and the Kafka codec. Both of them have powerful plugins to support Rust.

  • Rust and Cargo

  • Kafka: It can be run in the host machine or inside the docker or Kubernetes.

  • Kubernetes: Per Linkerd suggestion, Minikube is the best option to run Kubernetes.

Test Cases

Test cases are the best reference to start with for the project. For example:

Other notes

A few notes that we believe beneficial to the new developers:

Minikube proxy for Google

Since Google is blocked in a few areas as well as the Kubernetes registry, you may need to setup a HTTPs proxy for your Minikube to download the Kubernetes dependencies. You can setup a local HTTP proxy on the host machine such as localhost:8123, however, in the Minikube please use the IP address 10.0.2.2 instead of localhost. This StackOverflow page answers the corresponding reason.

Linkerd cannot be compiled

One of the reasons that Linkerd cannot be compiled is the NodeJS version is too old (related Slack discussion). There is not be an error message for this and you may wait for a long long time waiting for the compile without any progress.

Future Work

We will discuss the future work of the project in three parts, the same as the above Achievement section.

Kafka Codec

The current version of the codec doesn't support multi-version decoding, that means, only the Kafka message of the latest version will be decoded successfully. It can be tedious to support all the Kafka message of different versions. To solve this problem, the official Kafka client describes the messages in the Json format (related code) and a specific generator is implemented to convert the Json files to Java codes.

It would be nice to implement a Rust generator to parse the Json files into Rust source codes. This will make our life easier because we only need to re-run the generator when a new version of Kafka is released.

Linkerd Integration

Although a few test cases are created, we still need more tests (both unit and integration tests) to cover more complicated scenarios before merging the PRs into the production branch.

Metrics

A web UI will be appreciated to visualize the Kafka traffic states. And a Prometheus report for Kafka will be wonderful.

Acknowledge

Special thanks to the nice mentors Eliza Weisman and Thomas Rampelberg for their patience and nice instructions. I didn't know much about Rust, Kafka, and Kubernetes before I joined this GSoC project and I learned a lot this summer.

Thanks to the Linkerd community for supporting.

And lastly, thanks to Google for this wonderful journey.

You can’t perform that action at this time.