Skip to content

Latest commit

 

History

History
77 lines (61 loc) · 3.42 KB

Design.md

File metadata and controls

77 lines (61 loc) · 3.42 KB

cnetstat design

The cnetstat data processing pipeline looks like this:

  1. Use lsns to get a list of all the net namespaces we can see.
  2. Use nsenter -t <pid> -n netstat to get a list of connections in each namespace, with their PID.
  3. Use docker to get a map from PIDs to Docker container labels, which include the Kubernetes namespace, pod, and container name.
  4. Match the PIDs from netstat with the PIDs from Docker, yielding a list of connections with their container identifiers.

Pid-to-pod transation

One of the key things cnetstat needs to do is translate host PIDs to Kubernetes container and pod names. We know of two ways of doing this pid-to-pod translation.

The Docker way:

  1. Use docker ps to get the docker container ID and labels of every pod.
  2. Iterate through all pods, using docker inspect to get the root PID of each one.

The cgroup way:

  1. Iterate through /sys/fs/cgroup/cpu,cpuacct/kubepods/... to get all Kubernetes pods and containers on the system, with all of their PIDs and their Kubernetes UIDs.
  2. Use docker ps to translate Kubernetes UIDs into Kubernetes container and pod names.

They both require iterating over all pods in the system. The cgroup way has the advantage that it gives all PIDs in a pod, not just the root PID, and also that we could cache known UIDs. The Docker way has the advantage that it uses public interfaces, instead of implementation details.

We're using the Docker way because it seems easier for a proof-of-concept, but we are not committed to it for the long term.

Net namespaces

One important design point is that cnetstat builds its pid-to-pod mapping by talking to Docker, but it doesn't just iterate through connections from Docker-owned PIDs. Instead, it uses lsns to get a list of all net namespaces on a host, gets all connections from all of those namespaces, and then reports container identities for the PIDs that have them.

This is how cnetstat lists all connections, including those from the host or non-Docker container systems.

Future goals

Add an option to run in a loop and print connections periodically

This is required for some of the features below.

Track the owning process of TIME_WAIT connections

When a process closes a connection, it goes into state TIME_WAIT. netstat doesn't print an associated PID any more (likely because the kernel doesn't consider it associated with a PID), but we still want to know which process opened it so we can debug processes that open lots of short-lived connections. We just need to keep our own map from connections to PIDs and update it in the polling loop.

Use socket open/close events directly

Instead of using netstat to get the list of open connections every time, we should get the list of connections once, at startup, and then get a stream of socket open/close events from the kernel. We didn't do this initially for the sake of getting cnetstat working quickly, but it does seem like the right thing to do, both because the algorithmic complexity will be better and because it will ensure that we can attribute every connection to a process, even if it isn't open when we poll open connections.

Include a Kubernetes pod specification for running cnetstat as a daemonset

Support non-Docker container runtimes

We would gladly accept a pull request for pid-to-pod translation for a non-Docker container runtime.