The cnetstat data processing pipeline looks like this:
- Use
lsns
to get a list of all the net namespaces we can see. - Use
nsenter -t <pid> -n netstat
to get a list of connections in each namespace, with their PID. - Use
docker
to get a map from PIDs to Docker container labels, which include the Kubernetes namespace, pod, and container name. - Match the PIDs from netstat with the PIDs from Docker, yielding a list of connections with their container identifiers.
One of the key things cnetstat needs to do is translate host PIDs to Kubernetes container and pod names. We know of two ways of doing this pid-to-pod translation.
The Docker way:
- Use
docker ps
to get the docker container ID and labels of every pod. - Iterate through all pods, using
docker inspect
to get the root PID of each one.
The cgroup way:
- Iterate through /sys/fs/cgroup/cpu,cpuacct/kubepods/... to get all Kubernetes pods and containers on the system, with all of their PIDs and their Kubernetes UIDs.
- Use
docker ps
to translate Kubernetes UIDs into Kubernetes container and pod names.
They both require iterating over all pods in the system. The cgroup way has the advantage that it gives all PIDs in a pod, not just the root PID, and also that we could cache known UIDs. The Docker way has the advantage that it uses public interfaces, instead of implementation details.
We're using the Docker way because it seems easier for a proof-of-concept, but we are not committed to it for the long term.
One important design point is that cnetstat builds its pid-to-pod
mapping by talking to Docker, but it doesn't just iterate through
connections from Docker-owned PIDs. Instead, it uses lsns
to get a
list of all net namespaces on a host, gets all connections from all of
those namespaces, and then reports container identities for the PIDs
that have them.
This is how cnetstat lists all connections, including those from the host or non-Docker container systems.
This is required for some of the features below.
When a process closes a connection, it goes into state
TIME_WAIT
. netstat doesn't print an associated PID any more (likely
because the kernel doesn't consider it associated with a PID), but we
still want to know which process opened it so we can debug processes
that open lots of short-lived connections. We just need to keep our
own map from connections to PIDs and update it in the polling loop.
Instead of using netstat to get the list of open connections every time, we should get the list of connections once, at startup, and then get a stream of socket open/close events from the kernel. We didn't do this initially for the sake of getting cnetstat working quickly, but it does seem like the right thing to do, both because the algorithmic complexity will be better and because it will ensure that we can attribute every connection to a process, even if it isn't open when we poll open connections.
We would gladly accept a pull request for pid-to-pod translation for a non-Docker container runtime.