Skip to content

Missing cAdvisor network metrics when using gVisor #6500

@karanthukral

Description

@karanthukral

Description

After upgrading gVisor from release 20210322.0 to 20210720, we noticed that we were missing some cAdvisor (ex: container_network_receive_packets_total, container_network_receive_bytes_total etc) prometheus container metrics for all pods/containers running on gVisor.

During our debugging we tried to nail down any gVisor changes that could have caused this issue. We did this by going through the gVisor changelog while also running multiple different versions of gVisor to find the release where this stopped working. We found that release 20210518.0 was when metrics stopped being collected. 20210510.0 release was the last release we were able to successufully receieve metrics from cAdvisor. Looking at the changelog between the versions, we came across the following commit where the sandbox was updated to use the pod cgroup instead of the (1st) container cgroup it was using. Given that cAdvisor is built to deliver per container metrics but the mentioned commit moves the sandbox (and its metrics) to the pod's cgroup, cAdvisor stops emitting metrics for all gVisor containers.

Comparing the sandbox config for my pod between the working version of gVisor and broken version, the only changes that stood out were the cgroup changes. Examples of the sandbox config:

working

"sandbox": {
    "id": "e7b82bbf3ee2901aaec811ec59d807edaf6c6967884103ac418cfa3b032481da",
    "pid": 342475,
    "cgroup": {
      "name": "/kubepods/burstable/pod308d8927-7e25-443c-8dad-d390f7023b0e/e7b82bbf3ee2901aaec811ec59d807edaf6c6967884103ac418cfa3b032481da",
      "parents": null,
      "own": {
        "blkio": true,
        "cpu": true,
        "cpuset": true,
        "devices": true,
        "freezer": true,
        "hugetlb": true,
        "memory": true,
        "net_prio": true,
        "perf_event": true,
        "pids": true,
        "rdma": true,
        "systemd": true
      }
    },
    "originalOomScoreAdj": -999
}

not working

"sandbox": {
    "id": "71e675350a4c07b79cae482e650c1beeb1e23488112d3b751838a9dc8e17399a",
    "pid": 359425,
    "cgroup": {
      "name": "/kubepods/burstable/pod2baf1dc0-3715-4153-a8c2-b46225e219cc",
      "parents": null,
      "own": {
        "devices": true,
        "rdma": true
      }
    },
    "originalOomScoreAdj": -999
}

Steps to reproduce

In order to reproduce the bugs, you need a running k8s cluster which has cAdvisor metrics enabled as part of the kubelet. We rely on using Prometheus to scrape the metrics but in order to simply test this issue one can run a curl command against the kubelet on the node on which the gVisor pod is running. These steps assume you are able to authenticate requests against the kubelet on the node.

  1. Run 20210518.0 or newer release of gVisor containerd shim and runsc
  2. Deploy a pod using gVisor on k8s that is able to recieve and/or respond to network calls. I was utilizing sample-golang-notes
  3. Run GET requests against your running app
  4. Run curl -H 'authorization: Bearer $TOKEN' -k https://KUBELET_IP/metrics/cadvisor and search/grep for container_network_receive_bytes_total or container_network_receive_packets_total metrics which are associated with your namespace.

You should see the mentioned metrics missing from the reponse to request. If you downgrade your gVisor release to 20210510.0 or below and follow the same steps, you should start to see cAdvisor metrics mentioned come through again.

runsc version

> runsc --version
runsc version release-20210518.0
spec: 1.0.2

docker version (if using docker)

No response

uname

Linux NODE 4.19.0-17-amd64 #1 SMP Debian 4.19.194-3 (2021-07-18) x86_64 GNU/Linux

kubectl (if using Kubernetes)

❯ kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:52:14Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:53:14Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}

repo state (if built from source)

No response

runsc debug logs (if available)

No response

Metadata

Metadata

Assignees

Labels

type: bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions