Skip to content

Enhancement idea: Include cgroup name for OOMKilling to help identify the pod #766

@kposborne

Description

@kposborne

This can make it easier to find out what was going on, today, I needed to go to the dmesg logs to find out which pod had a process getting oomkilled, as multiple pods on this node could be running a kubectl process. There, it has details about what is the cgroup name which includes the pod uid
example event:

{
  "verb": "ADDED",
  "event": {
    "metadata": {
      "name": "<node_name>.17683ea4fced9353",
      "namespace": "default",
      "uid": "f5f30161-50d1-4d64-a93f-cb258da8f01a",
      "resourceVersion": "18283211",
      "creationTimestamp": "2023-06-13T14:35:38Z",
      "managedFields": [
        {
          "manager": "node-problem-detector",
          "operation": "Update",
          "apiVersion": "v1",
          "time": "2023-06-13T14:35:38Z"
        }
      ]
    },
    "involvedObject": {
      "kind": "Node",
      "name": "<node_name>",
      "uid": "<node_name>"
    },
    "reason": "OOMKilling",
    "message": "Memory cgroup out of memory: Killed process 3504804 (kubectl) total-vm:753508kB, anon-rss:58776kB, file-rss:29400kB, shmem-rss:0kB, UID:0 pgtables:288kB oom_score_adj:999",
    "source": {
      "component": "kernel-monitor",
      "host": "<node_name>"
    },
    "firstTimestamp": "2023-06-13T14:35:38Z",
    "lastTimestamp": "2023-06-13T14:35:38Z",
    "count": 1,
    "type": "Warning",
    "eventTime": null,
    "reportingComponent": "",
    "reportingInstance": ""
  }
}

In the dmesg logs, it includes the pod uid in the task_memcg

[Tue Jun 13 14:35:37 2023] Memory cgroup stats for /kubepods/burstable/pod28deef67-f194-4a8d-8c74-13b355109e91:
[Tue Jun 13 14:35:38 2023] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=fb90f9be7bb1086237c30b31138abaed4a8904e9b8ab9c1d30585
40a74a4e794,mems_allowed=0,oom_memcg=/kubepods/burstable/pod28deef67-f194-4a8d-8c74-13b355109e91,task_memcg=/kubepods/burstable/pod28deef67-
f194-4a8d-8c74-13b355109e91/fb90f9be7bb1086237c30b31138abaed4a8904e9b8ab9c1d3058540a74a4e794,task=kubectl,pid=3504804,uid=0
[Tue Jun 13 14:35:38 2023] Memory cgroup out of memory: Killed process 3504804 (kubectl) total-vm:753508kB, anon-rss:58776kB, file-rss:29400
kB, shmem-rss:0kB, UID:0 pgtables:288kB oom_score_adj:999

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions