Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Built-in K3s Containerd doesn't report OOM events for cgroups v2 #4572

Closed
1 task
ghost opened this issue Nov 24, 2021 · 2 comments
Closed
1 task

Built-in K3s Containerd doesn't report OOM events for cgroups v2 #4572

ghost opened this issue Nov 24, 2021 · 2 comments

Comments

@ghost
Copy link

ghost commented Nov 24, 2021

Environmental Info:
K3s Version:
k3s version v1.21.6+k3s1 (df033fa)
go version go1.16.8

Node(s) CPU architecture, OS, and Version:
Linux hostname 5.11.0-1017-aws #18~20.04.1-Ubuntu SMP Fri Aug 27 11:21:54 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
K3s v1.21.6+k3s1 cluster with 3 servers and 5 agents all servers are using Linux cgroups v2

Describe the bug:
When a process inside a pod is killed due to OOM Containerd doesn't report OOM events. It affects only systems which are using cgroups v2 with v1 it works as expected.

Steps To Reproduce:

  • Installed K3s:
    k3s-agent systemd service:
Description=Lightweight Kubernetes
Documentation=https://k3s.io
Wants=network-online.target
After=network-online.target

[Install]
WantedBy=multi-user.target

[Service]
Type=exec
EnvironmentFile=-/etc/default/%N
EnvironmentFile=-/etc/sysconfig/%N
EnvironmentFile=-/etc/systemd/system/k3s-agent.service.env
KillMode=process
Delegate=yes
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/local/bin/k3s \
    agent \
        '-c' \
        '/etc/rancher/k3s/config.yaml' \
        '--server' \
        'https://master:6443' \

/etc/rancher/k3s/config.yaml:

no-flannel: true
node-name: HOSTNAME
kubelet-arg:
- eviction-hard=imagefs.available<5%,nodefs.available<5%,memory.available<5%
- eviction-soft=imagefs.available<10%,nodefs.available<10%,memory.available<10%
- eviction-soft-grace-period=imagefs.available=5m,nodefs.available=5m,memory.available=5m
- cloud-provider=external
- "provider-id=aws:///us-east-1b/i-111111111"

node-label:
- "group-name=worker-group"
- "node-type=worker"
  • Create a deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: stress-oom-crasher
  labels:
    app: stress-oom-crasher
spec:
  replicas: 1
  selector:
    matchLabels:
      app: stress-oom-crasher
  template:
    metadata:
      labels:
        app: stress-oom-crasher
    spec:
      containers:
      - name: stress-oom-tester
        image: python:3.9.9
        command: ["/bin/sleep", "3650d"]
        resources:
          limits:
            memory: "123Mi"
            cpu: 50m
          requests:
            memory: "123Mi"
            cpu: 50m
  • Inside stress-oom-crasher pod execute python code below in order to cause OOM event
l = []
new_list4k = [0]*4096
while True:
    l.extend((new_list4k))

Expected behavior:
On node where stress-oom-crasher pod is running ctr events should show OOM events, e.g.

2021-11-24 10:59:04.757581973 +0000 UTC k8s.io /tasks/oom {"container_id":"3166ec37d31ee3089e272d6f3261585786fdcdc41d3cda4a3aac3ebd2b324586"}
2021-11-24 10:59:04.75831734 +0000 UTC k8s.io /tasks/oom {"container_id":"75c684a3665b008f1037324c7511150fe6cfad0b14d79d5030fda0130c59478f"}

Actual behavior:
There are no OOM events in output of ctr events command

Additional context / logs:
I noticed that if run a container manually, e.g.
ctr run -t --memory-limit=126000000 docker.io/library/python:3.9.9 test_oom bash
and generate OOM then expected /tasks/oom event is shown in output of ctr events.
In this case corresponding cgroup is created under /sys/fs/cgroup/k8s.io/ in case if a container is created by k3s corresponding cgroup is created under /sys/fs/cgroup/kubepods/.

Backporting

  • Needs backporting to older releases
@brandond
Copy link
Member

Have you tested to see if this behavior is unique to our packaging of containerd? Can you reproduce the same behavior with upstream containerd 1.4 when using cgroupv2?

@stale
Copy link

stale bot commented May 23, 2022

This repository uses a bot to automatically label issues which have not had any activity (commit/comment/label) for 180 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the bot can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the bot will automatically close the issue in 14 days. Thank you for your contributions.

@stale stale bot added the status/stale label May 23, 2022
@stale stale bot closed this as completed Jun 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant